Monthly Archives: October 2010

New content added in past few weeks

Here’s an overview of the new content added in the past few weeks. Two collections are of particular note: the Lloyd’s List for 1812, via 1812Privateers.org, and the Dyal Ship Collection. One man, Michael Dun, has digitized and indexed all of the issues of Lloyd’s List for the entire year of 1812. It’s quite a feat. He’s indexed all of the ships and all of the masters for that time, adding up to nearly 26,000 ship citations in all the issues of Lloyd’s List for 1812. He kindly shared his index with me, so I could include links to his resources. Mr. Dun hosts the pages on his servers, and they are accessible to all via that site. While working through the index of ship names that he provided to me, I was able to identify a number of corrections, and I incorporated those into the file I imported.

Working through this file was also an interesting reminder about the challenges we face in trying to make the most of these primary sources. Clearly, the folks who were putting together each issue of Lloyd’s List (it usually came out twice a week, and was published in London) were trying to get information out as quickly as possible, and weren’t too concerned with absolute accuracy, to say nothing of how researchers two centuries later would like them to present information.

As a few examples, each of the following slight spelling variations by the editors are likely the same ship: Misletoe, Misseltoe, and Missletoe (there’s no Mistletoe listed in this year of Lloyd’s!). Or, Nymph, Nymphe, and Nymphen. Or Powhatan, Powahattan, and Powhatton. Or Zenophon and Zenophen, when the proper spelling is Xenophon. Or Tinmouth Castle, most  likely meaning Teignmouth Castle. Or simple errors, like Hepsa instead of Hespa.

Of course, if you’re reading this at a London coffee shop one morning in 1812, you can easily look over these minor errors, and figure out what the editors’ intent was. But for researchers two centuries later, who are trying to mine large amounts of data to see what they can find, these errors cause a problem. So how do we address them? That’s an issue for an upcoming blog post. But, needless to say, we at ShipIndex.org have a solution…

Another interesting addition is the Dyal Ship Collection, but for very different reasons. This is a collection of images and data compiled by a researcher (in this case, a librarian) and added to his institution’s “institutional repository” (IR). An IR is a site, usually maintained by an academic library, where content generated by the institution’s faculty, staff, and students is made available for free. It is, in a large sense, a reaction to the high cost of many academic journals, where an institution’s researchers spend time and money doing and compiling research, then pay to have that published in a scholarly journal, then the institution pays to buy the results back, through a subscription to the journal. The whole discussion is beyond the scope of this blog post, but the point is that IRs are places where interesting and useful information can be stored — but it’s most often quite hidden, unless there’s some effective way of indexing the content.

So, with the encouragement and assistance of the compiler, we’ve created links into the collection of files and images that are stored in Texas Tech University’s institutional repository. Recently, we’ve heard from others who have data they’d like us to include, and we’re looking at ways of doing that effectively. This is just one example of that.

Other items we’ve added are mostly more standard print or online collections. The total list is as follows:

If you have maritime content that you’d like to get online, or is online but needs broader publicity, please let us know. We’d love to find a way to help.

Full text links from within ShipIndex

ShipIndex.org links to the full-text for nearly 85% of its citations! Before Mike ran the numbers, I guessed that a conservative estimate on links to full text would be at about 70%, so the 85% number was quite a surprise, but it’s true.

How did we do this? First, we’re linking to lots and lots of content online. There are so many free online resources with information about ships out there, and I feel like I find another one every week. But other than ShipIndex, there’s no place that brings all these resources to one place, and no way to search all of them at once. However, with ShipIndex, that’s what you’re doing. But that doesn’t get one to 85%.

Recently, we started looking for resources in Google Books. The next time you’re searching in ShipIndex and you see a hotlinked page number, try clicking on that page number. It should take you right to the page of the book within Google’s Book Search project.

Here are two examples from freely-available resources:

  • The citations for Aroostook, from Paul Calore’s Naval Campaigns of the Civil War, has a link to page 128, and the vessel is mentioned near the start of the last paragraph.
  • The citation for City of Pekin, from Arthur Clark’s The Clipper Ship Era, has a link to page 86, and the ship is mentioned about 2/3 of the way down the page.

This was an interesting experience, and I learned a lot when we did it. The goal was to try and link directly to the page that cited a specific ship. I discovered four different levels of Google Books linking:

  • No content: The book just can’t found, or it’s cited but offers no view into it at all
  • Snippet view: With snippet view, you really do only get just a touch of the book, and it’s hard to know how much or what you’ll get. Most importantly, you can only search by terms, you can’t ask Google to show you all of a specific page.
  • Preview: With preview, Google offers most of the pages of a book. This is common for recently-published works, and Google works with the publisher to figure out what they’ll show. The idea, obviously, is to show enough that someone wants to go out and buy the full book.
  • Full view: For these books, Google shows the entire thing. These are primarily books that are out of copyright protection – so, published before 1923.

We only activate links for books that are available via Full View and Preview — and we only do the Preview if it appears that most links will get to the page in question. We’ve found a few titles that are available in Preview, but so many links go to pages that aren’t visible, perhaps because the publisher only allows 10-20% of the book to be shown via Google Books, that it seems misleading to offer those links.

Links to Snippet views don’t work because there’s no way to get to a specific page. You could try to search for the ship name, but if the ship name is something like “Elizabeth”, then you’ll get every mention of “Elizabeth” in the book – including names of people, not just ships. Also, the searches just don’t work as well. This could be a result of problems in OCR work, too – if the OCR work isn’t very good, then Google won’t find specific phrases, and with the page linking, we’re going to a specific page, not searching for a ship name in the book’s text.

So, as a result, you’ll most likely find linking to Google for very old books (via Full View) and very new books (via Preview).

The horror stories about metadata in Google Books are very true. It’s a mess for any slightly complicated title, such as multi-volume sets. So, finding Navy Records Society volumes — especially multi-volume works that weren’t published consecutively — was sometimes quite a challenge. And, in some cases, volumes that should be available just aren’t. I found one book that was completely upside down. Others have lousy scan quality. But the fact is that an enormous amount of content is available from anyone’s computer now, and it will only improve.

Try it out; see what you think.