Category Archives: Data Correction

Unique Vessel Identifiers Added to ShipIndex.org – the short version

 

I wrote a very long blog post about our new use of Wikidata identifiers, and how it changes the ShipIndex database. It might be too long for some. Here, I hope to limit my overview of these changes to just four paragraphs (not including this one).

The ShipIndex database has grown a lot over the past decade, and now has over 3.5 million citations in it. For very common ship names, this makes it too hard to find information about a specific ship. How do you find information about the right America, when there are 2,378 different citations to work through?

The solution is unique identifiers – basically a specific identifier for each hull. This also allows us to bring together citations for the same ship as it changes names. To make this work, we’re using identifiers from Wikidata, plus local identifiers when Wikidata doesn’t yet have one. Wikidata makes it easy to use Linked Data, so that we can uniquely identify and share items and concepts: the identifier Q82925, represented at https://www.wikidata.org/wiki/Q82925, specifically refers to the author Joseph Conrad, while Q1278752, at https://www.wikidata.org/wiki/Q1278752, specifically refers to the ship named after Conrad, now at Mystic Seaport Museum. It also refers to the original name of that ship, Georg Stage, but differentiates it from the current ship with that name. Similarly, the identifier Q838125 refers to USS Hornet (CV-8), and Q1141355 refers to USS Hornet (CV-12).

When you look at the ShipIndex page for Hornet now, you’ll see eleven different ‘cards’, each one referring to a specific hull, or vessel. Click on any of those ship names, and you’ll see only the citations that have been associated with that unique identifier. And note the URL for each card – it includes the Q-identifier, so others can easily link to it as well, without needing to create some new URL. In addition, many citations have not been associated with any card, because we cannot determine to which vessel the citation refers – at least, not without going back to the original source.

In the future, I hope to offer ways for individuals to help grow this resource, and maybe create a way that people can share their own information – images, reminiscences, comments, online links – about specific vessels. Until then, I’ll be working away at associating as many citations as possible to specific vessels. I hope that this improves your experience with ShipIndex, and helps everyone do more and better maritime history research.

For more information about what we’ve done here, check out the much-longer blog post.

All the Gory Details about Adding Unique Vessel Identifiers to ShipIndex.org

TL;DR: The ShipIndex.org database can now differentiate between disparate ships of the same name, and combine citations using different names for the same ship. We use Wikidata Q-identifiers to do this, and hope that we can help apply Linked Data to maritime history.


This is a very long post about some significant enhancements that we’ve added to ShipIndex.org. For a shorter post about the same thing, that doesn’t go into quite as much detail, see here.


Greetings, from ShipIndex central. It’s been a while since our list blog post, and I referenced some upcoming changes back then. Of course, things have taken a bit more time than I’d expected, but I’m ready to start sharing some of these changes.

Right now, ShipIndex.org has over 3.5 million citations from over 900 resources. If you’re looking for a specific ship with a common name, you’re gonna have a hard time. Looking for a specific “Eagle”? As of today, there are 2,677 citations for ships named Eagle. “America”? There are 2,378 citations. The most common ship names are, in increasing order, “Hope”, “Anna”, and “Maria”, with “Elizabeth” having 3,818 citations, and “Mary” leading by a lot, with 5,072 citations.

Researchers have a big problem in trying to work through those common ship names. And they have a problem when a ship changes its name – it’s still the ship they want to research, but ShipIndex.org doesn’t really connect a user with the ship’s previous or subsequent names. (Well, it did, a bit: if you look at the “America” entry, you can see “related ships” – but how do you know which of the 2,378 citations also refer to Italis, West Point, or Australis?)

What I always wanted was a “unique vessel identifier”. I could bring together citations that refer to the same ship, and differentiate between citations for different ships with the same name. I wasn’t sure how to make that work, until a colleague at my day job suggested using Wikidata identifiers. This was such a great idea, for many reasons.

You’ve all seen the xkcd comic about standards, right? No? OK, here:

 

Same thing with identifiers. Many already exist – naval hull numbers, IMO numbers, national registration numbers, and others – but none refer to all ships, obviously. I didn’t want to create a new identifier, especially since most of the world wouldn’t use it. Wikidata, however, addresses all of these issues, and most importantly it can make maritime history research easier by using these identifiers across the web. For example, a vessel at a maritime museum, like Mystic Seaport’s Joseph Conrad, has the Wikidata identifier Q1278752. This is a “Q-number”, assigned at random. It is unique to this specific item. The Wikidata entry contains other pieces of factual data about the vessel, including its current location, its builder, and some of its dimensions. Anyone can add to the record about any entry.

Every item or entry (including non-physical concepts) in Wikipedia has a Q-number. Look on the left column for any Wikipedia entry (like the Conrad‘s), and you’ll see an entry under “Tools” that’s labelled “Wikidata item”. That links you to the Wikidata entry for the item in question. Information from Wikipedia is incorporated into Wikidata, and all of the information is available for sharing and using on the web. Look on the right of the Wikidata entry, and you’ll see a list of entries on that subject, in numerous different languages.

(As another aside, note the difference here between Wikipedia and Wikidata. Wikipedia contains textual information and discussion about a topic or item. One subject may have multiple entries in different languages. [The German entry about Joseph Conrad is not just a straight translation of the English one. Nor is the Farsi entry.] Wikidata, on the other hand, is a collection of data – just the facts, ma’am – about the topic. Wikidata does not have foreign-language versions of each data page.)

This Wikidata entry for “Joseph Conrad” is different from entries for the Polish author even though the ship is named after him; from a French army officer with the same name; a US army officer; and more. By using linked data in this way, online systems can better identify the person from the ship, making it easier for researchers to find what they’re looking for, quicker.

Over time, I’ll be able to do lots more with the Wikidata that is available to us, as the Wikidata database grows. Hopefully, the ShipIndex.org data will be easier to find online, plus it will be easier to use, because ships with common names will be better sorted, and ship name changes will be better represented.

Last December, my son and I visited the Cradle of Aviation museum in Garden City, NY. While there, we saw this plaque describing the many different ships in the US Navy named “Hornet”:

This is a great example of what we’re trying to do here in ShipIndex.org – sorting and dividing the many very different ships with the same name. (Note here, though, that this plaque differentiates between the last three versions of USS Hornet, saying that CV-12, CVA-12, and CVS-12 were different ships. They really weren’t; they were the same hull, even if they were refitted for different uses over the last decades of service. I don’t know, but I’d be surprised if Navy veterans who served on CV-12 would feel that they served on a different ship than those who served on CVS-12, for instance.)

Anyway, when you go to the entry for Hornet in ShipIndex.org now, you’ll see multiple ‘cards’ at the top of the page – each one represents a different vessel, or hull. We pull publicly available data from Wikidata into the cards we create, and we’ll be able to do more there over time. Right now, the images come from Wikidata; when we don’t have one, we have to put in a placeholder. (Imagine a place where you could post your own information, be it pictures, remembrances, links to vessel-specific sites, etc., about a specific vessel. Interested? Let me know.) We organize citations that are specifically about a particular vessel under the appropriate card.

Of course, not every ship in ShipIndex.org has a Wikidata identifier. Right now, we’re using local identifiers when a Wikidata identifier doesn’t exist, or we haven’t found it yet. Since anyone can create a new entry in Wikidata, we can also create identifiers there, and share our knowledge with the rest of the world.

We’ll never get all, or most, or even many, citations associated with cards. “Hornet” has 820 citations from 221 resources. We have 11 cards, for specific vessels, and each card has between 2 and 66 citations associated with it. So, just 235 of 820 citations are associated with cards. But many entries have basically no descriptive information about the ship at all, and one would need to look at each resource to figure out if the Hornet in question is one of the ones for which we have a card. Even when there is information, it’s often not enough – there are nine citations with “aircraft carrier” in the description, but without looking at each resource, I don’t know if they’re referring to USS Hornet (CV-8) or USS Hornet (CV-12).

But for the time being, it’s very much a start toward doing better research in maritime history. Look at the two Hornet links above, for instance. The URL for Hornet CV-8 is https://www.shipindex.org/vessels/Q838125, and the URL for Hornet CV-12 is https://www.shipindex.org/vessels/Q1141355. There’s that Q-identifier again, right in our URL, so it’s easy to find, easy to use, and easy to link to. This is the basis of Linked Data, and of making online research easier to do, and easier to manage.

As of this writing, we have 1446 citations associated with 70 vessels. That’s 0.000409% of all the citations in the database. Admittedly, we have a long way to go! But it is a start, and getting the underlying work done to make this happen was a big chunk of 2019 – it took a lot of time and work and money.

My next goals, beyond expanding the number of citations associated with vessels, is to make a way that users can help grow this resource. Perhaps you have been researching Hornet, and you know that Albion’s Five Centuries of Famous Ships refers to CV-12, rather than CV-8. If you could share that information, to expand the database a bit, that would be huge.

Then, as mentioned above, maybe you have images, or remembrances, about CV-12 specifically, or you want to link to resources about it online (remember, after the current Coronavirus pandemic passes, you can actually visit USS Hornet in Alameda, California; until then, you can visit https://www.uss-hornet.org/) – what if ShipIndex provided a place where you could post those and share them with others interested in researching a specific vessel? That’d be pretty cool, I think.

I’d love to hear what you think about this enhancement. For me, it’s been a long time coming. Of course, there’s much more to do, but I’m very excited about this significant change.

Deleting data dilemma

Of course I hate to remove content from the ShipIndex.org database; I’m always working on trying to expand, not contract, the database. But bad data is worse than no data, and an online resource recently disappeared, so I had to delete its contents from the database. The truth is, I have waited too long to remove this content, because I had been really pleased to get to 3.4 million citations, and removing 380,000 will be a big hit in getting to three and a half million citations.

While online resources are certainly wonderful – you can get to your results without leaving your home – they are most certainly not permanent. They exist in one place and everywhere at the same time, but then when they disappear, they’re gone completely. This is, obviously, not the case for books.

I have contacted the creator of the missing database, and haven’t heard back from him, but perhaps I’ll find another way of getting in contact, and maybe, just maybe, we can find a way to get that content in to ShipIndex separately.

One result of deleting these records is that there will be some of what we call “citationless ships” for a little while. These are entries for ships that now have no citations on them at all, because the only citation was from this one resource. I need to remove them from the database, but that will take a bit of time for some technical reasons. But I’m working on it, doing my best to keep the database clean and accurate.

Some good news is that I have scores (actually, four score, at present) of book files waiting to be imported. I’ve started adding those and have more to go. While they won’t add up to today’s lost 380,000 citations, they will get me back closer to that number, and since they’re all printed resources, they won’t disappear any time soon.

Updated OCLC WorldCat data – 20% more, and more accurate

I’ve updated an important resource, adding 20% to its contents, and improving the accuracy of all of the data in it. When we converted ShipIndex.org from a hobby to a business, we worked with OCLC to get a file of books by or about ships. For more about how these records are used, see the first of two posts about WorldCat records, here.

In any case, we agreed with OCLC that these records would remain in the free database, rather than the newly-created subscription database. There were about 40,000 records in that file. Last month, I had the opportunity to visit OCLC’s headquarters, in Dublin, Ohio. While there, I received an updated version of this file, which now contains over 50,000 authority records for ships.

I worked through the file, doing cleanup and corrections, and spent a few tries at loading the file into the ShipIndex.org database. It wasn’t as easy as other files, because the OCLC records are fully Unicode compliant. The database likes UTF-8, but Unicode is a bit beyond its abilities. (Actually, not in its abilities to display vessel names, but in its abilities to store them.) I replaced vessel names in Cyrillic, Japanese, Chinese, etc., with their transliterated names, and also removed a lot of the Unicode characters that were causing problems.

I also fixed a lot of names that I hadn’t fixed the first time around. Most of these were ship names with prefixes attached, like “USS Daffodil” or “HMS Daffodil” or “S/S Daffodil”. It’s always best to search without those prefixes. I have cleanup still to do on those leftover ship names, but the new records are live and I can do the cleanup later.

So now, as a result, the OCLC WorldCat resource has grown from about 40,000 to about 50,000 citations, and the metadata is much improved. All of these citations are in the free database. This is a big improvement all around. Thanks again to OCLC for creating this file for me!

ShipIndex as a Vessel Name Authority File

[This entry was written long ago, but not posted, because I was having problems with uploading images. As you’ll see, images are a critical part of this post! Now that I’ve gotten that problem resolved, I will add a few more posts soon. PMc]

Last May, I finally completed one very large file for import. This file was incredibly tough to process, but I learned a lot about how one can use the database, and I thought I’d share that information here.

The database is Mariners and Ships in Australian Waters, and it is a collection of transcribed passenger lists for thousands of voyages to Australia, primarily in the 2nd half of the 19th century. Because most records were handwritten, and then transcribed by volunteers, many, many errors crept into the database.

The database has 58,311 records in it. (I believe more are always being added to the website itself, as transcribers complete their work.) One major difference between this and every other resource is that each voyage has a separate entry. In the Ellis Island Database, a user searches by ship name, then goes in deeper by voyage date. In this case, the collection is organized by arrival year, then arrival month, then ship name – so I had to create a separate entry for each voyage, to be able to link to each transcription.

I quickly realized that there were many, many, many errors in the transcription of vessel names. Just looking over the ship names as they appeared in the spreadsheet, it was easy to spot typos – especially with the additional information I had about masters and tonnage, which helped connect a misspelling to a correct spelling.

After correcting numerous such misspellings, I did a test import of the file and found 1707 new ship names would be added to the database. I started to investigate each of those, and found that many were not actually new ship names – they were simply additional mistranscriptions of the passenger lists. As the ShipIndex.org database grows, it’s important to try and minimize the introduction of incorrect ship names.

For example, I saw this entry, which the transcriber recorded as “Maealsar”. The master’s name had been transcribed as “C M de Boer”, and the vessel size as 305 tons.

authblog1
I thought it looked a bit like “Macassar”, but there were no other “Macassar”s in that file. I did a search in ShipIndex.org for Macassar (http://www.shipindex.org/ships/macassar), and found an entry from the American Lloyd’s Register of American and Foreign Shipping for the same year, and found a Macassar there, with a captain C. M. De Boor, and tonnage of 306. Obviously, these are the same ship.

authblog2
I corrected the vessel name, but kept the mis-transcription, too, just in case I was wrong. So the entry now looks like this: “Macassar (corrected; listed as “Maealsar”) (of Amsterdam, C M de Boer, Master, 305 tons, from the port of Balaves to Sydney, New South Wales, 23 Mar 1861)”.

Another example was this name, which had been transcribed as “Magport”:

authblog-3

I thought it looked like it started with an “N”, but found no “Nagport” already in the database. However, a search for “nagp*” turned up “Nagpore”, among others, and a link to the entry of Record of American and Foreign Shipping for the same year returned these two ships:

authblog-4

One has the same master and tonnage as the one in the transcription. It then becomes clear that there’s an “e” hiding behind the bar on the page, rather than a “t”.

 

I felt like it became a combination of genealogy and authority record work. I tried to find sufficient documentation to prove that my analysis was more accurate than the original. And because I had both the entire set of metadata from the source, and the 2.3 million citations already in the ShipIndex.org database, I could more easily determine that various transcriptions were incorrect.

I recognized that ShipIndex.org is beginning to serve as an authority file for vessels. It is certainly my goal to improve the database along those lines, and I will use another blog post to discuss this further.

 

I found many instances of doing this sort of research, and while it took a very long time, it was actually quite fun to nail down a correction. Some were surprising – I guess I can see why one might read this as “Princess of Water”:

authblog-5

 But why in the world would you not recognize that “Princess of Wales” makes infinitely more sense for a ship name?

 

I’ll provide two last examples here. This first one shows how I used the existing metadata for the resource itself to determine the correct ship name.

The beautiful handwriting on this one made it easy to read, and it’s not surprising that it was transcribed as “Oasby”. But there was only one entry in the entire file for “Oasby”, and none in the existing ShipIndex.org database, so it made me wonder.

authblog-6A search through the metadata for the captain’s name, however, found 17 entries with Kennedy as captain (as had been noted in the transcription for this entry), for ship “Easby”, and the full resource has at least 70 other entries for “Easby”. Tonnage data is the same, and after learning of the existence of “Easby”, it’s easy to see that that’s what the ship name was; and the top of the dramatic ‘E’ was lost in the digitizing process.

This made the next new ship name, “Oaton Hall”, easy to resolve to “Eaton Hall”.

Finally, I dealt with this challenging entry by using the existing ShipIndex.org database:

authblog-7I tried searching for “waurego”,  but that returned no ships. By searching for “*rego”, I found all the citations that had a word in the ship name that ends in “rego”. I could easily locate “Warrego”, and confirm that’s the right ship.

There’s other searching that could be done here, too. If I change the search to “*rego$” it returns only the ship names that actually end in “rego”, deleting several, like “Trego Renneger” or “Effrego Ventus”, from the result list.

I’ll put together another post in the next few weeks with more examples of changes and corrections I was able to make, along with a discussion of the importance of authority data for ship names.

 

Deleting data – sometimes it must be done

I had to delete content from the database this morning. I’ve delayed doing it for a long time, but it had to be done. The “Property Management & Archive Record System” database, created by the US Department of Transportation’s Maritime Administration, was actually a very useful database, but was removed temporarily – and then permanently – so I really had no choice but to remove its contents from the ShipIndex.org database.

I had written the following description of the database:

This resource, called “PMARS”, is the official repository of records about vessels that are or were parts of US Maritime Administration’s Naval Defense Reserve Force. As a result, it focuses on ships from World War II to the present. Only a few hundred vessels are still in NDRF, but PMARS contains information about nearly all ships (over 7000) that were included in NDRF at some point.

While the database contains “basic ship data” about each vessel, the “Custody Cards” and “Disposal Cards” are of particular interest. These are images of the printed, typed, or handwritten notes regarding disposition of each vessel.

I had a great experience at a library conference once, using the PMARS database. A special collections librarian from Occidental College, in California, wanted to learn more about a Victory ship called “Occidental Victory”, named after her institution. (Victory ships were slightly larger and more powerful than Liberty ships; both were quickly-built cargo ships used extensively during World War II, and critical to Allied success in the war.) We looked up “Occidental Victory” in the ShipIndex.org database, and found a record from PMARS. It included digitized images of the ship’s Disposal Card, which showed the history of the ship and its final outcome.

The database also showed that the Maritime Administration still owned the binnacle for the ship, and was willing to loan it to museums and libraries for exhibits! She was thrilled to discover this, and said she wanted to create an exhibit about the ship, and of course borrow the binnacle for the exhibit. I don’t know that this ever happened, but to discover the binnacle was available was, I thought, really neat.

The digitized Disposal Cards and Custody Cards were great items, too, and it’s such a shame that these things are no longer available online. One might think that in our digital environment, such items wouldn’t be lost or taken off-line. But when it happens (and it happens more often than one might think), the data is lost for good, because it wasn’t backed up elsewhere, such as in the form of multiple physical copies in many different libraries.

For a while, the PMARS links redirected you to a page that said something to the effect of, “for more information, contact ____.” So I did. A little over a year ago I contacted people at the US Maritime Administration to ask what had happened to PMARS, and if it was coming back. I got a nice, quick response, and was told that PMARS had been taken off-line “due to security concerns”, that great bugaboo of meaninglessness. It was expected to return in mid-2012, in the form of two different databases, but that didn’t happen.

Now, the links are simply dead, and take you nowhere. If PMARS does come back, in whatever form, I’ll quickly return it to the ShipIndex.org database. Until then, I feel the proper thing to do is to remove the content from the database.

But I do anticipate adding a lot of new content in the very near future; I have a project going on that should, if all goes well, add lots of great new content in the next ten days. It won’t replace the content lost from the loss of the PMARS database, but perhaps that will, in fact, come back some day.

On Naming Ships and Representing them in ShipIndex

At present, ShipIndex.org has one point of access: the vessel name. You’d think that would be fairly easy, at least in the case of extant vessels: just look at the stern or the bow, and see what’s written there. Alas, it’s not that simple. There are many reasons for this, and a lot of them are completely understandable. Others can lead to surprisingly interesting stories.

While working through the index to the first 50 years of Steamboat Bill, and its successor, PowerShips, I came across many, many mentions of the Queen Elizabeth 2. Most of these are listed under the very common, abbreviated name, “QE2”. In the ShipIndex database, however, one also finds many entries for a different version of the name, “Queen Elizabeth II”. I read a bit about the ship on its Wikipedia page, and learned some interesting stories about how the name came about. According to the contributors, the name of the ship was not announced before the launching. Cunard intended to name the ship “Queen Elizabeth”, but the Queen, when she launched the ship, stated “I name this ship Queen Elizabeth the Second.”

The next day, newspapers announced the name as “Queen Elizabeth II”, though when the ship was delivered its name read “Queen Elizabeth 2”. According to Wikipedia, “From at least 2002 the official Cunard website stated that ‘The new ship is not named after the Queen but is simply the second ship to bear the name – hence the use of the Arabic 2 in her name, rather than the Roman II used by the Queen’, however, in a change in 2007 this information had been removed.”

In addition, there’s confusion about who the ship is named after. Multiple sources provide multiple suggestions. Some feel the ship is named after the current Queen, and that, in fact, she made that change when she announced its name. Others state that it is named after her mother, the wife of King George VI. Others state it’s named after the previous Cunard ship named Queen Elizabeth.

We need to make it possible for people to find ship names however they might be represented, and so we’ve created functionality that allows one to link between variant names for specific ships. So, for example, when you search for “QE2”, you find entries that cite “QE2”, but you also find a link at the top taking you to entries for other variant names for this ship, specifically “Queen Elizabeth 2” and “Queen Elizabeth II”.

We also have the ability to ‘normalize’ ship names, and in that case, one goes directly from a misspelling of a ship name to the correctly spelled entry. So, by rights, we should ‘normalize’ “QE2” and “Queen Elizabeth II” to “Queen Elizabeth 2”. But I think that, in this case, for this very famous ship, it’s worth maintaining the separate entries and linking them together via the “alternate spelling” links. Maybe I’m wrong; should I just normalize them all together? What do you think?

We also show links for previous and subsequent names of ships. So, if you search for “Euterpe”, you’ll see a “subsequent name” link to “Star of India.” It is important to remember that if there are multiple ships with the name “Euterpe,” the link appears, but doesn’t apply to all of them. Creating a system that separates out all these ships is a big project, but one that we will tackle.

One great thing about the Steamboat Bill files is that they include many previous and subsequent vessel names. Unfortunately, they don’t exactly indicate the order in which vessel names appeared; you’ll see both “Liberte; a) Brasil; b) Volendam; c) Monarch Sun; d) Volendam; e) Island Sun; g) Canada Star h) Queen of Bermuda” and “Queen of Bermuda; a) Brasil; b) Volendam; c) Monarch Sun; d) Volendam; e) Island Sun; f) Liberte; g) Canada Star”, as well as “Island Sun; a) Volendam”. So, some research is needed to figure out the order in which the ship names appeared. Then, I still have a question about whether or not I should include all of the previous and subsequent names in each entry or not. In the above example, if I determine that the actual path of ship name changes was Queen of Bermuda, then Brasil, then Volendam, then Monarch Sun, then Volendam (again), then Island Sun, then Liberte and finally Canada Star”, do I include ‘subsequent name’ links from Brasil to Volendam, Monarch Sun, Island Sun, Liberte, and Canada Star? That creates a lot of links. Or do I just have a link from Queen of Bermuda to Brasil, and on Brasil a link to Volendam?

And if I list all previous or subsequent names for a ship that had the same name twice, then in this case the entry for Brasil (and Queen of Bermuda, and others) will have multiple ‘subsequent name’ links to Volendam. The page for Volendam could conceivably have a link back to itself!

What do you think? What’s the best way to represent this important data?

How variant editions can screw up Google Books links

As we’ve mentioned in the blog before, you can link to the full text of many, many resources cited in ShipIndex.org. In fact, with a recent addition of a file containing tens of thousands of online ship images, nearly 90% of the citations provide full-text linking. Much of the linking comes through links to online resources, but others are available via links to books in Google Book Search.

A few weeks ago, several of us at ShipIndex were using some of these links, and found that many links for Sherry Sontag’s book Blind Man’s Bluff didn’t seem to work. While the links took one to the page cited in the index, the vessel mentioned in the index wasn’t listed on the page that we ended up at in Google Books. So today I picked up a copy of Blind Man’s Bluff from my local public library, to see if I’d made a lot of mistakes in working through the index.

I found that, in fact, I hadn’t made any mistakes – the page numbers in ShipIndex were the same as the page numbers listed in the back of the book. So then I re-tried some of the Google Book links we offer. Once again, a link to page 57 took me to page 57, but USS Halibut wasn’t mentioned on page 57 in Google. So I checked the copy I’d gotten from the library. That’s where I discovered the problem.

The copy from my public library, and the copy I’d originally used when creating the file to add to ShipIndex, came from the first publication of the book, by Public Affairs, a division of Perseus Books, and first published in 1998. But the copy on Google Books is the paperback edition, published by HarperCollins, in 1999, and the pagination, layout, and nearly every other aspect is completely different between the two. The HarperCollins version has 432 pages, while the Perseus version has 352. While the content may be exactly the same, the pagination is obviously different, so linking doesn’t work the way it should.

So now it seems that, in order to make the Google Books linking continue to work, I need to find an index to the HarperCollins edition of the book, and replace the index I’d compiled from the Public Affairs edition. It’s likely not a big deal to get done, but I thought it was an interesting problem that we may come up against more and more in the future.

New content added in past few weeks

Here’s an overview of the new content added in the past few weeks. Two collections are of particular note: the Lloyd’s List for 1812, via 1812Privateers.org, and the Dyal Ship Collection. One man, Michael Dun, has digitized and indexed all of the issues of Lloyd’s List for the entire year of 1812. It’s quite a feat. He’s indexed all of the ships and all of the masters for that time, adding up to nearly 26,000 ship citations in all the issues of Lloyd’s List for 1812. He kindly shared his index with me, so I could include links to his resources. Mr. Dun hosts the pages on his servers, and they are accessible to all via that site. While working through the index of ship names that he provided to me, I was able to identify a number of corrections, and I incorporated those into the file I imported.

Working through this file was also an interesting reminder about the challenges we face in trying to make the most of these primary sources. Clearly, the folks who were putting together each issue of Lloyd’s List (it usually came out twice a week, and was published in London) were trying to get information out as quickly as possible, and weren’t too concerned with absolute accuracy, to say nothing of how researchers two centuries later would like them to present information.

As a few examples, each of the following slight spelling variations by the editors are likely the same ship: Misletoe, Misseltoe, and Missletoe (there’s no Mistletoe listed in this year of Lloyd’s!). Or, Nymph, Nymphe, and Nymphen. Or Powhatan, Powahattan, and Powhatton. Or Zenophon and Zenophen, when the proper spelling is Xenophon. Or Tinmouth Castle, most  likely meaning Teignmouth Castle. Or simple errors, like Hepsa instead of Hespa.

Of course, if you’re reading this at a London coffee shop one morning in 1812, you can easily look over these minor errors, and figure out what the editors’ intent was. But for researchers two centuries later, who are trying to mine large amounts of data to see what they can find, these errors cause a problem. So how do we address them? That’s an issue for an upcoming blog post. But, needless to say, we at ShipIndex.org have a solution…

Another interesting addition is the Dyal Ship Collection, but for very different reasons. This is a collection of images and data compiled by a researcher (in this case, a librarian) and added to his institution’s “institutional repository” (IR). An IR is a site, usually maintained by an academic library, where content generated by the institution’s faculty, staff, and students is made available for free. It is, in a large sense, a reaction to the high cost of many academic journals, where an institution’s researchers spend time and money doing and compiling research, then pay to have that published in a scholarly journal, then the institution pays to buy the results back, through a subscription to the journal. The whole discussion is beyond the scope of this blog post, but the point is that IRs are places where interesting and useful information can be stored — but it’s most often quite hidden, unless there’s some effective way of indexing the content.

So, with the encouragement and assistance of the compiler, we’ve created links into the collection of files and images that are stored in Texas Tech University’s institutional repository. Recently, we’ve heard from others who have data they’d like us to include, and we’re looking at ways of doing that effectively. This is just one example of that.

Other items we’ve added are mostly more standard print or online collections. The total list is as follows:

If you have maritime content that you’d like to get online, or is online but needs broader publicity, please let us know. We’d love to find a way to help.

Is there a better way to present this data?

I’ve been working on a big file that’s going to be very useful to ShipIndex.org subscribers, especially those interested in World War II vessels. H.T. Lenton’s tome, British and Imperial Warships of the Second World War, is an incredible resource. Its 750+ pages are absolutely jam-packed with useful content, but it has presented me with a few challenging issues about how to manage this data. I thought I’d describe some of it here, explain what my plan is, and see if the greater good has any better suggestions. There’s still time to modify how this resource is managed. I’ve probably invested at least 30 full hours in preparing this file – and that doesn’t include a significant amount of work done by another person before me – and I still have a long way to go. But that’s what it takes, sometimes, to get a resource like this one ready to add to the database.

The first part of this remarkable volume looks at larger, named vessels, organized by vessel type and class. As one example, the “Corvettes and Frigates” section is divided into entries on the “Flower” class, the “River” class, the “Kil-” class, and four more classes. (The introduction has several fascinating paragraphs about the peregrinations of naming vessels, and shows how complicated the whole process was. A fair bit of background knowledge is required just to understand this section!) After some commentary on the design and development of the class, Lenton provides tables showing brief history information for every vessel in a class. Information may be quite extensive, or it might consist of as little as an indication of the intended builder and the approximate cancellation date (for example, for vessels ordered but not begun before the war ended).

This works fine for named vessels, but creates a conundrum for unnamed vessels. In the LCM (Landing Craft Mechanised) section, for example, the index notes that “LCM.21-118” appear on pg 490; “LCM.119-220” on pg 491, “LCM.221-334” on pg 492, etc. Of the 100+ ships on each page, though, just two to three dozen have any information at all about the vessel, and that information is slight, at best. For the LCMs, most have no Building or Completion information. Of the ones that have “Fate” information, it usually reads something like “Lost cause unknown Algiers ../11/42.” (Meaning it was lost in November 1942, but the exact date and cause is not known.)

To me, this information might be useful to someone, and I don’t want to not include the entry for that vessel. But for each one like that, there are several where no information at all is included, and I believe that adding an entry to ShipIndex.org should imply that at least SOMETHING is available in the resource. So I’ve decided that what I’ll do is expand entries like “LCM.21-118” to be “LCM.21”, “LCM.22”, “LCM.23”, etc., up to “LCM.118”. Then I’ll compare my list with the book itself. If there’s any information at all about the vessel, I’ll keep the entry. If there is no information beyond its listing on the page – nothing about where it was built, or how it was lost, for instance – then I’ll delete it. My thought is that if the volume offers one piece of information, I’ll include the vessel name in the index.

Still, it’s worth noting that for people who are working on an unlisted LCM, the volume may contain information about the LCM class that might be relevant. And if you’re looking for an image of a specific auxiliary vessel, it may be that an image of a different vessel in the same class will do. It appears that the most common vessel type in which this will apply will be the LCMs, of which several thousand were built, but it will be interesting to see how it actually turns out.

Am I doing the right thing? Should I be handling this in some other way? Is there some other way that I should note the amount of information presented? I’d welcome your comments – if there’s a better way of doing it, now’s the time for me to hear about it.