Category Archives: Data Correction

Specific resources that have been removed from the ShipIndex database

Written by: Peter McCracken
Published: January 16, 2022
Categories: Data Correction

In the past I haven’t kept a running tally of content that has been removed from the database, but I have mentioned it. The database takes a huge hit when I have to remove 380,000 citations in one go! As mentioned in the prior post, we’ve looked over all online resources, just to make sure they’re working. Many are not. I am going to make a note of the ones that I’m deleting from the database on the table below; I’ll be updating this over the next month or two as a work through all of the problematic online resources.

RESOURCE NAME	NOTES ABOUT THE RESOURCE
National Small Boat Register	As noted here, the database has been taken offline for an unknown period. Let’s just hope it does, eventually, return.
Blue World Web Museum	Google still shows the underlying data, but it’s not available. This collection had great links to images of ships from artwork.
Containership-info.com	Just not there anymore…
NOAA History: NOAA Coast and Geodetic Ships	I could not find any of the NOAA history pages anymore. A few pages may exist on the NOAA pages, but they’re not organized in the same way as in the past.
Union List of Historic Vessels in North America	This one threw me for a loop,

I’ll keep adding to this list as I work through our data.

More resource additions, and a few deletions, too

I provided a list of new resources added to the ShipIndex.org database, back in November. We’re always adding new content, so I’ll include a list of the new stuff ~~at the bottom of this post~~ in an upcoming post. But I also need to address the fact that we have to remove some stuff, as well.

Monographs, or books, are great as resources, because once they’re added, we know they’re not going anywhere. Those books are in libraries and collections around the world. You may not be able to access them right away, but eventually, you’ll be able to do so. Online resources are great because you can link to them RIGHT NOW. Boom, click, done. Except, when that doesn’t work.

Online resources are great for convenience, but not for reliability. They change and disappear all the time. For some reason, website publishers still don’t realize that if they’re going to change their URLs, they’re going to break access for repeat users. They can include redirects, but rarely do. Too often, website publishers switch from a straightforward linking and searching structure to some fancy search tool that removes prior direct links, and makes new direct links impossible. Tim Berners-Lee, the creator of the World Wide Web, defined five stars for Open Data. One of those is making sure that people can point to your stuff. That is, make sure they can link to it easily. If you are required to do a search to get data, rather than also having a direct link that would get a person to your content, then you’re doing it wrong.

As one example, there’s a brand new “Royal Navy Loss List searchable database” at https://thisismast.org/research/royal-navy-loss-list-search.html. It’s nice that this data is here, and you can do a search for, say, “Indefatigable”, and find a record. But you cannot provide a direct link to the “Indefatigable” results, without going through that search page, which is really annoying, at least for those who care about open data.

Unfortunately, in this case, the MAST Loss List database only meets one of Sir Tim’s five stars toward Open Data. They could — and should — do much better.

But even worse is the total disappearance of online resources. Our data team recently reviewed all online resources in the database, and found quite a few which have disappeared, or are currently offline. We discovered a lot of problems that we’ll need to address. In some cases, the fix is pretty easy because there’s an obvious change to the URLs in the database. This was the case for the Bremen Passenger Lists; we fixed them, and they’re accessible again.

For others, though, we see bigger problems. Take the UK’s “National Small Boat Register”, for instance, which was hosted by the National Maritime Museum in Cornwall. At https://nmmc.co.uk/explore/databases/, you can see that the museum reports in an undated note, “The NSBR is currently offline whilst we create a new and improved website. We will have it up and running again as soon as possible. Please check back for further updates.” WHAT???

I’m all for thoughtful and improved websites, but why take down the old one when you’re building the new one??? Why not just keep it up until the new one is live and working?? The old one worked, didn’t it?? (It obviously did, at one point, when we added it to the database.)

There’s nothing to do but delete the National Small Boat Register contents from the ShipIndex database, and hope we’ll discover the replacement database when — if — it is ever put back online.

The Blue World Web Museum recently disappeared, as did other smaller resources. If you manage a vessel database that you can no longer keep online, please, please, please, contact me at comments (at) shipindex (dot) org, and give me a chance to see if we can save that resource for you.

I think I’ll start a separate blog post that lists the online databases that have disappeared; if you know of new sites for any of these, or contacts for folks who might be willing to offload that work to ShipIndex.org, please do let me know.

This got quite long, so I’ll create a separate post that lists the recently-added new content, in a day or three. (After making the post about lost databases, I suppose.)

Unique Vessel Identifiers Added to ShipIndex.org – the short version

Written by: Peter McCracken
Published: April 12, 2020
Categories: Data Correction, Website Improvements

I wrote a very long blog post about our new use of Wikidata identifiers, and how it changes the ShipIndex database. It might be too long for some. Here, I hope to limit my overview of these changes to just four paragraphs (not including this one).

The ShipIndex database has grown a lot over the past decade, and now has over 3.5 million citations in it. For very common ship names, this makes it too hard to find information about a specific ship. How do you find information about the right America, when there are 2,378 different citations to work through?

The solution is unique identifiers – basically a specific identifier for each hull. This also allows us to bring together citations for the same ship as it changes names. To make this work, we’re using identifiers from Wikidata, plus local identifiers when Wikidata doesn’t yet have one. Wikidata makes it easy to use Linked Data, so that we can uniquely identify and share items and concepts: the identifier Q82925, represented at https://www.wikidata.org/wiki/Q82925, specifically refers to the author Joseph Conrad, while Q1278752, at https://www.wikidata.org/wiki/Q1278752, specifically refers to the ship named after Conrad, now at Mystic Seaport Museum. It also refers to the original name of that ship, Georg Stage, but differentiates it from the current ship with that name. Similarly, the identifier Q838125 refers to USS Hornet (CV-8), and Q1141355 refers to USS Hornet (CV-12).

When you look at the ShipIndex page for Hornet now, you’ll see eleven different ‘cards’, each one referring to a specific hull, or vessel. Click on any of those ship names, and you’ll see only the citations that have been associated with that unique identifier. And note the URL for each card – it includes the Q-identifier, so others can easily link to it as well, without needing to create some new URL. In addition, many citations have not been associated with any card, because we cannot determine to which vessel the citation refers – at least, not without going back to the original source.

In the future, I hope to offer ways for individuals to help grow this resource, and maybe create a way that people can share their own information – images, reminiscences, comments, online links – about specific vessels. Until then, I’ll be working away at associating as many citations as possible to specific vessels. I hope that this improves your experience with ShipIndex, and helps everyone do more and better maritime history research.

For more information about what we’ve done here, check out the much-longer blog post.

All the Gory Details about Adding Unique Vessel Identifiers to ShipIndex.org

Written by: Peter McCracken
Published: April 12, 2020
Categories: Data Correction, Website Improvements

TL;DR: The ShipIndex.org database can now differentiate between disparate ships of the same name, and combine citations using different names for the same ship. We use Wikidata Q-identifiers to do this, and hope that we can help apply Linked Data to maritime history.

This is a very long post about some significant enhancements that we’ve added to ShipIndex.org. For a shorter post about the same thing, that doesn’t go into quite as much detail, see here.

Greetings, from ShipIndex central. It’s been a while since our list blog post, and I referenced some upcoming changes back then. Of course, things have taken a bit more time than I’d expected, but I’m ready to start sharing some of these changes.

Right now, ShipIndex.org has over 3.5 million citations from over 900 resources. If you’re looking for a specific ship with a common name, you’re gonna have a hard time. Looking for a specific “Eagle”? As of today, there are 2,677 citations for ships named Eagle. “America”? There are 2,378 citations. The most common ship names are, in increasing order, “Hope”, “Anna”, and “Maria”, with “Elizabeth” having 3,818 citations, and “Mary” leading by a lot, with 5,072 citations.

Researchers have a big problem in trying to work through those common ship names. And they have a problem when a ship changes its name – it’s still the ship they want to research, but ShipIndex.org doesn’t really connect a user with the ship’s previous or subsequent names. (Well, it did, a bit: if you look at the “America” entry, you can see “related ships” – but how do you know which of the 2,378 citations also refer to Italis, West Point, or Australis?)

What I always wanted was a “unique vessel identifier”. I could bring together citations that refer to the same ship, and differentiate between citations for different ships with the same name. I wasn’t sure how to make that work, until a colleague at my day job suggested using Wikidata identifiers. This was such a great idea, for many reasons.

You’ve all seen the xkcd comic about standards, right? No? OK, here:

Same thing with identifiers. Many already exist – naval hull numbers, IMO numbers, national registration numbers, and others – but none refer to all ships, obviously. I didn’t want to create a new identifier, especially since most of the world wouldn’t use it. Wikidata, however, addresses all of these issues, and most importantly it can make maritime history research easier by using these identifiers across the web. For example, a vessel at a maritime museum, like Mystic Seaport’s Joseph Conrad, has the Wikidata identifier Q1278752. This is a “Q-number”, assigned at random. It is unique to this specific item. The Wikidata entry contains other pieces of factual data about the vessel, including its current location, its builder, and some of its dimensions. Anyone can add to the record about any entry.

Every item or entry (including non-physical concepts) in Wikipedia has a Q-number. Look on the left column for any Wikipedia entry (like the Conrad‘s), and you’ll see an entry under “Tools” that’s labelled “Wikidata item”. That links you to the Wikidata entry for the item in question. Information from Wikipedia is incorporated into Wikidata, and all of the information is available for sharing and using on the web. Look on the right of the Wikidata entry, and you’ll see a list of entries on that subject, in numerous different languages.

(As another aside, note the difference here between Wikipedia and Wikidata. Wikipedia contains textual information and discussion about a topic or item. One subject may have multiple entries in different languages. [The German entry about Joseph Conrad is not just a straight translation of the English one. Nor is the Farsi entry.] Wikidata, on the other hand, is a collection of data – just the facts, ma’am – about the topic. Wikidata does not have foreign-language versions of each data page.)

This Wikidata entry for “Joseph Conrad” is different from entries for the Polish author even though the ship is named after him; from a French army officer with the same name; a US army officer; and more. By using linked data in this way, online systems can better identify the person from the ship, making it easier for researchers to find what they’re looking for, quicker.

Over time, I’ll be able to do lots more with the Wikidata that is available to us, as the Wikidata database grows. Hopefully, the ShipIndex.org data will be easier to find online, plus it will be easier to use, because ships with common names will be better sorted, and ship name changes will be better represented.

Last December, my son and I visited the Cradle of Aviation museum in Garden City, NY. While there, we saw this plaque describing the many different ships in the US Navy named “Hornet”:

This is a great example of what we’re trying to do here in ShipIndex.org – sorting and dividing the many very different ships with the same name. (Note here, though, that this plaque differentiates between the last three versions of USS Hornet, saying that CV-12, CVA-12, and CVS-12 were different ships. They really weren’t; they were the same hull, even if they were refitted for different uses over the last decades of service. I don’t know, but I’d be surprised if Navy veterans who served on CV-12 would feel that they served on a different ship than those who served on CVS-12, for instance.)

Anyway, when you go to the entry for Hornet in ShipIndex.org now, you’ll see multiple ‘cards’ at the top of the page – each one represents a different vessel, or hull. We pull publicly available data from Wikidata into the cards we create, and we’ll be able to do more there over time. Right now, the images come from Wikidata; when we don’t have one, we have to put in a placeholder. (Imagine a place where you could post your own information, be it pictures, remembrances, links to vessel-specific sites, etc., about a specific vessel. Interested? Let me know.) We organize citations that are specifically about a particular vessel under the appropriate card.

Of course, not every ship in ShipIndex.org has a Wikidata identifier. Right now, we’re using local identifiers when a Wikidata identifier doesn’t exist, or we haven’t found it yet. Since anyone can create a new entry in Wikidata, we can also create identifiers there, and share our knowledge with the rest of the world.

We’ll never get all, or most, or even many, citations associated with cards. “Hornet” has 820 citations from 221 resources. We have 11 cards, for specific vessels, and each card has between 2 and 66 citations associated with it. So, just 235 of 820 citations are associated with cards. But many entries have basically no descriptive information about the ship at all, and one would need to look at each resource to figure out if the Hornet in question is one of the ones for which we have a card. Even when there is information, it’s often not enough – there are nine citations with “aircraft carrier” in the description, but without looking at each resource, I don’t know if they’re referring to USS Hornet (CV-8) or USS Hornet (CV-12).

But for the time being, it’s very much a start toward doing better research in maritime history. Look at the two Hornet links above, for instance. The URL for Hornet CV-8 is https://www.shipindex.org/vessels/Q838125, and the URL for Hornet CV-12 is https://www.shipindex.org/vessels/Q1141355. There’s that Q-identifier again, right in our URL, so it’s easy to find, easy to use, and easy to link to. This is the basis of Linked Data, and of making online research easier to do, and easier to manage.

As of this writing, we have 1446 citations associated with 70 vessels. That’s 0.000409% of all the citations in the database. Admittedly, we have a long way to go! But it is a start, and getting the underlying work done to make this happen was a big chunk of 2019 – it took a lot of time and work and money.

My next goals, beyond expanding the number of citations associated with vessels, is to make a way that users can help grow this resource. Perhaps you have been researching Hornet, and you know that Albion’s Five Centuries of Famous Ships refers to CV-12, rather than CV-8. If you could share that information, to expand the database a bit, that would be huge.

Then, as mentioned above, maybe you have images, or remembrances, about CV-12 specifically, or you want to link to resources about it online (remember, after the current Coronavirus pandemic passes, you can actually visit USS Hornet in Alameda, California; until then, you can visit https://www.uss-hornet.org/) – what if ShipIndex provided a place where you could post those and share them with others interested in researching a specific vessel? That’d be pretty cool, I think.

I’d love to hear what you think about this enhancement. For me, it’s been a long time coming. Of course, there’s much more to do, but I’m very excited about this significant change.

Deleting data dilemma

Written by: Peter McCracken
Published: November 23, 2015
Categories: Data Correction

Of course I hate to remove content from the ShipIndex.org database; I’m always working on trying to expand, not contract, the database. But bad data is worse than no data, and an online resource recently disappeared, so I had to delete its contents from the database. The truth is, I have waited too long to remove this content, because I had been really pleased to get to 3.4 million citations, and removing 380,000 will be a big hit in getting to three and a half million citations.

While online resources are certainly wonderful – you can get to your results without leaving your home – they are most certainly not permanent. They exist in one place and everywhere at the same time, but then when they disappear, they’re gone completely. This is, obviously, not the case for books.

I have contacted the creator of the missing database, and haven’t heard back from him, but perhaps I’ll find another way of getting in contact, and maybe, just maybe, we can find a way to get that content in to ShipIndex separately.

One result of deleting these records is that there will be some of what we call “citationless ships” for a little while. These are entries for ships that now have no citations on them at all, because the only citation was from this one resource. I need to remove them from the database, but that will take a bit of time for some technical reasons. But I’m working on it, doing my best to keep the database clean and accurate.

Some good news is that I have scores (actually, four score, at present) of book files waiting to be imported. I’ve started adding those and have more to go. While they won’t add up to today’s lost 380,000 citations, they will get me back closer to that number, and since they’re all printed resources, they won’t disappear any time soon.

Updated OCLC WorldCat data – 20% more, and more accurate

Written by: Peter McCracken
Published: May 31, 2014
Categories: Data Correction, New Content

I’ve updated an important resource, adding 20% to its contents, and improving the accuracy of all of the data in it. When we converted ShipIndex.org from a hobby to a business, we worked with OCLC to get a file of books by or about ships. For more about how these records are used, see the first of two posts about WorldCat records, here.

In any case, we agreed with OCLC that these records would remain in the free database, rather than the newly-created subscription database. There were about 40,000 records in that file. Last month, I had the opportunity to visit OCLC’s headquarters, in Dublin, Ohio. While there, I received an updated version of this file, which now contains over 50,000 authority records for ships.

I worked through the file, doing cleanup and corrections, and spent a few tries at loading the file into the ShipIndex.org database. It wasn’t as easy as other files, because the OCLC records are fully Unicode compliant. The database likes UTF-8, but Unicode is a bit beyond its abilities. (Actually, not in its abilities to display vessel names, but in its abilities to store them.) I replaced vessel names in Cyrillic, Japanese, Chinese, etc., with their transliterated names, and also removed a lot of the Unicode characters that were causing problems.

I also fixed a lot of names that I hadn’t fixed the first time around. Most of these were ship names with prefixes attached, like “USS Daffodil” or “HMS Daffodil” or “S/S Daffodil”. It’s always best to search without those prefixes. I have cleanup still to do on those leftover ship names, but the new records are live and I can do the cleanup later.

So now, as a result, the OCLC WorldCat resource has grown from about 40,000 to about 50,000 citations, and the metadata is much improved. All of these citations are in the free database. This is a big improvement all around. Thanks again to OCLC for creating this file for me!

ShipIndex as a Vessel Name Authority File

Written by: Peter McCracken
Published: December 12, 2013
Categories: Books, Data Correction, Genealogy

[This entry was written long ago, but not posted, because I was having problems with uploading images. As you’ll see, images are a critical part of this post! Now that I’ve gotten that problem resolved, I will add a few more posts soon. PMc]

Last May, I finally completed one very large file for import. This file was incredibly tough to process, but I learned a lot about how one can use the database, and I thought I’d share that information here.

The database is Mariners and Ships in Australian Waters, and it is a collection of transcribed passenger lists for thousands of voyages to Australia, primarily in the 2nd half of the 19th century. Because most records were handwritten, and then transcribed by volunteers, many, many errors crept into the database.

The database has 58,311 records in it. (I believe more are always being added to the website itself, as transcribers complete their work.) One major difference between this and every other resource is that each voyage has a separate entry. In the Ellis Island Database, a user searches by ship name, then goes in deeper by voyage date. In this case, the collection is organized by arrival year, then arrival month, then ship name – so I had to create a separate entry for each voyage, to be able to link to each transcription.

I quickly realized that there were many, many, many errors in the transcription of vessel names. Just looking over the ship names as they appeared in the spreadsheet, it was easy to spot typos – especially with the additional information I had about masters and tonnage, which helped connect a misspelling to a correct spelling.

After correcting numerous such misspellings, I did a test import of the file and found 1707 new ship names would be added to the database. I started to investigate each of those, and found that many were not actually new ship names – they were simply additional mistranscriptions of the passenger lists. As the ShipIndex.org database grows, it’s important to try and minimize the introduction of incorrect ship names.

For example, I saw this entry, which the transcriber recorded as “Maealsar”. The master’s name had been transcribed as “C M de Boer”, and the vessel size as 305 tons.

I thought it looked a bit like “Macassar”, but there were no other “Macassar”s in that file. I did a search in ShipIndex.org for Macassar (http://www.shipindex.org/ships/macassar), and found an entry from the American Lloyd’s Register of American and Foreign Shipping for the same year, and found a Macassar there, with a captain C. M. De Boor, and tonnage of 306. Obviously, these are the same ship.

I corrected the vessel name, but kept the mis-transcription, too, just in case I was wrong. So the entry now looks like this: “Macassar (corrected; listed as “Maealsar”) (of Amsterdam, C M de Boer, Master, 305 tons, from the port of Balaves to Sydney, New South Wales, 23 Mar 1861)”.

Another example was this name, which had been transcribed as “Magport”:

I thought it looked like it started with an “N”, but found no “Nagport” already in the database. However, a search for “nagp*” turned up “Nagpore”, among others, and a link to the entry of Record of American and Foreign Shipping for the same year returned these two ships:

One has the same master and tonnage as the one in the transcription. It then becomes clear that there’s an “e” hiding behind the bar on the page, rather than a “t”.

I felt like it became a combination of genealogy and authority record work. I tried to find sufficient documentation to prove that my analysis was more accurate than the original. And because I had both the entire set of metadata from the source, and the 2.3 million citations already in the ShipIndex.org database, I could more easily determine that various transcriptions were incorrect.

I recognized that ShipIndex.org is beginning to serve as an authority file for vessels. It is certainly my goal to improve the database along those lines, and I will use another blog post to discuss this further.

I found many instances of doing this sort of research, and while it took a very long time, it was actually quite fun to nail down a correction. Some were surprising – I guess I can see why one might read this as “Princess of Water”:

But why in the world would you not recognize that “Princess of Wales” makes infinitely more sense for a ship name?

I’ll provide two last examples here. This first one shows how I used the existing metadata for the resource itself to determine the correct ship name.

The beautiful handwriting on this one made it easy to read, and it’s not surprising that it was transcribed as “Oasby”. But there was only one entry in the entire file for “Oasby”, and none in the existing ShipIndex.org database, so it made me wonder.

A search through the metadata for the captain’s name, however, found 17 entries with Kennedy as captain (as had been noted in the transcription for this entry), for ship “Easby”, and the full resource has at least 70 other entries for “Easby”. Tonnage data is the same, and after learning of the existence of “Easby”, it’s easy to see that that’s what the ship name was; and the top of the dramatic ‘E’ was lost in the digitizing process.

This made the next new ship name, “Oaton Hall”, easy to resolve to “Eaton Hall”.

Finally, I dealt with this challenging entry by using the existing ShipIndex.org database:

I tried searching for “waurego”, but that returned no ships. By searching for “*rego”, I found all the citations that had a word in the ship name that ends in “rego”. I could easily locate “Warrego”, and confirm that’s the right ship.

There’s other searching that could be done here, too. If I change the search to “*rego$” it returns only the ship names that actually end in “rego”, deleting several, like “Trego Renneger” or “Effrego Ventus”, from the result list.

I’ll put together another post in the next few weeks with more examples of changes and corrections I was able to make, along with a discussion of the importance of authority data for ship names.

Deleting data – sometimes it must be done

Written by: Peter McCracken
Published: February 18, 2013
Categories: Data Correction

I had to delete content from the database this morning. I’ve delayed doing it for a long time, but it had to be done. The “Property Management & Archive Record System” database, created by the US Department of Transportation’s Maritime Administration, was actually a very useful database, but was removed temporarily – and then permanently – so I really had no choice but to remove its contents from the ShipIndex.org database.

I had written the following description of the database:

This resource, called “PMARS”, is the official repository of records about vessels that are or were parts of US Maritime Administration’s Naval Defense Reserve Force. As a result, it focuses on ships from World War II to the present. Only a few hundred vessels are still in NDRF, but PMARS contains information about nearly all ships (over 7000) that were included in NDRF at some point.

While the database contains “basic ship data” about each vessel, the “Custody Cards” and “Disposal Cards” are of particular interest. These are images of the printed, typed, or handwritten notes regarding disposition of each vessel.

I had a great experience at a library conference once, using the PMARS database. A special collections librarian from Occidental College, in California, wanted to learn more about a Victory ship called “Occidental Victory”, named after her institution. (Victory ships were slightly larger and more powerful than Liberty ships; both were quickly-built cargo ships used extensively during World War II, and critical to Allied success in the war.) We looked up “Occidental Victory” in the ShipIndex.org database, and found a record from PMARS. It included digitized images of the ship’s Disposal Card, which showed the history of the ship and its final outcome.

The database also showed that the Maritime Administration still owned the binnacle for the ship, and was willing to loan it to museums and libraries for exhibits! She was thrilled to discover this, and said she wanted to create an exhibit about the ship, and of course borrow the binnacle for the exhibit. I don’t know that this ever happened, but to discover the binnacle was available was, I thought, really neat.

The digitized Disposal Cards and Custody Cards were great items, too, and it’s such a shame that these things are no longer available online. One might think that in our digital environment, such items wouldn’t be lost or taken off-line. But when it happens (and it happens more often than one might think), the data is lost for good, because it wasn’t backed up elsewhere, such as in the form of multiple physical copies in many different libraries.

For a while, the PMARS links redirected you to a page that said something to the effect of, “for more information, contact ____.” So I did. A little over a year ago I contacted people at the US Maritime Administration to ask what had happened to PMARS, and if it was coming back. I got a nice, quick response, and was told that PMARS had been taken off-line “due to security concerns”, that great bugaboo of meaninglessness. It was expected to return in mid-2012, in the form of two different databases, but that didn’t happen.

Now, the links are simply dead, and take you nowhere. If PMARS does come back, in whatever form, I’ll quickly return it to the ShipIndex.org database. Until then, I feel the proper thing to do is to remove the content from the database.

But I do anticipate adding a lot of new content in the very near future; I have a project going on that should, if all goes well, add lots of great new content in the next ten days. It won’t replace the content lost from the loss of the PMARS database, but perhaps that will, in fact, come back some day.

On Naming Ships and Representing them in ShipIndex

Written by: Peter McCracken
Published: February 14, 2011
Categories: Data Correction

At present, ShipIndex.org has one point of access: the vessel name. You’d think that would be fairly easy, at least in the case of extant vessels: just look at the stern or the bow, and see what’s written there. Alas, it’s not that simple. There are many reasons for this, and a lot of them are completely understandable. Others can lead to surprisingly interesting stories.

While working through the index to the first 50 years of Steamboat Bill, and its successor, PowerShips, I came across many, many mentions of the Queen Elizabeth 2. Most of these are listed under the very common, abbreviated name, “QE2”. In the ShipIndex database, however, one also finds many entries for a different version of the name, “Queen Elizabeth II”. I read a bit about the ship on its Wikipedia page, and learned some interesting stories about how the name came about. According to the contributors, the name of the ship was not announced before the launching. Cunard intended to name the ship “Queen Elizabeth”, but the Queen, when she launched the ship, stated “I name this ship Queen Elizabeth the Second.”

The next day, newspapers announced the name as “Queen Elizabeth II”, though when the ship was delivered its name read “Queen Elizabeth 2”. According to Wikipedia, “From at least 2002 the official Cunard website stated that ‘The new ship is not named after the Queen but is simply the second ship to bear the name – hence the use of the Arabic 2 in her name, rather than the Roman II used by the Queen’, however, in a change in 2007 this information had been removed.”

In addition, there’s confusion about who the ship is named after. Multiple sources provide multiple suggestions. Some feel the ship is named after the current Queen, and that, in fact, she made that change when she announced its name. Others state that it is named after her mother, the wife of King George VI. Others state it’s named after the previous Cunard ship named Queen Elizabeth.

We need to make it possible for people to find ship names however they might be represented, and so we’ve created functionality that allows one to link between variant names for specific ships. So, for example, when you search for “QE2”, you find entries that cite “QE2”, but you also find a link at the top taking you to entries for other variant names for this ship, specifically “Queen Elizabeth 2” and “Queen Elizabeth II”.

We also have the ability to ‘normalize’ ship names, and in that case, one goes directly from a misspelling of a ship name to the correctly spelled entry. So, by rights, we should ‘normalize’ “QE2” and “Queen Elizabeth II” to “Queen Elizabeth 2”. But I think that, in this case, for this very famous ship, it’s worth maintaining the separate entries and linking them together via the “alternate spelling” links. Maybe I’m wrong; should I just normalize them all together? What do you think?

We also show links for previous and subsequent names of ships. So, if you search for “Euterpe”, you’ll see a “subsequent name” link to “Star of India.” It is important to remember that if there are multiple ships with the name “Euterpe,” the link appears, but doesn’t apply to all of them. Creating a system that separates out all these ships is a big project, but one that we will tackle.

One great thing about the Steamboat Bill files is that they include many previous and subsequent vessel names. Unfortunately, they don’t exactly indicate the order in which vessel names appeared; you’ll see both “Liberte; a) Brasil; b) Volendam; c) Monarch Sun; d) Volendam; e) Island Sun; g) Canada Star h) Queen of Bermuda” and “Queen of Bermuda; a) Brasil; b) Volendam; c) Monarch Sun; d) Volendam; e) Island Sun; f) Liberte; g) Canada Star”, as well as “Island Sun; a) Volendam”. So, some research is needed to figure out the order in which the ship names appeared. Then, I still have a question about whether or not I should include all of the previous and subsequent names in each entry or not. In the above example, if I determine that the actual path of ship name changes was Queen of Bermuda, then Brasil, then Volendam, then Monarch Sun, then Volendam (again), then Island Sun, then Liberte and finally Canada Star”, do I include ‘subsequent name’ links from Brasil to Volendam, Monarch Sun, Island Sun, Liberte, and Canada Star? That creates a lot of links. Or do I just have a link from Queen of Bermuda to Brasil, and on Brasil a link to Volendam?

And if I list all previous or subsequent names for a ship that had the same name twice, then in this case the entry for Brasil (and Queen of Bermuda, and others) will have multiple ‘subsequent name’ links to Volendam. The page for Volendam could conceivably have a link back to itself!

What do you think? What’s the best way to represent this important data?

How variant editions can screw up Google Books links

Written by: Peter McCracken
Published: December 20, 2010
Categories: Data Correction

As we’ve mentioned in the blog before, you can link to the full text of many, many resources cited in ShipIndex.org. In fact, with a recent addition of a file containing tens of thousands of online ship images, nearly 90% of the citations provide full-text linking. Much of the linking comes through links to online resources, but others are available via links to books in Google Book Search.

A few weeks ago, several of us at ShipIndex were using some of these links, and found that many links for Sherry Sontag’s book Blind Man’s Bluff didn’t seem to work. While the links took one to the page cited in the index, the vessel mentioned in the index wasn’t listed on the page that we ended up at in Google Books. So today I picked up a copy of Blind Man’s Bluff from my local public library, to see if I’d made a lot of mistakes in working through the index.

I found that, in fact, I hadn’t made any mistakes – the page numbers in ShipIndex were the same as the page numbers listed in the back of the book. So then I re-tried some of the Google Book links we offer. Once again, a link to page 57 took me to page 57, but USS Halibut wasn’t mentioned on page 57 in Google. So I checked the copy I’d gotten from the library. That’s where I discovered the problem.

The copy from my public library, and the copy I’d originally used when creating the file to add to ShipIndex, came from the first publication of the book, by Public Affairs, a division of Perseus Books, and first published in 1998. But the copy on Google Books is the paperback edition, published by HarperCollins, in 1999, and the pagination, layout, and nearly every other aspect is completely different between the two. The HarperCollins version has 432 pages, while the Perseus version has 352. While the content may be exactly the same, the pagination is obviously different, so linking doesn’t work the way it should.

So now it seems that, in order to make the Google Books linking continue to work, I need to find an index to the HarperCollins edition of the book, and replace the index I’d compiled from the Public Affairs edition. It’s likely not a big deal to get done, but I thought it was an interesting problem that we may come up against more and more in the future.

Shipindex.org Blog