All posts by Peter McCracken

Unique Vessel Identifiers Added to ShipIndex.org – the short version

 

I wrote a very long blog post about our new use of Wikidata identifiers, and how it changes the ShipIndex database. It might be too long for some. Here, I hope to limit my overview of these changes to just four paragraphs (not including this one).

The ShipIndex database has grown a lot over the past decade, and now has over 3.5 million citations in it. For very common ship names, this makes it too hard to find information about a specific ship. How do you find information about the right America, when there are 2,378 different citations to work through?

The solution is unique identifiers – basically a specific identifier for each hull. This also allows us to bring together citations for the same ship as it changes names. To make this work, we’re using identifiers from Wikidata, plus local identifiers when Wikidata doesn’t yet have one. Wikidata makes it easy to use Linked Data, so that we can uniquely identify and share items and concepts: the identifier Q82925, represented at https://www.wikidata.org/wiki/Q82925, specifically refers to the author Joseph Conrad, while Q1278752, at https://www.wikidata.org/wiki/Q1278752, specifically refers to the ship named after Conrad, now at Mystic Seaport Museum. It also refers to the original name of that ship, Georg Stage, but differentiates it from the current ship with that name. Similarly, the identifier Q838125 refers to USS Hornet (CV-8), and Q1141355 refers to USS Hornet (CV-12).

When you look at the ShipIndex page for Hornet now, you’ll see eleven different ‘cards’, each one referring to a specific hull, or vessel. Click on any of those ship names, and you’ll see only the citations that have been associated with that unique identifier. And note the URL for each card – it includes the Q-identifier, so others can easily link to it as well, without needing to create some new URL. In addition, many citations have not been associated with any card, because we cannot determine to which vessel the citation refers – at least, not without going back to the original source.

In the future, I hope to offer ways for individuals to help grow this resource, and maybe create a way that people can share their own information – images, reminiscences, comments, online links – about specific vessels. Until then, I’ll be working away at associating as many citations as possible to specific vessels. I hope that this improves your experience with ShipIndex, and helps everyone do more and better maritime history research.

For more information about what we’ve done here, check out the much-longer blog post.

All the Gory Details about Adding Unique Vessel Identifiers to ShipIndex.org

TL;DR: The ShipIndex.org database can now differentiate between disparate ships of the same name, and combine citations using different names for the same ship. We use Wikidata Q-identifiers to do this, and hope that we can help apply Linked Data to maritime history.


This is a very long post about some significant enhancements that we’ve added to ShipIndex.org. For a shorter post about the same thing, that doesn’t go into quite as much detail, see here.


Greetings, from ShipIndex central. It’s been a while since our list blog post, and I referenced some upcoming changes back then. Of course, things have taken a bit more time than I’d expected, but I’m ready to start sharing some of these changes.

Right now, ShipIndex.org has over 3.5 million citations from over 900 resources. If you’re looking for a specific ship with a common name, you’re gonna have a hard time. Looking for a specific “Eagle”? As of today, there are 2,677 citations for ships named Eagle. “America”? There are 2,378 citations. The most common ship names are, in increasing order, “Hope”, “Anna”, and “Maria”, with “Elizabeth” having 3,818 citations, and “Mary” leading by a lot, with 5,072 citations.

Researchers have a big problem in trying to work through those common ship names. And they have a problem when a ship changes its name – it’s still the ship they want to research, but ShipIndex.org doesn’t really connect a user with the ship’s previous or subsequent names. (Well, it did, a bit: if you look at the “America” entry, you can see “related ships” – but how do you know which of the 2,378 citations also refer to Italis, West Point, or Australis?)

What I always wanted was a “unique vessel identifier”. I could bring together citations that refer to the same ship, and differentiate between citations for different ships with the same name. I wasn’t sure how to make that work, until a colleague at my day job suggested using Wikidata identifiers. This was such a great idea, for many reasons.

You’ve all seen the xkcd comic about standards, right? No? OK, here:

 

Same thing with identifiers. Many already exist – naval hull numbers, IMO numbers, national registration numbers, and others – but none refer to all ships, obviously. I didn’t want to create a new identifier, especially since most of the world wouldn’t use it. Wikidata, however, addresses all of these issues, and most importantly it can make maritime history research easier by using these identifiers across the web. For example, a vessel at a maritime museum, like Mystic Seaport’s Joseph Conrad, has the Wikidata identifier Q1278752. This is a “Q-number”, assigned at random. It is unique to this specific item. The Wikidata entry contains other pieces of factual data about the vessel, including its current location, its builder, and some of its dimensions. Anyone can add to the record about any entry.

Every item or entry (including non-physical concepts) in Wikipedia has a Q-number. Look on the left column for any Wikipedia entry (like the Conrad‘s), and you’ll see an entry under “Tools” that’s labelled “Wikidata item”. That links you to the Wikidata entry for the item in question. Information from Wikipedia is incorporated into Wikidata, and all of the information is available for sharing and using on the web. Look on the right of the Wikidata entry, and you’ll see a list of entries on that subject, in numerous different languages.

(As another aside, note the difference here between Wikipedia and Wikidata. Wikipedia contains textual information and discussion about a topic or item. One subject may have multiple entries in different languages. [The German entry about Joseph Conrad is not just a straight translation of the English one. Nor is the Farsi entry.] Wikidata, on the other hand, is a collection of data – just the facts, ma’am – about the topic. Wikidata does not have foreign-language versions of each data page.)

This Wikidata entry for “Joseph Conrad” is different from entries for the Polish author even though the ship is named after him; from a French army officer with the same name; a US army officer; and more. By using linked data in this way, online systems can better identify the person from the ship, making it easier for researchers to find what they’re looking for, quicker.

Over time, I’ll be able to do lots more with the Wikidata that is available to us, as the Wikidata database grows. Hopefully, the ShipIndex.org data will be easier to find online, plus it will be easier to use, because ships with common names will be better sorted, and ship name changes will be better represented.

Last December, my son and I visited the Cradle of Aviation museum in Garden City, NY. While there, we saw this plaque describing the many different ships in the US Navy named “Hornet”:

This is a great example of what we’re trying to do here in ShipIndex.org – sorting and dividing the many very different ships with the same name. (Note here, though, that this plaque differentiates between the last three versions of USS Hornet, saying that CV-12, CVA-12, and CVS-12 were different ships. They really weren’t; they were the same hull, even if they were refitted for different uses over the last decades of service. I don’t know, but I’d be surprised if Navy veterans who served on CV-12 would feel that they served on a different ship than those who served on CVS-12, for instance.)

Anyway, when you go to the entry for Hornet in ShipIndex.org now, you’ll see multiple ‘cards’ at the top of the page – each one represents a different vessel, or hull. We pull publicly available data from Wikidata into the cards we create, and we’ll be able to do more there over time. Right now, the images come from Wikidata; when we don’t have one, we have to put in a placeholder. (Imagine a place where you could post your own information, be it pictures, remembrances, links to vessel-specific sites, etc., about a specific vessel. Interested? Let me know.) We organize citations that are specifically about a particular vessel under the appropriate card.

Of course, not every ship in ShipIndex.org has a Wikidata identifier. Right now, we’re using local identifiers when a Wikidata identifier doesn’t exist, or we haven’t found it yet. Since anyone can create a new entry in Wikidata, we can also create identifiers there, and share our knowledge with the rest of the world.

We’ll never get all, or most, or even many, citations associated with cards. “Hornet” has 820 citations from 221 resources. We have 11 cards, for specific vessels, and each card has between 2 and 66 citations associated with it. So, just 235 of 820 citations are associated with cards. But many entries have basically no descriptive information about the ship at all, and one would need to look at each resource to figure out if the Hornet in question is one of the ones for which we have a card. Even when there is information, it’s often not enough – there are nine citations with “aircraft carrier” in the description, but without looking at each resource, I don’t know if they’re referring to USS Hornet (CV-8) or USS Hornet (CV-12).

But for the time being, it’s very much a start toward doing better research in maritime history. Look at the two Hornet links above, for instance. The URL for Hornet CV-8 is https://www.shipindex.org/vessels/Q838125, and the URL for Hornet CV-12 is https://www.shipindex.org/vessels/Q1141355. There’s that Q-identifier again, right in our URL, so it’s easy to find, easy to use, and easy to link to. This is the basis of Linked Data, and of making online research easier to do, and easier to manage.

As of this writing, we have 1446 citations associated with 70 vessels. That’s 0.000409% of all the citations in the database. Admittedly, we have a long way to go! But it is a start, and getting the underlying work done to make this happen was a big chunk of 2019 – it took a lot of time and work and money.

My next goals, beyond expanding the number of citations associated with vessels, is to make a way that users can help grow this resource. Perhaps you have been researching Hornet, and you know that Albion’s Five Centuries of Famous Ships refers to CV-12, rather than CV-8. If you could share that information, to expand the database a bit, that would be huge.

Then, as mentioned above, maybe you have images, or remembrances, about CV-12 specifically, or you want to link to resources about it online (remember, after the current Coronavirus pandemic passes, you can actually visit USS Hornet in Alameda, California; until then, you can visit https://www.uss-hornet.org/) – what if ShipIndex provided a place where you could post those and share them with others interested in researching a specific vessel? That’d be pretty cool, I think.

I’d love to hear what you think about this enhancement. For me, it’s been a long time coming. Of course, there’s much more to do, but I’m very excited about this significant change.

More new content, and other new stuff coming soon…

It’s time for yet another list of new content. It has been a while since I’ve added to the list here, and to be honest our speed of importing new data has slowed a bit. But we’re still working at it, and we still welcome suggestions of content to be added. Content work continues day in and day out.

On the other side of things, 2019 was actually a year of a lot of development. We are just about to see that come to fruition, in the database itself. I will explain more about that after it has been released, and implemented a bit. I hope that will be very, very soon.

Until then, here’s a list of content added since the last time I posted here — which was, admittedly, quite a while ago, back in November.

New content:

This list includes five additional Roebuck Society volumes, for those interested in Australian history. These volumes are really tough to work through, and take a lot more time than most volumes. For more about them, read my Roebuck Society blog post from September.

We always have more to add, and we’re working through it as quickly as time allows. If you have suggestions, please do let us know. And watch for more big news very soon!

More new content; more new Roebuck volumes

The addition of new content to ShipIndex has slowed, but it hasn’t stopped. Here’s the list of resources added since my last posting:

I’ve previously written about the Roebuck Society volumes, and all the content that appears in them. They vary a lot, but there are a number in this import as well. If your ship is mentioned in a Roebuck Society volume, look at the entire volume closely, as it can contain a lot of data in a small amount of space. See the blog post cited above for more information on these publications.

Though the addition of new content is slowing down a bit, we’re working on some cool new functionality on the website itself, which you might see if you poke around enough. I’ll write more about that soon.

Now with 3.5 million citations! (Almost.) And more content.

The ShipIndex database continues to grow: according to this screen grab from our home page, we’re just 126 citations away from 3.5 million citations!

This is certainly a new record for ShipIndex.org content, but it has taken a long time to get here. Several years ago, I had to remove some 380,000 citations from the database, because the online resource containing those citations disappeared. But we’ve been adding lots more content since then, and we’ve recovered and gotten beyond where we’d been.

Here’s content that has been added since the last list I posted. Lots more is in process, as always.

Several titles are worth particular attention. The four volumes published by the Roebuck Society are especially valuable for southern Pacific research, but they’re tricky to use. I wrote a blog post about just those volumes last week, and more titles from the Roebuck Society will be added over time.

Ward’s collection of notes from newspapers, about American activities in the central Pacific, is also interesting — the 7 volume set is remarkable in its own right, printed on heavy paper and with a volume of illustrations and maps, if I remember correctly. It’s organized geographically, which makes finding the entries a bit of a challenge. It’s also probably not a particularly common title, but if it mentions the ship you’re researching, those citations from contemporary newspapers are going to be pretty valuable!

I plan to write a brief blog post about the effects of low technology on this data, regarding the Naval Marine Archive, in the next week or two.

As always, let us know if you have titles to suggest we add.

Adding Roebuck Society volumes

Over the past year, we have been adding a TON of new content to ShipIndex. This should come as no surprise to anyone who’s looked at the blog – pretty much every entry has been just a listing of all the new content we’ve added since the last blog post about new content! Most of that has come from indexes to books, but some has been from online databases and websites.

Recently, we’ve started adding content from a special set of publications. If you’re interested in early Australian history, or Pacific exploration, these will be of particular interest. But they are challenging to search, and challenging to process and add to the ShipIndex database.

They’re published by the Roebuck Society, an Australia-based organization that has published many records about the arrival and departure of ships through Australia’s history. The books themselves are an amalgamation of entries from numerous sources. The content looks like this:

Title page from one of the Roebuck volumes.

A content page from the same Roebuck volume.

It’s tough reading! There’s a lot of information crammed on each of these pages. Luckily, there’s an index to all this madness, but it’s often not much easier to read. Consider this example of an index from the book above:

sample index page from a Roebuck volume

An index page from the above volume.

 

Processing these indexes has been very hard work, and has taken a lot of time and money to complete. Because of the complexity of the indexes and the associated text, understanding these indexes and how to use the information in them takes some work. If a ship of interest to you is mentioned in a Roebuck book, then your best bet is to track down the book itself. Unlike other titles, it just doesn’t make sense to ask for individual pages, based on the index citations.

Remember that you can almost always get almost any book through interlibrary loan from your local public library. It will take them some time, and it will cost them (and possibly you) some money – so be patient and don’t forget to thank them, and support them financially – but in most cases, other libraries will loan these books to your local library, and they’ll loan it to you.

Once you have the book in hand, find the ship on the index page shown, and then see where and how often the ship is mentioned within the body of the text. The entry in the text will give a summary of the ship’s movements, and provides information about the sources (usually newspapers) from which the data is drawn. Many ships are mentioned dozens and dozens of times. Many entries contain data from multiple sources, so – especially for tonnage – many data points may appear for each ship in the index. The printed index notes sources for some of this data, but we have not preserved those notations here.

The Roebuck society has published over sixty volumes, but not all of them relate to vessel information. We have identified about a dozen relevant volumes to add. Some have already been added, and others will soon join the database.

Here’s a list of what’s been added to the database, and what’s in process. Live, in the database, as of publication of this blog post:

In processing, but headed for the database:

  • Broxam, Graeme, and Ian Nicholson. Shipping Arrivals and Departures: Sydney. Vol III: 1841-1844 and Gazetteer.
  • Broxam, Graeme. Shipping Arrivals and Departures: Tasmania. Vol III: 1843-1850.
  • Cumpston, J.S. Shipping Arrivals and Departures: Sydney. Vol I: 1788-1825.
  • Jones, A.G.E. Ships Employed in the South Seas Trade: Vol I.
  • Jones, A.G.E. Ships Employed in the South Seas Trade: Vol II.
  • Nicholson, Ian. Shipping Arrivals and Departures: Sydney. Vol II: 1826-1840.
  • Sexton, R. T. Shipping Arrivals and Departures: South Australia, 1627-1850.
  • Syme, Marten A. Shipping Arrivals and Departures: Victorian Ports. Vol. III: 1856-1860.

 

If you have questions, or suggestions on additional Roebuck volumes to add, or other titles to add, or thoughts on how best to use Roebuck volumes, please don’t hesitate to share it here, or send to comments (at) shipindex dot org.

Happy searching!

Yet another list of new content

The ShipIndex data team has been hard at work over the past six to eight weeks, and we’ve added a lot more data. A full list of all content that’s been added since the last update appears below.

Some are short books, or brief websites, but they’ve got unique content you won’t find elsewhere. Some, like the Conway’s volumes, are much longer and have thousands of entries in them. All kinds of content has been added, but we always welcome suggestions for more!

Two weeks ago, we went to the National Library of Scotland, and collected content there that we couldn’t find elsewhere. That’s always a thrill. That content still needs to be processed, so it’s not in the database yet, but will be, eventually. There’s a benefit in knowing that a resource has some information that might be useful to you, even if it’s hard to get, because then you at least know that it’s out there, and you can request it through interlibrary loan. Or, if you travel often, you can use WorldCat to determine which libraries own it, and then when you go near one of those libraries, you have a reason to visit. I, for one, was thrilled to have a reason to add a new library card, from the NLS, to my collection!

Now, here’s a list of the content added since the last update:

As always, send a note to comments (at) shipindex (dot) org if you have titles you think we should add!

 

 

More new content

It’s been over six weeks since I last posted a list of recently-added content, so of course there’s tons more waiting to be listed here. In mid-May, the ShipIndex team met up in Washington, DC, and visited the Library of Congress. We collected indexes to a ton of titles that we hadn’t found elsewhere, and we’ve processed some of those titles already. A lot more are still waiting to go through the whole process, and will be added over the next few months.

As always, we welcome recommendations and suggestions for titles that should be added to the ShipIndex database.

The following content has been added to the ShipIndex.org database since my last update:

As mentioned above, lots more is still to come!

 

More new content

I’m kind of astounded by the amount of new content we’re adding here, but here comes another list. There’s some new stuff in this list — through some work with the folks at Findmypast, the genealogy research company, we’re creating links to the passenger and crew lists that are available on that site. If you’re already a subscriber to Findmypast, you’ll be able to get it to the resource easily. If you’re not,  you might want to consider joining, to take a look at what you’ll find there. Of course, we can’t guarantee how much or what sort of information will be in the Findmypast databases, but it might be worth an investigation. You just never know what you’ll find, and that’s really the whole point of ShipIndex.org. More Findmypast content will be coming soon, as will many more monographs, and even some journal indexes that I found recently.

Without further ado, here’s what’s been added in the past few weeks:

More files will be uploaded soon, and many more are being processed. I’m working on the Roebuck Society volumes, which will be incredibly valuable for researchers interested in early Australian and New Zealand history. We now have nearly all Navy Records Society volumes in the database. In early May, we venture to the Library of Congress, to collect still more content! Now’s the time to tell me about other titles you’d like to see added — send us a note at comments (at) shipindex (dot) org.

The new content just keeps rolling in

I cannot believe how much additional content we’ve gotten into the database in the last several weeks. Here’s a list of content added since my last post:

As always, lots more will follow soon. And always feel free to let us know if you have titles  you think we should add. Send it to comments (at) shipindex (dot) org.