Category Archives: Website Improvements

Unique Vessel Identifiers Added to ShipIndex.org – the short version

 

I wrote a very long blog post about our new use of Wikidata identifiers, and how it changes the ShipIndex database. It might be too long for some. Here, I hope to limit my overview of these changes to just four paragraphs (not including this one).

The ShipIndex database has grown a lot over the past decade, and now has over 3.5 million citations in it. For very common ship names, this makes it too hard to find information about a specific ship. How do you find information about the right America, when there are 2,378 different citations to work through?

The solution is unique identifiers – basically a specific identifier for each hull. This also allows us to bring together citations for the same ship as it changes names. To make this work, we’re using identifiers from Wikidata, plus local identifiers when Wikidata doesn’t yet have one. Wikidata makes it easy to use Linked Data, so that we can uniquely identify and share items and concepts: the identifier Q82925, represented at https://www.wikidata.org/wiki/Q82925, specifically refers to the author Joseph Conrad, while Q1278752, at https://www.wikidata.org/wiki/Q1278752, specifically refers to the ship named after Conrad, now at Mystic Seaport Museum. It also refers to the original name of that ship, Georg Stage, but differentiates it from the current ship with that name. Similarly, the identifier Q838125 refers to USS Hornet (CV-8), and Q1141355 refers to USS Hornet (CV-12).

When you look at the ShipIndex page for Hornet now, you’ll see eleven different ‘cards’, each one referring to a specific hull, or vessel. Click on any of those ship names, and you’ll see only the citations that have been associated with that unique identifier. And note the URL for each card – it includes the Q-identifier, so others can easily link to it as well, without needing to create some new URL. In addition, many citations have not been associated with any card, because we cannot determine to which vessel the citation refers – at least, not without going back to the original source.

In the future, I hope to offer ways for individuals to help grow this resource, and maybe create a way that people can share their own information – images, reminiscences, comments, online links – about specific vessels. Until then, I’ll be working away at associating as many citations as possible to specific vessels. I hope that this improves your experience with ShipIndex, and helps everyone do more and better maritime history research.

For more information about what we’ve done here, check out the much-longer blog post.

All the Gory Details about Adding Unique Vessel Identifiers to ShipIndex.org

TL;DR: The ShipIndex.org database can now differentiate between disparate ships of the same name, and combine citations using different names for the same ship. We use Wikidata Q-identifiers to do this, and hope that we can help apply Linked Data to maritime history.


This is a very long post about some significant enhancements that we’ve added to ShipIndex.org. For a shorter post about the same thing, that doesn’t go into quite as much detail, see here.


Greetings, from ShipIndex central. It’s been a while since our list blog post, and I referenced some upcoming changes back then. Of course, things have taken a bit more time than I’d expected, but I’m ready to start sharing some of these changes.

Right now, ShipIndex.org has over 3.5 million citations from over 900 resources. If you’re looking for a specific ship with a common name, you’re gonna have a hard time. Looking for a specific “Eagle”? As of today, there are 2,677 citations for ships named Eagle. “America”? There are 2,378 citations. The most common ship names are, in increasing order, “Hope”, “Anna”, and “Maria”, with “Elizabeth” having 3,818 citations, and “Mary” leading by a lot, with 5,072 citations.

Researchers have a big problem in trying to work through those common ship names. And they have a problem when a ship changes its name – it’s still the ship they want to research, but ShipIndex.org doesn’t really connect a user with the ship’s previous or subsequent names. (Well, it did, a bit: if you look at the “America” entry, you can see “related ships” – but how do you know which of the 2,378 citations also refer to Italis, West Point, or Australis?)

What I always wanted was a “unique vessel identifier”. I could bring together citations that refer to the same ship, and differentiate between citations for different ships with the same name. I wasn’t sure how to make that work, until a colleague at my day job suggested using Wikidata identifiers. This was such a great idea, for many reasons.

You’ve all seen the xkcd comic about standards, right? No? OK, here:

 

Same thing with identifiers. Many already exist – naval hull numbers, IMO numbers, national registration numbers, and others – but none refer to all ships, obviously. I didn’t want to create a new identifier, especially since most of the world wouldn’t use it. Wikidata, however, addresses all of these issues, and most importantly it can make maritime history research easier by using these identifiers across the web. For example, a vessel at a maritime museum, like Mystic Seaport’s Joseph Conrad, has the Wikidata identifier Q1278752. This is a “Q-number”, assigned at random. It is unique to this specific item. The Wikidata entry contains other pieces of factual data about the vessel, including its current location, its builder, and some of its dimensions. Anyone can add to the record about any entry.

Every item or entry (including non-physical concepts) in Wikipedia has a Q-number. Look on the left column for any Wikipedia entry (like the Conrad‘s), and you’ll see an entry under “Tools” that’s labelled “Wikidata item”. That links you to the Wikidata entry for the item in question. Information from Wikipedia is incorporated into Wikidata, and all of the information is available for sharing and using on the web. Look on the right of the Wikidata entry, and you’ll see a list of entries on that subject, in numerous different languages.

(As another aside, note the difference here between Wikipedia and Wikidata. Wikipedia contains textual information and discussion about a topic or item. One subject may have multiple entries in different languages. [The German entry about Joseph Conrad is not just a straight translation of the English one. Nor is the Farsi entry.] Wikidata, on the other hand, is a collection of data – just the facts, ma’am – about the topic. Wikidata does not have foreign-language versions of each data page.)

This Wikidata entry for “Joseph Conrad” is different from entries for the Polish author even though the ship is named after him; from a French army officer with the same name; a US army officer; and more. By using linked data in this way, online systems can better identify the person from the ship, making it easier for researchers to find what they’re looking for, quicker.

Over time, I’ll be able to do lots more with the Wikidata that is available to us, as the Wikidata database grows. Hopefully, the ShipIndex.org data will be easier to find online, plus it will be easier to use, because ships with common names will be better sorted, and ship name changes will be better represented.

Last December, my son and I visited the Cradle of Aviation museum in Garden City, NY. While there, we saw this plaque describing the many different ships in the US Navy named “Hornet”:

This is a great example of what we’re trying to do here in ShipIndex.org – sorting and dividing the many very different ships with the same name. (Note here, though, that this plaque differentiates between the last three versions of USS Hornet, saying that CV-12, CVA-12, and CVS-12 were different ships. They really weren’t; they were the same hull, even if they were refitted for different uses over the last decades of service. I don’t know, but I’d be surprised if Navy veterans who served on CV-12 would feel that they served on a different ship than those who served on CVS-12, for instance.)

Anyway, when you go to the entry for Hornet in ShipIndex.org now, you’ll see multiple ‘cards’ at the top of the page – each one represents a different vessel, or hull. We pull publicly available data from Wikidata into the cards we create, and we’ll be able to do more there over time. Right now, the images come from Wikidata; when we don’t have one, we have to put in a placeholder. (Imagine a place where you could post your own information, be it pictures, remembrances, links to vessel-specific sites, etc., about a specific vessel. Interested? Let me know.) We organize citations that are specifically about a particular vessel under the appropriate card.

Of course, not every ship in ShipIndex.org has a Wikidata identifier. Right now, we’re using local identifiers when a Wikidata identifier doesn’t exist, or we haven’t found it yet. Since anyone can create a new entry in Wikidata, we can also create identifiers there, and share our knowledge with the rest of the world.

We’ll never get all, or most, or even many, citations associated with cards. “Hornet” has 820 citations from 221 resources. We have 11 cards, for specific vessels, and each card has between 2 and 66 citations associated with it. So, just 235 of 820 citations are associated with cards. But many entries have basically no descriptive information about the ship at all, and one would need to look at each resource to figure out if the Hornet in question is one of the ones for which we have a card. Even when there is information, it’s often not enough – there are nine citations with “aircraft carrier” in the description, but without looking at each resource, I don’t know if they’re referring to USS Hornet (CV-8) or USS Hornet (CV-12).

But for the time being, it’s very much a start toward doing better research in maritime history. Look at the two Hornet links above, for instance. The URL for Hornet CV-8 is https://www.shipindex.org/vessels/Q838125, and the URL for Hornet CV-12 is https://www.shipindex.org/vessels/Q1141355. There’s that Q-identifier again, right in our URL, so it’s easy to find, easy to use, and easy to link to. This is the basis of Linked Data, and of making online research easier to do, and easier to manage.

As of this writing, we have 1446 citations associated with 70 vessels. That’s 0.000409% of all the citations in the database. Admittedly, we have a long way to go! But it is a start, and getting the underlying work done to make this happen was a big chunk of 2019 – it took a lot of time and work and money.

My next goals, beyond expanding the number of citations associated with vessels, is to make a way that users can help grow this resource. Perhaps you have been researching Hornet, and you know that Albion’s Five Centuries of Famous Ships refers to CV-12, rather than CV-8. If you could share that information, to expand the database a bit, that would be huge.

Then, as mentioned above, maybe you have images, or remembrances, about CV-12 specifically, or you want to link to resources about it online (remember, after the current Coronavirus pandemic passes, you can actually visit USS Hornet in Alameda, California; until then, you can visit https://www.uss-hornet.org/) – what if ShipIndex provided a place where you could post those and share them with others interested in researching a specific vessel? That’d be pretty cool, I think.

I’d love to hear what you think about this enhancement. For me, it’s been a long time coming. Of course, there’s much more to do, but I’m very excited about this significant change.

Tons of new content; site upgrades

It’s been quite a while since I’ve posted anything to the blog here. It might seem like nothing has been happening, but that’s actually not true. A quick look at the ShipIndex webpage will show that; we have a new, much-improved page that responds to the size of the screen, meaning it works as well on a smartphone as it does on a laptop or a desktop.

Getting that done was a pretty big project, and it took a long time. I’m thrilled that it’s finally done. We may spot a thing or two to change, but for the time being I think we’re pretty happy with how it has turned out. A friend and colleague took over the process of carrying the project across the finish line, and he did a great job on that.

Another change has been to bring on someone to pick up the data work that I hadn’t been able to get to. I love doing that stuff, but just knew it wasn’t going to happen any time soon, and there was (and remains) a ton of work to be done there. And if we’re not adding content to the database, then we’re not adding much value. So our new data expert is working through backfiles as quickly as possible, and will soon be moving on to new files and to websites, as well. We have been working through that backlog of content that was waiting to be finished and loaded into the database, and we’ve been loading a ton of it.

Here’s a list of content that has been added in the past few weeks:

That’s a lot of new content! We also updated links to some online resource, where the webpage had totally changed its structure and didn’t have updated links, and improved some others, as well.

Each new resource has a “new” note on it, on our Resources page. If you’re following a particular ship name, then  you’ll get an email when new content is added for that ship name. Remember, you don’t need to be an active subscriber to get those emails. You need an account (because we need to know how to contact you), but that’s it. If you see a new citation that looks interesting, you can either subscribe to access the new content, or access it through your local library, if they offer access.

New content will keep being added over the next few weeks, and we have a plan for collecting even more monographic content at a major research university, then at New York Public Library, then at the Library of Congress, and then beyond. If there’s a title you think we should add, please do let me know, either here, or by email to comments (at) shipindex (dot) org!

 

 

New feature: Introducing stopwords

One of the neat things about having an online database is that one can study data to figure out how to make the system work better. This wouldn’t be the case if this were, say, a CD-ROM product.

I can look at all the searches that have been done on the site in the past year or so. In doing this, it’s clear that a lot of people include terms like “USS”, “HMS”, “USCGC” and other descriptive terms in front of the ship name. Others include vessel descriptors, such as “schooner” or “steamer”. For a long time, I’ve wanted to have a way of ignoring those terms, because it will get users to the content they really want more quickly. However, as with most things, it’s not as easy as it seems.

It’s easy to have a list of stopwords — words that are ignored in searches. Many search tools do this, so when you include “the” or “an” in a book title, Amazon doesn’t bother to search for these words. Of course, they still need to make exceptions to deal with searches for the band “The The”, and the like. And in the case of ShipIndex.org, one still needs to be able to search for “HMS” in a name, since some ships do have that as a legitimate part of their name – though none of them are part of the British Royal Navy.

So anyway, I reviewed the list of search terms, and came up with specific words or phrases that need to be ignored. Then we (and by “we” I don’t mean me – I mean the excellent development team that turns these ideas into reality) created the tools to ignore these words, and also show a results message that says, basically, “We ignored this term, but you can repeat the search without ignoring it if you like.”

The result will be a significant improvement in the results that people see when doing their searches. Let me know if you like it or if you don’t.

Some more challenging website problems

I discovered yesterday that some people are having problems creating accounts and then subscribing to the database. This appears to be a result of a restructuring of our backend, which was incredibly valuable, but had some unintended badness. Testing before release didn’t uncover these problems, and in fact they continue to be difficult to nail down – though there is no question at all that they are happening.

The technical team is working on it and I hope we’ll have a solution live as quickly as possible. If you have information to share about what didn’t work for you, that could help us troubleshoot the problem, and I’ll also be sure to let you know when everything is working again.

Having these problems right after returning from the genealogy show at Olympia, in London, is incredibly frustrating, but I suppose there’s never a good time for these problems to crop up. We’ll get them straightened out as soon as we can.

New subscription options; new backend; new currencies

Lots of big changes are now live at ShipIndex.org. The site has just been significantly upgraded, and has much more power than before. Most of this isn’t visible; it’s primarily back-end work, but it will make importing data much quicker, and will also allow for much more flexible access to the world. IfWhen we are mentioned on NPR or in the New York Times (God willing), we should be able to handle the rush.

There are some significant changes for users, though. We now offer fixed-length subscriptions: you can buy access for just two weeks, for three months, for six months, or for a year. You can still subscribe on a monthly basis, and that price has been slightly lowered.

Also, in a big development, you can now pay for access in multiple currencies! If you want to pay in Pounds Sterling, Euros, Australian Dollars, or Canadian Dollars, you can now do that. What this really means is that I absorb the cost of the foreign transaction fee rather than you, but it also means you can feel more comfortable about the cost of the database, particularly if you’re not too familiar with the value of the American dollar.

The new pricing is as follows:

Monthly recurring:     $8 per month

Time-limited subscriptions are as follows:
Two weeks:     $ 6
Three months:  $22
Six months:    $35
One year:      $65

At the moment, I know there are some bits of webcopy that need to be updated, particularly more information up front about the pricing changes. I’ll get to those as quickly as I can.

Please tell me what you think about these changes. What other changes do you think would be helpful?

Full text links from within ShipIndex

ShipIndex.org links to the full-text for nearly 85% of its citations! Before Mike ran the numbers, I guessed that a conservative estimate on links to full text would be at about 70%, so the 85% number was quite a surprise, but it’s true.

How did we do this? First, we’re linking to lots and lots of content online. There are so many free online resources with information about ships out there, and I feel like I find another one every week. But other than ShipIndex, there’s no place that brings all these resources to one place, and no way to search all of them at once. However, with ShipIndex, that’s what you’re doing. But that doesn’t get one to 85%.

Recently, we started looking for resources in Google Books. The next time you’re searching in ShipIndex and you see a hotlinked page number, try clicking on that page number. It should take you right to the page of the book within Google’s Book Search project.

Here are two examples from freely-available resources:

  • The citations for Aroostook, from Paul Calore’s Naval Campaigns of the Civil War, has a link to page 128, and the vessel is mentioned near the start of the last paragraph.
  • The citation for City of Pekin, from Arthur Clark’s The Clipper Ship Era, has a link to page 86, and the ship is mentioned about 2/3 of the way down the page.

This was an interesting experience, and I learned a lot when we did it. The goal was to try and link directly to the page that cited a specific ship. I discovered four different levels of Google Books linking:

  • No content: The book just can’t found, or it’s cited but offers no view into it at all
  • Snippet view: With snippet view, you really do only get just a touch of the book, and it’s hard to know how much or what you’ll get. Most importantly, you can only search by terms, you can’t ask Google to show you all of a specific page.
  • Preview: With preview, Google offers most of the pages of a book. This is common for recently-published works, and Google works with the publisher to figure out what they’ll show. The idea, obviously, is to show enough that someone wants to go out and buy the full book.
  • Full view: For these books, Google shows the entire thing. These are primarily books that are out of copyright protection – so, published before 1923.

We only activate links for books that are available via Full View and Preview — and we only do the Preview if it appears that most links will get to the page in question. We’ve found a few titles that are available in Preview, but so many links go to pages that aren’t visible, perhaps because the publisher only allows 10-20% of the book to be shown via Google Books, that it seems misleading to offer those links.

Links to Snippet views don’t work because there’s no way to get to a specific page. You could try to search for the ship name, but if the ship name is something like “Elizabeth”, then you’ll get every mention of “Elizabeth” in the book – including names of people, not just ships. Also, the searches just don’t work as well. This could be a result of problems in OCR work, too – if the OCR work isn’t very good, then Google won’t find specific phrases, and with the page linking, we’re going to a specific page, not searching for a ship name in the book’s text.

So, as a result, you’ll most likely find linking to Google for very old books (via Full View) and very new books (via Preview).

The horror stories about metadata in Google Books are very true. It’s a mess for any slightly complicated title, such as multi-volume sets. So, finding Navy Records Society volumes — especially multi-volume works that weren’t published consecutively — was sometimes quite a challenge. And, in some cases, volumes that should be available just aren’t. I found one book that was completely upside down. Others have lousy scan quality. But the fact is that an enormous amount of content is available from anyone’s computer now, and it will only improve.

Try it out; see what you think.

New feature: tracking ship updates

Here’s the third blog post for the morning. It’s definitely the most exciting. We’ve just released the mostest coolestest feature of ShipIndex since starting the site. (OK, so that’s admittedly my personal opinion, but I think it’s also a fact.)

Effective immediately, anyone with an account (that is, anyone who has created a username – you don’t need to be a subscriber) can be notified whenever a ship page is updated with new information. So, if you’re particularly interested in a vessel named Unanimity, you can go to that page, click on the button near that top that reads “NOTIFY ME when this page is updated”, and then whenever new content is added, you’ll get an email telling you so!

If you’re a subscriber, you’ll see what resource the content is from. You can go to the page directly, and check out the new citation.

If you’re not a subscriber, you’ll be notified that new citations have been added. You may decide it’s finally time to take gain access to everything that’s available on the site. Or, perhaps you use ShipIndex.org through a subscription provided by your local public or academic library. Go to your ship’s page and locate the new citations, which are always marked by a “new” icon for 45 days from the addition of the resource.

You’ll get just one email containing updates for all the ships you’re tracking, not a separate email for each ship, or each citation. Emails are sent in batches, several times per week, reflecting all the data added since the last update.

When you’re done following a vessel, you can just go to the ship page, click on the button that reads “CANCEL NOTIFICATIONS for this ship”, and the emails will stop.

You need to be logged in, so that we can keep track of how to notify you when a page is updated. But, as mentioned above, you DON’T need to be a subscriber. Also, from your profile page, you can see all the vessels you’re tracking, and clear all your notifications, or go to each page and modify them individually.

I truly believe this is an enormous step forward in what we’re offering via ShipIndex.org. You no longer need to come to the site to check on updates regarding the ships that interest you; we’ll take care of that for you. Now, when new citations are added for the ships that matter to you, you’ll be the first to know.

Please try it out, and let us know what you think. Remember: you do need to have an account, but you don’t need to be a subscriber.

I hope you’re as excited about this as I am.

New feature: Passenger and Crew Lists icons

When we exhibited at the National Genealogical Society conference recently, we quickly learned that lots of genealogists are looking for passenger and crew lists. We knew we had some of them in the ShipIndex.org database, but they weren’t identified. I’m pleased to report that we now have an icon to indicate which citations describe passenger or crew lists.

Mike built the functionality a little while ago, but I hadn’t activated it until today. If you currently have access to the premium database, check out the following searches to see it in action:

  • Admiral Lyons
  • Loreto – results are way down at the bottom; they come from Mystic Seaport’s New London Crew Lists database
  • Lady Amherst – this result is from a database that I just discovered and loaded today, of immigration lists for vessels headed to Australia in the 19th century. It offers links to digitized versions of microfilmed versions of hand-written passenger lists.
  • Acropolis

You do need to be logged in to see these icons, at least at the moment.

Let me know what you think!

Content, Conferences, and Enhancements

Oh, man. I’m so far behind in updating the world on what ShipIndex.org is up to. A few important points:

New content. I uploaded several files today. So far, they’ve included:

The first fills a brief gap; I had already imported volumes 2 and 3, but had had a problem with volume 1, which I’ve since fixed. The last resource, H. T. Lenton’s volume, is a really big, important one. It’s got just over 23,000 citations in it. Many of these are for unnamed vessels, such as Landing Crafts, with names like “LCM.21” or “LCM.234”. I think this is important content for those doing research on these rarely-known vessels. I wrote a lot about the processing I did on this one on the resource’s information page here.

I’m very pleased to get this one imported; it adds immeasurably to the World War II content for those doing in-depth research into naval movements during the war.

With these additions, we’re now just 24 citations short of 1,325,000 citations. Perhaps I’ll find a small set to add some time today.

Past Conference. Two weeks ago (man, time flies!), we went to Salt Lake City for the National Genealogical Society conference. That was a great event, and we had a super time talking with genealogists and learning how we can improve the product we provide for them. We also had a fine time talking with folks from other companies who we can partner with, to the benefit of all involved.

There’s so much to do as a followup on that, and we’re working away on it. That’s a good problem to have, but wow, what a pile of work on our plates. On top of all of that, I’m still working on adding content, and Mike is plugging away at enhancements and new features. Both of us are also working on some neat possible partnerships, plus adding institutional subscribers here and there.

Ship Normalization. Speaking of new enhancements, Mike has built a really valuable new tool that will have a huge impact on a lot of the data that we have from a few major resources. One drawback of projects where a print resource (especially a 19th century one) is digitized and put online is that the print-specific space-saving conventions are applied to an online environment. For example, the schooner Abbot Lawrence is represented in different volumes as “Abbot L’wr’nce”, “Abbott Law’nce”, “Abbott Lawr’nce”, and (obviously) “Abbot Lawrence”. All of them mean describe the same vessel, and in a print volume, that’s easily discerned. But online, the computer doesn’t know that when you search for “Abbott Lawrence,” you’d also like to see the other variations above. That is, unless you have Mike on your side, who has created a tool so that we can bring them all together (that is, ‘normalize’ them). And that’s what we’re doing. The process is quick and accurate, though there are enough entries that it’ll take quite a while.

But, we’re doing it, and we’re making all those other entries available, despite the proliferation of apostrophes.

Next Conference. Finally, I’m headed to a conference at Mystic Seaport tomorrow – it’s a joint conference for a number of organizations, including the Council of American Maritime Museums, the North American Society for Oceanic History, the Steamship Historical Society of America, the National Maritime Historical Society, and the Society for Nautical Research. What a group!

I’m looking forward to telling folks there about ShipIndex.org, and I hope I won’t run out of brochures. If you’re going, and would like to get together at some point, please drop me a line.