All posts by Peter McCracken

WorldCat (April) Fools

This is the first of a few new blog posts. It’s April 1, April Fools Day, but there is, alas, no foolin’ around here. It’s just bad news, start to finish, with the WorldCat subject entity links that have been in the free ShipIndex database since 2009. Read on, to learn more.

When ShipIndex switched from a personal project to a real company, back in 2009, I put all of the citations that had been in the “project” database, into the free database. Anything new was going to go in to the subscription database. I had been in contact with researchers at OCLC, the very large library cooperative that ostensibly helps libraries manage their resources, and shares those holdings, via their publicly available database called WorldCat. I worked with several remarkable people there, who through the years generated a list of all of the “identities” for ships in WorldCat.

This meant we could find books or manuscripts that were by or about ships. So, a book about a ship is easy enough to imagine – the book The Royal Yacht Britannia: The Official History is clearly about that vessel. Having a specific subject heading about that specific yacht makes it easier to differentiate between vessels with the same name. It also created links to books by ships, which often meant logbooks our individually-kept personal journals by people who were on board a vessel. It was a great way of uncovering a lot of useful content about ships that wouldn’t be found otherwise.

But the folks at OCLC said this content needed to be in the free database, not in the then-nascent subscription database. That was fine with me; it was worth including that content and keeping it freely available. The file has been updated occasionally over the past few years, and has always been in the completely free database.

Two or three weeks ago, I was doing some searching, and looked at WorldCat records. I saw notices indicating that the OCLC Identities project, on which these links were based, was going away. This past week, all the links to WorldCat failed. OCLC has ended this project, and with it, links to lots of content that used to be in the database. They’ve also removed linking by Library of Congress Control Number. You’re just searching by phrase now – this seems like the total antithesis of the ideals behind Linked Data.

I have figured out a way to make these links mostly work. The links are now searching by subject headings, rather than by control numbers or identities. As a result, in many cases, they won’t work effectively. In the old file, there was a search to an identity for a ship named “104”, and it specifically went to the entry for a specific ship with that name. Now, the search is for any entry that has both terms “104” and “ship” in a subject heading, so instead of one or two specific results, you get 38 results. Some refer to ‘cruise 104’ of a different vessel. It’s really too bad. Searches for ships like “Mary” are going to terrible, because they’ll include ships named “Mary Rose”, “Mary Ellen”, “Mary & Frank”, “Mary Smith”, and any other ship that has ‘mary’ as just part of its name – instead of going directly to the ship you’re researching. A search for a single, common word ship name, like “Eagle” or “Union” or “James” or “Monitor” or “Wasp” is going to return any record that has that word anywhere in the list of subject headings, even if the term doesn’t have anything to do with a ship name. Connections we’ve made, between specific vessels represented in WorldCat and other citations for those specific vessels, are probably no longer relevant.

OCLC did some work in creating Virtual International Authority File (VIAF) records for some ships, as well. Again, this was great in differentiating between ships with different names. But as far as I can tell, that is also all wiped out.

I’m disappointed and frustrated by this change, as I am with most of what OCLC has done to WorldCat over the past few years.

I’ll leave with this image I collected from WorldCat a few weeks ago, telling me that a copy of a book I wanted was at the State Library of South Australia, but that library is further than the distance to the moon:

My frustration with WorldCat – and OCLC – is ancient news, but it does just keep getting worse. This is really unfortunate. This is NOT a good April Fools joke.

ONE THOUSAND RESOURCES!

Wow – it’s been quite a while since I last posted anything to the blog. One would be forgiven for thinking we’d disappeared. But we haven’t. In fact, we’ve been working away, adding new content, adding new (mostly backend) functionality, trying new marketing work, and more. But first, we’ve hit a big milestone — yesterday, we loaded our ONE THOUSANDTH resource! I’ll admit, I’m pretty astounded at that. Here’s a list of what we’ve added since the last blog post:

So, we’ve been working hard at adding new content. Getting to ONE THOUSAND resources is HUGE, in my book! We’ve got more to go, I assure you.

On the technology side, we have converted most of our subscription processing from PayPal to Stripe. We think that’s better for us, and better for our customers. If you have an opinion otherwise, I’d be glad to hear it; for now, we think it will make things easier for users. We have some more work coming soon, this time on the login process.

As always, if you have suggestions for content to add to the database, or questions about how it works, please contact me at comments (at) shipindex (dot) org, and share your thoughts, suggestions, and ideas. Until then, fair winds!

Specific resources that have been removed from the ShipIndex database

In the past I haven’t kept a running tally of content that has been removed from the database, but I have mentioned it. The database takes a huge hit when I have to remove 380,000 citations in one go! As mentioned in the prior post, we’ve looked over all online resources, just to make sure they’re working. Many are not. I am going to make a note of the ones that I’m deleting from the database on the table below; I’ll be updating this over the next month or two as a work through all of the problematic online resources.

RESOURCE NAME NOTES ABOUT THE RESOURCE
National Small Boat RegisterAs noted here, the database has been taken offline for an unknown period. Let’s just hope it does, eventually, return.
Blue World Web MuseumGoogle still shows the underlying data, but it’s not available. This collection had great links to images of ships from artwork.
Containership-info.comJust not there anymore…
NOAA History: NOAA Coast and Geodetic ShipsI could not find any of the NOAA history pages anymore. A few pages may exist on the NOAA pages, but they’re not organized in the same way as in the past.
Union List of Historic Vessels in North AmericaThis one threw me for a loop,

I’ll keep adding to this list as I work through our data.

More resource additions, and a few deletions, too

I provided a list of new resources added to the ShipIndex.org database, back in November. We’re always adding new content, so I’ll include a list of the new stuff at the bottom of this post in an upcoming post. But I also need to address the fact that we have to remove some stuff, as well.

Monographs, or books, are great as resources, because once they’re added, we know they’re not going anywhere. Those books are in libraries and collections around the world. You may not be able to access them right away, but eventually, you’ll be able to do so. Online resources are great because you can link to them RIGHT NOW. Boom, click, done. Except, when that doesn’t work.

Online resources are great for convenience, but not for reliability. They change and disappear all the time. For some reason, website publishers still don’t realize that if they’re going to change their URLs, they’re going to break access for repeat users. They can include redirects, but rarely do. Too often, website publishers switch from a straightforward linking and searching structure to some fancy search tool that removes prior direct links, and makes new direct links impossible. Tim Berners-Lee, the creator of the World Wide Web, defined five stars for Open Data. One of those is making sure that people can point to your stuff. That is, make sure they can link to it easily. If you are required to do a search to get data, rather than also having a direct link that would get a person to your content, then you’re doing it wrong.

As one example, there’s a brand new “Royal Navy Loss List searchable database” at https://thisismast.org/research/royal-navy-loss-list-search.html. It’s nice that this data is here, and you can do a search for, say, “Indefatigable”, and find a record. But you cannot provide a direct link to the “Indefatigable” results, without going through that search page, which is really annoying, at least for those who care about open data.

Unfortunately, in this case, the MAST Loss List database only meets one of Sir Tim’s five stars toward Open Data. They could — and should — do much better.

But even worse is the total disappearance of online resources. Our data team recently reviewed all online resources in the database, and found quite a few which have disappeared, or are currently offline. We discovered a lot of problems that we’ll need to address. In some cases, the fix is pretty easy because there’s an obvious change to the URLs in the database. This was the case for the Bremen Passenger Lists; we fixed them, and they’re accessible again.

For others, though, we see bigger problems. Take the UK’s “National Small Boat Register”, for instance, which was hosted by the National Maritime Museum in Cornwall. At https://nmmc.co.uk/explore/databases/, you can see that the museum reports in an undated note, “The NSBR is currently offline whilst we create a new and improved website. We will have it up and running again as soon as possible. Please check back for further updates.” WHAT???

I’m all for thoughtful and improved websites, but why take down the old one when you’re building the new one??? Why not just keep it up until the new one is live and working?? The old one worked, didn’t it?? (It obviously did, at one point, when we added it to the database.)

There’s nothing to do but delete the National Small Boat Register contents from the ShipIndex database, and hope we’ll discover the replacement database when — if — it is ever put back online.

The Blue World Web Museum recently disappeared, as did other smaller resources. If you manage a vessel database that you can no longer keep online, please, please, please, contact me at comments (at) shipindex (dot) org, and give me a chance to see if we can save that resource for you.

I think I’ll start a separate blog post that lists the online databases that have disappeared; if you know of new sites for any of these, or contacts for folks who might be willing to offload that work to ShipIndex.org, please do let me know.

This got quite long, so I’ll create a separate post that lists the recently-added new content, in a day or three. (After making the post about lost databases, I suppose.)

Last few months of new content!

Goodness, it’s been a while since I added a blog post. We do have a lot of new content that’s been added, and some new great functionality, as well — I need to write something about that, since it is its own big step forward.

But for now, let’s list the content that has been added to the ShipIndex database since May:

There’s more to add soon, and maybe enough to get us over indexes to 1000 resources in the database, before the end of the year. We’ll see!

We have a lot of files still left to process, and we’re going to be adding a lot of new files, too, soon. So, as always, there’s more content coming soon. If you know of a title whose index should be added to the database, please do let me know, at comments (at) shipindex (dot) org — now’s a great time to get some more titles on the list, so they’ll be processed soon!

Recently added content, May 2021

OK, so “recently” isn’t necessarily accurate here; I think that this list covers content added in the past year, actually. We are always adding content to ShipIndex.org, but sometimes it’s slow going. So here’s a list of the content that has been added since my last content post,

New content:

As you can see, it’s a lot of content, even if we haven’t been bragging about what we’ve added through the year. As always, please send a note to comments (at) shipindex.org if you know of a title that you think should be added to the database.

Most Popular Vessel Names in the US

I updated the Merchant Vessels of the United States database today. That’s a big file (~375k entries) and it serves as an interesting collection of personal and merchant vessels.

(There’s a minor error in the import, in that about 10% of the entries – in the Os through Rs – are duplicated. I’m working on correcting that problem. Also, apologies about the layout in this blog post, particularly with the tables. Not sure what the problem is, but I’ll try to correct it.)

Unfortunately, the US Coast Guard has changed their system, and NOAA has dropped their version of the database altogether, so you can no longer link directly to a specific ship. This is very frustrating, but I can’t control other sites’ setups. The URL will take you to the search page, and you can search again for the ship name that you’d found in ShipIndex.

The Coast Guard has also removed tons of personal information about owners of recreational vessels. The remaining information will still be useful to some.

MVUS also creates an interesting opportunity to look at a really large data set, and get a good sense of what vessel names are most appealing to the most people in the US.

Continue reading

Unique Vessel Identifiers Added to ShipIndex.org – the short version

 

I wrote a very long blog post about our new use of Wikidata identifiers, and how it changes the ShipIndex database. It might be too long for some. Here, I hope to limit my overview of these changes to just four paragraphs (not including this one).

The ShipIndex database has grown a lot over the past decade, and now has over 3.5 million citations in it. For very common ship names, this makes it too hard to find information about a specific ship. How do you find information about the right America, when there are 2,378 different citations to work through?

The solution is unique identifiers – basically a specific identifier for each hull. This also allows us to bring together citations for the same ship as it changes names. To make this work, we’re using identifiers from Wikidata, plus local identifiers when Wikidata doesn’t yet have one. Wikidata makes it easy to use Linked Data, so that we can uniquely identify and share items and concepts: the identifier Q82925, represented at https://www.wikidata.org/wiki/Q82925, specifically refers to the author Joseph Conrad, while Q1278752, at https://www.wikidata.org/wiki/Q1278752, specifically refers to the ship named after Conrad, now at Mystic Seaport Museum. It also refers to the original name of that ship, Georg Stage, but differentiates it from the current ship with that name. Similarly, the identifier Q838125 refers to USS Hornet (CV-8), and Q1141355 refers to USS Hornet (CV-12).

When you look at the ShipIndex page for Hornet now, you’ll see eleven different ‘cards’, each one referring to a specific hull, or vessel. Click on any of those ship names, and you’ll see only the citations that have been associated with that unique identifier. And note the URL for each card – it includes the Q-identifier, so others can easily link to it as well, without needing to create some new URL. In addition, many citations have not been associated with any card, because we cannot determine to which vessel the citation refers – at least, not without going back to the original source.

In the future, I hope to offer ways for individuals to help grow this resource, and maybe create a way that people can share their own information – images, reminiscences, comments, online links – about specific vessels. Until then, I’ll be working away at associating as many citations as possible to specific vessels. I hope that this improves your experience with ShipIndex, and helps everyone do more and better maritime history research.

For more information about what we’ve done here, check out the much-longer blog post.

All the Gory Details about Adding Unique Vessel Identifiers to ShipIndex.org

TL;DR: The ShipIndex.org database can now differentiate between disparate ships of the same name, and combine citations using different names for the same ship. We use Wikidata Q-identifiers to do this, and hope that we can help apply Linked Data to maritime history.


This is a very long post about some significant enhancements that we’ve added to ShipIndex.org. For a shorter post about the same thing, that doesn’t go into quite as much detail, see here.


Greetings, from ShipIndex central. It’s been a while since our list blog post, and I referenced some upcoming changes back then. Of course, things have taken a bit more time than I’d expected, but I’m ready to start sharing some of these changes.

Right now, ShipIndex.org has over 3.5 million citations from over 900 resources. If you’re looking for a specific ship with a common name, you’re gonna have a hard time. Looking for a specific “Eagle”? As of today, there are 2,677 citations for ships named Eagle. “America”? There are 2,378 citations. The most common ship names are, in increasing order, “Hope”, “Anna”, and “Maria”, with “Elizabeth” having 3,818 citations, and “Mary” leading by a lot, with 5,072 citations.

Researchers have a big problem in trying to work through those common ship names. And they have a problem when a ship changes its name – it’s still the ship they want to research, but ShipIndex.org doesn’t really connect a user with the ship’s previous or subsequent names. (Well, it did, a bit: if you look at the “America” entry, you can see “related ships” – but how do you know which of the 2,378 citations also refer to Italis, West Point, or Australis?)

What I always wanted was a “unique vessel identifier”. I could bring together citations that refer to the same ship, and differentiate between citations for different ships with the same name. I wasn’t sure how to make that work, until a colleague at my day job suggested using Wikidata identifiers. This was such a great idea, for many reasons.

You’ve all seen the xkcd comic about standards, right? No? OK, here:

 

Same thing with identifiers. Many already exist – naval hull numbers, IMO numbers, national registration numbers, and others – but none refer to all ships, obviously. I didn’t want to create a new identifier, especially since most of the world wouldn’t use it. Wikidata, however, addresses all of these issues, and most importantly it can make maritime history research easier by using these identifiers across the web. For example, a vessel at a maritime museum, like Mystic Seaport’s Joseph Conrad, has the Wikidata identifier Q1278752. This is a “Q-number”, assigned at random. It is unique to this specific item. The Wikidata entry contains other pieces of factual data about the vessel, including its current location, its builder, and some of its dimensions. Anyone can add to the record about any entry.

Every item or entry (including non-physical concepts) in Wikipedia has a Q-number. Look on the left column for any Wikipedia entry (like the Conrad‘s), and you’ll see an entry under “Tools” that’s labelled “Wikidata item”. That links you to the Wikidata entry for the item in question. Information from Wikipedia is incorporated into Wikidata, and all of the information is available for sharing and using on the web. Look on the right of the Wikidata entry, and you’ll see a list of entries on that subject, in numerous different languages.

(As another aside, note the difference here between Wikipedia and Wikidata. Wikipedia contains textual information and discussion about a topic or item. One subject may have multiple entries in different languages. [The German entry about Joseph Conrad is not just a straight translation of the English one. Nor is the Farsi entry.] Wikidata, on the other hand, is a collection of data – just the facts, ma’am – about the topic. Wikidata does not have foreign-language versions of each data page.)

This Wikidata entry for “Joseph Conrad” is different from entries for the Polish author even though the ship is named after him; from a French army officer with the same name; a US army officer; and more. By using linked data in this way, online systems can better identify the person from the ship, making it easier for researchers to find what they’re looking for, quicker.

Over time, I’ll be able to do lots more with the Wikidata that is available to us, as the Wikidata database grows. Hopefully, the ShipIndex.org data will be easier to find online, plus it will be easier to use, because ships with common names will be better sorted, and ship name changes will be better represented.

Last December, my son and I visited the Cradle of Aviation museum in Garden City, NY. While there, we saw this plaque describing the many different ships in the US Navy named “Hornet”:

This is a great example of what we’re trying to do here in ShipIndex.org – sorting and dividing the many very different ships with the same name. (Note here, though, that this plaque differentiates between the last three versions of USS Hornet, saying that CV-12, CVA-12, and CVS-12 were different ships. They really weren’t; they were the same hull, even if they were refitted for different uses over the last decades of service. I don’t know, but I’d be surprised if Navy veterans who served on CV-12 would feel that they served on a different ship than those who served on CVS-12, for instance.)

Anyway, when you go to the entry for Hornet in ShipIndex.org now, you’ll see multiple ‘cards’ at the top of the page – each one represents a different vessel, or hull. We pull publicly available data from Wikidata into the cards we create, and we’ll be able to do more there over time. Right now, the images come from Wikidata; when we don’t have one, we have to put in a placeholder. (Imagine a place where you could post your own information, be it pictures, remembrances, links to vessel-specific sites, etc., about a specific vessel. Interested? Let me know.) We organize citations that are specifically about a particular vessel under the appropriate card.

Of course, not every ship in ShipIndex.org has a Wikidata identifier. Right now, we’re using local identifiers when a Wikidata identifier doesn’t exist, or we haven’t found it yet. Since anyone can create a new entry in Wikidata, we can also create identifiers there, and share our knowledge with the rest of the world.

We’ll never get all, or most, or even many, citations associated with cards. “Hornet” has 820 citations from 221 resources. We have 11 cards, for specific vessels, and each card has between 2 and 66 citations associated with it. So, just 235 of 820 citations are associated with cards. But many entries have basically no descriptive information about the ship at all, and one would need to look at each resource to figure out if the Hornet in question is one of the ones for which we have a card. Even when there is information, it’s often not enough – there are nine citations with “aircraft carrier” in the description, but without looking at each resource, I don’t know if they’re referring to USS Hornet (CV-8) or USS Hornet (CV-12).

But for the time being, it’s very much a start toward doing better research in maritime history. Look at the two Hornet links above, for instance. The URL for Hornet CV-8 is https://www.shipindex.org/vessels/Q838125, and the URL for Hornet CV-12 is https://www.shipindex.org/vessels/Q1141355. There’s that Q-identifier again, right in our URL, so it’s easy to find, easy to use, and easy to link to. This is the basis of Linked Data, and of making online research easier to do, and easier to manage.

As of this writing, we have 1446 citations associated with 70 vessels. That’s 0.000409% of all the citations in the database. Admittedly, we have a long way to go! But it is a start, and getting the underlying work done to make this happen was a big chunk of 2019 – it took a lot of time and work and money.

My next goals, beyond expanding the number of citations associated with vessels, is to make a way that users can help grow this resource. Perhaps you have been researching Hornet, and you know that Albion’s Five Centuries of Famous Ships refers to CV-12, rather than CV-8. If you could share that information, to expand the database a bit, that would be huge.

Then, as mentioned above, maybe you have images, or remembrances, about CV-12 specifically, or you want to link to resources about it online (remember, after the current Coronavirus pandemic passes, you can actually visit USS Hornet in Alameda, California; until then, you can visit https://www.uss-hornet.org/) – what if ShipIndex provided a place where you could post those and share them with others interested in researching a specific vessel? That’d be pretty cool, I think.

I’d love to hear what you think about this enhancement. For me, it’s been a long time coming. Of course, there’s much more to do, but I’m very excited about this significant change.

More new content, and other new stuff coming soon…

It’s time for yet another list of new content. It has been a while since I’ve added to the list here, and to be honest our speed of importing new data has slowed a bit. But we’re still working at it, and we still welcome suggestions of content to be added. Content work continues day in and day out.

On the other side of things, 2019 was actually a year of a lot of development. We are just about to see that come to fruition, in the database itself. I will explain more about that after it has been released, and implemented a bit. I hope that will be very, very soon.

Until then, here’s a list of content added since the last time I posted here — which was, admittedly, quite a while ago, back in November.

New content:

This list includes five additional Roebuck Society volumes, for those interested in Australian history. These volumes are really tough to work through, and take a lot more time than most volumes. For more about them, read my Roebuck Society blog post from September.

We always have more to add, and we’re working through it as quickly as time allows. If you have suggestions, please do let us know. And watch for more big news very soon!