All posts by Peter McCracken

More resource additions, and a few deletions, too

I provided a list of new resources added to the ShipIndex.org database, back in November. We’re always adding new content, so I’ll include a list of the new stuff at the bottom of this post in an upcoming post. But I also need to address the fact that we have to remove some stuff, as well.

Monographs, or books, are great as resources, because once they’re added, we know they’re not going anywhere. Those books are in libraries and collections around the world. You may not be able to access them right away, but eventually, you’ll be able to do so. Online resources are great because you can link to them RIGHT NOW. Boom, click, done. Except, when that doesn’t work.

Online resources are great for convenience, but not for reliability. They change and disappear all the time. For some reason, website publishers still don’t realize that if they’re going to change their URLs, they’re going to break access for repeat users. They can include redirects, but rarely do. Too often, website publishers switch from a straightforward linking and searching structure to some fancy search tool that removes prior direct links, and makes new direct links impossible. Tim Berners-Lee, the creator of the World Wide Web, defined five stars for Open Data. One of those is making sure that people can point to your stuff. That is, make sure they can link to it easily. If you are required to do a search to get data, rather than also having a direct link that would get a person to your content, then you’re doing it wrong.

As one example, there’s a brand new “Royal Navy Loss List searchable database” at https://thisismast.org/research/royal-navy-loss-list-search.html. It’s nice that this data is here, and you can do a search for, say, “Indefatigable”, and find a record. But you cannot provide a direct link to the “Indefatigable” results, without going through that search page, which is really annoying, at least for those who care about open data.

Unfortunately, in this case, the MAST Loss List database only meets one of Sir Tim’s five stars toward Open Data. They could — and should — do much better.

But even worse is the total disappearance of online resources. Our data team recently reviewed all online resources in the database, and found quite a few which have disappeared, or are currently offline. We discovered a lot of problems that we’ll need to address. In some cases, the fix is pretty easy because there’s an obvious change to the URLs in the database. This was the case for the Bremen Passenger Lists; we fixed them, and they’re accessible again.

For others, though, we see bigger problems. Take the UK’s “National Small Boat Register”, for instance, which was hosted by the National Maritime Museum in Cornwall. At https://nmmc.co.uk/explore/databases/, you can see that the museum reports in an undated note, “The NSBR is currently offline whilst we create a new and improved website. We will have it up and running again as soon as possible. Please check back for further updates.” WHAT???

I’m all for thoughtful and improved websites, but why take down the old one when you’re building the new one??? Why not just keep it up until the new one is live and working?? The old one worked, didn’t it?? (It obviously did, at one point, when we added it to the database.)

There’s nothing to do but delete the National Small Boat Register contents from the ShipIndex database, and hope we’ll discover the replacement database when — if — it is ever put back online.

The Blue World Web Museum recently disappeared, as did other smaller resources. If you manage a vessel database that you can no longer keep online, please, please, please, contact me at comments (at) shipindex (dot) org, and give me a chance to see if we can save that resource for you.

I think I’ll start a separate blog post that lists the online databases that have disappeared; if you know of new sites for any of these, or contacts for folks who might be willing to offload that work to ShipIndex.org, please do let me know.

This got quite long, so I’ll create a separate post that lists the recently-added new content, in a day or three. (After making the post about lost databases, I suppose.)

Last few months of new content!

Goodness, it’s been a while since I added a blog post. We do have a lot of new content that’s been added, and some new great functionality, as well — I need to write something about that, since it is its own big step forward.

But for now, let’s list the content that has been added to the ShipIndex database since May:

There’s more to add soon, and maybe enough to get us over indexes to 1000 resources in the database, before the end of the year. We’ll see!

We have a lot of files still left to process, and we’re going to be adding a lot of new files, too, soon. So, as always, there’s more content coming soon. If you know of a title whose index should be added to the database, please do let me know, at comments (at) shipindex (dot) org — now’s a great time to get some more titles on the list, so they’ll be processed soon!

Recently added content, May 2021

OK, so “recently” isn’t necessarily accurate here; I think that this list covers content added in the past year, actually. We are always adding content to ShipIndex.org, but sometimes it’s slow going. So here’s a list of the content that has been added since my last content post,

New content:

As you can see, it’s a lot of content, even if we haven’t been bragging about what we’ve added through the year. As always, please send a note to comments (at) shipindex.org if you know of a title that you think should be added to the database.

Most Popular Vessel Names in the US

I updated the Merchant Vessels of the United States database today. That’s a big file (~375k entries) and it serves as an interesting collection of personal and merchant vessels.

(There’s a minor error in the import, in that about 10% of the entries – in the Os through Rs – are duplicated. I’m working on correcting that problem. Also, apologies about the layout in this blog post, particularly with the tables. Not sure what the problem is, but I’ll try to correct it.)

Unfortunately, the US Coast Guard has changed their system, and NOAA has dropped their version of the database altogether, so you can no longer link directly to a specific ship. This is very frustrating, but I can’t control other sites’ setups. The URL will take you to the search page, and you can search again for the ship name that you’d found in ShipIndex.

The Coast Guard has also removed tons of personal information about owners of recreational vessels. The remaining information will still be useful to some.

MVUS also creates an interesting opportunity to look at a really large data set, and get a good sense of what vessel names are most appealing to the most people in the US.

Continue reading

Unique Vessel Identifiers Added to ShipIndex.org – the short version

 

I wrote a very long blog post about our new use of Wikidata identifiers, and how it changes the ShipIndex database. It might be too long for some. Here, I hope to limit my overview of these changes to just four paragraphs (not including this one).

The ShipIndex database has grown a lot over the past decade, and now has over 3.5 million citations in it. For very common ship names, this makes it too hard to find information about a specific ship. How do you find information about the right America, when there are 2,378 different citations to work through?

The solution is unique identifiers – basically a specific identifier for each hull. This also allows us to bring together citations for the same ship as it changes names. To make this work, we’re using identifiers from Wikidata, plus local identifiers when Wikidata doesn’t yet have one. Wikidata makes it easy to use Linked Data, so that we can uniquely identify and share items and concepts: the identifier Q82925, represented at https://www.wikidata.org/wiki/Q82925, specifically refers to the author Joseph Conrad, while Q1278752, at https://www.wikidata.org/wiki/Q1278752, specifically refers to the ship named after Conrad, now at Mystic Seaport Museum. It also refers to the original name of that ship, Georg Stage, but differentiates it from the current ship with that name. Similarly, the identifier Q838125 refers to USS Hornet (CV-8), and Q1141355 refers to USS Hornet (CV-12).

When you look at the ShipIndex page for Hornet now, you’ll see eleven different ‘cards’, each one referring to a specific hull, or vessel. Click on any of those ship names, and you’ll see only the citations that have been associated with that unique identifier. And note the URL for each card – it includes the Q-identifier, so others can easily link to it as well, without needing to create some new URL. In addition, many citations have not been associated with any card, because we cannot determine to which vessel the citation refers – at least, not without going back to the original source.

In the future, I hope to offer ways for individuals to help grow this resource, and maybe create a way that people can share their own information – images, reminiscences, comments, online links – about specific vessels. Until then, I’ll be working away at associating as many citations as possible to specific vessels. I hope that this improves your experience with ShipIndex, and helps everyone do more and better maritime history research.

For more information about what we’ve done here, check out the much-longer blog post.

All the Gory Details about Adding Unique Vessel Identifiers to ShipIndex.org

TL;DR: The ShipIndex.org database can now differentiate between disparate ships of the same name, and combine citations using different names for the same ship. We use Wikidata Q-identifiers to do this, and hope that we can help apply Linked Data to maritime history.


This is a very long post about some significant enhancements that we’ve added to ShipIndex.org. For a shorter post about the same thing, that doesn’t go into quite as much detail, see here.


Greetings, from ShipIndex central. It’s been a while since our list blog post, and I referenced some upcoming changes back then. Of course, things have taken a bit more time than I’d expected, but I’m ready to start sharing some of these changes.

Right now, ShipIndex.org has over 3.5 million citations from over 900 resources. If you’re looking for a specific ship with a common name, you’re gonna have a hard time. Looking for a specific “Eagle”? As of today, there are 2,677 citations for ships named Eagle. “America”? There are 2,378 citations. The most common ship names are, in increasing order, “Hope”, “Anna”, and “Maria”, with “Elizabeth” having 3,818 citations, and “Mary” leading by a lot, with 5,072 citations.

Researchers have a big problem in trying to work through those common ship names. And they have a problem when a ship changes its name – it’s still the ship they want to research, but ShipIndex.org doesn’t really connect a user with the ship’s previous or subsequent names. (Well, it did, a bit: if you look at the “America” entry, you can see “related ships” – but how do you know which of the 2,378 citations also refer to Italis, West Point, or Australis?)

What I always wanted was a “unique vessel identifier”. I could bring together citations that refer to the same ship, and differentiate between citations for different ships with the same name. I wasn’t sure how to make that work, until a colleague at my day job suggested using Wikidata identifiers. This was such a great idea, for many reasons.

You’ve all seen the xkcd comic about standards, right? No? OK, here:

 

Same thing with identifiers. Many already exist – naval hull numbers, IMO numbers, national registration numbers, and others – but none refer to all ships, obviously. I didn’t want to create a new identifier, especially since most of the world wouldn’t use it. Wikidata, however, addresses all of these issues, and most importantly it can make maritime history research easier by using these identifiers across the web. For example, a vessel at a maritime museum, like Mystic Seaport’s Joseph Conrad, has the Wikidata identifier Q1278752. This is a “Q-number”, assigned at random. It is unique to this specific item. The Wikidata entry contains other pieces of factual data about the vessel, including its current location, its builder, and some of its dimensions. Anyone can add to the record about any entry.

Every item or entry (including non-physical concepts) in Wikipedia has a Q-number. Look on the left column for any Wikipedia entry (like the Conrad‘s), and you’ll see an entry under “Tools” that’s labelled “Wikidata item”. That links you to the Wikidata entry for the item in question. Information from Wikipedia is incorporated into Wikidata, and all of the information is available for sharing and using on the web. Look on the right of the Wikidata entry, and you’ll see a list of entries on that subject, in numerous different languages.

(As another aside, note the difference here between Wikipedia and Wikidata. Wikipedia contains textual information and discussion about a topic or item. One subject may have multiple entries in different languages. [The German entry about Joseph Conrad is not just a straight translation of the English one. Nor is the Farsi entry.] Wikidata, on the other hand, is a collection of data – just the facts, ma’am – about the topic. Wikidata does not have foreign-language versions of each data page.)

This Wikidata entry for “Joseph Conrad” is different from entries for the Polish author even though the ship is named after him; from a French army officer with the same name; a US army officer; and more. By using linked data in this way, online systems can better identify the person from the ship, making it easier for researchers to find what they’re looking for, quicker.

Over time, I’ll be able to do lots more with the Wikidata that is available to us, as the Wikidata database grows. Hopefully, the ShipIndex.org data will be easier to find online, plus it will be easier to use, because ships with common names will be better sorted, and ship name changes will be better represented.

Last December, my son and I visited the Cradle of Aviation museum in Garden City, NY. While there, we saw this plaque describing the many different ships in the US Navy named “Hornet”:

This is a great example of what we’re trying to do here in ShipIndex.org – sorting and dividing the many very different ships with the same name. (Note here, though, that this plaque differentiates between the last three versions of USS Hornet, saying that CV-12, CVA-12, and CVS-12 were different ships. They really weren’t; they were the same hull, even if they were refitted for different uses over the last decades of service. I don’t know, but I’d be surprised if Navy veterans who served on CV-12 would feel that they served on a different ship than those who served on CVS-12, for instance.)

Anyway, when you go to the entry for Hornet in ShipIndex.org now, you’ll see multiple ‘cards’ at the top of the page – each one represents a different vessel, or hull. We pull publicly available data from Wikidata into the cards we create, and we’ll be able to do more there over time. Right now, the images come from Wikidata; when we don’t have one, we have to put in a placeholder. (Imagine a place where you could post your own information, be it pictures, remembrances, links to vessel-specific sites, etc., about a specific vessel. Interested? Let me know.) We organize citations that are specifically about a particular vessel under the appropriate card.

Of course, not every ship in ShipIndex.org has a Wikidata identifier. Right now, we’re using local identifiers when a Wikidata identifier doesn’t exist, or we haven’t found it yet. Since anyone can create a new entry in Wikidata, we can also create identifiers there, and share our knowledge with the rest of the world.

We’ll never get all, or most, or even many, citations associated with cards. “Hornet” has 820 citations from 221 resources. We have 11 cards, for specific vessels, and each card has between 2 and 66 citations associated with it. So, just 235 of 820 citations are associated with cards. But many entries have basically no descriptive information about the ship at all, and one would need to look at each resource to figure out if the Hornet in question is one of the ones for which we have a card. Even when there is information, it’s often not enough – there are nine citations with “aircraft carrier” in the description, but without looking at each resource, I don’t know if they’re referring to USS Hornet (CV-8) or USS Hornet (CV-12).

But for the time being, it’s very much a start toward doing better research in maritime history. Look at the two Hornet links above, for instance. The URL for Hornet CV-8 is https://www.shipindex.org/vessels/Q838125, and the URL for Hornet CV-12 is https://www.shipindex.org/vessels/Q1141355. There’s that Q-identifier again, right in our URL, so it’s easy to find, easy to use, and easy to link to. This is the basis of Linked Data, and of making online research easier to do, and easier to manage.

As of this writing, we have 1446 citations associated with 70 vessels. That’s 0.000409% of all the citations in the database. Admittedly, we have a long way to go! But it is a start, and getting the underlying work done to make this happen was a big chunk of 2019 – it took a lot of time and work and money.

My next goals, beyond expanding the number of citations associated with vessels, is to make a way that users can help grow this resource. Perhaps you have been researching Hornet, and you know that Albion’s Five Centuries of Famous Ships refers to CV-12, rather than CV-8. If you could share that information, to expand the database a bit, that would be huge.

Then, as mentioned above, maybe you have images, or remembrances, about CV-12 specifically, or you want to link to resources about it online (remember, after the current Coronavirus pandemic passes, you can actually visit USS Hornet in Alameda, California; until then, you can visit https://www.uss-hornet.org/) – what if ShipIndex provided a place where you could post those and share them with others interested in researching a specific vessel? That’d be pretty cool, I think.

I’d love to hear what you think about this enhancement. For me, it’s been a long time coming. Of course, there’s much more to do, but I’m very excited about this significant change.

More new content, and other new stuff coming soon…

It’s time for yet another list of new content. It has been a while since I’ve added to the list here, and to be honest our speed of importing new data has slowed a bit. But we’re still working at it, and we still welcome suggestions of content to be added. Content work continues day in and day out.

On the other side of things, 2019 was actually a year of a lot of development. We are just about to see that come to fruition, in the database itself. I will explain more about that after it has been released, and implemented a bit. I hope that will be very, very soon.

Until then, here’s a list of content added since the last time I posted here — which was, admittedly, quite a while ago, back in November.

New content:

This list includes five additional Roebuck Society volumes, for those interested in Australian history. These volumes are really tough to work through, and take a lot more time than most volumes. For more about them, read my Roebuck Society blog post from September.

We always have more to add, and we’re working through it as quickly as time allows. If you have suggestions, please do let us know. And watch for more big news very soon!

More new content; more new Roebuck volumes

The addition of new content to ShipIndex has slowed, but it hasn’t stopped. Here’s the list of resources added since my last posting:

I’ve previously written about the Roebuck Society volumes, and all the content that appears in them. They vary a lot, but there are a number in this import as well. If your ship is mentioned in a Roebuck Society volume, look at the entire volume closely, as it can contain a lot of data in a small amount of space. See the blog post cited above for more information on these publications.

Though the addition of new content is slowing down a bit, we’re working on some cool new functionality on the website itself, which you might see if you poke around enough. I’ll write more about that soon.

Now with 3.5 million citations! (Almost.) And more content.

The ShipIndex database continues to grow: according to this screen grab from our home page, we’re just 126 citations away from 3.5 million citations!

This is certainly a new record for ShipIndex.org content, but it has taken a long time to get here. Several years ago, I had to remove some 380,000 citations from the database, because the online resource containing those citations disappeared. But we’ve been adding lots more content since then, and we’ve recovered and gotten beyond where we’d been.

Here’s content that has been added since the last list I posted. Lots more is in process, as always.

Several titles are worth particular attention. The four volumes published by the Roebuck Society are especially valuable for southern Pacific research, but they’re tricky to use. I wrote a blog post about just those volumes last week, and more titles from the Roebuck Society will be added over time.

Ward’s collection of notes from newspapers, about American activities in the central Pacific, is also interesting — the 7 volume set is remarkable in its own right, printed on heavy paper and with a volume of illustrations and maps, if I remember correctly. It’s organized geographically, which makes finding the entries a bit of a challenge. It’s also probably not a particularly common title, but if it mentions the ship you’re researching, those citations from contemporary newspapers are going to be pretty valuable!

I plan to write a brief blog post about the effects of low technology on this data, regarding the Naval Marine Archive, in the next week or two.

As always, let us know if you have titles to suggest we add.

Adding Roebuck Society volumes

Over the past year, we have been adding a TON of new content to ShipIndex. This should come as no surprise to anyone who’s looked at the blog – pretty much every entry has been just a listing of all the new content we’ve added since the last blog post about new content! Most of that has come from indexes to books, but some has been from online databases and websites.

Recently, we’ve started adding content from a special set of publications. If you’re interested in early Australian history, or Pacific exploration, these will be of particular interest. But they are challenging to search, and challenging to process and add to the ShipIndex database.

They’re published by the Roebuck Society, an Australia-based organization that has published many records about the arrival and departure of ships through Australia’s history. The books themselves are an amalgamation of entries from numerous sources. The content looks like this:

Title page from one of the Roebuck volumes.

A content page from the same Roebuck volume.

It’s tough reading! There’s a lot of information crammed on each of these pages. Luckily, there’s an index to all this madness, but it’s often not much easier to read. Consider this example of an index from the book above:

sample index page from a Roebuck volume

An index page from the above volume.

 

Processing these indexes has been very hard work, and has taken a lot of time and money to complete. Because of the complexity of the indexes and the associated text, understanding these indexes and how to use the information in them takes some work. If a ship of interest to you is mentioned in a Roebuck book, then your best bet is to track down the book itself. Unlike other titles, it just doesn’t make sense to ask for individual pages, based on the index citations.

Remember that you can almost always get almost any book through interlibrary loan from your local public library. It will take them some time, and it will cost them (and possibly you) some money – so be patient and don’t forget to thank them, and support them financially – but in most cases, other libraries will loan these books to your local library, and they’ll loan it to you.

Once you have the book in hand, find the ship on the index page shown, and then see where and how often the ship is mentioned within the body of the text. The entry in the text will give a summary of the ship’s movements, and provides information about the sources (usually newspapers) from which the data is drawn. Many ships are mentioned dozens and dozens of times. Many entries contain data from multiple sources, so – especially for tonnage – many data points may appear for each ship in the index. The printed index notes sources for some of this data, but we have not preserved those notations here.

The Roebuck society has published over sixty volumes, but not all of them relate to vessel information. We have identified about a dozen relevant volumes to add. Some have already been added, and others will soon join the database.

Here’s a list of what’s been added to the database, and what’s in process. Live, in the database, as of publication of this blog post:

In processing, but headed for the database:

  • Broxam, Graeme, and Ian Nicholson. Shipping Arrivals and Departures: Sydney. Vol III: 1841-1844 and Gazetteer.
  • Broxam, Graeme. Shipping Arrivals and Departures: Tasmania. Vol III: 1843-1850.
  • Cumpston, J.S. Shipping Arrivals and Departures: Sydney. Vol I: 1788-1825.
  • Jones, A.G.E. Ships Employed in the South Seas Trade: Vol I.
  • Jones, A.G.E. Ships Employed in the South Seas Trade: Vol II.
  • Nicholson, Ian. Shipping Arrivals and Departures: Sydney. Vol II: 1826-1840.
  • Sexton, R. T. Shipping Arrivals and Departures: South Australia, 1627-1850.
  • Syme, Marten A. Shipping Arrivals and Departures: Victorian Ports. Vol. III: 1856-1860.

 

If you have questions, or suggestions on additional Roebuck volumes to add, or other titles to add, or thoughts on how best to use Roebuck volumes, please don’t hesitate to share it here, or send to comments (at) shipindex dot org.

Happy searching!