All posts by Peter McCracken

Comments catastrophe!

OK, maybe not “catastrophe,” but it looks like we haven’t received any comments that folks submitted via our ‘contact us’ form, since the end of March. If you emailed us directly, which you can always do, at comments (at) shipindex (dot) org, then we did get it. But if you submitted it via the form, we didn’t. Many apologies.

We’ll get the form fixed asap, and I’ll update this post, and add a new post, when we know it’s working again. Our tech team has a bug running through it (the kind that affects humans, not the kind that affects computers), so we’re a bit understaffed today. As a result, it might take us a day or so to troubleshoot and fix the problem. Until then, and after if you prefer, feel free to email us at comments (at) shipindex (dot) org, or email me directly at peter (at) shipindex (dot) org.

Oh, we also added three Navy Records Society volumes yesterday, and got another one complete and ready to be loaded very soon. Also, we’ve been working on an absolutely enormous file for weeks and weeks and weeks, and I hope it’ll be done very soon. Man, I’ll be glad to be done with that file. Sheesh.

Peter

Is there a better way to present this data?

I’ve been working on a big file that’s going to be very useful to ShipIndex.org subscribers, especially those interested in World War II vessels. H.T. Lenton’s tome, British and Imperial Warships of the Second World War, is an incredible resource. Its 750+ pages are absolutely jam-packed with useful content, but it has presented me with a few challenging issues about how to manage this data. I thought I’d describe some of it here, explain what my plan is, and see if the greater good has any better suggestions. There’s still time to modify how this resource is managed. I’ve probably invested at least 30 full hours in preparing this file – and that doesn’t include a significant amount of work done by another person before me – and I still have a long way to go. But that’s what it takes, sometimes, to get a resource like this one ready to add to the database.

The first part of this remarkable volume looks at larger, named vessels, organized by vessel type and class. As one example, the “Corvettes and Frigates” section is divided into entries on the “Flower” class, the “River” class, the “Kil-” class, and four more classes. (The introduction has several fascinating paragraphs about the peregrinations of naming vessels, and shows how complicated the whole process was. A fair bit of background knowledge is required just to understand this section!) After some commentary on the design and development of the class, Lenton provides tables showing brief history information for every vessel in a class. Information may be quite extensive, or it might consist of as little as an indication of the intended builder and the approximate cancellation date (for example, for vessels ordered but not begun before the war ended).

This works fine for named vessels, but creates a conundrum for unnamed vessels. In the LCM (Landing Craft Mechanised) section, for example, the index notes that “LCM.21-118” appear on pg 490; “LCM.119-220” on pg 491, “LCM.221-334” on pg 492, etc. Of the 100+ ships on each page, though, just two to three dozen have any information at all about the vessel, and that information is slight, at best. For the LCMs, most have no Building or Completion information. Of the ones that have “Fate” information, it usually reads something like “Lost cause unknown Algiers ../11/42.” (Meaning it was lost in November 1942, but the exact date and cause is not known.)

To me, this information might be useful to someone, and I don’t want to not include the entry for that vessel. But for each one like that, there are several where no information at all is included, and I believe that adding an entry to ShipIndex.org should imply that at least SOMETHING is available in the resource. So I’ve decided that what I’ll do is expand entries like “LCM.21-118” to be “LCM.21”, “LCM.22”, “LCM.23”, etc., up to “LCM.118”. Then I’ll compare my list with the book itself. If there’s any information at all about the vessel, I’ll keep the entry. If there is no information beyond its listing on the page – nothing about where it was built, or how it was lost, for instance – then I’ll delete it. My thought is that if the volume offers one piece of information, I’ll include the vessel name in the index.

Still, it’s worth noting that for people who are working on an unlisted LCM, the volume may contain information about the LCM class that might be relevant. And if you’re looking for an image of a specific auxiliary vessel, it may be that an image of a different vessel in the same class will do. It appears that the most common vessel type in which this will apply will be the LCMs, of which several thousand were built, but it will be interesting to see how it actually turns out.

Am I doing the right thing? Should I be handling this in some other way? Is there some other way that I should note the amount of information presented? I’d welcome your comments – if there’s a better way of doing it, now’s the time for me to hear about it.

Two new institutional subscribers

I’m pleased to report that two institutions have signed up for institutional subscriptions in the past week. Everyone at the ShipIndex.org world headquarters is excited about this. The two institutions are pretty far away from each other: East Carolina University, which offers an excellent Masters program in Maritime Studies (admittedly, as a graduate of that program, I might be a bit biased), and the Australian National Maritime Museum, in Sydney. So, while they may be some 9700 miles apart from each other, they share excellent company. (BTW, try using Google Maps to get driving directions from one to the other. In a nutshell, drive across the country to Gas Works Park in Seattle, kayak 2756 miles to Hawaii, drive down to Honolulu, then get back in the kayak and paddle 3879 miles to Japan. Why even stop in Hawaii? Really, Google? You then still need to paddle another 3300 miles down to the top of Australia, and then drive down to Sydney. I think it would be easier to just drive the 100+ miles to the Outer Banks, and start paddling from there, through the Panama Canal, and straight down to Sydney. But who am I to questions Google Topeka?)

Any individual associated with ECU (that is, any student, faculty, or staffmember) can access the complete ShipIndex.org database within Joyner Library, anywhere on campus, or from home. Anyone working from within the ANMM library, in Sydney, can similarly access the entire ShipIndex.org premium database.

Several other institutions are currently trialing ShipIndex.org. If you’re affiliated with an institution that you think might benefit from access to the database, please have them give us a call. We’re still offering significant plankowner discounts that can save them a lot of money.

Index to Nautical Research Journal added

This blog is slightly delayed, but it’s definitely better late than never. I added some particularly valuable content last week and haven’t yet mentioned it here on the blog. In addition to several books, I’ve added entries from the first 40 years of Nautical Research Journal, 1948 to 1995. This is particularly valuable to researchers because NRJ is written for model shipbuilders, so it provides lots of technical and specific information about individual vessels. Just a day or two after adding it, I was able to put it to great use in assisting one subscriber in looking for an illustration of a specific ship. Among just these three resources, I added over 15,500 new citations, and over 1600 completely new vessels. The specific resources just added are:

As always, if you come across specific resources that you’d like to see added, please let us know, at comments (at) shipindex (dot) org.

ShipIndex.org Trial at Coastal Carolina Univ

Students, faculty, and staff at Coastal Carolina University can access a trial of the institutional version of ShipIndex.org for the next month or so. Please check it out. More information is posted at the News from Kimbel Library blog. If you use it in Kimbel Library and find it useful, please be sure to tell a librarian.

Several other institutions are also running trials of ShipIndex.org — is yours? If not, check with your friendly reference librarian or electronic resources librarian and ask them to contact us to see about getting one set up. (We provide institutional subscriptions to — and therefore run trials for — public libraries, academic libraries, historical societies, maritime museums, and more. It may be that an institution you know could provide access for you so you don’t need to subscribe yourself!)

New Content: Am Nep index, 1991-95

I just uploaded content from a five-year index to American Neptune, covering 1991 through 1995, to the premium database. Adding this sort of content is likely the most useful for the most users; it’s great to have one place where you can locate content online from numerous sources, but locating print-only content is a lot trickier.

I have a number of other journal indexes to add, and I want to get them done as quickly as I can. I’m working on it…

The ships mentioned in the 50-year index to American Neptune remain in the freely-available database; they’ll stay there permanently. Newly added content, however, will be going into the premium database.

As always, let me know of content you think should be added.

Lots of new content…

I just finished uploading 5000+ additional citations today, which reminds me that it’s about time for an update on what’s been added in the past few weeks. With the most recent import, I’ve added content from three of the five books in Paul Silverstone’s “U.S. Navy Warship Series”. The series covers the history of the US Navy from 1775 to 2007, in a series of five attractive and comprehensive books, published by the Naval Institute Press and Routledge.

I’ve added Civil War Navies, 1855-1883; The Navy of World War II, 1922-1947; and The Navy of the Nuclear Age, 1947-2007. I still need to work through and add index content from The Sailing Navy, 1775-1854 and The New Navy, 1883-1922. Of the ones I’ve added so far, the WWII volume (added today) and the Nuclear Age volume each have over 5,000 entries in their indexes. The Civil War volume has many fewer, and unfortunately doesn’t include any merchant vessels in the index, which is certainly a shame.

Anyway, here’s a list of most of what I’ve added since the last listing of newly-added content, nearly a month(!) ago:

That’s a pile of stuff! Multiple Navy Records Society volumes, which are particularly valuable for those studying British naval history; the Silverstone volumes and the PMARS database for those working on US naval history; Early South Carolina Newspapers Database for those interested in Southern US colonial history; several resources for steamship buffs (especially the steamship postcards available in Newman’s online collection); Mains’l Haul for Western and general history, and some random things, as well. In the past month, it looks like I’ve added content from two journal indexes, two online resources, and a pile of books.

You can always see new content added to the database on the resources page. Any content added in the past 45 days will have a “NEW!” icon next to it. As you can see from that page, that adds up to a lot of new stuff.

In addition, I’ve reimported most (but not all) of the freely-available files, so that they’ll show the illustration icon when they’ve got one. Those files were added to the database before we had the illustration and “main entry” icons, and you can still tell that an entry has an illustration — usually when the page number is in italics — but it didn’t show the icon. By processing and reimporting those files, the icons are now appearing. I’m still working on one big file, but I’ve covered a lot of the others. That’s some of what’s going on at ShipIndex world headquarters.

As always, let me know if there’s content you’d like to see added (more NRS volumes are on the way, as are a couple of important journal indexes), or if you have any other items to share.

Cool new enhancements!

Well, we’ve done a ton of stuff since coming back from Boston. While in Boston at the ALA Midwinter conference, Mike and I met with about fifteen different people to get feedback on how to improve the site. Each meeting was about 45 minutes long, and the whole experience was really fantastic. We met with academic reference librarians, public librarians, electronic resources librarians, genealogy librarians, authors, content providers, folks with library services businesses that we admire, and tons more. We came away with pages and pages and pages of modifications to make.

Some of these changes are/were easy, and some will be a lot tougher. On Saturday, Mike put new code up on the site, and many of the changes are now visible there. Since we do a lot of iterative releases, we don’t use ‘release numbers,’ but if we did, all the enhanced functionality that has just gone live would definitely deserve a ‘dot version’ – like, say, from 2.1 to 2.2. And, in fact, it probably would deserve an upgrade from version 2.x to 3.0, because of the new institutional access that I’ll get to later. (That doesn’t have much front-end visibility, but it has been a huge change on the back end.)

Here are a few of the changes you’ll see:

  • A “new” icon next to any item added in the last 45 days.
  • Better layout on the results pages
  • Better diacritics management
  • Links to resources open in new windows
  • More, and updated, information on the webpage, especially regarding individual subscriptions
  • A completely new “librarians” tab, with information for librarians, regarding our new institutional service

In addition, he created a number of tools that will help us better identify and proactively correct data issues.

With the new importing tools, I’ve imported several new files in the last few days, and have also started to go back to improve and reimport some of the older files. There are a number of files in the freely-accessible collection that have illustrations but don’t indicate that on the results pages. I’ve already corrected a few of those, and more will be corrected soon. Those don’t count as “new” resources, and they remain freely-accessible.

The biggest deal, though, is INSTITUTIONAL ACCESS! We can now offer subscriptions via IP-authentication, for institution-wide access. Check out our librarians page for more information about this. If you’re interested in a setting up a trial for your institution, please drop us a line at sales (at) shipindex (dot) org. Or recommend us to your local librarian! We can provide access for academic, public, special, and other libraries. And, to top it off, we’re offering “plankowner” discounts for institutions that join us before June. Contact us soon for more information.

This release is a big deal all around for us, and it’ll lead to a lot more content being added (two completely new resources have already been added today, and four have been improved and updated over the past two days). Results will be easier to use, and of course institutions can now subscribe, as well.

We’ve got more improvements and enhancements in the works, so let us know about any changes you’d like to see.

Last night’s dream

So, I don’t usually remember my dreams. It’s just the way I am. When I do, though, I try to pay attention.

Last night, I dreamt that I was visiting a library, and meeting with librarians there. Not too unusual, except for a few things. First, there was a freeway running through the library. Well, not running through it — I think the library and freeway were built at the same time, so really, they were part of each other. You could say the freeway had a library built around it. It did mean, though, that there were some pretty weird twists and turns to the building.

Anyway, while meeting with the librarians, one showed me an index I’d always hoped existed, but had never actually seen. She thought I’d be interested in it, and I certainly was. It was a spiral-bound index to the New York Times, on various special subjects. It was an annual volume, so presumably there were many, many others — hopefully one for every year since 1851, or maybe a bit more recent.  There were tabs to different subjects covered by the index, and one of them, about two-thirds of the way through, was an index to — wait for it — wait for it — ships, mentioned in the NYT. Ah… love at first sight. Truly.

I had looked for such a thing in the past. Well, not really, actually — I’d looked for ships listed in the annual volumes of the NYT Index, but I’d never looked for a separate, supplemental index to the NYT. Could such a thing exist? Sure it could. It’s the NYT, after all. So I was absolutely thrilled to find this. I wrote down as much bibliographic information as I could, so I could find a library that owned such a thing once I got home, and then review every single volume of it, to collect citations for every vessel mentioned in the New York Times.

When I woke up, there was, of course, no such piece of paper next to my bed. So, alas, I still don’t have an index to ships mentioned in the NYT. But if it existed in my dreams, it seems there might be a very, very small chance that it exists in real life, right? If you know of such an index, please, please, please let me know. I’ll be forever in your debt…

The messiest metadata yet…

I’m used to messy metadata (that is, data about data – so in this case, data that describes the contents of the ship register), but today I’ve really hit a snag. I found a very nice resource online that has digitized many years of a useful ship register. But the data describing the data in that register is so bad that I wonder if it’s worth adding to the ShipIndex database at all. In some instances, every second entry is obviously wrong. At this point, I’m up to the “Ae”s (after working through the ship names that started with question marks), and I’ve got a long, long way to go. It didn’t take that long to collect this data, but it’ll take forever to correct it.

What I find so frustrating is that if the compilers of the data had spent, say, just three solid days of work going through this file, they could have corrected tens of thousands of errors before they ever sent it out into the world.

Here are some examples. When you see a series of ship names like

Aéro-Poatale IV
Aeropostale I.
Aéro-Postale I.
Aéro-Postale II.

it’s easy to see that the first one is not “Aéro-Poatale”.

Or when you see the following (the second field is the launch date; the third is the tonnage):

Affaric 1934 239
Affarie 1934 239

you know they’re they same ship, and it’s easy enough to determine that the ship name is Affaric, not Affarie.

Or just below that, the following series of ship names:

Afghanistan 1940
Afghanistan 1917
Afghantstan 1905
Afghauistan 1917
Afghauistan 1917

(The second, fourth, and fifth all describe the same vessel.) If the vessel you’re searching is the 1905 one, and you use the term “Afghanistan”, you won’t find it via the native interface.

Here’s another good one a bit further down. Apparently they weren’t sure which way the accent should go.

Agnés 1896 120
Agnès 1896 120

(A quick look at the pdfs they link to shows that the second one is correct.)

Speaking of diacritics, who knows how many ship names are inaccurately represented here because the compilers decided to just ditch the diacritics? Here are three different versions of the same vessel:

Hillev?g 1885 877
Hillevåg 1885 877
Hillevg 1885 877

Many diacritics are replaced with questions marks – probably as a result of some hinky encoding issues – but many others are just deleted. When I can find them, I put them back — so someone who knows a ship’s name will be able to find it — but I’m afraid that’s not going to happen most of the time.

Also, numerous blank spaces are missing from ship names. While this reflects how the data appeared in the original resource, it doesn’t consider how people use the database they’ve created. If I’m searching for the 1922 vessel Pacific Commerce, which appears in the database, how do I know that I should also search for “PacificCommerce”, which will also return a result for the vessel I’m seeking? If I don’t fix entries such as these, they’ll create “ships” in the ShipIndex database with names like “PacificCommerce” or “PacificFir” – and later, I’d have to go back and fix them all. And, of course, the fix is not that difficult – just put in the spaces in the appropriate locations. I may use regular expressions to simplify this work, though that does raise the possibility of adding unintentional errors. (But it’d be worth it; it’d fix far, far, far more errors than it’d introduce.)

I certainly wouldn’t expect total accuracy in a project like this. In some cases, the originals that were OCR’d were very poor quality microfilm. But what frustrates me is that a quick pass over the spreadsheet, as I’m doing, would identify tens of thousands of these errors.

Problems are not limited to ship names. There are more than four hundred entries whose build dates are well after the issues were published; all of those are clearly wrong. In other cases, when one build date is 1980 and another one, for a ship with the same name and size, is 1930, it’s easy to know that the latter is correct and the former is wrong. Here’s an example:

Alan Seeger 1943 7208
Alan Seeger 1913 7208

A quick look at the entry for the second one confirms that, while one can see why the OCR software thought it said “1913”, a proofer could easily identify the error (as I did, for instance), and correct it to read “1943”.

And what concerns me is that if I don’t clean up most of this data now, then it’ll get into the ShipIndex.org database, and make a mess that I’ll have to clean up eventually. But I think it will take me many, many hours to go through this and correct it all – and who knows what I’ll miss, and will still get introduced to the database. If I import the data, warts and all, then try to go back and correct it later, there will be that much more to clean up.

I’m quite frustrated by this, because it’s so clear to me how much positive impact cleanup would have had on the original database itself. As it is, I’m making it more reliable to search this database through the ShipIndex interface than through its native interface (for example, the person searching for the 1905 Afghanistan would find it through ShipIndex.org, but not through the original site), but it’ll take a long time before I can get the file done and ready to load.

What a shame.

Aéro-Poatale IV
Aeropostale I.
Aéro-Postale I.
Aéro-Postale II.