Monthly Archives: December 2009

Data correction work at ShipIndex.org

We’ve completed our first initial load of a large pile of content into the premium ShipIndex.org database, and now have 1,231,909 references in the database. That’s a lot of content. We do have tons more to add, and it’ll keep coming in over time. Now, however, we’ll turn to the process of cleaning up some of this data.

Obviously, having all this data in one place is, I believe, a huge benefit, and well worth the subscription price for the premium database. We make it possible to search through well over a million references, from about 125 resources, in less than a second. The quality of much of this data, however, often leaves something to be desired. And now I’m turning to doing some cleanup, which I believe will be an equally valuable benefit provided by our site.

Data problems come from lots of different sources; some resources include prefixes, such as “USS” or “HMS” in front of vessel names, so many American naval vessels are currently listed on the ‘U’ page, as in this screen shot:

Many 'USS' listings in ShipIndex.org

They’re also listed in the proper location, under the name of the vessel, so it means there are several places to look. That’s no good, and we’ll fix that.

Another problem is attempts to save space in 19th century printed directories. One will find many entries with apostrophes in them, like the following:

Abbreviated entries in ShipIndex.org

As a subscriber to premium content, you can follow any of these particular links to find that most of these transcriptions accurately reflect what was written in the original publications. But that was done to save space in the printed directory; the stern of the ship certainly read “Duke of Newcastle”, not “D’keof N’wcastle” one year, “D’keofN’wcastle” another year, and “D’ke of N’wc’stle” a third year. (And there is a transcription error, as well: one reads “D’ke of N’woastle” rather than “D’ke of N’wcastle”.) Since no researcher would reasonably think to search for “D’ke”, we’ll work to change all of these to be searchable under “Duke of Newcastle”. (Don’t worry, if you do want to search for “D’ke”, you still can.)

A third problem is simple transcription errors, and there are many of those, from lots of different sources. In addition to the one noted above, several errors appear in the vessel named “D’le pf Suth’rl’nd”. The original source appears as:

ScreenHunter_03 Dec. 24 10.52

So, part of our value-add is correcting these errors. The data quality team at ShipIndex.org has lots of experience with this, since we’ve been doing something similar for the past ten years with magazine titles. (Trust me, they are far more complicated than ship names.)

Of course, with 1.23 million entries, it’ll take us a while to get through the entire database. It’s a fairly slow and meticulous process – though the technology team at ShipIndex has done a great job creating a panoply of tools to simplify the process and speed it up. (The technology team spent much of the past ten years building the tools that the data quality team used when working on magazine titles, so we’ve got it all pretty well covered.) It’ll take time to work through everything, and we’ll definitely be adding more data before we finish this process – meaning it’ll take that much longer – but it will happen. And if you see an error you especially want corrected, please don’t hesitate to let us know.

Thanks for your interest, and have a great holiday season.

A Page A Day – Moby-Dick

I somehow stumbled across an interesting site today, called “One Drawing for Every Page of Moby-Dick”, in which an amateur artist is creating a drawing based on the text of each page of Melville’s Moby-Dick. The overview shows sets of each pages that have been done so far, and the blog provides info on the more recent pages. Each work is done on “found paper” — discarded books, actually — and done with whatever type of materials the artist chooses. He does about 20-25 pages per month.

Interesting.

Hot Snot! ShipIndex is back in business!

As Doc Hudson says when he takes over as Lighting McQueen’s crew chief in the Piston Cup tie-breaking race, “Hot Snot! We are back in business!”

Over the past ten days or so, the crew at ShipIndex.org had some technical issues that we had to address, but we worked on ‘em, and we solved ‘em. Over the course of today, you’ll see a dramatic increase in the number of references in the index; assuming nothing else goes haywire, there should be over ONE MILLION references in the index by the end of tomorrow. We’re adding content from one major resource, and will be adding content from many other resources, as well, through the course of the next two days.

Keep an eye on the number of entries in the premium database through the course of the day. At the moment, it’s at 713,476, but it’ll be growing rapidly.

Reimporting data over the next few days

We’re doing some more tweaking to the content in ShipIndex.org, and will need to do some reimporting of some data — OK, a lot of data. Initially, a pile of premium data will disappear, but worry not — we’ll add it all, and much, much more, in the next few days. It’ll go in just as quickly as the machine will allow, but there is a huge pile of data. No free data will disappear. Stick with us!

Thanks, Peter

ShipIndex is going to Boston!

The entire staff of ShipIndex.org, plus support staff (ie, a spouse), will be headed to Boston in mid-January, for the American Library Association Midwinter conference. We’re going to be meeting with librarians, site users, content providers, and others, to talk about what works and what doesn’t work on the current site, and how to improve it. We know our users have great ideas about what else we could do; we want to collect and record as much of that as we can. We’ve done only a little bit of this in the past, and it has paid back big dividends, so we expect that with up to 20 separate interviews, we’re going to get a lot out of the experience.

If you’d be interested in joining us, please send me a note and we’ll set up a time. It’ll take under an hour of your time – we’ll meet at a wifi-enabled coffee shop near the Boston Convention Center (note: not the Hynes Convention Center – and location suggestions are greatly welcomed), and ply you with coffee, pastries, and our undying gratitude. We’ll spend a few minutes showing you the site, then ask you to use it yourself, and give us your feedback. We’ll discuss your thoughts and ideas, which I think will be the best part. It’ll take about 45 minutes, total.

We’ll be doing these every hour, on the hour, Friday through Sunday (the 15th, 16th, and 17th), from 9 through 5, with a break for lunch. If you’d be interested in participating, please drop me a line at peter at shipindex dot org and we’ll see if we can make it work.

Peter