Category Archives: Website Improvements

New feature: Introducing stopwords

One of the neat things about having an online database is that one can study data to figure out how to make the system work better. This wouldn’t be the case if this were, say, a CD-ROM product.

I can look at all the searches that have been done on the site in the past year or so. In doing this, it’s clear that a lot of people include terms like “USS”, “HMS”, “USCGC” and other descriptive terms in front of the ship name. Others include vessel descriptors, such as “schooner” or “steamer”. For a long time, I’ve wanted to have a way of ignoring those terms, because it will get users to the content they really want more quickly. However, as with most things, it’s not as easy as it seems.

It’s easy to have a list of stopwords — words that are ignored in searches. Many search tools do this, so when you include “the” or “an” in a book title, Amazon doesn’t bother to search for these words. Of course, they still need to make exceptions to deal with searches for the band “The The”, and the like. And in the case of ShipIndex.org, one still needs to be able to search for “HMS” in a name, since some ships do have that as a legitimate part of their name – though none of them are part of the British Royal Navy.

So anyway, I reviewed the list of search terms, and came up with specific words or phrases that need to be ignored. Then we (and by “we” I don’t mean me – I mean the excellent development team that turns these ideas into reality) created the tools to ignore these words, and also show a results message that says, basically, “We ignored this term, but you can repeat the search without ignoring it if you like.”

The result will be a significant improvement in the results that people see when doing their searches. Let me know if you like it or if you don’t.

Some more challenging website problems

I discovered yesterday that some people are having problems creating accounts and then subscribing to the database. This appears to be a result of a restructuring of our backend, which was incredibly valuable, but had some unintended badness. Testing before release didn’t uncover these problems, and in fact they continue to be difficult to nail down – though there is no question at all that they are happening.

The technical team is working on it and I hope we’ll have a solution live as quickly as possible. If you have information to share about what didn’t work for you, that could help us troubleshoot the problem, and I’ll also be sure to let you know when everything is working again.

Having these problems right after returning from the genealogy show at Olympia, in London, is incredibly frustrating, but I suppose there’s never a good time for these problems to crop up. We’ll get them straightened out as soon as we can.

New subscription options; new backend; new currencies

Lots of big changes are now live at ShipIndex.org. The site has just been significantly upgraded, and has much more power than before. Most of this isn’t visible; it’s primarily back-end work, but it will make importing data much quicker, and will also allow for much more flexible access to the world. IfWhen we are mentioned on NPR or in the New York Times (God willing), we should be able to handle the rush.

There are some significant changes for users, though. We now offer fixed-length subscriptions: you can buy access for just two weeks, for three months, for six months, or for a year. You can still subscribe on a monthly basis, and that price has been slightly lowered.

Also, in a big development, you can now pay for access in multiple currencies! If you want to pay in Pounds Sterling, Euros, Australian Dollars, or Canadian Dollars, you can now do that. What this really means is that I absorb the cost of the foreign transaction fee rather than you, but it also means you can feel more comfortable about the cost of the database, particularly if you’re not too familiar with the value of the American dollar.

The new pricing is as follows:

Monthly recurring:     $8 per month

Time-limited subscriptions are as follows:
Two weeks:     $ 6
Three months:  $22
Six months:    $35
One year:      $65

At the moment, I know there are some bits of webcopy that need to be updated, particularly more information up front about the pricing changes. I’ll get to those as quickly as I can.

Please tell me what you think about these changes. What other changes do you think would be helpful?

Full text links from within ShipIndex

ShipIndex.org links to the full-text for nearly 85% of its citations! Before Mike ran the numbers, I guessed that a conservative estimate on links to full text would be at about 70%, so the 85% number was quite a surprise, but it’s true.

How did we do this? First, we’re linking to lots and lots of content online. There are so many free online resources with information about ships out there, and I feel like I find another one every week. But other than ShipIndex, there’s no place that brings all these resources to one place, and no way to search all of them at once. However, with ShipIndex, that’s what you’re doing. But that doesn’t get one to 85%.

Recently, we started looking for resources in Google Books. The next time you’re searching in ShipIndex and you see a hotlinked page number, try clicking on that page number. It should take you right to the page of the book within Google’s Book Search project.

Here are two examples from freely-available resources:

  • The citations for Aroostook, from Paul Calore’s Naval Campaigns of the Civil War, has a link to page 128, and the vessel is mentioned near the start of the last paragraph.
  • The citation for City of Pekin, from Arthur Clark’s The Clipper Ship Era, has a link to page 86, and the ship is mentioned about 2/3 of the way down the page.

This was an interesting experience, and I learned a lot when we did it. The goal was to try and link directly to the page that cited a specific ship. I discovered four different levels of Google Books linking:

  • No content: The book just can’t found, or it’s cited but offers no view into it at all
  • Snippet view: With snippet view, you really do only get just a touch of the book, and it’s hard to know how much or what you’ll get. Most importantly, you can only search by terms, you can’t ask Google to show you all of a specific page.
  • Preview: With preview, Google offers most of the pages of a book. This is common for recently-published works, and Google works with the publisher to figure out what they’ll show. The idea, obviously, is to show enough that someone wants to go out and buy the full book.
  • Full view: For these books, Google shows the entire thing. These are primarily books that are out of copyright protection – so, published before 1923.

We only activate links for books that are available via Full View and Preview — and we only do the Preview if it appears that most links will get to the page in question. We’ve found a few titles that are available in Preview, but so many links go to pages that aren’t visible, perhaps because the publisher only allows 10-20% of the book to be shown via Google Books, that it seems misleading to offer those links.

Links to Snippet views don’t work because there’s no way to get to a specific page. You could try to search for the ship name, but if the ship name is something like “Elizabeth”, then you’ll get every mention of “Elizabeth” in the book – including names of people, not just ships. Also, the searches just don’t work as well. This could be a result of problems in OCR work, too – if the OCR work isn’t very good, then Google won’t find specific phrases, and with the page linking, we’re going to a specific page, not searching for a ship name in the book’s text.

So, as a result, you’ll most likely find linking to Google for very old books (via Full View) and very new books (via Preview).

The horror stories about metadata in Google Books are very true. It’s a mess for any slightly complicated title, such as multi-volume sets. So, finding Navy Records Society volumes — especially multi-volume works that weren’t published consecutively — was sometimes quite a challenge. And, in some cases, volumes that should be available just aren’t. I found one book that was completely upside down. Others have lousy scan quality. But the fact is that an enormous amount of content is available from anyone’s computer now, and it will only improve.

Try it out; see what you think.

New feature: tracking ship updates

Here’s the third blog post for the morning. It’s definitely the most exciting. We’ve just released the mostest coolestest feature of ShipIndex since starting the site. (OK, so that’s admittedly my personal opinion, but I think it’s also a fact.)

Effective immediately, anyone with an account (that is, anyone who has created a username – you don’t need to be a subscriber) can be notified whenever a ship page is updated with new information. So, if you’re particularly interested in a vessel named Unanimity, you can go to that page, click on the button near that top that reads “NOTIFY ME when this page is updated”, and then whenever new content is added, you’ll get an email telling you so!

If you’re a subscriber, you’ll see what resource the content is from. You can go to the page directly, and check out the new citation.

If you’re not a subscriber, you’ll be notified that new citations have been added. You may decide it’s finally time to take gain access to everything that’s available on the site. Or, perhaps you use ShipIndex.org through a subscription provided by your local public or academic library. Go to your ship’s page and locate the new citations, which are always marked by a “new” icon for 45 days from the addition of the resource.

You’ll get just one email containing updates for all the ships you’re tracking, not a separate email for each ship, or each citation. Emails are sent in batches, several times per week, reflecting all the data added since the last update.

When you’re done following a vessel, you can just go to the ship page, click on the button that reads “CANCEL NOTIFICATIONS for this ship”, and the emails will stop.

You need to be logged in, so that we can keep track of how to notify you when a page is updated. But, as mentioned above, you DON’T need to be a subscriber. Also, from your profile page, you can see all the vessels you’re tracking, and clear all your notifications, or go to each page and modify them individually.

I truly believe this is an enormous step forward in what we’re offering via ShipIndex.org. You no longer need to come to the site to check on updates regarding the ships that interest you; we’ll take care of that for you. Now, when new citations are added for the ships that matter to you, you’ll be the first to know.

Please try it out, and let us know what you think. Remember: you do need to have an account, but you don’t need to be a subscriber.

I hope you’re as excited about this as I am.

New feature: Passenger and Crew Lists icons

When we exhibited at the National Genealogical Society conference recently, we quickly learned that lots of genealogists are looking for passenger and crew lists. We knew we had some of them in the ShipIndex.org database, but they weren’t identified. I’m pleased to report that we now have an icon to indicate which citations describe passenger or crew lists.

Mike built the functionality a little while ago, but I hadn’t activated it until today. If you currently have access to the premium database, check out the following searches to see it in action:

  • Admiral Lyons
  • Loreto – results are way down at the bottom; they come from Mystic Seaport’s New London Crew Lists database
  • Lady Amherst – this result is from a database that I just discovered and loaded today, of immigration lists for vessels headed to Australia in the 19th century. It offers links to digitized versions of microfilmed versions of hand-written passenger lists.
  • Acropolis

You do need to be logged in to see these icons, at least at the moment.

Let me know what you think!

Content, Conferences, and Enhancements

Oh, man. I’m so far behind in updating the world on what ShipIndex.org is up to. A few important points:

New content. I uploaded several files today. So far, they’ve included:

The first fills a brief gap; I had already imported volumes 2 and 3, but had had a problem with volume 1, which I’ve since fixed. The last resource, H. T. Lenton’s volume, is a really big, important one. It’s got just over 23,000 citations in it. Many of these are for unnamed vessels, such as Landing Crafts, with names like “LCM.21” or “LCM.234”. I think this is important content for those doing research on these rarely-known vessels. I wrote a lot about the processing I did on this one on the resource’s information page here.

I’m very pleased to get this one imported; it adds immeasurably to the World War II content for those doing in-depth research into naval movements during the war.

With these additions, we’re now just 24 citations short of 1,325,000 citations. Perhaps I’ll find a small set to add some time today.

Past Conference. Two weeks ago (man, time flies!), we went to Salt Lake City for the National Genealogical Society conference. That was a great event, and we had a super time talking with genealogists and learning how we can improve the product we provide for them. We also had a fine time talking with folks from other companies who we can partner with, to the benefit of all involved.

There’s so much to do as a followup on that, and we’re working away on it. That’s a good problem to have, but wow, what a pile of work on our plates. On top of all of that, I’m still working on adding content, and Mike is plugging away at enhancements and new features. Both of us are also working on some neat possible partnerships, plus adding institutional subscribers here and there.

Ship Normalization. Speaking of new enhancements, Mike has built a really valuable new tool that will have a huge impact on a lot of the data that we have from a few major resources. One drawback of projects where a print resource (especially a 19th century one) is digitized and put online is that the print-specific space-saving conventions are applied to an online environment. For example, the schooner Abbot Lawrence is represented in different volumes as “Abbot L’wr’nce”, “Abbott Law’nce”, “Abbott Lawr’nce”, and (obviously) “Abbot Lawrence”. All of them mean describe the same vessel, and in a print volume, that’s easily discerned. But online, the computer doesn’t know that when you search for “Abbott Lawrence,” you’d also like to see the other variations above. That is, unless you have Mike on your side, who has created a tool so that we can bring them all together (that is, ‘normalize’ them). And that’s what we’re doing. The process is quick and accurate, though there are enough entries that it’ll take quite a while.

But, we’re doing it, and we’re making all those other entries available, despite the proliferation of apostrophes.

Next Conference. Finally, I’m headed to a conference at Mystic Seaport tomorrow – it’s a joint conference for a number of organizations, including the Council of American Maritime Museums, the North American Society for Oceanic History, the Steamship Historical Society of America, the National Maritime Historical Society, and the Society for Nautical Research. What a group!

I’m looking forward to telling folks there about ShipIndex.org, and I hope I won’t run out of brochures. If you’re going, and would like to get together at some point, please drop me a line.

Cool new enhancements!

Well, we’ve done a ton of stuff since coming back from Boston. While in Boston at the ALA Midwinter conference, Mike and I met with about fifteen different people to get feedback on how to improve the site. Each meeting was about 45 minutes long, and the whole experience was really fantastic. We met with academic reference librarians, public librarians, electronic resources librarians, genealogy librarians, authors, content providers, folks with library services businesses that we admire, and tons more. We came away with pages and pages and pages of modifications to make.

Some of these changes are/were easy, and some will be a lot tougher. On Saturday, Mike put new code up on the site, and many of the changes are now visible there. Since we do a lot of iterative releases, we don’t use ‘release numbers,’ but if we did, all the enhanced functionality that has just gone live would definitely deserve a ‘dot version’ – like, say, from 2.1 to 2.2. And, in fact, it probably would deserve an upgrade from version 2.x to 3.0, because of the new institutional access that I’ll get to later. (That doesn’t have much front-end visibility, but it has been a huge change on the back end.)

Here are a few of the changes you’ll see:

  • A “new” icon next to any item added in the last 45 days.
  • Better layout on the results pages
  • Better diacritics management
  • Links to resources open in new windows
  • More, and updated, information on the webpage, especially regarding individual subscriptions
  • A completely new “librarians” tab, with information for librarians, regarding our new institutional service

In addition, he created a number of tools that will help us better identify and proactively correct data issues.

With the new importing tools, I’ve imported several new files in the last few days, and have also started to go back to improve and reimport some of the older files. There are a number of files in the freely-accessible collection that have illustrations but don’t indicate that on the results pages. I’ve already corrected a few of those, and more will be corrected soon. Those don’t count as “new” resources, and they remain freely-accessible.

The biggest deal, though, is INSTITUTIONAL ACCESS! We can now offer subscriptions via IP-authentication, for institution-wide access. Check out our librarians page for more information about this. If you’re interested in a setting up a trial for your institution, please drop us a line at sales (at) shipindex (dot) org. Or recommend us to your local librarian! We can provide access for academic, public, special, and other libraries. And, to top it off, we’re offering “plankowner” discounts for institutions that join us before June. Contact us soon for more information.

This release is a big deal all around for us, and it’ll lead to a lot more content being added (two completely new resources have already been added today, and four have been improved and updated over the past two days). Results will be easier to use, and of course institutions can now subscribe, as well.

We’ve got more improvements and enhancements in the works, so let us know about any changes you’d like to see.