New feature: Introducing stopwords

One of the neat things about having an online database is that one can study data to figure out how to make the system work better. This wouldn’t be the case if this were, say, a CD-ROM product.

I can look at all the searches that have been done on the site in the past year or so. In doing this, it’s clear that a lot of people include terms like “USS”, “HMS”, “USCGC” and other descriptive terms in front of the ship name. Others include vessel descriptors, such as “schooner” or “steamer”. For a long time, I’ve wanted to have a way of ignoring those terms, because it will get users to the content they really want more quickly. However, as with most things, it’s not as easy as it seems.

It’s easy to have a list of stopwords — words that are ignored in searches. Many search tools do this, so when you include “the” or “an” in a book title, Amazon doesn’t bother to search for these words. Of course, they still need to make exceptions to deal with searches for the band “The The”, and the like. And in the case of ShipIndex.org, one still needs to be able to search for “HMS” in a name, since some ships do have that as a legitimate part of their name – though none of them are part of the British Royal Navy.

So anyway, I reviewed the list of search terms, and came up with specific words or phrases that need to be ignored. Then we (and by “we” I don’t mean me – I mean the excellent development team that turns these ideas into reality) created the tools to ignore these words, and also show a results message that says, basically, “We ignored this term, but you can repeat the search without ignoring it if you like.”

The result will be a significant improvement in the results that people see when doing their searches. Let me know if you like it or if you don’t.

Leave a Reply

Your email address will not be published. Required fields are marked *