New content added; mostly Navy Records Society volumes

Indexes for the following volumes have been added:

One of my goals is to have the entire set of Navy Records Society volumes included in the database. These volumes are fantastic resources in British naval history. I’m working through them, one at a time. Right now, I think I have about a third or so of them in the premium database, though I’ve been focusing one the ones with the largest number of vessels mentioned in their indexes. (Actually, it’s more than a third, because some volumes have indexes for multiple volumes. I count about 53 actual indexes in the database, out of over 150 volumes published, so I’d guess the total number is closer to 60 or so.) Either way, I’ll keep at it…

Great new review of ShipIndex from Charleston Advisor

I got back from the Charleston Conference last night. I couldn’t stay for the conference, unfortunately, but I did get to attend, and present at, a pre-conference. I didn’t present on ShipIndex (though I did meander aimlessly about it while we were working through some technical difficulties and they needed me to say something – anything! – into the microphone…), but I did get some great ShipIndex news while I was there.

ShipIndex was just reviewed in The Charleston Advisor, a well-known and well-respected source for “Critical Reviews of Web Products for Information Professionals”. The review appears in the October issue, and a copy was distributed to all attendees at the Charleston Conference. ShipIndex got 4-1/2 stars, out of a possible 5, and a very positive review. The summary of the review includes this bit regarding content: “This unique, comprehensive and authoritative database provides a wealth of information about ships. Links to external content pull all of the information about each vessel together in one place. It is a perfect database for vessel research.” Regarding pricing, the reviewer wrote, “The database is so reasonably priced it is ridiculous. You get a lot of information for very little money.”

The full review is available online, but costs a whopping $38. (Of course, the journal itself costs $295 for libraries; $495 for others…) Just trust me – it’s very positive.

To top it all off, Charleston Advisor editors gave ShipIndex the 2011 award for “Best Content“! The citation reads “Everything you ever wanted to know about ships has been aggregated in this one Web site aimed at both researchers and hobbyists. The system is packed with information, has a strong user interface and a visually appealing look. This unique service was created by Peter McCracken, one of the cofounders of Serials Solutions.”

ShipIndex also received a “Recommended” review from Choice this summer (June 2011), which described the site as “a needed research tool for maritime history, [and] useful for academic and special libraries with interested clientele.”

Good feelings all around.

Why indexing matters

I’m a huge fan of indexes, especially to magazines (aka serials, or journals), and it frustrates me quite a bit when I find useful journals that don’t have indexes to them. Here’s why.

The most important reason, most definitely, is because an index makes old issues of a magazine useful and accessible. Generally, a person receives and (hopefully) reads a particular issue. After that, the issue is stored, and eventually recycled.

(Or, perhaps, left at the local public library, if it’s not too old. I’m writing this in my local public library, and I have several recent issues of magazines to drop off in the ‘magazine exchange’ area. But the library has an understandable rule that no magazine left here be more than six months old. If that rule weren’t in place, the magazine area would be overrun with decade-old copies of magazines that no one wants, and the library would be left with the work of sorting through and recycling them all.)

When a library receives a magazine, it gets stored on shelves for a while. In niche areas like maritime history, it will likely eventually be sent to an off-site storage facility, as well. If there’s no guide to finding what’s in a given issue, then there’s basically no chance of finding anything in any particular issue. Consider a library catalog’s entry for, say, American Heritage magazine. Published for over 60 years, its subject coverage is represented in bibliographic data by basically a dozen words – and a third are in French, and two thirds of the remaining ones are duplicated. The only unique English words are “United States History Civilization Periodicals”. But with hundreds of thousands of pages in those 60 years, there’s an enormous wealth of information. Which is why they publish their own index to their magazine. Now, all those hundreds of thousands of pages are accessible to anyone with access to the index.

Maritime history publications would do well to make note of this, and to consider how their data is accessed when it’s more than a few issues old. Organizations that publish quality indexes to their resources, and then make that information as available as possible, are to be commended. As one specific example, consider the San Diego Maritime Museum’s publication, Mains’l Haul. Not only do they publish a current index to their journal, they make that publication freely available online. This is so vitally important, and should be aggressively emulated by every maritime history organization, regardless of their size.

People will be seeking articles from the entire run of Mains’l Haul for decades to come, because they take the time to make an index available to all. While it may cost money to do this (though some institutions are able to take advantage of volunteer indexers), I think it’s easy to see ways that that money will be returned in spades, and for decades to come, as people discover that past articles mention something of interest to them, and publishers of such works can then offer reprint services for those articles at reasonable fees, essentially indefinitely.

If a researcher doesn’t know that a person or a vessel is mentioned in a past article, they will not put that publication to use, and that’s a loss to the publisher, to the article’s author – whose work would be useful but won’t be found – and to history in general.

I’d like to make two additional comments:

First, don’t rely on a commercial abstract and indexing service to do this for you; while it’s great to get one’s content indexed in large databases, they will provide, at best, only a cursory summary of each article. They will not be sufficient for someone seeking a mention of a person, ship, or location that’s mentioned in, but not central to, a given article.

Second, a listing of the articles in an issue is NOT an index. (I’m looking at you.) It’s a list of article titles, and nothing more. While I suppose it’s better than nothing, it misses infinite opportunities to guide researchers to the incredible wealth of information that’s contained in a quality scholarly publication.

Please, magazine publishers: index, Index, INDEX! And if you’re really forward-thinking, make the index available for free, to anyone. Put it online as a pdf, as a searchable database, and as a text file that anyone can download and use elsewhere. What you lose in the cost of creating and distributing the index, you’ll more than make up in revenue from providing reprints and back issues, and (perhaps more importantly) in promoting and displaying the importance, value, and reputation, of the journal in question.

The death of the semantic web

I came across some interesting notes while going through old emails the other day. A message from NISO, the National Information Standards Organization, reported that the semantic web is dead, citing a post on semantico. The semantic web is a concept of presenting data in a structured format, usually as ‘triples’ (I am, absolutely, not an expert – or even that knowledgeable – on this stuff, so don’t quote me too far), so a computer can better understand what each term means.

For example, when a computer sees the word “Magellan”, it just sees a word. It doesn’t know if the word refers to an explorer, to a spacecraft, to a mutual fund, a “progressive metal/rock” band, or something else. By defining, through triples, what one means, the computer can realize that one page is talking about the explorer while another is talking about a mutual fund company.

Such semantic definitions have been used extensively in some subject areas, but not at all in most. And one of the great challenges with it is/was solving problems among the “upper ontology” – that is, the layer that connects concepts in zoology with concepts in art history with concepts in electrical engineering with concepts in maritime history, etc. One field may work hard to define its ontology, but if that schema doesn’t mesh with other ontologies, then the systems aren’t really connected.

So I was interested to read of the effective death of the semantic web, and its replacement by schema.org. Schema.org is a nascent project being put together by representatives from the search teams at Google, Yahoo, and Microsoft’s Bing. It uses microformat HTML tags, added to a page’s markup text, to define what something is. This is done for the benefit of search engines – so a “Magellan” that is marked with the tags

  <div itemscope itemtype="http://schema.org/Person">
    <span itemprop="name">Ferdinand Magellan</span>

is clearly a person, while the Magellan that’s tagged

  <div itemscope itemtype="http://schema.org/Product">
    <span itemprop="name">Fidelity Magellan Fund>/span>

is something you can buy. (Note the differences in the end of each first line; the first is “/Person”, and the second is “/Product”.)

(Also: I defined the Magellan Fund as a ‘product’, because one can buy a share of it, but it might more appropriately be an ‘organization’, since there is a ticker symbol associated with it, and schema.org currently has a “tickerSymbol” attribute for Organizations.)

The current schema.org structure is quite limited, and focuses primarily on people, organizations (especially local businesses), creative works, events, and locations. But it’s certainly extensible, and – if it’s generally adopted, as triples were not – it will clearly expand to other fields.

I’d love to take on extending it to vessels. It’d be pretty easy for us to modify our HTML to include these microtags, and if that helps people find the information they’re seeking, then all the better for all involved. But I’m not sure what the proper levels should be. One doesn’t want to have too many levels in a structure like this, but I think that going straight from “Thing” to “Vessel” might be a bit of a jump. I imagine an intermediate step of, perhaps, “Vehicle”, would be appropriate. Then those with interest in cars, trains, airplanes, bicycles, scooters and lots more, would build out their schemas, while we could start a layout of sailing vessels.

It seems simple, but immediately becomes fairly complex. You could, for instance, split up “Vessel” entries to “HumanPowered”, “WindPowered”, and “MechanicallyPowered”, perhaps, then divide by vessel type – canoe, kayak, paddleboat; sloop, ketch, yawl, schooner, brig, brigantine, barkentine, ship, bark, hermaphrodite brig; paddlewheel steamer, ferryboat, fishing boat, battleship, oceanliner; etc., etc. Is that too much differentiation? How do you define a vessel that’s been re-rigged, from a ship to a bark, for example? How, even, do you make it clear that when you’re talking about a ‘ship,’ you’re talking about a three-masted vessel with square sails on the furthest-aft mast, rather than something that floats and is bigger than a boat?

Lots of other terms could be added or defined over time. When the computer can understand what the term means, rather than just presenting the term to the world, it will make it much easier for individuals to draw understanding and make connections from within large bodies of marked-up data.

It would appear that this system, because it’s fairly easily applied, has a much better chance of success than did the original ‘triples’ approach. I look forward to watching it with interest.

“Trust” and identifiers: two great concepts that concept great together

My two primary areas of interest – library science and maritime history – bumped into each other this week in an interesting way. On the MARHST-L discussion list, there’s been much talk about various vessel identifiers. They haven’t really been called that; participants have been discussing “Official Numbers,” the Mercantile Navy List, and Lloyd’s Numbers, and it has made me think of IMO numbers, Hull Identification numbers, USCG Documentation numbers, naval identifiers (ie, PT-109, CV-42, etc.), and various other vessel identifiers.

On the library side, the latest issue of NISO’s publication, Information Standards Quarterly, is now available, and the entire issue is about identifiers. There’s an article about ISNI, the International Standard Name Identifier; ORCID, the Open Researcher & Contributor ID; the Names Project; the use of SAN, the Standard Address Number, in supply chains; I² and ISNI, and more. I² is the Institutional Identifier; I was very briefly on this working group before I left my previous job.

In that previous job, I used the ISSN, the International Standard Serial Number, a great deal. But for various reasons, it didn’t fill our bill, and we had to create our own unique identifier. We loved and used the ISSN, but it wasn’t quite the complete thing. The identifier we developed was (and, to my knowledge, still is) only used in-house, though it could have had great application elsewhere.

At ShipIndex.org, I believe my future includes developing a new vessel identifier.  (Yes, I know.) I’ve presented on this before, at both library conferences (with slides) and at maritime history conferences, but it hasn’t started to be developed yet. As I see it, when that starts to happen, it’ll be through our website, and it’ll be by individuals who want or need an identifier. People who know more about a specific ship than I do will be able to collate various citations that refer to a single ship, even if its name has changed, and improve the quality of the community’s knowledge about those ships. They’ll also be able to do lots, lots, more, but that’s more for another time.

This isn’t a great description of what I have in mind, and I don’t know if I’ll ever be able to get it built, but I have high hopes for what ShipIndex, with the help of its constituents, can create. I aim to do whatever I can to make it happen. Alas, it’ll take a lot of time and money, and those aren’t yet in abundant supply.

In the NISO publication, Geoff Bilder has a great Op-Ed piece about Trust and Identifiers. I once had a beer with him (I honestly can’t remember if it was in Asheville, NC, or Edinburgh, Scotland. If it was in Edinburgh, I trust it was a scotch, not a beer. I feel certain it was at a UKSG conference [which, possibly, could have placed it in Torquay or Coventry; not just Edinburgh], but at the same time I also think there was discussion of skipping out on the NASIG conference [with someone else, not Geoff] to try and catch UNC and ECU play baseball in the NCAA playoffs. But I digress…) and I probably made a fool of myself. He’s clearly an incredibly smart guy, and I know that what he writes is worth reading, and worth reading closely.

I would like to see ShipIndex.org become the trustworthy source, as described by Geoff, for vessel identifiers. I think it can happen, if only because I’m not sure anyone else is ready to do it. If you’re willing to help, let me know.

New Linking Relationships

Yes, I know it’s been far too long since I posted something here. As ALA Annual rapidly approaches, however, lots of news is coming up. I added a big file a month or so ago, and I’ll add a note about that soon.

Right now, I want to mention a great linking arrangement that we recently settled on, with the good folks at Accessible Archives, who digitize 18th and 19th century publications. We’re actively collecting links to ships mentioned in the newspapers in their Civil War Collection, so you can find mentions of ships in those newspapers.

Read more about this in the recent press release, either via PR Newswire, or at the Accessible Archives website. I’ll write more about this soon.

Don’t forget that we’ll be in New Orleans in about ten days, at the American Library Association Annual Conference! We’ll be at Table 3818. See you there.

Upcoming Conferences – SCELC, ACRL, NGS, ALA

At the American Library Association Midwinter conference in San Diego last month (where it was wonderfully warm and sunny, compared to the 8-12” of snow dumping outside my window at the moment), we ran a promotion for librarians, which we called “We Sing Sea Shanties on the Show Floor”. When librarians signed up for a free trial of ShipIndex.org, I’d sing them a sea shanty, right there on the convention floor.

Folks from Perkins Library, at Hastings College, filmed the first shanty I sang, then posted it to their Facebook page. They also promoted their ShipIndex.org trial on the campus radio station! Very cool.

Anyway, it was a rousing success, and we’ll do it again at the ACRL conference in Philadelphia, at the end of March. If you’re attending, please make a point of visiting us at Table 155. Bring your IP ranges, and I’ll sing you a shanty!

We’ll also be at the following conferences and gatherings:

  • SCELC Vendor Day, March 3, Los Angeles. I’ll also be the keynote speaker at the SCELC Colloquium the day before, but I won’t be talking about ShipIndex. Instead, I’ll talk about an idea I have for improving the way libraries manage electronic resources – especially the niche ones, like ShipIndex. So, it’s relevant to ShipIndex but it’s more of a proposal of something I’d like to see someone else build than a pitch for ShipIndex. Those occur on Thursday, the 3rd, at 10:50 and 1:40.
  • National Genealogical Society Conference, May 11-14, Charleston, SC. Here, we’ll be talking more about our individual subscription offers. Charleston is a great city; this should be a fun conference. We had a great time at NGS last year.

If you attend any of these conferences, please come by and say hello! If you know of other conferences we should attend, please let us know; we’d be interested to hear about them.

More new content, question about shipwreck info

The following files have been added to the premium ShipIndex.org database in the past few days:

The last one listed describes shipwrecks around the world. A correspondent suggested that we add more content surrounding shipwrecks, which I thought was a great idea. This is a start. I understand that there are a number of diving guides regarding shipwrecks, specifically intended to help divers locate particular sites. I’d love to know more about those, and get some examples from folks. If you have any ideas about such items — either books or websites or other sources — please let me know by email or in the blog comments section below.

Thanks.

New content added recently

Content from the following resources has been added to the premium database in the past few weeks:

In addition, a number of resources were update. Several hundred new vessels were added to the entry for IrishShipwrecks.com, and corrected URLs were added to several databases where the URL structures had changed.

The premium database now contains over 1.53 million citations.

On Naming Ships and Representing them in ShipIndex

At present, ShipIndex.org has one point of access: the vessel name. You’d think that would be fairly easy, at least in the case of extant vessels: just look at the stern or the bow, and see what’s written there. Alas, it’s not that simple. There are many reasons for this, and a lot of them are completely understandable. Others can lead to surprisingly interesting stories.

While working through the index to the first 50 years of Steamboat Bill, and its successor, PowerShips, I came across many, many mentions of the Queen Elizabeth 2. Most of these are listed under the very common, abbreviated name, “QE2”. In the ShipIndex database, however, one also finds many entries for a different version of the name, “Queen Elizabeth II”. I read a bit about the ship on its Wikipedia page, and learned some interesting stories about how the name came about. According to the contributors, the name of the ship was not announced before the launching. Cunard intended to name the ship “Queen Elizabeth”, but the Queen, when she launched the ship, stated “I name this ship Queen Elizabeth the Second.”

The next day, newspapers announced the name as “Queen Elizabeth II”, though when the ship was delivered its name read “Queen Elizabeth 2”. According to Wikipedia, “From at least 2002 the official Cunard website stated that ‘The new ship is not named after the Queen but is simply the second ship to bear the name – hence the use of the Arabic 2 in her name, rather than the Roman II used by the Queen’, however, in a change in 2007 this information had been removed.”

In addition, there’s confusion about who the ship is named after. Multiple sources provide multiple suggestions. Some feel the ship is named after the current Queen, and that, in fact, she made that change when she announced its name. Others state that it is named after her mother, the wife of King George VI. Others state it’s named after the previous Cunard ship named Queen Elizabeth.

We need to make it possible for people to find ship names however they might be represented, and so we’ve created functionality that allows one to link between variant names for specific ships. So, for example, when you search for “QE2”, you find entries that cite “QE2”, but you also find a link at the top taking you to entries for other variant names for this ship, specifically “Queen Elizabeth 2” and “Queen Elizabeth II”.

We also have the ability to ‘normalize’ ship names, and in that case, one goes directly from a misspelling of a ship name to the correctly spelled entry. So, by rights, we should ‘normalize’ “QE2” and “Queen Elizabeth II” to “Queen Elizabeth 2”. But I think that, in this case, for this very famous ship, it’s worth maintaining the separate entries and linking them together via the “alternate spelling” links. Maybe I’m wrong; should I just normalize them all together? What do you think?

We also show links for previous and subsequent names of ships. So, if you search for “Euterpe”, you’ll see a “subsequent name” link to “Star of India.” It is important to remember that if there are multiple ships with the name “Euterpe,” the link appears, but doesn’t apply to all of them. Creating a system that separates out all these ships is a big project, but one that we will tackle.

One great thing about the Steamboat Bill files is that they include many previous and subsequent vessel names. Unfortunately, they don’t exactly indicate the order in which vessel names appeared; you’ll see both “Liberte; a) Brasil; b) Volendam; c) Monarch Sun; d) Volendam; e) Island Sun; g) Canada Star h) Queen of Bermuda” and “Queen of Bermuda; a) Brasil; b) Volendam; c) Monarch Sun; d) Volendam; e) Island Sun; f) Liberte; g) Canada Star”, as well as “Island Sun; a) Volendam”. So, some research is needed to figure out the order in which the ship names appeared. Then, I still have a question about whether or not I should include all of the previous and subsequent names in each entry or not. In the above example, if I determine that the actual path of ship name changes was Queen of Bermuda, then Brasil, then Volendam, then Monarch Sun, then Volendam (again), then Island Sun, then Liberte and finally Canada Star”, do I include ‘subsequent name’ links from Brasil to Volendam, Monarch Sun, Island Sun, Liberte, and Canada Star? That creates a lot of links. Or do I just have a link from Queen of Bermuda to Brasil, and on Brasil a link to Volendam?

And if I list all previous or subsequent names for a ship that had the same name twice, then in this case the entry for Brasil (and Queen of Bermuda, and others) will have multiple ‘subsequent name’ links to Volendam. The page for Volendam could conceivably have a link back to itself!

What do you think? What’s the best way to represent this important data?