Updated OCLC WorldCat data – 20% more, and more accurate

I’ve updated an important resource, adding 20% to its contents, and improving the accuracy of all of the data in it. When we converted ShipIndex.org from a hobby to a business, we worked with OCLC to get a file of books by or about ships. For more about how these records are used, see the first of two posts about WorldCat records, here.

In any case, we agreed with OCLC that these records would remain in the free database, rather than the newly-created subscription database. There were about 40,000 records in that file. Last month, I had the opportunity to visit OCLC’s headquarters, in Dublin, Ohio. While there, I received an updated version of this file, which now contains over 50,000 authority records for ships.

I worked through the file, doing cleanup and corrections, and spent a few tries at loading the file into the ShipIndex.org database. It wasn’t as easy as other files, because the OCLC records are fully Unicode compliant. The database likes UTF-8, but Unicode is a bit beyond its abilities. (Actually, not in its abilities to display vessel names, but in its abilities to store them.) I replaced vessel names in Cyrillic, Japanese, Chinese, etc., with their transliterated names, and also removed a lot of the Unicode characters that were causing problems.

I also fixed a lot of names that I hadn’t fixed the first time around. Most of these were ship names with prefixes attached, like “USS Daffodil” or “HMS Daffodil” or “S/S Daffodil”. It’s always best to search without those prefixes. I have cleanup still to do on those leftover ship names, but the new records are live and I can do the cleanup later.

So now, as a result, the OCLC WorldCat resource has grown from about 40,000 to about 50,000 citations, and the metadata is much improved. All of these citations are in the free database. This is a big improvement all around. Thanks again to OCLC for creating this file for me!

Leave a Reply

Your email address will not be published. Required fields are marked *