Category Archives: Genealogy

ShipIndex as a Vessel Name Authority File

[This entry was written long ago, but not posted, because I was having problems with uploading images. As you'll see, images are a critical part of this post! Now that I've gotten that problem resolved, I will add a few more posts soon. PMc]

Last May, I finally completed one very large file for import. This file was incredibly tough to process, but I learned a lot about how one can use the database, and I thought I’d share that information here.

The database is Mariners and Ships in Australian Waters, and it is a collection of transcribed passenger lists for thousands of voyages to Australia, primarily in the 2nd half of the 19th century. Because most records were handwritten, and then transcribed by volunteers, many, many errors crept into the database.

The database has 58,311 records in it. (I believe more are always being added to the website itself, as transcribers complete their work.) One major difference between this and every other resource is that each voyage has a separate entry. In the Ellis Island Database, a user searches by ship name, then goes in deeper by voyage date. In this case, the collection is organized by arrival year, then arrival month, then ship name – so I had to create a separate entry for each voyage, to be able to link to each transcription.

I quickly realized that there were many, many, many errors in the transcription of vessel names. Just looking over the ship names as they appeared in the spreadsheet, it was easy to spot typos – especially with the additional information I had about masters and tonnage, which helped connect a misspelling to a correct spelling.

After correcting numerous such misspellings, I did a test import of the file and found 1707 new ship names would be added to the database. I started to investigate each of those, and found that many were not actually new ship names – they were simply additional mistranscriptions of the passenger lists. As the database grows, it’s important to try and minimize the introduction of incorrect ship names.

For example, I saw this entry, which the transcriber recorded as “Maealsar”. The master’s name had been transcribed as “C M de Boer”, and the vessel size as 305 tons.

I thought it looked a bit like “Macassar”, but there were no other “Macassar”s in that file. I did a search in for Macassar (, and found an entry from the American Lloyd’s Register of American and Foreign Shipping for the same year, and found a Macassar there, with a captain C. M. De Boor, and tonnage of 306. Obviously, these are the same ship.

I corrected the vessel name, but kept the mis-transcription, too, just in case I was wrong. So the entry now looks like this: “Macassar (corrected; listed as “Maealsar”) (of Amsterdam, C M de Boer, Master, 305 tons, from the port of Balaves to Sydney, New South Wales, 23 Mar 1861)”.

Another example was this name, which had been transcribed as “Magport”:


I thought it looked like it started with an “N”, but found no “Nagport” already in the database. However, a search for “nagp*” turned up “Nagpore”, among others, and a link to the entry of Record of American and Foreign Shipping for the same year returned these two ships:


One has the same master and tonnage as the one in the transcription. It then becomes clear that there’s an “e” hiding behind the bar on the page, rather than a “t”.


I felt like it became a combination of genealogy and authority record work. I tried to find sufficient documentation to prove that my analysis was more accurate than the original. And because I had both the entire set of metadata from the source, and the 2.3 million citations already in the database, I could more easily determine that various transcriptions were incorrect.

I recognized that is beginning to serve as an authority file for vessels. It is certainly my goal to improve the database along those lines, and I will use another blog post to discuss this further.


I found many instances of doing this sort of research, and while it took a very long time, it was actually quite fun to nail down a correction. Some were surprising – I guess I can see why one might read this as “Princess of Water”:


 But why in the world would you not recognize that “Princess of Wales” makes infinitely more sense for a ship name?


I’ll provide two last examples here. This first one shows how I used the existing metadata for the resource itself to determine the correct ship name.

The beautiful handwriting on this one made it easy to read, and it’s not surprising that it was transcribed as “Oasby”. But there was only one entry in the entire file for “Oasby”, and none in the existing database, so it made me wonder.

authblog-6A search through the metadata for the captain’s name, however, found 17 entries with Kennedy as captain (as had been noted in the transcription for this entry), for ship “Easby”, and the full resource has at least 70 other entries for “Easby”. Tonnage data is the same, and after learning of the existence of “Easby”, it’s easy to see that that’s what the ship name was; and the top of the dramatic ‘E’ was lost in the digitizing process.

This made the next new ship name, “Oaton Hall”, easy to resolve to “Eaton Hall”.

Finally, I dealt with this challenging entry by using the existing database:

authblog-7I tried searching for “waurego”,  but that returned no ships. By searching for “*rego”, I found all the citations that had a word in the ship name that ends in “rego”. I could easily locate “Warrego”, and confirm that’s the right ship.

There’s other searching that could be done here, too. If I change the search to “*rego$” it returns only the ship names that actually end in “rego”, deleting several, like “Trego Renneger” or “Effrego Ventus”, from the result list.

I’ll put together another post in the next few weeks with more examples of changes and corrections I was able to make, along with a discussion of the importance of authority data for ship names.


Upcoming Genealogy Conferences in NY State will be at several conferences in the next few months.

Come see us in upstate New York, at the first New York State Family History Conference, in Syracuse, September 20-21. It’s co-sponsored by the Central New York Genealogical Society and the New York Genealogical and Biographical Society. The sessions look like they’ll be pretty interesting, and it looks like they might also be pretty packed, as a number of meals and lectures are already sold out.

The exhibit hall is free and open to the public. Come visit us at the Holiday Inn & Conference Center, in Liverpool (just outside Syracuse). Hours on Friday, Sept 20, are at least 9:00 to 4:30, and possibly as long as 8:00 am to 6:00 pm (different sources have times; I’m seeking clarification). On Saturday, the hours are 8:00 am to 3:30 pm. Come pick up a bottle opener, try the database, as questions, offer suggestions, or just say hello!


If you live ‘downstate’, come see us at the second Genealogy Event, in Manhattan, Saturday, November 2, at the Metropolitan Pavilion, located at 125 W 18th St, New York City. This is the second year of this neat event, and I very much enjoyed exhibiting at it last year. The format this year is a bit different, however. Last year was a two-day event; this year, it’s one long day — in this case, a Saturday. The hours are 10am to 8pm, with all sorts of sessions and workshops through the day.

The exhibit hall will be open the whole time, as well; unlike in Syracuse, you do need to pay to access the exhibit hall. In any case, genealogy gatherings are rare in New York City, and a lot of people attended the inaugural event last year. I look forward to attending, but I think I will need to leave a bit early (like, maybe a half-hour early) to catch a bus home — given the cost of hotel rooms in Manhattan, it makes sense to catch a late bus home, if at all possible. I may miss the last half hour of the show, but the first 9-1/2 hours should be great! Please stop by and say hello if you’ll be there.


Beyond the state borders, I anticipate attending Who Do You Think You Are? LIVE, in London, in February 2014, and the National Genealogical Society conference, in Richmond, Virginia, in May 2014. If you know of other events you think I should consider, please let me know.

“The Genealogy Event” in NYC, and Glazier ‘Immigrants to America’ series

I went to a great genealogy conference in New York City a few weeks ago. It was just before Hurricane Sandy came through; we knew the storm was coming, but we didn’t know how bad it was going to be.

The conference itself, though, was great. Called simply “The Genealogy Event”, it took place at The Metropolitan Pavilion, which was a beautiful space. I particularly liked the wood floor, rather than the standard poured concrete. A range of exhibitors attended, and it was fun for me to see folks I’d met at previous conferences, as well as meet some new ones. It was great to see a genealogy conference in a super-major city. NGS and FGS conferences are almost always in smaller cities, so putting this unaffiliated, independent event in a big city was a great move.

Interactions with attendees were also great; one highlight definitely was seeing a friend from library school who I hadn’t seen in maybe 10 years. I guess I knew he was in Manhattan, but it is a big place – it’s not likely that you’ll run into someone you know there, to say nothing of inside the conference hall!

I had two separate versions of another very interesting interaction. A woman came to me with a copy of her ancestor’s naturalization papers. On it, her ancestor had recorded his arrival at Ellis Island on board the ship Le Havre in about 1906, if I remember correctly. She told me that the folks at Ellis Island had said no ship existed with that name, and she wanted to see if I could help. I quickly looked up “le havre” in the database, and did expect to find a lot of ships with that name. However, in fact, there were just a handful of entries, and their timing didn’t match with passenger vessels of that era at all.

Now, granted, there are many, many ships that are not (yet) included in the database. But for immigration ships, I’d say it’s pretty comprehensive. Records for that time period are quite complete, and lots of databases and books cover the period (to say nothing of entries in, say, the magazine Steamboat Bill / PowerShips). Given the total lack of entries for that period, I felt that the folks at Ellis Island were correct. I pointed out that the date on the naturalization papers was 20-some years after the ancestor said he’d arrived, so it’s reasonable to assume that he just remembered incorrectly. Or, perhaps, his English was still not very good when he completed the form, and when an officer asked him what ship he arrived on, he instead answered with where he sailed from.

The solution to tracking this down, I think, is to look at the appropriate volumes edited by Ira Glazier & others, such as Italians to America, Germans to America, etc. These books, which transcribe thousands of passenger lists from the National Archives, are organized by date, then by vessel name. So if the ancestor’s date of arrival was correct (certainly not a given, since the ship name was wrong), then the researcher could locate the appropriate volume – first by nationality (for the proper series), then by date (for the proper volume), then by day, and then look at vessel entries.

One huge disappointment about these volumes (for me) is that they have no vessel index to them. Since they were clearly machine-processed, it would seem a vessel index would have been easy to generate, but as far as I can tell, it wasn’t done. A year or so ago I had a bee in my bonnet about creating such an index, but I tried one path to doing it and found it to not work. After these interactions in NYC, I went back to trying it. My results were actually better than I’d expected, but I am still afraid it will take far too long to do this work. I’ll keep thinking about it, though. I would love to make it work; I think a vessel index to those volumes would be incredibly valuable.

In any case, and even without the important vessel index, these Glazier volumes are a valuable tool. While I’m not certain of it, I believe that these volumes are not included in,,, or any other genealogy aggregated databases. There are so many resources that are not in these mega-databases; they’re fantastic places to start, but it’s important to not stop there!

In the spirit of encouraging further research in this area, here is a list of all the Glazier “Immigrants to America” volumes of which I am aware. Two series, Italians to America and Emigration from the United Kingdom to America, are still being published.

  • Germans to America: Lists of Passengers Arriving at US Ports.
    • 67 volumes, covering January 1850 to June 1897.
  • Germans to America, Series II: Lists of Passengers Arriving at US Ports in the 1840s.
    • Seven volumes, covering January 1840 to December 1849.
  • Italians to America: Lists of Passengers Arriving at US Ports.
    • 28 volumes so far, covering January 1880 to April 1905.
      (Vols. 27 & 28 were published in June 2012, by Scarecrow Press.)
  • Emigration from the United Kingdom to America: Lists of Passengers Arriving at US Ports.
    • 18 volumes so far, currently covering January 1870 to December 1881.
      (Vols. 17 & 18 were published two weeks ago [Nov 2012], by Scarecrow Press.)
  • Migration from the Russian Empire: List of Passengers Arriving at the Port of New York.
    • Six volumes, covering January 1875 to June 1891.
  • The Famine Immigrants: Lists of Irish Immigrants Arriving at the Port of New York, 1846-1851.
    • Seven volumes, covering January 1846 to December 1851.

(Thanks to Jared Hughes at Rowman & Littlefield for helping me confirm the publishing information above.)


The Genealogy Event was a great event. I hope it’ll become an annual event; I plan to attend as often as I can.