Tuesday, December 28, 2010

Tech Tuesday - Google Translate

Machine translation is an extremely useful tool for genealogists working with other languages, whether you are trying to navigate a website written in another language or read documents pertaining to your ancestors. The two most familiar tools are probably Babel Fish and Google Translate. The widest selection of languages is offered by Google Translate, currently 63 languages.

Google translate works by detecting patterns in documents that have already been translated by humans. The quality of the translation depends on the number of available documents. The best translations are going to occur for languages where Google has had access to a large number of translated documents, languages like French, German, Spanish and English. Machine translation is constantly improving, so if you haven't tried it recently you may be surprised by how good the results are.

Still, Google Translate is not perfect and you will occasionally get strange or nonsensical translations (such as an example given by James Fallows here). Genealogists need to pay special attention when place names or surnames get translated by the machine translator, because names are typically left untranslated. Google translate is a great tool for getting the basic gist of a document or for helping search for information. One useful feature of Google Translate is that it can be added to your Google Toolbar allowing quick and easy translation of web pages. However, when it comes to documents, you should always double check the translation the old-fashioned way, especially because the document might have structures or words on which the machine translator has been insufficiently trained.

3 comments:

  1. "Machine translation" is a term which is used by some to refer solely to rule-based translation using a computer, and by others to refer to all types of translation involving computers, both "pure machine translation" and "computer-assisted translation," which often means translation memory that uses dual corpora. Google Translate is a hybrid of these systems - rule-based is the fallback when dual corpora fitting the algorithm are lacking. Rule-based machine translation definitely has limits, but translation memory-based computer-assisted translation has the potential for continuing improvement, especially when high-quality dual corpora are available and are not outnumbered by poor-quality human translations.

    ReplyDelete
  2. @Greta Koehl

    Such terminological controversies are unfortunately common in every field. Still, I hope the post relates the usefulness of online translation tools for genealogical research.

    ReplyDelete
  3. I would be lost with out them. And do have limits. for instance for Christmas I looked into Merry Christmas. I was given the formal spelling. I google and see there is the intimate greeting of friends and family. Also the second word had variations over the internet.
    People in germany, tell me it is a laugh cause maybe of low german heritage, or another reason. They say use English and we will translate.
    But still I get the gist of what they say, sometimes, I am uncertain, but there are various versions to fall back on. One can translate backward to see how it turns out. That is your best bet. Try new words and keep it relatively basic and simple.

    ReplyDelete