Monday, November 29, 2010

Google linguam latīnam addit

Google’s machine translation services have taken another step forward, adding an “alpha” version of Latin. I tested it out with a variety of English-to-Latin challenges (and vice versa), and the results look promising. As an example, here’s something that might look familiar to the Classicists among you.

This brings them up to 58 languages — quite impressive! They are also experimenting with audio now. The results for the most common languages — English, French, Spanish, et al. — are pretty decent. The results for their newer offerings are, well, rather comically bad.

7 comments:

  1. Google Translate works off of bilingual corpora, so feeding it well-known passages is not a fair test. You should instead use obscure works, or better yet, compose your own Latin prose and see what you get.

    ReplyDelete
  2. I’ll leave that to you. Feel free to report back on your experiences.

    ReplyDelete
  3. David Doughan11/29/2010 3:11 PM

    Just tried this very rapidly:

    Latin (hurriedly invented):Attamen , patres conscripti, censeo Carthaginem delendam esse.

    Google: However, the Members of the Senate, in my opinion to Carthage to be destroyed

    Not very good, IMO ...

    ReplyDelete
  4. Agreed. But I feel it’s a promising start for an “alpha” release. Presumably, the beta version will be better, and the final version better still. Of course, there will always be lacunae in the lexis. It’s interesting Google would take aim at Latin, a language if not dead then at least in deep repose.

    ReplyDelete
  5. The current limitations of this specific bilingual corpus can be tested thus:

    I entered another famous line from the Aeneid (VI.268): ibant obscuri sola sub nocte per umbram, and got "On they went dimly, beneath the lonely night through the gloom", which can be called acceptable, being almost verbatim the Loeb translation (with "through" for "amid"; and notice that Google even added the capital).

    The grammar of the line is very simple and straightforward. There is a double hypallage that has been much celebrated, but it only implies putting obscuri with the subject and sola with nocte - the only difficulty can be in the semantic level, since the epithets appear to be "exchanged".

    But grammar had nothing to do with the translation. Just "correct" the text (incidentally destroying the poetry) to mean that "on they went alone beneath the dark night through the gloom", and enter this: ibant soli obscura sub nocte per umbram, as short-sighted commentators said the line should be understood. The result is awesome:

    "they were going on his Dark throne under the shadow of the night, by"

    It's no longer Aeneas and the Sibyl going underground, but Frodo and Sam going to Mordor. If you read online Latin translations of Tolkien's Ring verse you see that "on his dark throne" appears regularly as in (/on!) solio obscuro. So much for the "corpus" :)

    But seriously, using Latin poetry may not be the best option for "statistical machine translation", because of the amount of hyperbaton and other devices you find. I fed the machine some Cicero, Seneca and Boethius but it didn't seem to recognize them, though it might have made more sense; after all, in their blog they say: "Hoc instrumentum convertendi Latinam rare usurum ut convertat nuntios electronicos vel epigrammata effigierum YouTubis intellegamus. Multi autem vetusti libri de philosophia, de physicis, et de mathematica lingua Latina scripti sunt" (emphasis added).

    ReplyDelete
  6. But seriously, using Latin poetry may not be the best option for “statistical machine translation”, because of the amount of hyperbaton and other devices you find.

    Oh, definitely! You’d get in similar trouble attempting to offer Old English based on the poetic corpora, rather than the prose.

    ReplyDelete