Friday, May 9, 2008

New languages at Google — not quite ready for prime time

For some time now, Google has offered a suite of online language tools, including automatic translation for text and web pages, dictionaries, and more. The idea was to compete with the tried and true Babelfish, about the only useful surviving piece of the once-great Internet search giant Altavista. For most of its life so far, Google Translation hasn’t offered very much more than Babelfish: rather clunky, but certainly serviceable, translation between a dozen or so languages. Recently, however, Google broadened its offerings.

Like most of the new widgets coming out of Google on what seems like a daily basis, there was little in the way of fanfare. I made the discovery accidentally, in fact, when I happened to drop by there earlier this week. The front-page interface (linked above) looked the same, but I noticed eleven new languages — Arabic, Bulgarian, Croatian, Czech, Danish, Finnish, Hindi, Norwegian, Polish, Romanian, and Swedish. This brings Google’s total to a whopping 22 different languages, double what Babelfish offers (though Babelfish still offers two forms of Chinese script: traditional and simplified). It used to be that I had to resort to obscure sites for languages like Finnish and Polish. No longer!

But how accurate is it? For its more well-heeled languages, Google is pretty good. As I said above, clunky but serviceable. But for the newer ones, there are some significant problems. Testing out Hindi, I made quite an alarming discovery, in fact.

I don’t speak Hindi myself (though I am learning some from my friend Arun). He volunteered to help me check a few simple phrases. You can probably guess what I tried — the names of some of Tolkien’s books. Perhaps I thought to test these out in Hindi because there is, so far, no Hindi translation of The Lord of the Rings or even The Hobbit. For “The Lord of the Rings”, Google was right on: अंगूठियों का मालिक. But add a full-stop to the phrase, and Google tacks a है onto the end that simply shouldn’t be there. “The Two Towers” also had problems. In the translation suggested by Google, इस दो टॉवर, the first word you see there doesn’t belong; to get the correct translation, दो टॉवर, you have to strip the definite article off of the phrase you feed to Google. Finally, and this is the alarming discovery I mentioned: just try “The Fellowship of the Ring” — without hesitation, Google gives back: राजा की वापसी. What’s the problem? This actually means “The Return of the King” in perfect Hindi!

How could this have happened? My guess is that somebody used the feature where Google allows human beings to suggest their own “better translations” — but whomever supplied this one probably made a bad copy-paste. Tsk, tsk. The service is bound to improve, and it is free, but at the moment, it appears rather unreliable. My friend Arun helped me identify many problems with the Hindi, and I would suppose the other new languages have their share of beta problems as well. Perhaps other readers will test out the Scandinavian and Slavic additions. I know I’ve got Polish readers — how is it?


  1. I myself noticed the feature today and was surprised to see Bulgarian on the list of new languages added. I was quite impressed actually as certain phrases and even whole (short) sentences were correctly translated. Of course, this type of automatic translation is bound to be inanely innacurate. However, it is still a very useful tool if you want to get a general idea of what a webpage is about. I use this type of services for Asian languages mostly, but its nice to have the option, right?

  2. Yes, absolutely. It’s not too useful if your aim is to produce quality text in another language, but if you just need to get a sense of the content of a text, or to look up a few words you may not know, then it’s very helpful indeed. Usually. ;)

  3. So quiet around here...

  4. Yes, sorry about that! I’ve been busy at work, and I’ve got a cold. *sniffle* And of course, there’s the Memorial Day holiday here in the U.S. I’ve been meaning to write up a couple of new posts, but haven’t managed to find the time yet. But don’t despair, and keep checking back. I promise to resume your regularly scheduled Lingwë broadcast in the next few days.