Monday, September 28, 2009

Google Language Tools grow again

Over the past year and a half or so (most recently, here), I’ve been providing regular updates on the addition of new languages at Google’s Language Tools site. For those not yet familiar with it, Google provides automated translation between quite a wide variety of languages, easily surpassing the old Babelfish website I used to use before Google (originally owned by defunct search site, Alta Vista, and now owned by Yahoo).

Well, they’re at it again. Google has had Persian in beta testing for some time now, but I didn’t think that alone worth blogging about. But they’ve evidently just rolled out a major release, and those industrious little devils at Google are up to fifty-one languages now! This includes a few of the important oversights I and my fellow commenters mentioned the last time I wrote about this subject. Now I’m not saying we had any influence on the choices, hahae — but at least we’ve got Afrikaans, Swahili, and Albanian now. And I was really pleased to see Irish, Welsh, and Icelandic.

Here is the current list of supported languages:

Afrikaans, Albanian, Arabic, Belarusian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Filipino, Finnish, French, Galician, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Irish, Italian, Japanese, Korean, Latvian, Lithuanian, Macedonian, Malay, Maltese, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swahili, Swedish, Thai, Turkish, Ukrainian, Vietnamese, Welsh, and Yiddish.

And that’s not all. You might remember that I complained about the oversight of most of the Indian subcontinent (with only Hindi represented so far)? And you may notice this is still true in the list above. Well, Google is now offering something it’s calling the Google Translator Toolkit, to help you “create content in other languages in an easy-to-use translation editor.” Tucked away inside this new feature (which you’ll find here) is a feature for uploading and translating documents. The languages offered there include a total of sixty!

Some of these are variants and dialects (e.g., Brazilian versus European Portuguese, and simplified versus traditional Chinese), but there are also entire languages, ranging all over India: Bengali, Gujarati, Kannada, Malayalim, Marathi, Nepali, Punjabi, Tamil, and Telugu. But I experimented with them a little and couldn’t quite figure out what Google intends here (but I admit I did not RTFM ;). Certain common phrases in my test document were translated, but most weren’t. It seems that full support for these languages isn’t available yet, but in any case, this probably provide a clue as to which languages Google will be rolling out next for machine-translation.

2 comments:

  1. Do they have anything on Latin yet? That would really help me out with some of the research on medieval texts that I do.

    I did try their French translator and while it was OK for occasional words and phrases, I found that checking each word individually at Wiktionary was actually better.

    Do you think the service is better now?

    ReplyDelete
  2. Hi, Mark. I’m sorry to say they don’t have Latin at Google. I know of an English <> Latin machine translation website, but I can’t vouch for the quality (in fact, I’d venture to say it’s probably dreadful, but I haven’t tested more than a phrase or two). You’ll find it here. For simple dictionary translation, try the Perseus Project at Tufts University, here. And of course, there are a wealth of Latin resources through Google Books, Archive.org, etc.

    Machine translation is still not quite ready for prime time, though it’s getting better. It can definitely help you read websites in foreign languages, but it’s not very good for producing original compositions in the target language (and still worse for actual research or genuinely accurate translation).

    ReplyDelete