Wednesday, June 25, 2008

Chinese — radically different from English!

Okay, perhaps that’s a rather obvious statement. Ah, but then again, even radical means something else when applied to the Chinese language — or more specifically, to its orthography. And even though I have mentioned Chinese from time to time, the language is still largely a mystery to me. Allow me to explain.

For those who have been reading about my exploration of Tolkien’s Flammifer of Westernesse in translation (Part One and Part Two), I had something in front of me which I had no idea how to approach — the Chinese translation from earlier this decade. Footnote: the translator’s name is Lucifer Chu. How perfect is that? :) But what do I mean that I had no idea? Well, when it comes to brute force translation — i.e., looking up one word at a time in a dictionary — it helps to know how to use the dictionary. We tend to take for granted the fact that being able to look up a word efficiently implies some knowledge of the sequence of letters in the alphabet. Occasionally, some confusion can arise (e.g., do you put ð after d or after t?), but for the most part, this is the least of one’s worries when attempting translations of western languages. Even foreign scripts like the Greek, Cyrillic, and Hebrew alphabets can be memorized, and in any event, the Latin alphabet is clearly related to these scripts. It also helps that, in addition to a set order, alphabets contain a manageable number of distinct symbols.

But Chinese? How exactly does one look up Chinese ideograms in a dictionary?! I mean, there are thousands of them! The Kangxi Dictionary, in fact, records almost 50,000! Nowadays, one can use any number of machine translation services (e.g., Google Translate and Altavista Babelfish) — but these require you to be able to type the Chinese characters*, or at least cut and paste them. What do you do when what you have in front on you is a printed book?

This is the sort of thing to which I was referring when I wrote recently that I had no ‘foothold’ for approaching Chinese, so I decided to learn more about Chinese dictionaries. Apparently, you’re supposed to identify the “radical” — that is, the key graphical element of the character in question, then locate the radical among the headwords in the dictionary. But how exactly do you recognize the radical?

It’s supposedly easy, but I’m not so sure. Take 升, for instance. The radical, it turns out, is that little stroke at the top left. But why isn’t it the vertical stroke on the right, which also happens to be an actual radical? Why isn’t it the horizontal stroke, which is also an actual radical? And even learning the radicals is a challenge — there are about 100 of them!

To a native speaker (writer) of Chinese, I’m sure all of this is pretty self-apparent, but to me? Not so much. What about you? Here, try this. Follow this link for a demonstration of looking up ideograms in a Chinese dictionary, courtesy of Yale University, and tell me you don’t find it confusing as hell. And now imagine following this procedure for every single character in the text you want to translate. Insanity seems inevitable, illiteracy preferable.

So, in the end, despite a fascinating — if necessarily much abbreviated — crash course in Chinese orthography, I tackled the problem at hand from a different angle. I cheated. Yes, that’s right. I cheated. No one dropped me a line to help (I get more traffic from Egypt than from China), so I had to take matters into my own hands.

What I wanted was to get the Chinese characters comprising the last few lines of the Song of Eärendil onto my computer’s clipboard so that I could simply paste them into a machine-translator. Lazy and clever — who knew it was possible! :) How to do it? OCR, of course. With a little Googling, I found a freeware Chinese OCR utility which I have to say is really quite excellent — especially for the price, hahae. The utility, with the original scan on the left and the selectable post-OCR text on the right, is pictured above. The tool is slightly tricky to use, because you have to highlight each character manually, then select from the several suggestions the program makes — and if a computer program can’t recognize the characters automatically, what hope do I have? But with a little human assistance, the utility recognizes more than 10,000 characters!

So what happened with the Chinese rendition of Flammifer? I ran the text through Google Translate and Babelfish — and with the caveat that I may have made one or two mistakes in the semi-manual OCR process — here’s a more or less combined version of the two outputs (and I’ll include the selectable Chinese so that you can play around with it if you like):

上古岁月化尘土,
游孑生命永无极.
月华昏昏时光逝,
天上人间永别离.
明灯出白韦斯特内西,
光明使者无止息.

Ancient years of dust,
Yu Jie [?] lives forever limitless.
The lunar crown obscure time passes,
The heaven on earth parting the sky.
West of White Light to the West,
The bright messenger does not stop.
It’s somewhat inscrutable and sounds a bit like haiku, but between the two machine translations, it looks like we’re on the right track, or at least getting close. I’d still like to know how a native speaker would translate this. Machine translation is clearly showing its limits here, but “White Light to the West” and “bright messenger” certatinly seem to approach the ideas of the Flammifer and Eärendil. But Lunar-crown-forever-limitless help me if I ever try to learn Chinese! And for native Chinese speakers, I’m sure learning English is just as great a challenge. :)


* Typing Chinese ideograms is not so difficult as it sounds, apparently, judging by the millions of text messages sent using Chinese characters every day. I heard just a little bit about this on NPR recently — in the aftermath of the earthquake in Szechuan Province.

23 comments:

  1. How do the Chinese type their characters? I have a friend who lives in Japan and speaks Japanese, and they use the Latin characters (which they call "Romanji") to type, and then the symbol that equals the word they type appears. They apparently use 3 different coding systems! That way, for the many loanwords, they have a way to write it. Andy says that in a Japanese newspaper, he sees 3 different coding systems used in the same sentence! Crazy to think about.

    ReplyDelete
  2. That’s basically how it works in Chinese too. SMS devices work the same way computer keyboards do, as summarized here in this article from the New York Times (subscription required) — ironically, the article is also about how computer use is eroding people’s native facility with writing Chinese by hand:

    “Most computer users in China rely on a transliteration system called pinyin to render Chinese characters from a standard Roman keyboard. [...] For example, to write “Beijing,” which means “north capital,” in Chinese:

    1. Type ‘bei’ in pinyin. The software presents a list of characters fitting the pronunciation ‘bei’ in a toolbar.

    2. The correct answer is No. 4, for ‘north.’ When selected, the character appears on the screen.

    3. Type ‘jing.’ The software presents a list of characters fitting the pronunciation ‘jing.’

    4. The appropriate ‘jing’ is not among the first 10 choices, so the user must scroll to the next 10 choices.

    5. The appropriate character, meaning ‘capital,’ is No. 1. Choose it in the toolbar. The full word is rendered on the screen.”

    There are some variations and optimizations, including sometimes being able to type an entire word, rather than one syllable at a time, but that’s basically it. Sounds tedious, I know, but it didn’t stop the Chinese from sending millions of text messages in the wake of the recent earthquake.

    ReplyDelete
  3. There are other ways of entering Chinese in SMS (I know because I am implementing a few...).

    For example, there are many Chinese phone pads that have different strokes on them (top to bottom, diagonal, left to right, etc.) You press the key for the direction of the stroke as you would draw the character (length of the drawn stroke doesn't matter) and it is able to figure out (or offer a few choices) for the character you could draw with that combination of pen strokes.

    T9 and Q9 are example technologies that support this. For examples, see

    http://www.qcode.com/eng/demos/demos.php3

    An interesting topic for me! (Work related, family related, and Tolkien related all at once.)

    ReplyDelete
  4. Thanks for sharing your work, Jeremy! Fascinating stuff. The need to offer multiple input methodologies is probably another clue to just how different Chinese is from western languages.

    ReplyDelete
  5. It is interesting how the computer is "eroding" writing Chinese by hand. I had asked my friend in Japan about that very same thing, knowing that calligraphy was at one time a very high art. While he said that there are people who practice it, it is not as important anymore. Kinda sad, really.

    ReplyDelete
  6. It is definitely a loss. However, that being said, losing a collection of ideograms in favor of the Latin alphabet needn’t be the end of calligraphy. The Vietnamese, for example, create the most amazing calligraphy out of their Latin-based alphabet. Looking at it, you’d think it was ideographic or at least a syllabary (like Korean). Here’s the first example I could find.

    ReplyDelete
  7. Yeah, that is beautiful!

    ReplyDelete
  8. I'm impressed with the effort you've put into this project! The idea of translating into Chinese makes my brain hurt.

    I recently received an e-mail offering to publish my Master's Thesis (on the works of American Mystery Writer Ellery Queen) in Chinese. The person who made the offer (in very poor English, I might add) wants to translate it, post it on Chinese web sites, and will send me copies of any books or magazines it may be published in. (Doesn't that just make your copyright alarm go off?)

    I have contacted the families of the Ellery Queen authors for advice before I make any decisions. If I do decide to have it translated into Chinese, I think I will hire a translator whose skills I trust rather that muddle my way through 172 pages of translation. I sure don't want it to sound like the horrible examples on Engrish.com!

    If you have any suggestions or advice, please let me know!

    ReplyDelete
  9. Hi Cat Bastet! Catching up on some of my recent posts, I see. Welcome back ...

    I’m impressed with the effort you’ve put into this project! The idea of translating into Chinese makes my brain hurt.

    Thanks! Often it’s more a question of ingenious laziness than bona fide effort. Although I guess that takes effort as well. :)

    I recently received an e-mail offering to publish my Master’s Thesis (on the works of American Mystery Writer Ellery Queen) in Chinese. The person who made the offer (in very poor English, I might add) wants to translate it, post it on Chinese web sites, and will send me copies of any books or magazines it may be published in. (Doesn't that just make your copyright alarm go off?)

    Yes, the inquiry seems questionable on a couple of different levels. First, as you say, the copyright question — it sounds as if you could easily lose control of your intellectual property here. Of course, if somebody wants to translate and distribute a pirate copy of your work in another language and country, there’s probably no stopping them; so it may be a good sign that the interested parties contacted you at all. But second, the fact that the self-nominated translator can’t write well in English would make we worry s/he isn’t very likely to understand your thesis well enough to translate it. Caveat lector, I would say, in that case. I’m reminded of the episode of News Radio where Jimmy James had his management book translated into Japanese and then back into English from Japanese, from which he gave a public reading. Talk about Engrish! ;)

    I have contacted the families of the Ellery Queen authors for advice before I make any decisions. If I do decide to have it translated into Chinese, I think I will hire a translator whose skills I trust rather that muddle my way through 172 pages of translation. I sure don't want it to sound like the horrible examples on Engrish.com!

    Yes, a prudent course. This is exactly what I’d do in your place. Whatever happens, it’s flattering that somebody’s this interested, isn’t it? :)

    ReplyDelete
  10. Hi Cat Bastet! Catching up on some of my recent posts, I see. Welcome back ...

    Thanks! I'm having fun reading all your posts. :)

    I’m reminded of the episode of News Radio where Jimmy James had his management book translated into Japanese and then back into English from Japanese, from which he gave a public reading. Talk about Engrish! ;)

    OMG RTFL! I forgot about that one! Yup, that's my greatest fear of this possible project.

    Whatever happens, it’s flattering that somebody’s this interested, isn’t it? :)

    No kidding! I'm still surprised and flattered. Of course, I'm still surprised there are EQ fans in China. It boggles my mind.

    ReplyDelete
  11. No kidding! I’m still surprised and flattered. Of course, I’m still surprised there are EQ fans in China. It boggles my mind.

    If there’s one thing I’ve learned from writing Lingwë, it’s that there are fans of just about anyone just about anywhere. I get visitors from the most surprising places (e.g., Uzbekistan, Algeria, Pakistan, Malaysia, Nigeria, et al.). Not a lot of visitors, but some. :)

    ReplyDelete
  12. I get visitors from the most surprising places (e.g., Uzbekistan, Algeria, Pakistan, Malaysia, Nigeria, et al.). Not a lot of visitors, but some. :)

    Same here! I'm so curious that I use my Site Meter to see how they found me.

    ReplyDelete
  13. Very cool -- thanks! I just signed up. It will be fun to compare the Google Analytics data to the Site Meter data. (God, does that make me sound geeky or what?! :)

    ReplyDelete
  14. God, does that make me sound geeky or what?! :)

    Join the club, hahae. After you’ve had a chance to compare the two, drop me an email and let me know what you think. I’m not going to go ahead and sign up for Site Meter now, but if you give it the upper hand in a fair fight, maybe I will. It has its work cut out for it, though: I’m a big fan of Google Analytics.

    ReplyDelete
  15. Hello, I am a Malaysian second-language speaker of Chinese, and I am quite amused at what has been discussed so far.
    The strokes of Chinese characters are written in a certain order, and that is the basis for identifying the radical when none are 'obvious'. The order of the strokes is typically from top-left to bottom-right. In 升, for example, the stroke order is the top-left stroke, the horizontal stroke, the curved stroke on the left, and then the vertical stroke on the right. In the case of a word like 吗, the radical is obviously (at least to me) the 口 shape on the left.
    You made two mistakes with your 'semi-manual OCR'. '孑' should be '子' (thus accounting for the mysterious 'Yu Jie'; '游子' actually means 'person far from home') and '白' should be '自' (白 means white, so that was an unfortunate mistake that accounts for 'White Light to the West'; '自' actually means 'from').
    Correcting those two mistakes and running the text through the Google Translate again gives this:

    Ancient days of dust,
    Polar wandering life forever.
    Yuet Wah intoxication time passed away,
    Heaven and earth never leave.
    Light from 韦斯特内西,
    No stop light messenger.

    'Yuet Wah' is obviously Google's translation of '月华' (which actually means 'moonlight'). '韦斯特内西' is a transliteration of Westernesse; its Pinyin is Wéisītènèixī. But the Google translation is too imperfect to be used, so here I give my own. (I am not an expert, so I do not guarantee it is without errors.)

    From ancient times the years turn to dust,
    But the life of the person far from home is forever without extremes.
    The moonlight fades and the time passes,
    Heaven and the world forever part.
    A bright light appears from Westernesse,
    The herald of light never rests.

    Referring to the original song in The Lord of the Rings, I can tell that 'ancient times' means the Elder Days, 'Heaven and the world forever part' is the translation of 'and tarry never more on Hither Shores where mortals are', and 'herald' had mysteriously moved from the fourth from last line to the last line. I think the Chinese translator did a good job. The person managed to make the translation almost as cryptic as the English original.☺
    As for what you said about Chinese speakers learning English, I think I must comment that as I see it, it is harder for a native Chinese speaker to learn English than it is for a native English speaker to learn Chinese. I am therefore glad I am the latter.

    ReplyDelete
  16. Thanks so much for the explanations and the translation. Very helpful indeed! Chinese has always seemed rather alien to me, in part, I suppose, because I’ve never put a lot of time into trying to learn much. So your point of view here is really valuable.

    ReplyDelete
  17. There is one thing I would like to ask. (Yes, I am the same person as before.) If you do not know Chinese, how did you know which part of the book is Bilbo's song of Eärendil? By chapter number perhaps…
    I must add that I do not read much Chinese poetry so I may be unfamiliar with the style. Chinese grammar relies on a strict word order, which is not possible in poetry. Here is the word-by-word translation (ignoring English grammar) — see what you make of it.

    Ancient times years change dust,
    Person far from home life forever without extreme.
    Moonlight faint faint time pass,
    Heaven the world forever leave.
    Bright light appear (or come) from 韦斯特内西 (Westernesse),
    Light messenger (or herald) without stop rest.

    It also interests me that the transliteration of Westernesse ends in 西, the Chinese character for west, even though other characters would do. I think that was done purposely.

    ReplyDelete
  18. Yes, I found the poem merely by knowing where in the book to look, and then recognizing the long sequence of shorter lines of verse. Thanks for these additional details, and I too find it very illuminating to see that 西 was tacked onto the end of the word. So, that doesn’t affect the reading of the word for a Chinese speaker?

    ReplyDelete
  19. As I wrote before, the Pinyin of 韦斯特内西 is Wéisītènèixī, and this would be approximately pronounced as way-se-te-nay-see. I would prefer the pronunciation way-se-te-nay-se, and would use a character like 丝 (sī) to represent the last syllable. But transliteration of names et cetera into Chinese is usually very rough. Your name would probably be transliterated as 杰胜•非舍 (Jiéshèng•Fēishě: approximately dzye (ending in 'yeah')-sheng fay (like 'fade')-sher (without fully pronouncing the R)). The second syllable ends in NG even though there are characters pronounced 'shen'. 杰胜 is the usual transliteration of Jason. (I got creative with Fisher.) But I agree with Tolkien that names should not be 'translated'. I have a Chinese name myself. Even in French Michelangelo is 'Michel-Ange', Christopher Columbus is 'Christophe Colomb', et cetera. 孔子's (Kǒngzǐ) so called Latinized name is Confucius!

    If you still have the book I was wondering if you could tell me what Lucifer Chu did with the Elvish languages, did he leave them alone, at least?

    ReplyDelete
  20. LO, I’m so glad you came across my post; this is such great information you’re providing! As for how the Chinese translator handled the Elvish, I’m certainly in no position to comment. I would hope he merely transliterated them; however, even that (it appears from your comments) is a very inexact process. Send me an email (my email address is in my profile, above), and I can share what I’ve got. Then maybe you can share some of your findings with us here.

    ReplyDelete
  21. I do not think the translator transliterated the Elvish. Chinese does not use an alphabet so transliteration instead uses the characters with (approximately) the closest sounds, which, of couse, have their own meanings, but the meanings have to be ignored in transliteration. (Imagine trying to use English words with the closest pronunciation to represent the pronunciation of an Irish sentence.) Therefore, transliteration is only used for proper names ('奥林匹克运动会' (Àolínpǐkè Yùndònghuì) for 'Olympic Games' ('运动会' means 'games'), '爱迪生' (Àidíshēng) for 'Edison', et cetera) and loanwords ('吉他' (jítā) for 'guitar', '沙发' (shāfā) for 'sofa', et cetera. But many recent loanwords are not transliterated, such as 'email' (although there is '电子邮件'), 'Internet' (although there is '因特网'), 'DVD' (although there is '直观装置'), et cetera.)
    I think it is most likely that the translator somehow left out the Elvish quotations (maybe he translated them).

    ReplyDelete
  22. Hmm! Well, if you want to see for youself, and then share your findings with us, just send me an email. :)

    ReplyDelete