automatic translation – ThursdayAgain

If you’re a language teacher now is the time to consider a move to a new career, maybe pig farming or becoming a trappist monk; anything where talking to people is not necessary – avoiding all those unpleasant memories & broken dreams. Almost-real-time speech conversion from one language to another has arrived. Microsoft Research demonstrated not only how to convert spoken English into Mandarin with just a few seconds’ delay, but also how to output that Mandarin speech with the rhythms & intonations of the original speaker. The technology was demonstrated by Microsoft’s research chief Rick Rashid in Tjianjin, China on 25th October (as part of the ill-starred Windows 8 & “Surface tablet” launches) but the news initially got lost in the bear-fight about responsibility for the general “Asian Launches Cock-up”.

Rashid said a few English sentences into the MR’s new speech-recognition, translation & generation system & reports suggest that the Mandarin output stunned a crowd of 2000 academics.

The system’s “whizz-bang” capability stems from a series of improvements throughout the speech-to-speech process. Software like Dragon has after many years of effort, at last begun to make inroads, & create opportunities, for speech recognition in offices & the next generation of tools based on it, like Apple’s Siri, recognizes spoken questions & search for answers on the web. Microsoft’s Kinect has also recently had a speech interface added.

While such systems fail in handling words at an average rate of around 20% MR’s trick is to use a neural-networking heuristics system that reduces word-recognition errors to around 12%. That means the translation engine (Bing Translate) has a far better chance of creating intelligible Mandarin input to feed into the speaking engine.

But the “goddam!!” factor is the generation of Mandarin speech in a voice recognizably like that of the speaker’s: if you can preserve the speaker’s vocal rhythms & intonations in the translation, their meaning (it is claimed) will be more apparent & the conversation will be more effective. This was achieved for the Tjianjin presentation by having Rashid work with a machine-learning algorithm for an hour, rather than the more usual recitation of a standard text that software like Dragon asks for.

Just think of how many wars will start once we can all understand exactly what each politician really said!