Welcome to Microsoft Pri0: That's Microspeak for top priority, and that's the news and observations you'll find here from Seattle Times reporter Sharon Chan.
July 30, 2008 10:58 AM
Posted by Benjamin J. Romano
I didn't have space in today's print story to share an update on Microsoft's translation efforts, which have been in the market -- quietly -- for nearly a year. The Microsoft Translator group is somewhat unique in that it's still part of the Microsoft Research team, even though it's a live product. That's also why it was on display at the Microsoft Research Faculty Summit yesterday.
Microsoft is going up against the established online translator -- Babel Fish, which Yahoo owns -- and Google Translate.
Lane Rau, marketing manager for Microsoft Translator, demonstrated a side-by-side Web page translator that could be particularly useful to people with familiarity, but not total comfort, in another language. The "bilingual viewer" shows an original Web page on one half of the screen and a translated version on the other half. Holding the mouse cursor over a block of text highlights it on both pages so you can compare the translated text with the original.
I've played around with Babel Fish and Google Translate a bit to see if they can do the same. If they can, it wasn't obvious to me.
Another view of Microsoft Translator shows the translated page in full and provides the original text in a small box when you hover the cursor over a text block.
It looks like a great tool for people learning a new language, and Rau said the service does get used that way.
But the purpose of the bilingual viewer is to make machine translation usable now, even as researchers continue to improve it.
"You can compare sentence by sentence," Rau said. "So, what that means is if there is an error, for example my name is Lane and that could be translated as street in another language ... you can go, oh, that was a name, so that's why that mistake happened. And you can actually understand it a lot better."
Microsoft is refining what's called statistical machine translation. It starts with millions of parallel sentences in language pairs gathered in a database. When a new sentence is entered for translation, the system looks through the database to select the most likely meaning or grammatical structure.
"It takes a ton of data -- millions of sentences in parallel," Rau said.
The company is relying on its huge archive of professionally translated software documentation and also exploring other sources such as World Health Organization translations, she said.
Statistical machine translation differs from rule-based translation, which relies on exhaustive sets of rules for each language, manually entered by humans. Babel Fish uses rule-based translation, powered by Systran, a French company.
Microsoft Translator also uses Systran for standard, non-technical translations in some languages, Rau said. "We're actually working on replacing that right now," she added.
Systran was previously used by Google Translate, but according to unofficial reports that surfaced last fall, Google switched off Systran and was using its own statistical machine translation.
Most users still find plenty to fault with machine translation, be it statistical or rules-based.
Both Microsoft and Google see major benefits in the statistical route.
"It takes a long time to develop a good enough database of rules," Rau said. "The nice thing about statistical translation is, once we get to that quality level -- we're working on that right now -- it actually can continue to improve and you can scale it across many more languages because you don't have to have the humans typing in manually different rules."
Right now, Microsoft Translator offers about a dozen language pairs.
The company hasn't done much marketing around its offering yet, focusing instead on improving the quality and integrating it into other products, such as Live Search. "We're working on integrating it into Office," Rau said.