Thursday, August 12, 2010

How Google Translate Works

Google uploaded a video that explains how the Google's machine translation service works. It's fascinating to see how much the Google Translate has improved in the past 4 years and how many Google services are used in it.



Here's the full text of the video:

"Google Translate is a free tool that enables you to translate the sentences, documents and even whole websites instantly. But how exactly does it work? While it may seem like we have a room full of bilingual elves working for us, in fact all of our translations come from computers. These computers use a process called 'statistical machine translation' -- which is just a fancy way to say that our computers generate translations based on the patterns found in the large amounts of text.

But let's take a step back. If you want to teach someone a new language you might start by teaching them vocabulary words and the grammatical rules that explain how to construct sentences. A computer can learn foreign language the same way - by referring to vocabulary and a set of the rules. But languages are complicated and, as any language learner can tell you, there are exceptions to almost any rule. When you try to capture all of these exceptions, and exceptions to the exceptions, in a computer program, the translation quality begins to break down. Google Translate takes a different approach.

Instead of trying to teach our computers all the rules of a language, we let our computers discover the rules for themselves. They do this by analyzing millions and millions of the documents that have already been translated by the human translators. These translated texts come from books, organizations like the UN and websites from all around the world. Our computers scan these texts looking for statistically significant patterns -- that is to say, patterns between the translation and the original text that are unlikely to occur by the chance. Once the computer finds a pattern, it can use this pattern to translate the similar texts in the future. When you repeat this process billions of times you end up with billions of patterns and one very smart computer program. For some languages however we have fewer translated documents available and therefore fewer patterns that our software has detected. This is why our translation quality will vary by language and language pair. We know our translations aren't always perfect but by constantly providing new translated texts we can make our computers smarter and our translations better. So next time you translate a sentence or webpage with Google Translate, think about those millions of documents and billions of patterns that are ultimately led to your translation - and all of it happening in the blink of an eye."

No comments:

Post a Comment