Google Translate, the go-to reference when it comes to translating words online, relies on artificial intelligence to suggest translations and lets users suggest improved translations for words and sentences. According to Wikipedia, Google Translate helps over 500 million people translate more than 100 billion words – every day. In this post, I explain what’s behind Google Translate and highlight some gender biases I found while looking for translations of professions from English to Spanish.
Why is translation so important?
Translation helps humans from different cultures understand one another, expands our access to knowledge and helps form bonds and collaborate. In The Bible, God creates languages to confuse the humans that were working together in building a tower so high that would reach the sky. In science fiction, the Universal Translator of Star Trek and the Babel Fish of “The Hitchhiker’s Guide to the Galaxy” make contacts with the inhabitants of distant planets possible.
How does Google Translate work?
Google Translate started in 2006 by using a method called Statistical Machine Translation, a technique that relies on large corpora of translated text and applies statistics to predict which translation is the best for the input phrase. As a fun fact, the first corpus of data to extract translations were official translations from the United Nations and the EU Parlament!
Since November 2016, translations are done using the Google Neural Machine Translation algorithm. This addresses complete sentences (rather than looking for individual word translations) and is more context-aware.
Google Translate also feeds on online information to recommend ways to express concepts that are present in the Internet, and uses the validations of the translations collected via the Google Translate Community Initiative.
There is a smaller Google Crowdsource community that contributes translations using an Android app.
As with a lot of systems that rely on Artificial Intelligence to automate tasks, the system works really well but it is not perfect. For example, due to the use of EU texts in its origins, translations from EU languages into English works best, and languages with little written expression, like some of the African languages supported in Google Translate, present the poorest results.
Gender in language
Different languages have different effects of gender in words.
In English, substantives do not change with gender. In Spanish, however, the ending of most substantives changes depending on wether we are talking about a male or a female. This means, in practice, that there are two words in Spanish for words like doctor, as we use doctor for the masculine form and doctora for the feminine form.
Google Translate gets this right a lot of the time, but has a gender bias for some professions that are predominantly formed by people of one of the sexes. Next, there are some examples and an analysis of each of the instances.
I found a beautiful example in a longer post at The Atlantic about machine powered translations that’s worth sharing to illustrate what the problem is with this:
“I began my explorations very humbly, using the following short remark, which, in a human mind, evokes a clear scenario:
“In their house, everything comes in pairs. There’s his car and her car, his towels and her towels, and his library and hers.“
The translation challenge seems straightforward, but in French (and other Romance languages), the words for “his” and “her” don’t agree in gender with the possessor, but with the item possessed. So here’s what Google Translate gave me:
Dans leur maison, tout vient en paires. Il y a sa voiture et sa voiture, ses serviettes et ses serviettes, sa bibliothèque et les siennes.
The program fell into my trap, not realizing, as any human reader would, that I was describing a couple, stressing that for each item he had, she had a similar one. For example, the deep-learning engine used the word “sa” for both “his car” and “her car,” so you can’t tell anything about either car-owner’s gender. Likewise, it used the genderless plural “ses” both for “his towels” and “her towels,” and in the last case of the two libraries, his and hers, it got thrown by the final “s” in “hers” and somehow decided that that “s” represented a plural (“les siennes”). Google Translate’s French sentence missed the whole point.”
This is a great example of the limited understanding of Google Translate and the issue that gender represents in getting translations right.
Two genders suggested
Google Translate correctly suggests that there are two possible translations for this word depending on the gender of the subject for professions like hairdressers, doctors or designers.
Some professions are translated as female by default, including nurse, kindergarten teacher, or ballet dancer.
In other cases the only suggested translation is the masculine version: School teacher, mechanics and software developers.
The Impact of bias
Imagine the following situation: A company based in the USA is looking to outsource some of their software development. It is important for them to be able to work with people in the same time zone, so a handful of countries in Latin America are the most obvious choices. Nobody speaks Spanish at the HR department, so the team has Google Translate prepare the job description into Spanish.
Three weeks later, the CTO walks into HR. Why were no CVs from female candidates received. Upon review of the job offer (the CTO spent a year travelling Central America so he can speak it fluently) he realised the offer has explicitly left women off the call for candidates.
This topic is close to my heart and I hope you found in interesting and want to continue learning. I can recommend this long read from The economist which covers translation and machine understanding. You can also check out my post about Machine Learning for Media Professionals.