What to do about languages that machines can’t translate
A lot of us consider machine translation an easy, if imperfect, go-to. But a fascinating BBC article explores what will probably come as a surprise to most of you reading this (I know it was to me): Of the 7,000 languages currently spoken in the world, only about 100 can be machine-translated.
The issue comes down to resources. Machine translators like Google Translate work by creating algorithms based on a wide range of human-generated translations and interactions. This makes translating easy (albeit sometimes of dubious accuracy) for commonly used languages like English, French, and Mandarin. But “low-resource” languages haven’t been translated into a wide enough variety of documents to generate algorithms.
In an increasingly global world with international challenges like climate change and an ongoing pandemic on our hands, this impediment to communication can be a major issue, US intelligence agent Carl Rubino points out.
Fortunately, Intelligence Advanced Research Projects Activity (IARPA), the intelligence branch that Rubino works for, may have found a solution. The group is funding a massive search engine that would allow users to seek out a term and find any related documents translated from the language in question.
Competing research teams are looking into different ways to make this happen. One team, led by Kathleen McKeown of Columbia University, is trying to use neural network technology to make the machine translation process more streamlined. McKeown believes that current AI is given far more data than necessary to create its algorithms; she points out that the average human being learning a foreign language would never use or see the amount of text currently fed to machines.
That said, despite all of that text, machine translation still isn’t flawless. Nevertheless, if the goal is to have at least simple, hopefully accurate translations of more languages, this idea of making the algorithm-generating process faster is an interesting one to watch.
Another interesting development is the increase in usable sources of speech and text for low-resource languages. This is thanks to a growing number of people posting articles, videos, text messages, and more, online in their native languages. Scott Miller, of the University of Southern California remarks that most of this content isn’t translated into any other language, but it can still be used to teach machines.
The way this works involves “pre-training” AI to become familiar with basic human language structures. Then, this knowledge can be applied to an individual language. With only a few thousand words, machines can then use what they’ve learned about that language to translate it into others.
It’s amazing how smart AI can be – but like intelligent people, sometimes that gets in the way. Take the phenomenon of “hallucinations”, when AI will insert incorrect days, dates, or numbers, because of what matches with algorithms. Scientists have learned to fix this problem by adapting the way machines gather information.
For fans of antiquity, these techniques could also be applied to ancient languages that are also low-resource.
All of these developments are exciting and promising. It will be interesting to see how they play out in the future.
The general prediction, which we’ve talked about often on this blog, is that machine translation will continue to be a helpful resource and tool, but in order to assure accuracy – especially in areas like pharma and healthcare, where that’s crucial – is to have these translations read over or monitored by human beings. Still, by this logic, any development in machine translation can contribute to making it easier to understand each other – wherever (or whenever) we’re from.
Contact Our Writer – Alysa Salzberg