Artificial intelligence now allows us to recover lost languages
Posted on April 1, 2021
An artificial intelligence system developed by the MIT Laboratory of Computer Science and Artificial Intelligence (CSAIL) aims to decipher missing languages and learn more about the people who spoke them.
The algorithms they have designed can automatically decipher a lost language using only a few thousand words (far less data than is commonly used to train algorithms).
The records that have survived to our days of the languages ??that have been lost throughout history are so minimal (often lacking traditional separators like blanks and punctuation) that previous technologies are not capable of deciphering them. Traditional automatic translators, like Google's, are only efficient at translating words between two languages ??that are still being used: their way of working is useless if we want to infer information from languages ??that have already disappeared.
And when a language is lost, the entire body of knowledge about the people who spoke it is also lost!
Aware of this situation, Professor Regina Barzilay and her team of MIT researchers have spent years working on an artificial intelligence system capable of deciphering these languages, without the need for knowledge about their relationship with other recent languages.
This system can also determine the relationships between languages ??on its own. The team applied their algorithm to Spanish, comparing it with Basque and with other candidates from the Romance, Germanic, Turkish and Uralic families. As a result, they found that although Basque and Latin were closer to Spanish than other languages, they were still too different to be considered related. This conclusion corroborates recent studies that suggest that Spanish is not related to Basque: the AI ??system reached the same conclusion as humans.
How exactly does it work?
The programmers used basic notions of classical linguistics, such as the fact that languages ??evolve in predictable ways, to train algorithms. For example, when pronunciation varies, certain sound substitutions are more likely to occur than others. For example, a word with a p in the main language can evolve to a b in the descending language, but the change to a k is less likely because of how different both letters sound. With this and other basic instructions as a starting point, they developed a system capable of analysing transformations from little training data, which was key because there is very little data on the languages ??to be recovered.
This project is the continuation of a study published last year in which the same researchers managed to decipher the dead languages ??of Ugaritic and Linear B. The latter began to be used around 1400 BC and it took humans decades to decipher it. However, a key difference with that project was that the team knew that these languages ??were related to the earliest forms of Hebrew and Greek, respectively. With the new system, the algorithm also infers the relationship between the languages.
Source: El País
Leave a Comment:
Daily Dose of Positivity