Categories: Technology Facts

Facebook’s new polyglot AI can translate between 100 languageson October 19, 2020 at 3:00 pm

The news: Facebook is open-sourcing a new AI language model called M2M-100 that can translate between any pair among 100 languages. Of the 4,450 possible language combinations, it translates 1,100 of them directly. This is in contrast to previous multilingual models, which heavily rely on English as an intermediate. A Chinese to French translation, for example, typically passes from Chinese to English and then English to French, which increases the chance of introducing errors.

Data curation: The model was trained on 7.5 billion sentence pairs. In order to compile a data set that large, the researchers relied heavily on automated curation. They used web crawlers to scrape billions of sentences from the web and had another language model called FastText identify the language. (They didn’t use any Facebook data.) Then they used a program called LASER 2.0, developed previously by Facebook’s AI research lab, which uses unsupervised learning–machine learning that doesn’t require manually labeled data–to match sentences across languages by their meaning.

LASER 2.0 creates what are known as “embeddings” from large, unstructured data sets of sentences. It trains on the available sentence examples within each language and maps out their relationships to one another based on how often and how close together they’re used. These embeddings help the machine-learning model approximate the meaning of each sentence, which then allows LASER 2.0 to automatically pair up sentences that share the same meaning in different languages.

Pairing languages: The researchers focused on the language combinations that they believed would be most commonly requested. They grouped languages according to linguistic, geographic, and cultural similarities, with the assumption that people who live in the same region would communicate more often. One language group, for example, included the most common languages spoken in India, including Bengali, Hindi, Tamil, and Urdu. LASER 2.0 then targeted its search for sentences pairs on all the possible language pairs within each group.

Ongoing challenges: Languages spoken in places like Africa and Southeast Asia still suffer from translation quality issues because too little language data is available to be scraped from the web, says Angela Fan, the lead researcher on the project. Given the reliance on web data, the researchers also need to figure out techniques for identifying and eradicating any embedded sexism, racism, and other discriminatory biases. Right now, the researchers have used a profanity filter to clean up some particularly egregious language, but it is mostly limited to English.

Research only: Facebook has no current plans to use the model in its products. M2M-100 is meant for research purposes only, says Fan. Ultimately, however, the goal is for the model to improve on and expand Facebook’s existing translation capabilities. Applications could include user communication (for example, the feature that allows people to translate posts into their native language) and perhaps content moderation.

Next One doctor's campaign to stop a covid-19 vaccine being rushed through before Election Dayon October 19, 2020 at 6:45 pm »

Previous « Game Theory Focuses the Hunt for Alien Civilizations Onto Just One Staron October 16, 2020 at 8:00 am

How To Enhance Your Flag’s Display at Night

Ensure your flag remains a symbol of respect by learning how to properly illuminate it…

33 minutes ago

General Knowledge

5 Common Car Repair Scams To Look Out For

Is there such a thing as headlight fluid? Find out what common car repair scams…

1 hour ago

General Knowledge

How to Use Natural Products for a Healthier Home and Animals

Explore the benefits of using natural products for your home and pets, from healthy forage…

2 hours ago

Education Facts

Shaykhi Academy – Your Ultimate Destination for Quran & Arabic Learning

Learning the Quran and Arabic is a lifelong journey that requires dedication, proper guidance, and…

9 hours ago

General Knowledge

How To Choose the Right Jet Ski Storage Option for You

Learn how to choose the right jet ski storage option for you. Explore tips and…

18 hours ago

TIPS & TRICKS

The Ultimate Guide to Selecting Shipping Boxes: Spotlight on 10x8x6 and 8x8x8 Boxes

Choosing the right shipping boxes can feel like navigating a maze, especially with so many…

1 day ago