Meta launches SeamlessM4T AI to translate, transcribe 100+ languages

In an effort to make AI better at understanding different languages, the tech giant Meta has developed a new AI model called SeamlessM4T. According to Meta, the new AI can translate and transcribe more than 100 languages in both text and speech forms.

SeamlessM4T is a big step forward when it comes to AI-powered speech-to-speech and speech-to-text features. Meta explains in a blog post that their single model is an instant translation that allows people who speak different languages to communicate effectively without the need for a separate language identity model.

SeamlessM4T uses Meta’s Perceptron AI text-to-text machine translation model and the Universal Speech Translator, which is known for being one of the few straight speech-to-speech translation systems that can translate Hokkien.

This new idea builds on Massively Multilingual Speech, which is Meta’s framework for speech recognition, language identification, and speech synthesis across a wide range of over 1,100 languages.

Meta’s efforts are remarkable, but others are also developing advanced AI translation and transcription systems. Amazon, Microsoft, OpenAI, and startups offer commercial and open-source services.

Google’s Universal Speech Model

Google is also working on the Universal Speech Model, which aims to understand the most commonly spoken languages in the world. Mozilla’s Common Voice project aims to create a diverse collection of voices that can be used to train automatic speech recognition algorithms. SeamlessM4T stands out because it is an attempt to combine translation and transcription into a single model.

Meta says that 4 million hours of speech and “tens of billions” of sentences of freely available text from the internet were used to make it. In a conversation with the reporter, Juan Pino, a research scientist at Meta’s AI research division and one of the project’s partners, did not say where the data came from, only that it came from “a variety” of places.

But Meta says that most of the data it mined came from open sources and wasn’t copied. The company admits that it may contain information that could be used to identify an individual. Meta used the text and speech that were scraped to make the SeamlessAlign training data set for SeamlessM4T.

Researchers matched up 443,000 hours of speech with texts and made 29,000 hours of “speech-to-speech” alignments. This taught SeamlessM4T how to record speech-to-text, translate text, make speech from text, and even translate words spoken in one language into words spoken in another language.

About SeamlessM4T

The current state-of-the-art voice transcription model performed worse than SeamlessM4T against background sounds and “speaker variations” in speech-to-text tasks on an internal benchmark, according to Meta. Meta believes the training data set’s speech and text data gives SeamlessM4T an advantage over speech-only and text-only models.

According to Meta, SeamlessM4T outperforms the current state-of-the-art voice transcription model in background noise and speaker tone in an internal benchmark. SeamlessM4T outperforms models that use only speech or text due to the large amount of speech and text data in the training dataset, according to Meta.

In a blog post, Meta said SeamlessM4T advances universal multitask AI systems and delivers cutting-edge outcomes. It’s worth considering model biases.

The Conversation reported on AI-powered translation flaws, including gender bias. In the Proceedings of the National Academy of Sciences, leading speech recognition algorithms were twice as likely to mistranscribe black speakers as white ones.

The tech firm claims that SeamlessM4T doesn’t translate hazardous text, a common error in translation and generative AI text models. In Bengali and Kyrgyz, the approach produces more culturally and socioeconomically detrimental variations. In general, SeamlessM4T is more hazardous with religious and sexual orientation literature.

The SeamlessM4T public demo filters communication for toxicity, according to Meta. This filter is not set in the open-source model.

Problems with translation by AI

If AI is used too much, it could make languages less rich. AI translation results in “translationese” because it can’t be customized like human translation can. AI can make translations more accurate but less diverse.

Because of this problem, Meta recommends not using SeamlessM4T for long or official translations, which are recognised by government agencies and translation bodies.

Meta says that you shouldn’t use SeamlessM4T for medical or legal purposes to avoid misunderstandings. This is important because AI language mistakes have led to mistakes by law enforcement. A mistake in translating a text message led police to wrongly accuse a Kurdish man of helping terrorists, and another mistake led them to search the wrong car, which ended the investigation.

The accuracy of AI translation can be improved, but limiting variation may come at a cost. Be careful with AI-powered devices, especially when you’re in a sensitive position.

Meta launches SeamlessM4T AI to translate, transcribe 100+ languages

Google’s Universal Speech Model

About SeamlessM4T

Problems with translation by AI

More posts

Seesaw buys Little Thinking Minds to boost Arabic literacy in MENA region

France’s AFD backs Africa’s digital finance inclusion with fresh €3 million

Ethiopia’s new AI system to enhance transparency and efficiency in freight transport

VFD Microfinance Bank rebounds from 2023 loss with N366.6m profit in 2024