Speech to speech translation with translatotron: A state of the art review

Kala R. Jules, Adetiba Emmanuel, Abayomi Abdultaofeek, Oluwatobi E. Dare, Ifijeh H. Ayodele

Speech to speech translation with translatotron: A state of the art review

dc.contributor.author	Kala R. Jules, Adetiba Emmanuel, Abayomi Abdultaofeek, Oluwatobi E. Dare, Ifijeh H. Ayodele
dc.date.accessioned	2025-10-31T20:21:04Z
dc.date.issued	2025-10-20
dc.description	This article provides a comprehensive review of Translatotron models. • It explores the architecture, innovations, and performance of Translatotron models compared to traditional cascaded systems. • Compares Translatotron models to other S2ST models, and presents it as a potential candidate for African Language translation.
dc.description.abstract	A speech-to-speech translation using cascade-based methods has been considered a benchmark for a very long time. Still, it is plagued by many issues, like the time to translate a speech from one language to another and compound errors. These issues are because cascade-based methods use a combination of other methods, such as speech recognition, speech-to-text transcription, text-to-text translation, and finally, text-to-speech transcription. Google proposed Translatotron, a sequence-to-sequence direct speech-to-speech translation model that was designed to address the issues of compound errors associated with cascade-based models. Today, there are 3 versions of the Translatotron model: Translatotron 1, Translatotron 2, and Translatotron 3. Translatotron 1 is a proof of concept to demonstrate direct speech-to-speech translation. This first approach was found to be less effective than the cascade model, but it was producing promising results. Translatotron 2 was an improved version of Translatotron 1 with results similar to the cascade-based model. Translatotron 3, the latest version of the model, significantly improves the translation and is better than the cascade model at some points. This paper presents a complete review of speech-to-speech translation using Translatotron models. We will also show that Translatotron is the best model to bridge the language gap between African Languages and other well-formalized languages.
dc.description.sponsorship	The authors are grateful to Google for funding this work through the Google Academic Research Award to EA. Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, is also acknowledged for the FEDGEN HPC infrastructure through a World Bank ACE Impact grant administered by the Nigerian National University Commission, and for also providing support for funding of this publication.
dc.identifier.issn	2590-1230
dc.identifier.uri	https://dspace.summituniversity.edu.ng/handle/123456789/175
dc.language.iso	en
dc.publisher	Elsevier B.V.
dc.relation.ispartofseries	28 (2025) ; 107780
dc.subject	Translatotron
dc.subject	Speech-to-speech
dc.subject	BLEU
dc.subject	Cascade
dc.title	Speech to speech translation with translatotron: A state of the art review
dc.type	Article

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Speech to speech translation with translatotron_A state of the art review.pdf
Size:: 1.45 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed to upon submission
Description:

Download

Collections

Computer Science