Speech to speech translation with translatotron: A state of the art review

dc.contributor.authorKala R. Jules, Adetiba Emmanuel, Abayomi Abdultaofeek, Oluwatobi E. Dare, Ifijeh H. Ayodele
dc.date.accessioned2025-10-31T20:21:04Z
dc.date.issued2025-10-20
dc.descriptionThis article provides a comprehensive review of Translatotron models. • It explores the architecture, innovations, and performance of Translatotron models compared to traditional cascaded systems. • Compares Translatotron models to other S2ST models, and presents it as a potential candidate for African Language translation.
dc.description.abstractA speech-to-speech translation using cascade-based methods has been considered a benchmark for a very long time. Still, it is plagued by many issues, like the time to translate a speech from one language to another and compound errors. These issues are because cascade-based methods use a combination of other methods, such as speech recognition, speech-to-text transcription, text-to-text translation, and finally, text-to-speech transcription. Google proposed Translatotron, a sequence-to-sequence direct speech-to-speech translation model that was designed to address the issues of compound errors associated with cascade-based models. Today, there are 3 versions of the Translatotron model: Translatotron 1, Translatotron 2, and Translatotron 3. Translatotron 1 is a proof of concept to demonstrate direct speech-to-speech translation. This first approach was found to be less effective than the cascade model, but it was producing promising results. Translatotron 2 was an improved version of Translatotron 1 with results similar to the cascade-based model. Translatotron 3, the latest version of the model, significantly improves the translation and is better than the cascade model at some points. This paper presents a complete review of speech-to-speech translation using Translatotron models. We will also show that Translatotron is the best model to bridge the language gap between African Languages and other well-formalized languages.
dc.description.sponsorshipThe authors are grateful to Google for funding this work through the Google Academic Research Award to EA. Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, is also acknowledged for the FEDGEN HPC infrastructure through a World Bank ACE Impact grant administered by the Nigerian National University Commission, and for also providing support for funding of this publication.
dc.identifier.issn2590-1230
dc.identifier.urihttps://dspace.summituniversity.edu.ng/handle/123456789/175
dc.language.isoen
dc.publisherElsevier B.V.
dc.relation.ispartofseries28 (2025) ; 107780
dc.subjectTranslatotron
dc.subjectSpeech-to-speech
dc.subjectBLEU
dc.subjectCascade
dc.titleSpeech to speech translation with translatotron: A state of the art review
dc.typeArticle

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Speech to speech translation with translatotron_A state of the art review.pdf
Size:
1.45 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed to upon submission
Description:

Collections