Speech to speech translation with translatotron: A state of the art review
| dc.contributor.author | Kala R. Jules, Adetiba Emmanuel, Abayomi Abdultaofeek, Oluwatobi E. Dare, Ifijeh H. Ayodele | |
| dc.date.accessioned | 2025-10-31T20:21:04Z | |
| dc.date.issued | 2025-10-20 | |
| dc.description | This article provides a comprehensive review of Translatotron models. • It explores the architecture, innovations, and performance of Translatotron models compared to traditional cascaded systems. • Compares Translatotron models to other S2ST models, and presents it as a potential candidate for African Language translation. | |
| dc.description.abstract | A speech-to-speech translation using cascade-based methods has been considered a benchmark for a very long time. Still, it is plagued by many issues, like the time to translate a speech from one language to another and compound errors. These issues are because cascade-based methods use a combination of other methods, such as speech recognition, speech-to-text transcription, text-to-text translation, and finally, text-to-speech transcription. Google proposed Translatotron, a sequence-to-sequence direct speech-to-speech translation model that was designed to address the issues of compound errors associated with cascade-based models. Today, there are 3 versions of the Translatotron model: Translatotron 1, Translatotron 2, and Translatotron 3. Translatotron 1 is a proof of concept to demonstrate direct speech-to-speech translation. This first approach was found to be less effective than the cascade model, but it was producing promising results. Translatotron 2 was an improved version of Translatotron 1 with results similar to the cascade-based model. Translatotron 3, the latest version of the model, significantly improves the translation and is better than the cascade model at some points. This paper presents a complete review of speech-to-speech translation using Translatotron models. We will also show that Translatotron is the best model to bridge the language gap between African Languages and other well-formalized languages. | |
| dc.description.sponsorship | The authors are grateful to Google for funding this work through the Google Academic Research Award to EA. Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, is also acknowledged for the FEDGEN HPC infrastructure through a World Bank ACE Impact grant administered by the Nigerian National University Commission, and for also providing support for funding of this publication. | |
| dc.identifier.issn | 2590-1230 | |
| dc.identifier.uri | https://dspace.summituniversity.edu.ng/handle/123456789/175 | |
| dc.language.iso | en | |
| dc.publisher | Elsevier B.V. | |
| dc.relation.ispartofseries | 28 (2025) ; 107780 | |
| dc.subject | Translatotron | |
| dc.subject | Speech-to-speech | |
| dc.subject | BLEU | |
| dc.subject | Cascade | |
| dc.title | Speech to speech translation with translatotron: A state of the art review | |
| dc.type | Article |
Files
Original bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- Speech to speech translation with translatotron_A state of the art review.pdf
- Size:
- 1.45 MB
- Format:
- Adobe Portable Document Format
License bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- license.txt
- Size:
- 1.71 KB
- Format:
- Item-specific license agreed to upon submission
- Description: