fairMultilingualSubtitle

Title: A Look at GUI (Graphical User Interface i.e. type and click, no coding) Driven Multilingual Subtitle Translation and Editing with Open Source Software.

Author: Shradha Mukherjee

Location: Planet Earth

Nicknames: Shredder, a supervillain for villains and Goldy, a superhero for heroes.

Copyright © 2023 onwards fairwissenschaft and Shradha Mukherjee. All rights reserved.

Article:

This project article and accompanying video’s audio, and subtitles will be available in seven languages, namely, English, French, German, Spanish, Hindi, Arabic and Chinese (Simplified). For the audio and translation, open source resources that allow commercial usage, are used, through SubtitleEdit web or install version https://www.nikse.dk as follows:
(1) My original voice recording is converted to subtitle text using VOSK speech to text accessed via SubtitleEdit.
(2) Translation from English to French, German, Hindi, Arabic and Chinese (Simplified) done automatically using LibreTranslate API open source, Google Translate API and Baidu Translate API. Google Translate API and Baidu Translate API, though not open source, as per the Google and Baidu’s online documentation has no restriction if user wants to use the translated material commercially.
(3) After translation, subtitles have automation errors of software name translation. Example Apple is translated to Apfel in German, because its not only name of company but also name of fruit. To fix this, bilingual subtitles, of the original English and German translation are created in same subtitle track. This combing of the two to create bilingual joint subtitle track of English and German is done using SubtitleEdit.
(4) This same method used to translate English to German, is also used to translate English to French, Hindi, Arabic and Chinese.
(5) The Piper and VOSK get downloaded and used locally so work without internet, but the Translate APIs require internet access to work. Therefore, using SubtitleEdit web interface or installing it on computer both ways require online communication, so its not completely private.
(6) SubtitleEdit can be installed on Windows using executive (exe) file and on Linux OS Canonical Ubuntu using mono/wine method.
(7) AI voice over generated using Piper text to speech, accessed via SubtitleEdit. Please do not use Microsoft voices because as per their documentation user is not allowed to use their voices commercially without buying commercial license from them. Presently I am using Speechlo, https://speechelo.com, which in its Pro version allows user to use generated TTS voiceovers commercially. They allow pause tag but do not have duration tag, nor do they directly do subtitle file to audio. However, they have great cost effective voices so to calculate pauses from subtitle (srt) file, I first used SubtitleEdit to export srt file in a custom format which I them imported into excel to calculate pause and duration by subtraction of srt timestamps. Thereafter, I loaded the text with pause in Speechelo to get an output which is off by only 3 sec form srt to audio format, this 2 sec error occurs in one spot in the end (the spot where I had not inserted pause because there was nothing to subtract from) which was corrected in waveform view in Blender open source software. Speechelo does not have option to provide line by line duration (screenshots below). Duration is useful for fine tuning and using SSML aware software like Amazon Polly, which user can use in pay-as-you-go manner and use the result commercially.
Presently, I have demonstrated GUI method for speech to text, text to speech, translation and audio editing. I will make a tutorial for this under Computer Basic, while for doing it using computational coding in Python, I will make tutorial in Computer Advanced. GUI methods are the foundation of fairwissenschaft, because GUI demonstrate the code in action visually and builds foundation for understanding the less visually coding methods behind the GUI. Doing coding before GUI is like putting the ‘cart before the horse’, it does not work. In summary the journey of fairwissenschaft is from GUI to coding, because as the saying goes “if you can’t explain it, you don’t know it”, GUI is the explanation behind the code. This is how I have learnt and applied in my knowledge work, and presently with fairwissenschaft I want to teach by the same method. Example Blender open source software, not only has a GUI interface but also has a scripting interface where you can write python codes, and also its cross-platform making it interoperable, so I choose it for video editing.

SubtitleEdit Software: Conversion of my original English audio or voice .mp3 file to English text subtitle (srt) file.
SubtitleEdit Software: Conversion of text subtitle (srt) file to audio or voice of Piper libritts generating .mp3 file.
SubtitleEdit Software: English text subtitle (srt) file to German text subtitle (srt) file.
SubtitleEdit Software: Join to create bilingual English and German text subtitle (srt) file.
SubtitleEdit Software: Export German text subtitle (srt) file in custom format so that can calculate pause time and duration time in Office excel file.
Office Software, here WPS Office: Import custom German text subtitle (srt) file into spreadsheet (excel) and format time as hh:mm:ss.000 to then do subtraction and calculate pause time, and duration optionally. Duration is useful for using SSML aware software like Amazon Polly.
Speechelo Pro: The text and pause duration from the spreadsheet (excel) file is copied into Speechelo online web app for generating audio or voiceover in mp3. Speechelo does not have option to provide line by line duration. This is shown for German, and similarly subtitles form my own voice can be used here to generate audio so that I don’t need to use my original voice. Speechelo pause tag is, [sPause sec=00:00:00.120 ePause], here 0.120sec is the pause time.
Blender Software: Here comparison of two separate audio files are shown read by different voices, one from Piper Thorsten where Thorsten read my voice to srt file; and another, as discussed above, from Speechelo Hannah where Hannah read my voice to srt file with pauses only without duration. There, is only pause mismatch of 2sec at the end, which is easily corrected in editing with Blender open source software.