Text-to-Speech Synthesis: A Complete System for the Slovenian Language

Jerneja Gros, Nikola Pavešić, France Mihelič


A text-to-speech system, capable of synthesising continuous Slovenian speech from an arbitrary input text is described. The text-to-speech system is based on the concatenation of basic speech units, diphones, using the TD-PSOLA technique, and no special hardware is required. The input text is transformed into its spoken equivalent by a series of the modules. The modules, constituting the text-to-speech system are described in detail. Special attention is paid to segmental duration determination, where the effect of speaking rate on phone duration is widely studied. Finally, the results of output speech quality assessment are given in terms of acceptability and intelligibility.


text-to-speech synthesis, diphone concatenation, prosody modelling, grapheme-to-phoneme conversion, Slovenian language

Full Text:


Creative Commons License
This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.

