Text-to-Speech Synthesis: A Complete System for the Slovenian Language
Abstract
A text-to-speech system, capable of synthesising continuous Slovenian speech from an arbitrary input text is described. The text-to-speech system is based on the concatenation of basic speech units, diphones, using the TD-PSOLA technique, and no special hardware is required. The input text is transformed into its spoken equivalent by a series of the modules. The modules, constituting the text-to-speech system are described in detail. Special attention is paid to segmental duration determination, where the effect of speaking rate on phone duration is widely studied. Finally, the results of output speech quality assessment are given in terms of acceptability and intelligibility.
Keywords
text-to-speech synthesis, diphone concatenation, prosody modelling, grapheme-to-phoneme conversion, Slovenian language
Full Text:
PDFThis work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.