Text-to-Speech Synthesis: A Complete System for the Slovenian Language

Jerneja Gros; Nikola Pavešić; France Mihelič

Text-to-Speech Synthesis: A Complete System for the Slovenian Language

Jerneja Gros, Nikola Pavešić, France Mihelič

Abstract

A text-to-speech system, capable of synthesising continuous Slovenian speech from an arbitrary input text is described. The text-to-speech system is based on the concatenation of basic speech units, diphones, using the TD-PSOLA technique, and no special hardware is required. The input text is transformed into its spoken equivalent by a series of the modules. The modules, constituting the text-to-speech system are described in detail. Special attention is paid to segmental duration determination, where the effect of speaking rate on phone duration is widely studied. Finally, the results of output speech quality assessment are given in terms of acceptability and intelligibility.

Keywords

text-to-speech synthesis, diphone concatenation, prosody modelling, grapheme-to-phoneme conversion, Slovenian language

Full Text:

PDF

This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.

Username
Password
Remember me

Journal of Computing and Information Technology