A new sinusoidal model based engine for FESTIVAL TTS system which performs the DSP (Digital Signal Processing) operations (i.e. converting a phonetic input into audio signal) of a diphone-based TTS concatenative system, taking as input the NLP (Natural Language Processing) data (a sequence of phonemes with length and intonation values elaborated from the text script) computed by FESTIVAL is described.
The engine aims to be an alternative to MBROLA and makes use of SMS (“Spectral Modeling Synthesis”) representation, implemented with the CLAM (C++ Library for Audio and Music) framework.