A system converting textual information into speech is usually denoted as a TTS (Text-To-Speech) system. The design of this system varies depending on its purpose and platform requirements. In this thesis a TTS synthesizer designed for an embedded system operating on an arbitrary vocabulary has been evaluated and partially implemented in Matlab, constituting a base for further development. The focus is on the speech generation part, which involves the conversion from phonetic notation into synthetic speech.
The chosen TTS system is the so called Time Domain-PSOLA, which convincingly suits the implementation and platform requirements. It concatenates segments of recorded speech and changes its prosodic characteristics with the Pitch Synchronous Overlap and Add (PSOLA) technique. The segment size is from the mid point of one phone to the mid point of the next, referred to as a diphone.
The quality of the generated synthesized speech is rather satisfying for the test sentences applied. Some disturbances still occur as a consequence of mismatches, such as different spectral properties of the segments and pitch detection errors, but with further developing a reduction of these can be performed.
Author: Hammarstedt, Linnea
Source: Lulea University of Technology
Download Link: Click Here To Download This Report (PDF)
Reference URL: Visit Now