Concatenative Syllable-Based Text to Speech System for Kipsigis Language

A Kipsigis Concatenative Syllable-based Text to speech system is presented in this document. A text-to-speech (TTS) system must be capable of automatically producing speech by storing small segments of speech and slicing and re-splicing them when required. There are two basic methods of speech synthesis namely; Formant (rule) based and Concatenative (dictionary) based synthesis. Concatenative speech synthesis  uses most commonly used words in the audio database. Concatenative based synthesis has the drawback of large database size as each word needs to be stored. But syllable-based speech system generates more number of words based on very small database. Different syllables can form new words, hence the original database is not large. Soft cutting  of syllables gives the ‘from’ & ‘to’ location of sample numbers of syllables and then these locations can be used in the database.

     This Kipsigis TTS system was developed through creating waveforms by concatenating parts of natural speech recorded by Kipsigis professional speaker. In this system all the acoustically and perceptually significant sound variations in the Kipsigis language were recorded then all Kipsigis syllables are created so that they are played back each time the system synthesises speech. Kipsigis language speakers agree that the developed system is suitable for use. Some improvement on the sound signal can be made

