Text To Speech Synthesizer For Hadiyyisa Using Statistical Parametric Speech Synthesis

Demama, Desta

AUIR Home
→
Institute of Technology
→
Department of Computer Science
→
Theses and Dissertations of this Department
→
View Item

dc.contributor.author	Demama, Desta
dc.date.accessioned	2023-10-27T11:18:14Z
dc.date.available	2023-10-27T11:18:14Z
dc.date.issued	2021-10
dc.identifier.uri	http://hdl.handle.net/123456789/3145
dc.description.abstract	This research illustrates the first Text to Speech system for the Hadiyyisa language. The main speech technology target is to create communication between human being and machine. Speech synthesis is performing diverse roles in today's modern human activities like assist disabled and in the telecommunication sector. Hadiyyisa language is one of Ethiopian local language which from Cushitic group and use English like alphabets and additionally some different characters. This research work focused on Statistical parametric speech synthesis based on HMM techniques was chosen for this study because it is a model-based system that requires little storage, has a short run time, and is simple to integrate with small handheld de vices. The process of converting input text into an acoustic waveform is divided into several stages, each with its own set of functional components. The synthesizer is divided into two parts training and testing. Speech source and excitation parameters are derived from a speech database during the training process. Ergodic hidden Markov model used to automatically segment the speech and phonetic transcriptions. The input text is processed to shape phonetic strings, along with the qualified models, during the testing step. Finally, the voice has been synthesized is created from speech parameters. In order to train the system being developed composed four hundred sentences and speeches. The system use tenfold cross validation rule effective training method for the system training set consisting of 90% of the total training set selected randomly, with the remaining 10% used as a hold out set for validation. In this study mean opinion score (MOS) to test the intelligibility and naturalness of synthesized speech and Mel cepstral distortion (MCD) techniques used as objective (experimental) eval uation. We evaluated the effectiveness for text to speech and found that the proposed method can generate more natural spectral parameters and F0 the score above 70%. As objective evaluation the MCD score is 5.6 and as subjective evaluation based on intelligibility and nat uralness synthesized speech using MOS testing techniques results a score of 3.06 and 2.62 correspondingly	en_US
dc.language.iso	en	en_US
dc.publisher	Ambo University	en_US
dc.subject	Statistical Parameter Speech Synthesis	en_US
dc.subject	Text to Speech	en_US
dc.subject	Hadiyyisa	en_US
dc.title	Text To Speech Synthesizer For Hadiyyisa Using Statistical Parametric Speech Synthesis	en_US
dc.type	Thesis	en_US