Abstract:
This research illustrates the first Text to Speech system for the Hadiyyisa language. The main
speech technology target is to create communication between human being and machine.
Speech synthesis is performing diverse roles in today's modern human activities like assist
disabled and in the telecommunication sector. Hadiyyisa language is one of Ethiopian local
language which from Cushitic group and use English like alphabets and additionally some
different characters. This research work focused on Statistical parametric speech synthesis
based on HMM techniques was chosen for this study because it is a model-based system that
requires little storage, has a short run time, and is simple to integrate with small handheld de vices. The process of converting input text into an acoustic waveform is divided into several
stages, each with its own set of functional components. The synthesizer is divided into two
parts training and testing. Speech source and excitation parameters are derived from a speech
database during the training process. Ergodic hidden Markov model used to automatically
segment the speech and phonetic transcriptions. The input text is processed to shape phonetic
strings, along with the qualified models, during the testing step. Finally, the voice has been
synthesized is created from speech parameters. In order to train the system being developed
composed four hundred sentences and speeches. The system use tenfold cross validation rule
effective training method for the system training set consisting of 90% of the total training
set selected randomly, with the remaining 10% used as a hold out set for validation. In this
study mean opinion score (MOS) to test the intelligibility and naturalness of synthesized
speech and Mel cepstral distortion (MCD) techniques used as objective (experimental) eval uation. We evaluated the effectiveness for text to speech and found that the proposed method
can generate more natural spectral parameters and F0 the score above 70%. As objective
evaluation the MCD score is 5.6 and as subjective evaluation based on intelligibility and nat uralness synthesized speech using MOS testing techniques results a score of 3.06 and 2.62
correspondingly