High-Individuality Voice Conversion Based on Concatenative Speech Synthesis

Concatenative speech synthesis is a method that can make speech sound which has naturalness and high-individuality of a speaker by introducing a large speech corpus. Based on this method, in this paper, we propose a voice conversion method whose conversion speech has high-individuality and naturalness. The authors also have two subjective evaluation experiments for evaluating individuality and sound quality of conversion speech. From the results, following three facts have be confirmed: (a) the proposal method can convert the individuality of speakers well, (b) employing the framework of unit selection (especially join cost) of concatenative speech synthesis into conventional voice conversion improves the sound quality of conversion speech, and (c) the proposal method is robust against the difference of genders between a source speaker and a target speaker.