High Quality Speech Coding using Combined Parametric and Perceptual Modules

A novel approach to speech coding using the hybrid architecture is presented. Advantages of parametric and perceptual coding methods are utilized together in order to create a speech coding algorithm assuring better signal quality than in traditional CELP parametric codec. Two approaches are discussed. One is based on selection of voiced signal components that are encoded using parametric algorithm, unvoiced components that are encoded perceptually and transients that remain unencoded. The second approach uses perceptual encoding of the residual signal in CELP codec. The algorithm applied for precise transient selection is described. Signal quality achieved using the proposed hybrid codec is compared to quality of some standard speech codecs.





References:
[1] Yang M., Low bit rate speech coding, IEEE Potentials, vol. 23, no. 4, pp.
32-36, 2004.
[2] Kulesza M., Szwoch G., Czyżewski A., Improving signal quality in
speech codec using hybrid perceptual-parametric algorithm, Multimedia
and Network Information Systems- 06, Wrocław, (submitted for
publication).
[3] Ritz C. H., Lossless wideband speech coding, 10th International
Conference on Speech Science and Technology, Sydney, Australia,
December 2004.
[4] Dong H., Gibson J.D., Structures for SNR scalable speech coding, IEEE
Transactions on speech and audio processing, (accepted and to appear)
May 2006.
[5] Verma T.S., Levine S.N., Meng T.H., Transient Modeling Synthesis: a
flexible analysis/synthesis tool for transient signals. International
Computer Music Conference, Greece, 1997.
[6] Chu W.C., Speech Coding Algorithms. Foundation and Evolution of
Standardized Coders, John Wiley & Sons, Hoboken 2003.
[7] Goldberg R., Riek L., A Practical Handbook of Speech Coders, CRC
Press, Boca Raton 2000.
[8] Kliewer J., Mertins A., Audio subband coding with improved
representation of transient signal segments, Proc IX European Signal
Processing Conference (EUSICPO-98), Rhodes, Greece, September
1998, pp. 1245-1248.
[9] Babu V. S., Malot A. K., V. M. Vijayachandran V.M., Vinay M. K.,
Transient Detection for Transform Domain Coders, AES 116th
Convention, Berlin, May 2004.
[10] ISO / IEC 14496-3:2001 Information technology - Generic coding of
moving pictures and associated audio information: Part 3: Advanced
Audio Coding (AAC).
[11] OGG Vorbis Specification: http://xiph.org/vorbis/
[12] Painter T., Spanias A., Perceptual Coding of Digital Audio, Proceedings
of IEEE, vol. 88, pp. 451-513, April 2000.
[13] Opticom, Opera your digital ear, User manual, version 3.5, 2002.