A Review on Feature Extraction for Speaker Recognition under Degraded Conditions

DISKEN, Gokay; TÜFEKCİ, ZEKERİYA; SARIBULUT, Lutfu; ÇEVİK, ULUS

doi:10.1080/02564602.2016.1185976

A Review on Feature Extraction for Speaker Recognition under Degraded Conditions

Atıf İçin Kopyala

DISKEN G., TÜFEKCİ Z., SARIBULUT L., ÇEVİK U.

IETE TECHNICAL REVIEW, cilt.34, sa.3, ss.321-332, 2017 (SCI-Expanded)

Yayın Türü: Makale / Derleme
Cilt numarası: 34 Sayı: 3
Basım Tarihi: 2017
Doi Numarası: 10.1080/02564602.2016.1185976
Dergi Adı: IETE TECHNICAL REVIEW
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
Sayfa Sayıları: ss.321-332
Anahtar Kelimeler: Feature extraction, Identification, Speaker recognition, Verification, LINEAR PREDICTION, ROBUST SPEECH, WAVELET TRANSFORM, WORD RECOGNITION, MULTITAPER MFCC, ADDITIVE NOISE, VERIFICATION, IDENTIFICATION, COMPENSATION, COMBINATION
Çukurova Üniversitesi Adresli: Evet

Özet

Speech is a signal that includes speaker's emotion, characteristic specification, phoneme-information etc. Various methods have been proposed for speaker recognition by extracting specifications of a given utterance. Among them, short-term cepstral features are used excessively in speech, and speaker recognition areas because of their low complexity, and high performance in controlled environments. On the other hand, their performances decrease dramatically under degraded conditions such as channel mismatch, additive noise, emotional variability, etc. In this paper, a literature review on speaker-specific information extraction from speech is presented by considering the latest studies offering solutions to the aforementioned problem. The studies are categorized in three groups considering their robustness against channel mismatch, additive noise, and other degradations such as vocal effort, emotion mismatch, etc. For a more understandable representation, they are also classified into two tables by utilizing their classification methods, and used data-sets.