Affine-invariant visual features contain supplementary information to enhance speech recognition
AUDIO- AND VIDEO-BASED BIOMETRIC PERSON AUTHENTICATION, PROCEEDINGS, cilt.2091, ss.175-181, 2001 (SCI-Expanded)
- Yayın Türü: Makale / Tam Makale
- Cilt numarası: 2091
- Basım Tarihi: 2001
- Dergi Adı: AUDIO- AND VIDEO-BASED BIOMETRIC PERSON AUTHENTICATION, PROCEEDINGS
- Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED)
- Sayfa Sayıları: ss.175-181
- Çukurova Üniversitesi Adresli: Hayır
Özet
The performance of audio-based speech recognition systems degrades severely when there is a mismatch between training and usage environments due to background noise. This degradation is due to a loss of ability to extract and distinguish important information from audio features. One of the emerging techniques for dealing with this problem is the addition of visual features in a multimodal recognition system. This paper presents an affine-invariant, multimodal speech recognition system and focuses on the supplementary information that is available from video features.