Feature-based hybrid strategies for gradient descent optimization in end-to-end speech recognition

Dokuz Y., TÜFEKCİ Z.

MULTIMEDIA TOOLS AND APPLICATIONS, vol.81, no.7, pp.9969-9988, 2022 (SCI-Expanded) identifier identifier

  • Publication Type: Article / Article
  • Volume: 81 Issue: 7
  • Publication Date: 2022
  • Doi Number: 10.1007/s11042-022-12304-5
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, FRANCIS, ABI/INFORM, Applied Science & Technology Source, Compendex, Computer & Applied Sciences, INSPEC, zbMATH
  • Page Numbers: pp.9969-9988
  • Keywords: Speech recognition, Deep learning, Mini-batch gradient descent, Hybrid sample selection strategies, LSTM
  • Çukurova University Affiliated: Yes


With the increasing popularity of deep learning, deep learning architectures are being utilized in speech recognition. Deep learning based speech recognition became the state-of-the-art method for speech recognition tasks due to their outstanding performance over other methods. Generally, deep learning architectures are trained with a variant of gradient descent optimization. Mini-batch gradient descent is a variant of gradient descent optimization which updates network parameters after traversing a number of training instances. One limitation of mini-batch gradient descent is the random selection of mini-batch samples from training set. This situation is not preferred in speech recognition which requires training features to collapse all possible variations in speech databases. In this study, to overcome this limitation, hybrid mini-batch sample selection strategies are proposed. The proposed hybrid strategies use gender and accent features of speech databases in a hybrid way to select mini-batch samples when training deep learning architectures. Experimental results justify that using hybrid of gender and accent features is more successful in terms of speech recognition performance than using only one feature. The proposed hybrid mini-batch sample selection strategies would benefit other application areas that have metadata information, including image recognition and machine vision.