Implementing Mel-Spectrogram Analysis for Emotion Recognition in Speech

Rishi Ahuja

New Delhi, India


Vol: 13, Issue: 4, 2023

Receiving Date: 2023-08-22 Acceptance Date:


Publication Date:


Download PDF


Emotion classification from speech and text is becoming increasingly important in artificial intelligence (AI). A more comprehensive framework for speech emotion recognition must be established to encourage and improve human-machine interaction. Since machines can't now accurately categorize human emotions, models for machine learning development were explicitly developed for this use. Around the world, many researchers are working to increase the accuracy of emotion classification algorithms. To create a speech emotion detection model for this study, two processes are involved: (I) managing and (ii) classifying. Feature selection (FS) was used to find the most relevant feature subset. An extensive range of diverse vision-based paradigms were used to meet the increasing demand for precise emotion categorization throughout the AI technology industry, considering how vital feature selection is. This research approach addresses the difficulty of classifying emotions and the development of machine learning and deep learning techniques. This previously mentioned work focuses on voice expression analysis and offers a paradigm for improving human- computer interaction by developing a prototype cognitive computing system to classify emotions. The research aims to increase this similar precision, for example, in voice, by utilizing feature selection techniques and, more recently, a variety of deep learning methodologies, most notably TensorFlow. A study further emphasizes how vital component selection is in developing robust machine learning algorithms for the classification of emotions.

Keywords: emotion recognition; Artificial Intelligence; deep learning framework


  1. Babak Joze Abbaschian, Daniel Sierra-Soa, “Speech emotion recognition”, MDPI publications, Sensors, 21(4), 1249 (2021)
  2. Hao Ming, Tianhao Yang “Speech emotion recognition from 3D Log-Mel Spectrograms with Deep Learning Network and with methods”, IEEE Publications, Volume 5, pages 1215-1221 (2019)
  3. Wisha Zehra, Abdul Rehman Javed, “Cross corpus multi-lingual speech emotion recognition using ensemble learning”, Springer Nature publications, volume 7, pages1845– 1854 (2021)
  4. Eva Lieskovska, MichalChmulik, “Speech emotion recognition using deep learning and attention mechanism”, MDPI publications, Electronics 10(10), 1163 (2021)
  5. J Ancilin, “Improved speech emotion recognition with Mel frequency magnitude coefficient”, Elsevier publications, Applied Acoustics 10.1016 108046 (2021)
  6. Ziping Zhao, Qifei Li, “Combining a parallel 2D CNN with a self-attention Dilated Residual Network for CTC-based discrete speech emotion recognition”, Elsevier publications, Neural Networks 10.1016 (2021)
  7. Prabhav Singh, KPS Rana, “A multimodal hierarchical approach to speech emotion recognition from audio and text”, Elsevier publications, Knowledge-Based Systems 10.1016 107316 (2021)
  8. Youngja Nam, Chankyu Lee, “title Cascaded Convolutional Neural Network Architecture for Speech Emotion Recognition in Noisy Conditions”, mdpi publications, Sensor Networks 21(13), 4399 (2021)
  9. Siddique Latif; Rajib Rana, “Survey of Deep Representation Learning for Speech Emotion Recognition”, IEEE publications, 10.1109/TAFFC.2021.3114365 (2021)
  10. Mustaqeem, Soonil Kwon, “Optimal feature selection speech emotion recognition”, Wiley publications, 10.1002/int.22505 (2021)
  11. Yuan, Jiahong, Xingyu Cai, Renjie Zheng, Liang Huang, and Kenneth Church. 'The role of phonetic units in speech emotion recognition.' arXiv preprint arXiv:2108.01132 (2021).
  12. Ntalampiras, Stavros. 'Speech emotion recognition via learning analogies.' Pattern Recognition Letters 144 (2021): 21-26.
  13. Ali, Hasimah, Muthusamy Hariharan, Sazali Yaacob, and Abdul Hamid Adom. 'Facial emotion recognition using empirical mode decomposition.' Expert Systems with Applications 42, no. 3 (2015): 1261-1277.
  14. Liu, Zhen-Tao, Min Wu, Wei-Hua Cao, Jun-Wei Mao, Jian-Ping Xu, and Guan-Zheng Tan. 'Speech emotion recognition based on feature selection and extreme learning machine decision tree.' Neurocomputing 273 (2018): 271- 280.
  15. Ragot, Martin, Nicolas Martin, Sonia Em, Nico Pallamin, and Jean-Marc Diverrez. 'Emotion recognition using physiological signals: laboratory vs. wearable sensors.' In Advances in Human Factors in Wearable Technologies and Game Design: Proceedings of the AHFE 2017 International Conference on Advances in Human Factors and Wearable Technologies, July 17-21, 2017, The Westin Bonaventure Hotel, Los Angeles, California, USA 8, pp. 15-22. Springer International Publishing, 2018.

Disclaimer: All papers published in IJRST will be indexed on Google Search Engine as per their policy.