Raw Audio vs. Spectrograms: Advancing Emotion Recognition from Speech
June 1 - Aug 7 (proposed: could be flexible)
This project will investigate how machine learning models can recognize human emotions from speech. Two approaches will be examined:
- processing raw audio signals
- using spectrograms that convert sound into image form
Models will be developed and evaluated for both methods, providing hands-on experience in audio processing, image-based representation, and end-to-end learning.
The purpose is to identify which approach captures emotional cues more accurately and efficiently. The work will highlight how different data representations influence model performance and advance speech emotion recognition for real-world applications in education, healthcare, customer service, and human–AI interaction.
Student Learning Outcomes
Students will develop machine learning models for speech emotion recognition using raw audio and spectrograms, including data preprocessing, model training, and evaluation. The project provides hands-on experience with HPC resources, builds skills in Python, deep learning, and audio- and image-based classification, and concludes with manuscript writing and publication experience.
This work is supported by a 2026 Gonzaga Research Opportunity.
