Deploying Real-Time Speech Recognition on ESP32 Using TinyML and Edge Impulse

The emergence of Tiny Machine Learning (TinyML) has enabled real-time on-device inference on ultra-low-power microcontrollers, eliminating reliance on cloud computing while significantly reducing latency, power consumption, and bandwidth requirements. This study explores the deployment of a TinyML-based speech recognition system on an ESP32 microcontroller, leveraging Edge Impulse for model development, Mel-Frequency Cepstral Coefficients (MFCCs) for feature extraction, and TensorFlow Lite for Microcontrollers (TFLM) for efficient inference. The model was trained on a curated subset of the Google Speech Commands Dataset, incorporating background noise augmentation to enhance robustness in real-world environments. Using Edge Impulse’s EON Compiler, the model was fully quantized and optimized, achieving a 37% reduction in RAM usage and 27% in ROM. The final model attained 87.14% accuracy on testing data and 97.1% average classification confidence during real-time inference, with excellent noise rejection (99.6%) and latency of 266 ms. Compared to state-of-the-art systems deployed on more powerful platforms, the proposed approach achieves competitive accuracy while maintaining real-time inference and minimal resource consumption on ultra-low-power hardware. This makes it particularly suitable for battery-powered IoT, robotics, and embedded automation applications where connectivity and energy efficiency are critical. By balancing performance and efficiency, this research highlights the viability of deploying speech recognition systems on constrained microcontrollers. Future work will explore advanced architectures and enhanced feature extraction strategies to further improve recognition accuracy, especially for short or phonetically similar commands. ©The authors ©Springer.

Subjects

TinyML

Speech Recognition

Edge Impulse

TensorFlow Lite

ESP32

MFCC

Neural Networks

IoT

Energy-Efficient AI

License

Acceso Restringido

URL License

https://creativecommons.org/licenses/by-nc-sa/4.0/

How to cite

González, M., Gutiérrez, S., Espinosa, R., Ponce, H. (2025). Deploying Real-Time Speech Recognition on ESP32 Using TinyML and Edge Impulse. In: Martínez-Villaseñor, L., Martínez-Seis, B., Pichardo, O. (eds) Artificial Intelligence – COMIA 2025. COMIA 2025. Communications in Computer and Information Science, vol 2552. Springer, Cham. https://doi.org/10.1007/978-3-031-97907-1_17