Deploying Real-Time Speech Recognition on ESP32 Using TinyML and Edge Impulse
Journal
Artificial Intelligence – COMIA 2025 : 17th Mexican Congress, Mexico City, Mexico, May 12–16, 2025, Proceedings, Part I
ISSN
1865-0929
1865-0937
Publisher
Springer Nature Switzerland
Date Issued
2025
Author(s)
Gutiérrez, Sebastián
Type
text::book::book part
Abstract
The emergence of Tiny Machine Learning (TinyML) has enabled real-time on-device inference on ultra-low-power microcontrollers, eliminating reliance on cloud computing while significantly reducing latency, power consumption, and bandwidth requirements. This study explores the deployment of a TinyML-based speech recognition system on an ESP32 microcontroller, leveraging Edge Impulse for model development, Mel-Frequency Cepstral Coefficients (MFCCs) for feature extraction, and TensorFlow Lite for Microcontrollers (TFLM) for efficient inference. The model was trained on a curated subset of the Google Speech Commands Dataset, incorporating background noise augmentation to enhance robustness in real-world environments. Using Edge Impulse’s EON Compiler, the model was fully quantized and optimized, achieving a 37% reduction in RAM usage and 27% in ROM. The final model attained 87.14% accuracy on testing data and 97.1% average classification confidence during real-time inference, with excellent noise rejection (99.6%) and latency of 266 ms. Compared to state-of-the-art systems deployed on more powerful platforms, the proposed approach achieves competitive accuracy while maintaining real-time inference and minimal resource consumption on ultra-low-power hardware. This makes it particularly suitable for battery-powered IoT, robotics, and embedded automation applications where connectivity and energy efficiency are critical. By balancing performance and efficiency, this research highlights the viability of deploying speech recognition systems on constrained microcontrollers. Future work will explore advanced architectures and enhanced feature extraction strategies to further improve recognition accuracy, especially for short or phonetically similar commands. ©The authors ©Springer.
License
Acceso Restringido
How to cite
González, M., Gutiérrez, S., Espinosa, R., Ponce, H. (2025). Deploying Real-Time Speech Recognition on ESP32 Using TinyML and Edge Impulse. In: Martínez-Villaseñor, L., Martínez-Seis, B., Pichardo, O. (eds) Artificial Intelligence – COMIA 2025. COMIA 2025. Communications in Computer and Information Science, vol 2552. Springer, Cham. https://doi.org/10.1007/978-3-031-97907-1_17
