
Research Article
Lightweight Keyword Spotting with Inter-Domain Interaction and Attention for Real-Time Voice-Controlled Robotics
@ARTICLE{10.4108/airo.7877, author={Hien Vu Pham and Thuy Phuong Vu and Huong Thi Nguyen and Minhhuy Le}, title={Lightweight Keyword Spotting with Inter-Domain Interaction and Attention for Real-Time Voice-Controlled Robotics}, journal={EAI Endorsed Transactions on AI and Robotics}, volume={4}, number={1}, publisher={EAI}, journal_a={AIRO}, year={2025}, month={3}, keywords={TinyML, Speech Commands, Channel Attention, Keyword Spotting}, doi={10.4108/airo.7877} }
- Hien Vu Pham
Thuy Phuong Vu
Huong Thi Nguyen
Minhhuy Le
Year: 2025
Lightweight Keyword Spotting with Inter-Domain Interaction and Attention for Real-Time Voice-Controlled Robotics
AIRO
EAI
DOI: 10.4108/airo.7877
Abstract
This study introduces a novel lightweight Keyword Spotting (KWS) model optimized for deployment on resource-constrained microcontrollers, with potential applications in robotic control and end-effector operations. The proposed model employs inter-domain interaction to effectively extract features from both Mel-frequency cepstral coefficients (MFCCs) and temporal audio characteristics, complemented by an attention mechanism to prioritize relevant audio segments for enhanced keyword detection. Achieving a 93.70% accuracy on the Google Command v2-12 commands dataset, the model outperforms existing benchmarks. It also demonstrates remarkable efficiency in inference speed (0.359 seconds) and resource utilization (34.9KB peak RAM and 98.7KB flash memory), offering a 3x faster inference time and reduced memory footprint compared to the DS-CNN-S model. These attributes make it particularly suitable for real-time voice command applications in low-power robotic systems, enabling intuitive and responsive control of robotic arms, end-effectors, and navigation systems. In this work, however, the KWS model is demonstrated in a simple non-destructive testing system for controlling sensor movement. This research lays the groundwork for advancing voice-activated robotic technologies on resource-limited hardware platforms.
Copyright © 2025 Hien Vu Pham et al., licensed to EAI. This is an open access article distributed under the terms of the CC BY-NC-SA 4.0, which permits copying, redistributing, remixing, transformation, and building upon the material in any medium so long as the original work is properly cited.