About | Contact Us | Register | Login
ProceedingsSeriesJournalsSearchEAI
airo 25(1):

Research Article

Lightweight Keyword Spotting with Inter-Domain Interaction and Attention for Real-Time Voice-Controlled Robotics

Download135 downloads
Cite
BibTeX Plain Text
  • @ARTICLE{10.4108/airo.7877,
        author={Hien Vu Pham and Thuy Phuong Vu and Huong Thi Nguyen and Minhhuy Le},
        title={Lightweight Keyword Spotting with Inter-Domain Interaction and Attention for Real-Time Voice-Controlled Robotics},
        journal={EAI Endorsed Transactions on AI and Robotics},
        volume={4},
        number={1},
        publisher={EAI},
        journal_a={AIRO},
        year={2025},
        month={3},
        keywords={TinyML, Speech Commands, Channel Attention, Keyword Spotting},
        doi={10.4108/airo.7877}
    }
    
  • Hien Vu Pham
    Thuy Phuong Vu
    Huong Thi Nguyen
    Minhhuy Le
    Year: 2025
    Lightweight Keyword Spotting with Inter-Domain Interaction and Attention for Real-Time Voice-Controlled Robotics
    AIRO
    EAI
    DOI: 10.4108/airo.7877
Hien Vu Pham1, Thuy Phuong Vu1, Huong Thi Nguyen1, Minhhuy Le1,*
  • 1: Phenikaa (Vietnam)
*Contact email: leminhhuy8886@gmail.com

Abstract

This study introduces a novel lightweight Keyword Spotting (KWS) model optimized for deployment on resource-constrained microcontrollers, with potential applications in robotic control and end-effector operations. The proposed model employs inter-domain interaction to effectively extract features from both Mel-frequency cepstral coefficients (MFCCs) and temporal audio characteristics, complemented by an attention mechanism to prioritize relevant audio segments for enhanced keyword detection. Achieving a 93.70% accuracy on the Google Command v2-12 commands dataset, the model outperforms existing benchmarks. It also demonstrates remarkable efficiency in inference speed (0.359 seconds) and resource utilization (34.9KB peak RAM and 98.7KB flash memory), offering a 3x faster inference time and reduced memory footprint compared to the DS-CNN-S model. These attributes make it particularly suitable for real-time voice command applications in low-power robotic systems, enabling intuitive and responsive control of robotic arms, end-effectors, and navigation systems. In this work, however, the KWS model is demonstrated in a simple non-destructive testing system for controlling sensor movement. This research lays the groundwork for advancing voice-activated robotic technologies on resource-limited hardware platforms.

Keywords
TinyML, Speech Commands, Channel Attention, Keyword Spotting
Received
2024-11-19
Accepted
2025-03-11
Published
2025-03-18
Publisher
EAI
http://dx.doi.org/10.4108/airo.7877

Copyright © 2025 Hien Vu Pham et al., licensed to EAI. This is an open access article distributed under the terms of the CC BY-NC-SA 4.0, which permits copying, redistributing, remixing, transformation, and building upon the material in any medium so long as the original work is properly cited.

EBSCOProQuestDBLPDOAJPortico
EAI Logo

About EAI

  • Who We Are
  • Leadership
  • Research Areas
  • Partners
  • Media Center

Community

  • Membership
  • Conference
  • Recognition
  • Sponsor Us

Publish with EAI

  • Publishing
  • Journals
  • Proceedings
  • Books
  • EUDL