Learning Consistent Embedding Distribution for Robust ASR

Zuyao Ma; Yulong Wang; Tongcun Liu; Lei Zhang; Wei Li

Machine Learning and Intelligent Communication. 8th EAI International Conference, MLICOM 2023, Beijing, China, December 17, 2023, Proceedings

Research Article

Learning Consistent Embedding Distribution for Robust ASR

Cite: BibTeX Plain Text

@INPROCEEDINGS{10.1007/978-3-031-71716-1_1,
    author={Zuyao Ma and Yulong Wang and Tongcun Liu and Lei Zhang and Wei Li},
    title={Learning Consistent Embedding Distribution for Robust ASR},
    proceedings={Machine Learning and Intelligent Communication. 8th EAI International Conference, MLICOM 2023, Beijing, China, December 17, 2023, Proceedings},
    proceedings_a={MLICOM},
    year={2024},
    month={9},
    keywords={Robust ASR Distribution transformation Speech enhancement},
    doi={10.1007/978-3-031-71716-1_1}
}

Zuyao Ma
Yulong Wang
Tongcun Liu
Lei Zhang
Wei Li
Year: 2024
Learning Consistent Embedding Distribution for Robust ASR
MLICOM
Springer
DOI: 10.1007/978-3-031-71716-1_1

Zuyao Ma^,*, Yulong Wang, Tongcun Liu, Lei Zhang, Wei Li

*Contact email: mazuyao233@bupt.edu.cn

Abstract

Despite the success achieved by existing Automatic Speech Recognition (ASR) models, they are highly dependent on the sufficiency of labeled clean training data, which is unrealistic in practice due to expensive labeling costs and unpredictable noise. To address this challenge, we propose a novel Distribution Transformation network (DT-net), which attempts to refine the pre-trained embeddings to mitigate the influence brought by noise. The proposed DT-net consists of a front-end Speech Enhancement (SE) module and a back-end Automatic Speech Recognition (ASR) module. Besides, two types of novel distribution transformations are introduced into the SE and ASR module respectively to adapt to the distributions of clean and noisy pre-trained embeddings. Extensive experiments conducted in public datasets, CHiME-4, reveal that the proposed DT-net outperforms other baselines in terms of both recognition performance and robustness.

Keywords: Robust ASR, Distribution transformation, Speech enhancement

Published: 2024-09-20
Appears in: SpringerLink

: http://dx.doi.org/10.1007/978-3-031-71716-1_1

Learning Consistent Embedding Distribution for Robust ASR

Abstract

About EAI

Community

Publish with EAI