
Research Article
L-TCN Speech Separation Algorithm for Effectively Acquisition IPD Information Based on Attention in Reverberation Environment
@INPROCEEDINGS{10.1007/978-3-031-60347-1_14, author={Xiyu Song and Zhengyi An and Shiqi Wang and Fangzhi Yao and Mei Wang}, title={L-TCN Speech Separation Algorithm for Effectively Acquisition IPD Information Based on Attention in Reverberation Environment}, proceedings={Mobile Multimedia Communications. 16th EAI International Conference, MobiMedia 2023, Guilin, China, July 22-24, 2023, Proceedings}, proceedings_a={MOBIMEDIA}, year={2024}, month={10}, keywords={reverberation environment speech separation time convolution network}, doi={10.1007/978-3-031-60347-1_14} }
- Xiyu Song
Zhengyi An
Shiqi Wang
Fangzhi Yao
Mei Wang
Year: 2024
L-TCN Speech Separation Algorithm for Effectively Acquisition IPD Information Based on Attention in Reverberation Environment
MOBIMEDIA
Springer
DOI: 10.1007/978-3-031-60347-1_14
Abstract
Speech separation aims to separate a target speaker's speech from mixed speech. However, various noises and reverberations in real life make separation difficult. To solve this problem, a multi-channel microphone array is introduced to extract the spatial information of the target speech; however, the number of inter-channel phase differences (IPDs) increases linearly with the square of the number of microphones. Indeed, using all IPDs will impose a massive load on the system; therefore, We use the attention mechanism to effectively acquire IPD information. Moreover, the time convolution network (TCN) exhibits excellent performance in speech separation; however, a large number of parameters of deep dilated convolution results in a huge system burden. In summary, a speech separation method aided by effectively acquisition IPD information based on attention is proposed for a lightweight time convolution network (L-TCN). Compared with the control experiment, the proposed method reduces the parameters by 90% and doubles the utilization rate of the IPD. Based on the premise of reducing the system load, the short-time objective intelligence (STOI) increases by 0.19 and the scale-invariant signal to distortion ratio (SI-SDR) increases by 6.33.