
Research Article
A Method for Extracting News Text Information from Converged Media Videos Based on SWT Algorithm
@INPROCEEDINGS{10.4108/eai.18-12-2025.2365259, author={Lixiang Shi and Jing Liang and Qi Li and Rui Lv}, title={A Method for Extracting News Text Information from Converged Media Videos Based on SWT Algorithm}, proceedings={Proceedings of the 13th International Conference on Identification, Information and Knowledge in the Internet of Things, IIKI 2025, 18-21 December 2025, Chengdu, China}, publisher={EAI}, proceedings_a={IIKI}, year={2026}, month={6}, keywords={SWT algorithm Converged media video News text Information extraction Feature encoding Loss function}, doi={10.4108/eai.18-12-2025.2365259} }- Lixiang Shi
Jing Liang
Qi Li
Rui Lv
Year: 2026
A Method for Extracting News Text Information from Converged Media Videos Based on SWT Algorithm
IIKI
EAI
DOI: 10.4108/eai.18-12-2025.2365259
Abstract
In converged media videos, only semantic images can be selected, resulting in low reliability of extracted information. Therefore, this paper proposes a method for extracting news text information from converged media videos based on the SWT algorithm. A maximum stable dynamic region criterion is proposed to detect spatiotemporally stable regions, and a triple feature encoding mechanism is designed. A multi-level feature fusion framework is proposed, generating positive and negative sample pairs to define the extraction loss function. Edge density is used to distinguish text from noise, and the extraction and classification losses are combined for optimization to achieve news text information extraction. Experimental results show that the proposed method reduces the loss value by 0.07 in each of the first 6 rounds, decreases to 0.32 in the 10th round, and finally stabilizes at 0.25. On three datasets, including NewsHub, the proposed method achieves an F1 score of up to 0.92 and an AP of 0.90, representing a 3.2%-3.8% improvement over the best comparison method. In feature space visualization, the average aggregation degree of similar texts reaches 91%, while the proportion of outliers drops to a minimum of 0.3%. This demonstrates the superior reliability of the extracted information, effectively addressing multimodal interference and possessing significant practical value.


