
Research Article
MSDA-Text: Template-Guided Long-Form Text Generation with Multi-Source Data Augmentation
@INPROCEEDINGS{10.4108/eai.18-12-2025.2365293, author={Zheng Dai and Yilun Zhang and Pengjia Wang and Qianpu Jiang and Fuguo Liu and Yufeng Shi}, title={MSDA-Text: Template-Guided Long-Form Text Generation with Multi-Source Data Augmentation}, proceedings={Proceedings of the 13th International Conference on Identification, Information and Knowledge in the Internet of Things, IIKI 2025, 18-21 December 2025, Chengdu, China}, publisher={EAI}, proceedings_a={IIKI}, year={2026}, month={6}, keywords={Large Language Models Long-Form Text Generation Multi-Source Data Augmentation Template-Guided Generation RAG Text-to-SQL}, doi={10.4108/eai.18-12-2025.2365293} }- Zheng Dai
Yilun Zhang
Pengjia Wang
Qianpu Jiang
Fuguo Liu
Yufeng Shi
Year: 2026
MSDA-Text: Template-Guided Long-Form Text Generation with Multi-Source Data Augmentation
IIKI
EAI
DOI: 10.4108/eai.18-12-2025.2365293
Abstract
Large Language Models (LLMs) have demonstrated remarkable capabilities in text generation, yet their outputs often depend heavily on pre-training data and lack the factual depth required for domain-specific long-form writing, such as industrial reports or biographical summaries. To address this limitation, we propose MSDA-Text (Template-Guided Long-Form Text Generation with Multi-Source Data Augmentation), a framework designed to produce accurate and comprehensive long-form texts aligned with user intent. Building upon existing long-text generation architectures such as Storm, MSDA-Text introduces two key enhancements: (1) a template-guided outline generation process that incorporates user-provided reference materials into multi-perspective LLM discussions, and (2) multi-source data augmentation that integrates both Internet-based and local real-time data through Retrieval-Augmented Generation (RAG) and Text-to-SQL techniques. The framework employs the Model Context Protocol (MCP) to unify template parsing across heterogeneous file types and features a long-text writing agent that autonomously retrieves and synthesizes content for each outline section. Experimental results demonstrate that MSDA-Text generates long-form documents that are more structured, user-aligned, and factually grounded than existing LLM-based methods.


