
Research Article
A novel knowledge enhancement method for large-scale natural language training model
@ARTICLE{10.4108/airo.8987, author={Qi Han and Gilja So}, title={A novel knowledge enhancement method for large-scale natural language training model}, journal={EAI Endorsed Transactions on AI and Robotics}, volume={4}, number={1}, publisher={EAI}, journal_a={AIRO}, year={2025}, month={7}, keywords={large-scale natural language training model, knowledge enhancement, long text representation, pre-trained mode}, doi={10.4108/airo.8987} }
- Qi Han
Gilja So
Year: 2025
A novel knowledge enhancement method for large-scale natural language training model
AIRO
EAI
DOI: 10.4108/airo.8987
Abstract
Knowledge enhancement-based large-scale natural language training model is an advanced language model that combines deep learning and knowledge enhancement. By learning from massive unlabeled data and combining with external knowledge such as knowledge graph, it breaks through the limitations of traditional models in interpretability and reasoning ability. Introducing knowledge into data-driven artificial intelligence model is an important way to realize human-machine hybrid intelligence. However, since most pre-trained models are trained on large-scale unstructured corpus data, the defects in certainty and explainability can be remedied to some extent by introducing external knowledge. To solve the above problems, we present a knowledge-enhanced large-scale natural language training model that integrates deep learning with external knowledge sources (e.g., knowledge graphs) to improve interpretability and reasoning ability. This approach addresses the limitations of traditional models trained on unstructured data by incorporating external knowledge to enhance certainty and explainability. We propose a new knowledge enhancement method and demonstrate its effectiveness through a long text representation model. This model processes structured, knowledge-rich long texts by extracting and integrating knowledge and semantic information at the sentence and document levels. It then fuses these representations to generate an enhanced long text representation. Experiments on legal case matching tasks show that our model significantly outperforms existing methods, highlighting its innovation and practical value.
Copyright © 2025 Qi Han et al., licensed to EAI. This is an open access article distributed under the terms of the CC BY-NC-SA 4.0, which permits copying, redistributing, remixing, transformation, and building upon the material in any medium so long as the original work is properly cited.