
Research Article
Vision Transformer-Based Recognition of Chinese Cursive Calligraphy: A Curriculum Learning and Skeleton Embedding Approach
@INPROCEEDINGS{10.4108/eai.21-11-2024.2354609, author={Xinrui Shan and Jinyang Zheng and Yilin Fang and Tianhong Qi}, title={Vision Transformer-Based Recognition of Chinese Cursive Calligraphy: A Curriculum Learning and Skeleton Embedding Approach}, proceedings={Proceedings of the 2nd International Conference on Machine Learning and Automation, CONF-MLA 2024, November 21, 2024, Adana, Turkey}, publisher={EAI}, proceedings_a={CONF-MLA}, year={2025}, month={3}, keywords={vision transformer chinese cursive calligraphy skeleton embedding}, doi={10.4108/eai.21-11-2024.2354609} }
- Xinrui Shan
Jinyang Zheng
Yilin Fang
Tianhong Qi
Year: 2025
Vision Transformer-Based Recognition of Chinese Cursive Calligraphy: A Curriculum Learning and Skeleton Embedding Approach
CONF-MLA
EAI
DOI: 10.4108/eai.21-11-2024.2354609
Abstract
Chinese cursive calligraphy, characterized by fluid and complex strokes, presents a significant challenge in character recognition due to the variations in character structure and style. This paper proposes an innovative approach to recognize Chinese cursive characters using two Vision Transformer (ViT)-based models. We enhance the models with curriculum learning to optimize training efficiency by dynamically adjusting the difficulty of samples, allowing the models to progressively learn from easier to harder examples. Additionally, we integrate skeleton embeddings into the ViT encoder input to capture the underlying structural information of cursive characters. Our method demonstrates superior performance compared to baseline approaches, achieving higher recognition accuracy on a self-made cursive calligraphy datasets.