About | Contact Us | Register | Login
ProceedingsSeriesJournalsSearchEAI
Proceedings of the 2nd International Conference on Machine Learning and Automation, CONF-MLA 2024, November 21, 2024, Adana, Turkey

Research Article

Vision Transformer-Based Recognition of Chinese Cursive Calligraphy: A Curriculum Learning and Skeleton Embedding Approach

Download207 downloads
Cite
BibTeX Plain Text
  • @INPROCEEDINGS{10.4108/eai.21-11-2024.2354609,
        author={Xinrui  Shan and Jinyang  Zheng and Yilin  Fang and Tianhong  Qi},
        title={Vision Transformer-Based Recognition of Chinese Cursive Calligraphy: A Curriculum Learning and Skeleton Embedding Approach},
        proceedings={Proceedings of the 2nd International Conference on Machine Learning and Automation, CONF-MLA 2024, November 21, 2024, Adana, Turkey},
        publisher={EAI},
        proceedings_a={CONF-MLA},
        year={2025},
        month={3},
        keywords={vision transformer chinese cursive calligraphy skeleton embedding},
        doi={10.4108/eai.21-11-2024.2354609}
    }
    
  • Xinrui Shan
    Jinyang Zheng
    Yilin Fang
    Tianhong Qi
    Year: 2025
    Vision Transformer-Based Recognition of Chinese Cursive Calligraphy: A Curriculum Learning and Skeleton Embedding Approach
    CONF-MLA
    EAI
    DOI: 10.4108/eai.21-11-2024.2354609
Xinrui Shan1,*, Jinyang Zheng2, Yilin Fang3, Tianhong Qi4
  • 1: Zhejiang University
  • 2: Xi'an Jiaotong-Liverpool University
  • 3: Beijing University of Posts and Telecommunications
  • 4: University of Science and Technology Beijing
*Contact email: xinruishan@zju.edu.cn

Abstract

Chinese cursive calligraphy, characterized by fluid and complex strokes, presents a significant challenge in character recognition due to the variations in character structure and style. This paper proposes an innovative approach to recognize Chinese cursive characters using two Vision Transformer (ViT)-based models. We enhance the models with curriculum learning to optimize training efficiency by dynamically adjusting the difficulty of samples, allowing the models to progressively learn from easier to harder examples. Additionally, we integrate skeleton embeddings into the ViT encoder input to capture the underlying structural information of cursive characters. Our method demonstrates superior performance compared to baseline approaches, achieving higher recognition accuracy on a self-made cursive calligraphy datasets.

Keywords
vision transformer chinese cursive calligraphy skeleton embedding
Published
2025-03-11
Publisher
EAI
http://dx.doi.org/10.4108/eai.21-11-2024.2354609
Copyright © 2024–2025 EAI
EBSCOProQuestDBLPDOAJPortico
EAI Logo

About EAI

  • Who We Are
  • Leadership
  • Research Areas
  • Partners
  • Media Center

Community

  • Membership
  • Conference
  • Recognition
  • Sponsor Us

Publish with EAI

  • Publishing
  • Journals
  • Proceedings
  • Books
  • EUDL