About | Contact Us | Register | Login
ProceedingsSeriesJournalsSearchEAI
Proceedings of the 2nd International Conference on Machine Learning and Automation, CONF-MLA 2024, November 21, 2024, Adana, Turkey

Research Article

Stable Random Sampling (SRS): A New Method to Refine Causal Masking in Decoder-Only Transformer

Download210 downloads
Cite
BibTeX Plain Text
  • @INPROCEEDINGS{10.4108/eai.21-11-2024.2354592,
        author={Shuhao  Zhang and Jiayi  Yu and Jiarui  Li},
        title={Stable Random Sampling (SRS): A New Method to Refine Causal Masking in Decoder-Only Transformer},
        proceedings={Proceedings of the 2nd International Conference on Machine Learning and Automation, CONF-MLA 2024, November 21, 2024, Adana, Turkey},
        publisher={EAI},
        proceedings_a={CONF-MLA},
        year={2025},
        month={3},
        keywords={decoder-only transformer causal masking random sampling positional information},
        doi={10.4108/eai.21-11-2024.2354592}
    }
    
  • Shuhao Zhang
    Jiayi Yu
    Jiarui Li
    Year: 2025
    Stable Random Sampling (SRS): A New Method to Refine Causal Masking in Decoder-Only Transformer
    CONF-MLA
    EAI
    DOI: 10.4108/eai.21-11-2024.2354592
Shuhao Zhang1,*, Jiayi Yu2, Jiarui Li3
  • 1: University of Science and Technology Beijing
  • 2: UM-SJTU Joint Institute, Shanghai Jiaotong University
  • 3: Xidian University
*Contact email: U202142800@xs.ustb.edu.cn

Abstract

In current language modelling, the decoder-only Transformer architecture with causal masking has become a cornerstone, demonstrating exceptional performance across various tasks. However, we have identified two significant limitations: First, causal masking presents a substantial obstacle to further optimizing overall model efficiency, particularly in handling long contexts. Second, traditional optimization of causal masking struggles with uneven attention distribution and the inability to encode absolute positional information, limiting their effectiveness in position-sensitive tasks. In this work, we propose the Stable Random Sampling (SRS) algorithm, a novel method to address both limitations by refining the causal masking process. SRS introduces a pseudo-attention mask to balance attention distributions for performance refinement and incorporates random sampling and Locality-Sensitive Hashing (LSH) in causal masking part for efficient processing, reducing time complexity of this part to O(n). The effectiveness of SRS is validated both theoretically and empirically. Our pre-training ablation experiments demonstrate that SRS module virtually enhances the performance of causal masking while each functional part of it relatively improves efficiency and effectiveness towards different sizes of tasks, on average showing a 30% reduction in training time and a 50% decrease in loss rate compared to traditional methods.

Keywords
decoder-only transformer causal masking random sampling positional information
Published
2025-03-11
Publisher
EAI
http://dx.doi.org/10.4108/eai.21-11-2024.2354592
Copyright © 2024–2025 EAI
EBSCOProQuestDBLPDOAJPortico
EAI Logo

About EAI

  • Who We Are
  • Leadership
  • Research Areas
  • Partners
  • Media Center

Community

  • Membership
  • Conference
  • Recognition
  • Sponsor Us

Publish with EAI

  • Publishing
  • Journals
  • Proceedings
  • Books
  • EUDL