About | Contact Us | Register | Login
ProceedingsSeriesJournalsSearchEAI
Security and Privacy in Communication Networks. 18th EAI International Conference, SecureComm 2022, Virtual Event, October 2022, Proceedings

Research Article

SecureBERT: A Domain-Specific Language Model for Cybersecurity

Cite
BibTeX Plain Text
  • @INPROCEEDINGS{10.1007/978-3-031-25538-0_3,
        author={Ehsan Aghaei and Xi Niu and Waseem Shadid and Ehab Al-Shaer},
        title={SecureBERT: A Domain-Specific Language Model for Cybersecurity},
        proceedings={Security and Privacy in Communication Networks. 18th EAI International Conference, SecureComm 2022, Virtual Event, October 2022, Proceedings},
        proceedings_a={SECURECOMM},
        year={2023},
        month={2},
        keywords={Cyber automation Cyber threat intelligence Language model},
        doi={10.1007/978-3-031-25538-0_3}
    }
    
  • Ehsan Aghaei
    Xi Niu
    Waseem Shadid
    Ehab Al-Shaer
    Year: 2023
    SecureBERT: A Domain-Specific Language Model for Cybersecurity
    SECURECOMM
    Springer
    DOI: 10.1007/978-3-031-25538-0_3
Ehsan Aghaei,*, Xi Niu, Waseem Shadid, Ehab Al-Shaer
    *Contact email: eaghaei@uncc.edu

    Abstract

    Natural Language Processing (NLP) has recently gained wide attention in cybersecurity, particularly in Cyber Threat Intelligence (CTI) and cyber automation. Increased connection and automation have revolutionized the world’s economic and cultural infrastructures, while they have introduced risks in terms of cyber attacks. CTI is information that helps cybersecurity analysts make intelligent security decisions, that is often delivered in the form of natural language text, which must be transformed to machine readable format through an automated procedure before it can be used for automated security measures.

    This paper proposes SecureBERT, a cybersecurity language model capable of capturing text connotations in cybersecurity text (e.g., CTI) and therefore successful in automation for many critical cybersecurity tasks that would otherwise rely on human expertise and time-consuming manual efforts. SecureBERT has been trained using a large corpus of cybersecurity text. To make SecureBERT effective not just in retaining general English understanding, but also when applied to text with cybersecurity implications, we developed a customized tokenizer as well as a method to alter pre-trained weights. The SecureBERT is evaluated using the standard Masked Language Model (MLM) test as well as two additional standard NLP tasks. Our evaluation studies show that SecureBERT outperforms existing similar models, confirming its capability for solving crucial NLP tasks in cybersecurity.

    Keywords
    Cyber automation Cyber threat intelligence Language model
    Published
    2023-02-04
    Appears in
    SpringerLink
    http://dx.doi.org/10.1007/978-3-031-25538-0_3
    Copyright © 2022–2025 ICST
    EBSCOProQuestDBLPDOAJPortico
    EAI Logo

    About EAI

    • Who We Are
    • Leadership
    • Research Areas
    • Partners
    • Media Center

    Community

    • Membership
    • Conference
    • Recognition
    • Sponsor Us

    Publish with EAI

    • Publishing
    • Journals
    • Proceedings
    • Books
    • EUDL