Collaborative Computing: Networking, Applications and Worksharing. 13th International Conference, CollaborateCom 2017, Edinburgh, UK, December 11–13, 2017, Proceedings

Research Article

ProNet: Toward Payload-Driven Protocol Fingerprinting via Convolutions and Embeddings

Download
155 downloads
  • @INPROCEEDINGS{10.1007/978-3-030-00916-8_48,
        author={Yafei Sang and Yongzheng Zhang and Chengwei Peng},
        title={ProNet: Toward Payload-Driven Protocol Fingerprinting via Convolutions and Embeddings},
        proceedings={Collaborative Computing: Networking, Applications and Worksharing. 13th International Conference, CollaborateCom 2017, Edinburgh, UK, December 11--13, 2017, Proceedings},
        proceedings_a={COLLABORATECOM},
        year={2018},
        month={10},
        keywords={Protocol fingerprinting Convolutions Embedding},
        doi={10.1007/978-3-030-00916-8_48}
    }
    
  • Yafei Sang
    Yongzheng Zhang
    Chengwei Peng
    Year: 2018
    ProNet: Toward Payload-Driven Protocol Fingerprinting via Convolutions and Embeddings
    COLLABORATECOM
    Springer
    DOI: 10.1007/978-3-030-00916-8_48
Yafei Sang,*, Yongzheng Zhang,*, Chengwei Peng,*
    *Contact email: sangyafei@iie.ac.cn, zhangyongzheng@iie.ac.cn, pengchengwei@iie.ac.cn

    Abstract

    Protocol fingerprinting (PF) focuses on the capability to derive a series of distinguishable features for recognizing which protocol or application generated the network traffic. Unfortunately, deep packet inspection (DPI), a widely adopted method for PF, requires significant expert effort to develop and maintain protocol signatures. Additionally, the new solution paradigm, deep flow inspection (DFI) using machine learning for PF, also relies on hand-designed features. In this paper, we present , a payload based approach to protocol fingerprinting, which overcomes the limitation of artificial feature engineering. The key novelty of is two-fold: it takes generic, raw short packet payloads as input, instead of the typical flow-statistical-features (, port, packet size, packet-interval); it learns to simultaneously extract features via convolutional operations on the embeddings and embeddings. We implement and evaluate on real-world traces, including DNS, QQLive, PPLive, PPStream, SopCast, DHCP, NBNS, HTTP, SMTP and SMB. Our experiment results show that achieves over 99% precision and recall with low false-positives (less than 1%) and nearly no false-negatives.