Internet Traffic Classification Using Machine Learning

Li Jun; Zhang Shunyi; Lu Yanqing; Zhang Zailong

2nd International ICST Conference on Communications and Networking in China

Research Article

Internet Traffic Classification Using Machine Learning

Cite: BibTeX Plain Text

@INPROCEEDINGS{10.1109/CHINACOM.2007.4469372,
    author={Li Jun and Zhang Shunyi and Lu Yanqing and Zhang Zailong},
    title={Internet Traffic Classification Using Machine Learning},
    proceedings={2nd International ICST Conference on Communications and Networking in China},
    publisher={IEEE},
    proceedings_a={CHINACOM},
    year={2008},
    month={3},
    keywords={Machine Learning (ML); Traffic classification; Feature Selection},
    doi={10.1109/CHINACOM.2007.4469372}
}

Li Jun
Zhang Shunyi
Lu Yanqing
Zhang Zailong
Year: 2008
Internet Traffic Classification Using Machine Learning
CHINACOM
IEEE
DOI: 10.1109/CHINACOM.2007.4469372

Li Jun^1,2^,*, Zhang Shunyi¹, Lu Yanqing¹, Zhang Zailong¹

1: Nanjing University of Posts and Telecommunications, Nanjing 210003, China.
2: Zhejiang Wanli University, Ningbo 315100, China

*Contact email: lijunreed@ieee.org

Abstract

Internet traffic identification and classification is vital to the areas of network management and security monitoring, network planning, and QoS provision. Traditional approaches such as port-based and payload-based identification are becoming increasingly difficult with many new applications (e.g. P2P) using dynamic port numbers, masquerading techniques, and encryption to avoid detection. An alternative approach is to classify traffic by exploiting the distinctive characteristics of flow statistics. We present here a traffic classification scheme based on machine learning (ML). The performance impact of the dataset size, feature selection and ML algorithm selection is demonstrated by experiments. The genetic algorithm based feature selection can dramatically reduce the ML learning and modeling time with less decrease or even a bit increase in classification accuracy. The chosen ML algorithms: TAN, C4.5, NBTree, RandomForest and distance weighted KNN, can reach high classification accuracy. Typically, C4.5 and RandomForest are superior to other ML algorithms in computational complexity. Besides, experiments show that the size of data set would impact on the classification performance, and tuning dataset’s size could meet the requirements of specific applications.

Keywords: Machine Learning (ML); Traffic classification; Feature Selection

Published: 2008-03-07
Publisher: IEEE
Modified: 2011-07-17

: http://dx.doi.org/10.1109/CHINACOM.2007.4469372

Internet Traffic Classification Using Machine Learning

Abstract

About EAI

Community

Publish with EAI