sis 15(5): e4

Research Article

An Efficient Technique for Network Traffic Summarization using Multiview Clustering and Statistical Sampling

Download965 downloads
  • @ARTICLE{10.4108/sis.2.5.e4,
        author={Mohiuddin Ahmed and Abdun Naser Mahmood and Michael J. Maher},
        title={An Efficient Technique for Network Traffic Summarization using Multiview Clustering and Statistical Sampling},
        journal={EAI Endorsed Transactions on Scalable Information Systems},
        volume={2},
        number={5},
        publisher={ICST},
        journal_a={SIS},
        year={2015},
        month={7},
        keywords={Scalable Data Mining, Network Traffic Summarization, Multiview Clustering},
        doi={10.4108/sis.2.5.e4}
    }
    
  • Mohiuddin Ahmed
    Abdun Naser Mahmood
    Michael J. Maher
    Year: 2015
    An Efficient Technique for Network Traffic Summarization using Multiview Clustering and Statistical Sampling
    SIS
    ICST
    DOI: 10.4108/sis.2.5.e4
Mohiuddin Ahmed1,*, Abdun Naser Mahmood1, Michael J. Maher1
  • 1: School of Engineering and Information Technology, UNSW Canberra, Australia
*Contact email: mohiuddin. ahmed@student.unsw.edu.au

Abstract

There is significant interest in the data mining and network management communities to efficiently analyse huge amounts of network traffic, given the amount of network traffic generated even in small networks. Summarization is a primary data mining task for generating a concise yet informative summary of the given data and it is a research challenge to create summary from network traffic data. Existing clustering based summarization techniques lack the ability to create a suitable summary for further data mining tasks such as anomaly detection and require the summary size as an external input. Additionally, for complex and high dimensional network traffic datasets, there is often no single clustering solution that explains the structure of the given data. In this paper, we investigate the use of multiview clustering to create a meaningful summary using original data instances from network traffic data in an efficient manner. We develop a mathematically sound approach to select the summary size using a sampling technique. We compare our proposed approach with regular clustering based summarization incorporating the summary size calculation method and random approach. We validate our proposed approach using the benchmark network traffic dataset and state-of-theart summary evaluation metrics.