7th International Conference on Collaborative Computing: Networking, Applications and Worksharing

Research Article

An Ensemble-based Approach to Fast Classification of Multi-label Data Streams

Download165 downloads
  • @INPROCEEDINGS{10.4108/icst.collaboratecom.2011.247086,
        author={Xiangnan Kong and Philip Yu},
        title={An Ensemble-based Approach to Fast Classification of Multi-label Data Streams},
        proceedings={7th International Conference on Collaborative Computing: Networking, Applications and Worksharing},
        publisher={IEEE},
        proceedings_a={COLLABORATECOM},
        year={2012},
        month={4},
        keywords={data stream data mining multi-label classification random tree},
        doi={10.4108/icst.collaboratecom.2011.247086}
    }
    
  • Xiangnan Kong
    Philip Yu
    Year: 2012
    An Ensemble-based Approach to Fast Classification of Multi-label Data Streams
    COLLABORATECOM
    ICST
    DOI: 10.4108/icst.collaboratecom.2011.247086
Xiangnan Kong1, Philip Yu1,*
  • 1: University of Illinois at Chicago
*Contact email: psyu@cs.uic.edu

Abstract

Network operators are continuously confronted with online events, such as online messages, blog updates, etc. Due to the huge volume of these events and the fast changes of the topics, it is critical to manage them promptly and effectively. There have been many softwares and algorithms developed to conduct automatic classification over these stream data. Conventional approaches focus on single-label scenarios, where each event can only be tagged with one label. However, in many stream data, each event can be tagged with more than one labels. Effective stream classification systems should be able to consider the unique properties of multi-label stream data, such as large data volumes, label correlations and concept drifts. To address these challenges, in this paper, we propose an efficient and effective method for multi-label stream classification based on an ensemble of fading random trees. The proposed model can efficiently process high-speed multi-label stream data with concept drifts. Empirical studies on real-world tasks demonstrate that our method can maintain a high accuracy in multi-label stream classification, while providing a very efficient solution to the task.