1st International ICST Conference on Communications and Networking in China

Research Article

A XOR-Based Hierarchical Sketch for Identifying and Estimating Hierarchical Frequent Items Online

  • @INPROCEEDINGS{10.1109/CHINACOM.2006.344662,
        author={Wenfeng  Feng and Qiao  Guo and Zhibin  Zhang and Zongpu  Jia},
        title={A XOR-Based Hierarchical Sketch for Identifying and Estimating Hierarchical Frequent Items Online},
        proceedings={1st International ICST Conference on Communications and Networking in China},
        publisher={IEEE},
        proceedings_a={CHINACOM},
        year={2007},
        month={4},
        keywords={},
        doi={10.1109/CHINACOM.2006.344662}
    }
    
  • Wenfeng Feng
    Qiao Guo
    Zhibin Zhang
    Zongpu Jia
    Year: 2007
    A XOR-Based Hierarchical Sketch for Identifying and Estimating Hierarchical Frequent Items Online
    CHINACOM
    IEEE
    DOI: 10.1109/CHINACOM.2006.344662
Wenfeng Feng1, Qiao Guo1, Zhibin Zhang2, Zongpu Jia2
  • 1: Network Information Center, Beijing Institute of Technology, Beijing 100081, China
  • 2: Department of Computer Science, Henan Polytechnic University, Jiaozuo Henan 454003, China

Abstract

Many sketch data structures have been implemented. But due to our knowledge, the current sketching approach to summarizing the hierarchical structure in data was oracles-based, which "stack" logn sketches. By designing and using a XOR-based pair-wise independent family of hash functions on the hierarchical domain U*, a hierarchical sketch data structure was implemented to summarize the hierarchical structure in data stream, which mapped the data stream items to a three dimensional counter array of size LtimesDtimesW, of which, L was the number of layers in the hierarchical domain, D was the number of the hash functions chosen uniformly at random, and W was the range of the hash functions. The sketch used the breadth-first search strategy to identify and estimate the hierarchical frequent items online with user-specified probability. The experiments demonstrate substantial improvement in the update time and evaluating accuracy over the straightforward oracles-based sketch approach.