2nd International ICST Conference on Collaborative Computing: Networking, Applications and Worksharing

Research Article

Cost-based Solution for Optimizing Multi-join Queries over Distributed Streaming Sensor Data

  • @INPROCEEDINGS{10.1109/COLCOM.2006.361871,
        author={Joseph Gomes and Hyeong-Ah Choi},
        title={Cost-based Solution for Optimizing Multi-join Queries over Distributed Streaming Sensor Data},
        proceedings={2nd International ICST Conference on Collaborative Computing: Networking, Applications and Worksharing},
        publisher={IEEE},
        proceedings_a={COLLABORATECOM},
        year={2007},
        month={5},
        keywords={Aggregates Collaboration Collaborative work Distributed computing Information filtering Information filters Network servers Physics computing Sensor phenomena and characterization Time factors},
        doi={10.1109/COLCOM.2006.361871}
    }
    
  • Joseph Gomes
    Hyeong-Ah Choi
    Year: 2007
    Cost-based Solution for Optimizing Multi-join Queries over Distributed Streaming Sensor Data
    COLLABORATECOM
    IEEE
    DOI: 10.1109/COLCOM.2006.361871
Joseph Gomes1,*, Hyeong-Ah Choi1,*
  • 1: Department of Computer Science, The George Washington University, Washington, DC, USA
*Contact email: joegomes@gwu.edu, hchoi@gwu.edu

Abstract

Sensors are envisioned to be at the center of distributed collaborative computing services involving time-critical decision support. Sensors are small devices with limited communication and computational capabilities that collect data on their neighboring physical world and send the data periodically to server machines. Sensors form a collaborative network with these servers, where the sensors gather information and the servers perform various operations (e.g. filter, aggregate, join etc) on the information streams in real-time according to predefined queries or rules. Sensor data streams are continuous, un-ending and have highly volatile characteristics. As a result, traditional database systems are inappropriate for handling queries for sensor streams, and several stream data management systems have been proposed in the literature. In this paper we focus on a special type of query, namely join queries, which is the most expensive query operator. Here, we address the problem of finding an optimal join tree that maximizes throughput for sliding window based multi-join queries over continuous sensor data streams. We present a polynomial time algorithm Fodp and three variants of Fodp. Our experiments in ARES show that for almost all instances, trees from Fodp and its variants perform close to the optimal trees from our exponential time algorithm OptDP (Gomes, 2006), and significantly better than existing XJoin based heuristic algorithms.