2nd International IEEE Conference on Communication System Software and Middleware

Research Article

Real-time End-to-end Network Monitoring in Large Distributed Systems

  • @INPROCEEDINGS{10.1109/COMSWA.2007.382612,
        author={ Han  Hee Song and  Praveen Yalagandula},
        title={Real-time End-to-end Network Monitoring in Large Distributed Systems},
        proceedings={2nd International IEEE Conference on Communication System Software and Middleware},
        publisher={IEEE},
        proceedings_a={COMSWARE},
        year={2007},
        month={7},
        keywords={Bandwidth  Delay  Extraterrestrial measurements  Interference  Jitter  Monitoring  Network servers  Performance evaluation  Real time systems  Streaming media},
        doi={10.1109/COMSWA.2007.382612}
    }
    
  • Han Hee Song
    Praveen Yalagandula
    Year: 2007
    Real-time End-to-end Network Monitoring in Large Distributed Systems
    COMSWARE
    IEEE
    DOI: 10.1109/COMSWA.2007.382612
Han Hee Song1,*, Praveen Yalagandula2,*
  • 1: University of Texas at Austin, Austin, TX, USA
  • 2: HP Labs, Palo Alto, CA, USA
*Contact email: hhsong@cs.utexas.edu, praveen.yalagandula@hp.com

Abstract

Measuring real-time end-to-end network path performance metrics is important for several distributed applications such as media streaming systems (e.g., for switching to paths with higher bandwidth and lower jitter) and content distribution systems (e.g., for selecting servers with lower latency). However, it is challenging to perform such end-to-end pairwise measurements in large distributed systems while achieving high accuracy and avoid interfering with existing traffic. On the end hosts, the measurements can overload the machine by causing interference among themselves and other processes. On the network, the measurement packets from different hosts can interfere among themselves and with other flows on bottleneck links. In this paper, we propose a system to monitor end-host and network resources and adapt the number of measurements according to the observed load. Our scheme avoids interference by measuring only a small subset of network paths and reconstructing the entire network path properties from the partial, indirect measurements. Our simulation experiments and real testbed experiments on PlanetLab show that our path selection algorithm working with resource constraints does not adversely affect the accuracy of inference and our system can effectively adapt to the changing resource usage at the end hosts