
Research Article
Real-time End-to-end Network Monitoring in Large Distributed Systems
- @INPROCEEDINGS{10.1109/COMSWA.2007.382612, author={ Han Hee Song and Praveen Yalagandula}, title={Real-time End-to-end Network Monitoring in Large Distributed Systems}, proceedings={2nd International IEEE Conference on Communication System Software and Middleware}, publisher={IEEE}, proceedings_a={COMSWARE}, year={2007}, month={7}, keywords={Bandwidth Delay Extraterrestrial measurements Interference Jitter Monitoring Network servers Performance evaluation Real time systems Streaming media}, doi={10.1109/COMSWA.2007.382612} }
- Han Hee Song
 Praveen Yalagandula
 Year: 2007
 Real-time End-to-end Network Monitoring in Large Distributed Systems
 COMSWARE
 IEEE
 DOI: 10.1109/COMSWA.2007.382612
Abstract
Measuring real-time end-to-end network path performance metrics is important for several distributed applications such as media streaming systems (e.g., for switching to paths with higher bandwidth and lower jitter) and content distribution systems (e.g., for selecting servers with lower latency). However, it is challenging to perform such end-to-end pairwise measurements in large distributed systems while achieving high accuracy and avoid interfering with existing traffic. On the end hosts, the measurements can overload the machine by causing interference among themselves and other processes. On the network, the measurement packets from different hosts can interfere among themselves and with other flows on bottleneck links. In this paper, we propose a system to monitor end-host and network resources and adapt the number of measurements according to the observed load. Our scheme avoids interference by measuring only a small subset of network paths and reconstructing the entire network path properties from the partial, indirect measurements. Our simulation experiments and real testbed experiments on PlanetLab show that our path selection algorithm working with resource constraints does not adversely affect the accuracy of inference and our system can effectively adapt to the changing resource usage at the end hosts


