1st International ICST Conference on Scalable Information Systems

Research Article

The notification based approach to implementing failure detectors in distributed systems

  • @INPROCEEDINGS{10.1145/1146847.1146861,
        author={Jin  Yang  and Jiannong  Cao and Weigang  Wu},
        title={The notification based approach to implementing failure detectors in distributed systems},
        proceedings={1st International ICST Conference on Scalable Information Systems},
        publisher={ACM},
        proceedings_a={INFOSCALE},
        year={2006},
        month={6},
        keywords={fault tolerance; heartbeat; failure detector; QoS; performance evaluation},
        doi={10.1145/1146847.1146861}
    }
    
  • Jin Yang
    Jiannong Cao
    Weigang Wu
    Year: 2006
    The notification based approach to implementing failure detectors in distributed systems
    INFOSCALE
    ACM
    DOI: 10.1145/1146847.1146861
Jin Yang 1,*, Jiannong Cao1,*, Weigang Wu1,*
  • 1: Internet and Mobile Computing Lab, Department of Computing, Hong Kong Polytechnic University, Hung Hom, Kowloon Hong Kong
*Contact email: csyangj@comp.polyu.edu.hk, csjcao@comp.polyu.edu.hk, cswgwu@comp.polyu.edu.hk

Abstract

Failure Detector (FD) is the fundamental component of fault tolerant computer systems. In recent years, many research works have been done on the study of QoS and implementation of FDs for distributed computing environments. Almost all of these works are based on the heartbeat approach (HBFD). In this paper, we propose a general model for implementing FDs which separates the processes to be monitored from the underlying running environment. We identify the potential problems of HBFD approach and propose an alternative approach to implementing FDs, called notification based FD (NTFD). Instead of letting the process periodically send heartbeat messages to show it is still alive, in NTFD, the underlying watchdog mechanism sends failure notification messages only when the failure of a monitored process is detected locally. Compared with HBFD implementation under our model, NTFD is more efficient and scalable, and can guarantee the strong accuracy property. Trade-off of achieving QoS of FD is analyzed and the results show that NTFD has much higher probability to achieve a better balance between completeness and accuracy, yet provides a much lower probability of false report and lower system cost. Based on the analysis, we propose the design of a hybrid FD which combines the advantages of HBFD and NTFD.