EAI Endorsed Transactions on Future Internet 18(13): e5

Research Article

Improving ZooKeeper Atomic Broadcast Performance When a Server Quorum Never Crashes

Download57 downloads
  • @ARTICLE{10.4108/eai.12-1-2018.154177,
        author={Ibrahim EL-Sanosi and Paul Ezhilchelvan},
        title={Improving ZooKeeper Atomic Broadcast Performance When a Server Quorum Never Crashes},
        journal={EAI Endorsed Transactions on Future Internet},
        volume={18},
        number={13},
        publisher={EAI},
        journal_a={UE},
        year={2018},
        month={1},
        keywords={Apache ZooKeeper; Atomic Broadcast; Crash-Tolerance; Server Replication; Protocol Latency; roughput; Performance Evaluation},
        doi={10.4108/eai.12-1-2018.154177}
    }
    
  • Ibrahim EL-Sanosi
    Paul Ezhilchelvan
    Year: 2018
    Improving ZooKeeper Atomic Broadcast Performance When a Server Quorum Never Crashes
    UE
    EAI
    DOI: 10.4108/eai.12-1-2018.154177
Ibrahim EL-Sanosi1,*, Paul Ezhilchelvan1
  • 1: School of Computer Science Newcastle University Newcastle Upon Tyne United Kingdom
*Contact email: i.s.el-sanosi@ncl.ac.uk

Abstract

Operating at the core of the highly-available ZooKeeper system is the ZooKeeper atomic broadcast (Zab) for imposing a total order on service requests that seek to modify the replicated system state. Zab is designed with the weakest assumptions possible under crash-recovery fault model; e.g., any number - even all - of servers can crash simultaneously and the system will continue or resume its service provisioning when a server quorum remains or resumes to be operative. Our aim is to explore ways of improving Zab performance without modifying its easy-to-implement structure. To this end, we ÿrst assume that server crashes are independent and a server quorum remains operative at all time. Under these restrictive, yet practical, assumptions, we propose three variations of Zab and do performance comparison. ⁄e ÿrst variation o‡ers excellent performance but can be only used for 3-server systems; the other two do not have this limitation. One of them reduces the leader overhead further by conditioning the sending of acknowledgements on the outcomes of coin tosses. Owing to its superb performance, it is re-designed to operate under the least-restricted Zab fault assumptions. Further performance comparisons conÿrm the potential of coin-tossing in o‡ering performances better than Zab, particularly at high workloads.