Collaborative Computing: Networking, Applications and Worksharing. 14th EAI International Conference, CollaborateCom 2018, Shanghai, China, December 1-3, 2018, Proceedings

Research Article

An Optimized Multi-Paxos Protocol with Centralized Failover Mechanism for Cloud Storage Applications

Download
97 downloads
  • @INPROCEEDINGS{10.1007/978-3-030-12981-1_43,
        author={Wenmin Lin and Hao Jiang and Nailiang Zhao and Jilin Zhang},
        title={An Optimized Multi-Paxos Protocol with Centralized Failover Mechanism for Cloud Storage Applications},
        proceedings={Collaborative Computing: Networking, Applications and Worksharing. 14th EAI International Conference, CollaborateCom 2018, Shanghai, China, December 1-3, 2018, Proceedings},
        proceedings_a={COLLABORATECOM},
        year={2019},
        month={2},
        keywords={Centralized failover mechanism Multi-Paxos Replica group Leader election Leader repair},
        doi={10.1007/978-3-030-12981-1_43}
    }
    
  • Wenmin Lin
    Hao Jiang
    Nailiang Zhao
    Jilin Zhang
    Year: 2019
    An Optimized Multi-Paxos Protocol with Centralized Failover Mechanism for Cloud Storage Applications
    COLLABORATECOM
    Springer
    DOI: 10.1007/978-3-030-12981-1_43
Wenmin Lin1,*, Hao Jiang1,*, Nailiang Zhao1,*, Jilin Zhang1,*
  • 1: Hangzhou Dianzi University
*Contact email: linwenmin@hdu.edu.cn, Jianghaokobe@163.com, znl@hdu.edu.cn, jilin.zhang@hdu.edu.cn

Abstract

For typical Multi-Paxos protocol running on a cloud storage application, the failover mechanism is complex in terms of implementation. When the leader fails within a replica group, a new leader should be elected by broadcasting prepare requests over the replica group. Moreover, repairing new leader’s missing log entries requires broadcasting prepare request as well. This introduces too much network cost and increase the latency to restore normal storage service at the same time. In view of this challenge, an optimization for Multi-Paxos protocol with centralized failover mechanism for cloud storage applications is proposed in this paper. Compared with typical Multi-Paxos protocol, failover mechanism and normal client requests handling logic are split, and been handled by two clusters respectively: A coordinator cluster is dedicated to handle failover issues as a central manager; while a data cluster only takes charge of data replication and storage regarding client commands. With the centralized failover mechanism in the new design, the centralized coordinator cluster maintains real-time status information of each replica group. And a replica with largest apply index value is elected as the new leader by coordinator cluster; while repairing missing log entries can be achieved with limited replica’s bitmap information maintained by coordinator cluster as well. Comparison between two protocols is implemented and analyzed to prove the feasibility of our proposal.