1st International ICST Conference on Communications and Networking in China

Research Article

Reliable Multicast Based on Erasure Resilient Codes over InfiniBand

  • @INPROCEEDINGS{10.1109/CHINACOM.2006.344802,
        author={Xigui Wang and Zifeng  Xiao and Jizhong Han and Chengde  Han},
        title={Reliable Multicast Based on Erasure Resilient Codes over InfiniBand},
        proceedings={1st International ICST Conference on Communications and Networking in China},
        publisher={IEEE},
        proceedings_a={CHINACOM},
        year={2007},
        month={4},
        keywords={Erasure Resilient Codes InfiniBand Multicast Reed-Solomon code},
        doi={10.1109/CHINACOM.2006.344802}
    }
    
  • Xigui Wang
    Zifeng Xiao
    Jizhong Han
    Chengde Han
    Year: 2007
    Reliable Multicast Based on Erasure Resilient Codes over InfiniBand
    CHINACOM
    IEEE
    DOI: 10.1109/CHINACOM.2006.344802
Xigui Wang1,2,*, Zifeng Xiao1,2,*, Jizhong Han1,*, Chengde Han1,*
  • 1: Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100080, PRC
  • 2: Graduate School, Chinese Academy of Sciences, Beijing 100080, PRC
*Contact email: wxg@ict.ac.cn, xzf@ict.ac.cn, hjz@ict.ac.cn, han@cit.ac.cn

Abstract

Many distributed applications and systems, e.g., an efficient implementation of distributed cache coherence protocol in distributed shared-memory systems, usually require efficient, reliable and scalable multicast capabilities from low-level interconnections. However, InfiniBand network, a high performance interconnection with low latency and high bandwidth, lacks the necessary reliable hardware multicast capability. To avoid low-efficiency multicast emulation with one-to-many point-to-point messages and ACKs, this paper proposes an efficient algorithm to provide reliable multicast based on erasure resilient codes over InfiniBand. This algorithm can not only avoid the feedback implosion problem by point-to-point multicast emulation messages, but also achieve lower latency and better scalability comparing with automatic-request retransmission (ARQ). Moreover, this algorithm can be optimized with message pipeline mechanism to achieve the same level of latency as the un-reliable InfiniBand hardware multicast. Performance analysis demonstrates that the failure probability to recover a message is less than 1.4times10 even for a system with 1000 message receivers