8th International Conference on Communications and Networking in China

Research Article

BER Guaranteed Optimization and Implementation of Parallel Turbo Decoding on GPU

  • @INPROCEEDINGS{10.1109/ChinaCom.2013.6694588,
        author={Xiang Chen and Ji Zhu and Ziyu wen and Yu Wang and Huazhong Yang},
        title={BER Guaranteed Optimization and Implementation of Parallel Turbo Decoding on GPU},
        proceedings={8th International Conference on Communications and Networking in China},
        publisher={IEEE},
        proceedings_a={CHINACOM},
        year={2013},
        month={11},
        keywords={software radio turbo code gpu},
        doi={10.1109/ChinaCom.2013.6694588}
    }
    
  • Xiang Chen
    Ji Zhu
    Ziyu wen
    Yu Wang
    Huazhong Yang
    Year: 2013
    BER Guaranteed Optimization and Implementation of Parallel Turbo Decoding on GPU
    CHINACOM
    IEEE
    DOI: 10.1109/ChinaCom.2013.6694588
Xiang Chen,*, Ji Zhu1, Ziyu wen2, Yu Wang2, Huazhong Yang2
  • 1: Xidian University
  • 2: Tsinghua University
*Contact email: chenxiang98@mails.tsinghua.edu.cn

Abstract

In this this paper, we present an optimized parallel implementation of a Bit Error Rate (BER) guaranteed turbo decoder on a General Purpose Graphic Process Unit (GPGPU). Actually, it is a critical task to implement complex communication signal processing over GPGPUs, since the parallelism over GPGPUs in general requires independent data streams for processing. So we explore both the inherent parallelisms and the extended sub-frame level parallelisms in turbo decoding and map them onto the recent GPU architecture. A guarding mechanism called Previous Iteration Value Initialization with Double Sided Training Window (PIVIDSTW) is used to minimize the loss of BER performance caused by sub-frame level parallelism, while the high throughput is still maintained. In addition, to explore the potential of parallelization in Turbo decoding on GPUs, the theoretical occupancy and scalability are analyzed with the consideration of the number of sub-frames per frame. Compared with previous work in [5] and [7], we achieve a better trade-off between BER performance and throughput concerns.