# An ASIC Fast Decoder of Rate Compatible Modulation and Its Application in Wireless Communication System

Wei Yu, Jun Wu<sup>(⊠)</sup>, Hao Cui, Zhifeng Zhang, and Haoqi Ren

College of Electronics and Information Engineering, Tongji University, Shanghai 201804, People's Republic of China {2014yuwei,wujun,zhangzf,renhaoqi}@tongji.edu.cn, hao.cui@live.com

**Abstract.** Rate Compatible Modulation (RCM) is a new rate adaptation scheme in wireless communication system, which can achieve very high spectrum efficiency both in additive white Gaussian noise (AWGN) channel and fading channel. But the high decoding complexity of RCM hinders its application in practical communication systems. This paper introduces an Application Specific Integrated Circuit (ASIC) based fast decoder of RCM to implement belief propagation (BP) algorithm in logarithm domain. Though BP algorithm has natural parallel characteristic, a partial-parallel full-pipelined architecture is designed to achieve a tradeoff between hardware resource and processing speed. In order to reduce the computing complexity and improve the throughput of the decoder, we adopt some reduced algorithms, such as piecewise function approximation, lookup tables, fixed-point computing and etc. We build a communication system to test our ASIC decoder in AWGN channel and IEEE 802.11a fading channel.

The original RCM is a rateless code modulation scheme, which can get high spectral efficiency, but is not feasible in some communication systems (such as deep space communications system) for too long transmission delay. In this paper we propose a non-rateless RCM scheme and implement it both in AWGN channel and fading channel. Through testing we confirm that the performance of our proposed scheme is very close to the original rateless scheme, but can greatly reduce transmission time.

**Keywords:** ASIC  $\cdot$  Decoder  $\cdot$  BP  $\cdot$  Communication  $\cdot$  Spectrum efficiency

## 1 Introduction

Rate adaptation is essential for wireless communication system to approach the changing capacity. Adaptive modulation and coding (AMC) and hybrid automatic repeat request (HARQ) are the most successful rate adaptation schemes currently. In AMC and HARQ, the transmitter selects the best combination of coding rate and modulation scheme to match the estimated channel condition according to feedback from the receiver. However these two schemes rely

on the accurate channel state feedback and they can only achieve staircase-like spectrum efficiency.

In order to achieve seamless rate adaptation, three rateless codes were proposed recently, which are: Spinal code [5], Strider code [1] and Rate compatible modulation (RCM) [2,3]. The transmitter of rateless codes at first sends a symbol block of certain size, if the receiver can't decode the source bits, the transmitter will retransmit small blocks of symbols continuously till the receiver decodes all source bits correctly.

Spinal code uses a hash function over the message bits to produce pseudorandom bits that can be mapped directly to a dense constellation for transmission [4]. Although the performance of Spinal code is very good, but the Maximum Likelihood (ML) decoding algorithm is very complicated and time consuming, especially when the source length is large. Strider code combines a batch of conventionally encoded symbols (QPSK symbols of encoding bits that have been passed through a 1/5 rate convolutional code) linearly, and uses a decoding algorithm similar to successive interference cancellation. But unfortunately the performance of Strider code is not much better than traditional modulation and coding scheme, its spectral efficiency curve is also not smooth.

RCM is a rateless coding and modulation scheme. Symbols of RCM are incrementally generated from source bits through weighted combination without rate limit [3], and the decoding algorithm of RCM is a variant of belief propagation (BP). A large number of studies [2,3,7-9] show that RCM can achieve very high spectral efficiency both in additive white Gaussian noise (AWGN) channel and fading channel, but the complexity of traditional RCM decoder is very high. In our previous works, the fast decoding algorithm of RCM was proposed and implemented in FPGA [8]. Afterwards the partial design of the Application Specific Integrated Circuit (ASIC) decoder was described in [6], but only simulation results were given, because this chip did not tape out at that time. This paper introduces the full design of ASIC decoder chip and presents a communication system with the decoder. To the best of our knowledge, this is the first ASIC decoder of rateless coded modulation scheme. With full test we confirm that the ASIC decoder works correctly, and the spectrum efficiency is much better than conventional modulation and coding scheme in both AWGN and IEEE 802.11a fading channel.

## 2 System Architecture

## 2.1 The RCM Encoder

The encoder of RCM uses a random sparse matrix (G) to multiply a source bits vector (b) to generate coded modulation symbols (u). Each row of G only has 8 non-zero elements in random positions. These non-zero elements are random arrangement of weight set  $W = \{-4, -4, -2, -1, 1, 2, 4, 4\}$ . The rows number of G could be constructed as many as we need to achieve rateless effect. The encoding process can be expressed as follows.

$$u_i = \sum_{j=1}^8 w_j \times b_{n_{ij}} \tag{1}$$

Where  $w_j$  is the non-zero element of G in the *i* row,  $n_{ij}$  is the index of the source bit weighed by  $w_j$  to generate symbol  $u_i$  [3].  $u_i$  is integer and ranges from -11 to 11. Every two adjacent symbols are mapped to I and Q plane to form a complex modulation signal. So the constellation of RCM is fixed to  $23 \times 23$ QAM.

#### 2.2 The RCM ASIC Decoder

The original decoding algorithm of RCM is a variation of belief propagation (BP), which includes three steps: (1) the horizontal iteration, which calculates probabilities from symbol nodes to source nodes; (2) the vertical iteration, which calculates probabilities from source nodes to symbol nodes; (3) after iterating enough times between step (1) and step (2), the hard decision of source nodes is outputted. Detailed information of the algorithm is shown in [3].

The original horizontal iteration in step (1) is a deconvolution operation (different from the horizontal iteration in BP decoding of LDPC), which includes many multiply-accumulate operations. It is not easy to be converted into logarithm field directly, so the computing complexity is very high. In order to reduce computing complexity, we proposed a fast decoding algorithm in [8], which uses lookup tables and piecewise function approximation to convert multiplyaccumulate operations in arithmetic field to addition operations in logarithm field. The fast decoding algorithm can save 90% multiplication resources without noticeable performance loss.

The Architecture of the ASIC Decoding Logic. Now we design a partialparallel and full pipelined architecture in this ASIC decoder to implement the fast decoding algorithm. The architecture of the decoding logic is shown in Fig. 1. The HUP (Horizontal Unit Processors) is used to calculate Log Likelihood Ratio (LLR) message sent from symbols node u to source bit node b. The VUP (Vertical Unit Processor) is used to calculate LLR message sent from source bit node bto symbol node u [6]. The RAG (Random Address Generators) stores columns of non-zero elements in sub-matrix of matrix G. There're 8 HUPs and 8 VUPs working parallel. Each HUP and VUP processes data of a sub-matrix (which is part of the random sparse generating matrix G) in full pipelining mode.

The Top Control Module of the ASIC Decoder. The top control module mainly includes three functions: (1) inputting RCM symbols from outside and writing them into data input memory (2) controlling iteration operation of the decoding logic; (3) reading decoding information from data output memory and outputting them to outside. These three functions are achieved by three state machines, we call them SM1, SM2, SM3 respectively.



Fig. 1. Architecture of RCM ASIC decoder.

SM1 is shown as Fig. 2. When the decoder begins to work, the state machine is at the *Idle* state. The decoder sends ACK signal to the outside at the *Idle* state, which means the decoder can receive RCM symbols from outside. When the decoder receives cmd1 signal, SM1 jumps to state S1. The decoder continues to receive RCM symbols from outside at state S1. When one frame of RCM symbols is received completely, the SM1 jumps to state S2. The decoding logic begins to carry out decoding operation at state S2, the decoder also can receive RCM symbols from outside at this state. When the decoder receives  $dec\_state$ signal, the SM1 jumps to state S3. The decoding logic continues to carry out decoding operation at state S3, but the decoder doesn't receive RCM symbols from outside any more. If the next cmd1 signal is received at state S3, the SM1will jump back to state S1, then RCM symbols can be wrote into another data input memory (the data input memory is operated in ping pang mode).

SM2 is shown as Fig. 3. When the decoding process is not started, SM2 is at the *Idle* state. When decoding logic receives *cup\_enable* signal from outside, SM2 jumps to state *hup\_state*, the HUP module reads data from data memory



Fig. 2. State machine of SM1.



Fig. 3. State machine of SM2.

and begins to work at this state. The HUP module sets  $hup\_end$  signal to be 1 when it ends work, and SM2 jumps to state  $vup\_state$ . The VUP module begins to work at state  $vup\_state$ . The VUP module sets  $vup\_end$  signal to be 1 when it ends work, and SM2 jumps back to state  $hup\_state$ . When the decoding logic complete iteration operation 16 times (an iteration operation includes process of state  $hup\_state$  and state  $vup\_state$ ), SM2 jumps to state judge. The decoding information will be wrote to the data output memory at state judge.

SM3 is very similar to SM1, so we just describes it briefly here. When the decoder finishes decoding operation, decoding information is wrote into data output memory. If the decoder receives output data request signal (cmd2 signal) from outside, SM3 reads data from data output memory and output them to the outside.

The ASIC Chip After Tape-Out. The ASIC chip uses Multi-Project Wafer (MPW) CMOS 65 nm Standard Performance technology, with 9 metal layers, occupying area of  $5 \times 5 \text{ mm}^2$ . When running at 300 MHz, this chip can achieve 84 Mbps throughput, and consumes power 4.21 W. Figure 4 shows the picture of this ASIC chip (we also call RCM as Random Projections Codes (RPC)). Table 1 describes main pins of RCM decoder chip and their functions.

## 2.3 The Non-rateless Scheme

The original RCM is a rateless code modulation scheme, which can get high spectral efficiency, but is not feasible in some communication systems for too

| Name        | Type   | Description                                                                                                                                            |
|-------------|--------|--------------------------------------------------------------------------------------------------------------------------------------------------------|
| data1[63:0] | Input  | Data input pins, the value of high 32 bits is the channel coefficient, and the value of low 32 bits is the input RCM symbol                            |
| en1[7:0]    | Input  | Control signal of input data, en1[7:0] should be set to<br>'00000001' in the 0I24 clock cycles, and be set to<br>'00000010' in the 25I49 clock cycles  |
| f1_end      | Input  | Control signal of input data, this pin should be set to 1<br>when one block of symbols is transmitted to decoder<br>completely                         |
| cmd1        | Input  | Request signal of transmitting data to decoder                                                                                                         |
| ack1        | Output | If the decoder accepts data input request, then set ack1 to 1, otherwise set ack1 to 0                                                                 |
| data2[31:0] | Output | Data Output pins, which are soft decoding value of source<br>bits, each source bit occupies 16 pins                                                    |
| en2[7:0]    | Input  | Control signal of output data, en2[7:0] should be set to<br>'00000001' in the 0I24 clock cycles, and be set to<br>'00000010' in the 25I49 clock cycles |
| f2_end      | Input  | Control signal of output data, this pin should be set to 1<br>when the outside received one frame of soft decoding bits                                |
| cmd2        | Input  | Request signal of receiving data from decoder                                                                                                          |
| ack2        | Output | If the decoder accepts data output request, then set ack2<br>to 1, otherwise set ack2 to 0                                                             |
| wb_en       | Input  | When wb_en=1, the outside can configure registers inside<br>the RCM decoder. These configuration values are<br>transmitted from pins of data1          |
| PDRST       | Input  | Reset signal of PLL                                                                                                                                    |
| CLOCK_IO    | Input  | I/O clock                                                                                                                                              |
| CLOCK_PLL   | Input  | PLL clock                                                                                                                                              |
| RESET       | Input  | Reset signal of RCM decoder                                                                                                                            |

| Table 1. Pins of ASIC decoder c | hip. |
|---------------------------------|------|
|---------------------------------|------|

long transmission delay such as deep space communications. In order to reduce transmission time, we propose a non-rateless RCM scheme. The transmitter sends at most 3 blocks of RCM symbols (one base block and two retransmission blocks) in the proposed non-rateless scheme.

In order to get the optimal rate adaptation of nan-rateless scheme both in AWGN channel and fading channel, we carry out massive simulations to find the relationship between the number of symbols of base/retransmission block and SNR condition (the number of source bits is fixed to be 400 in one frame in these simulations). Through simulations, we build two lookup tables shown as Tables 2 and 3, these two tables give number of RCM symbols of base block

| SNR (dB)             | 5    | 6    | 7   | 8   | 9   | 10  | 11  |
|----------------------|------|------|-----|-----|-----|-----|-----|
| $N_{base}$           | 1200 | 1100 | 800 | 650 | 500 | 400 | 340 |
| $N_{retrans}$        | 200  | 200  | 100 | 100 | 100 | 80  | 80  |
| SNR (dB)             | 12   | 13   | 14  | 15  | 16  | 17  | 18  |
| N <sub>base</sub>    | 300  | 270  | 240 | 220 | 210 | 190 | 180 |
| N <sub>retrans</sub> | 80   | 70   | 70  | 70  | 70  | 60  | 60  |
| SNR (dB)             | 19   | 20   | 21  | 22  | 23  | 24  | 25  |
| N <sub>base</sub>    | 170  | 160  | 150 | 140 | 120 | 110 | 100 |
| N <sub>retrans</sub> | 60   | 60   | 40  | 40  | 20  | 20  | 20  |

**Table 2.** Number of symbols vs. SNRin AWGN channel.

**Table 3.** Number of symbols vs. SNR in fading channel.

| SNR (dB)      | 5    | 6    | 7    | 8    | 9    | 10  | 11  |
|---------------|------|------|------|------|------|-----|-----|
| $N_{base}$    | 1500 | 1500 | 1400 | 1300 | 1000 | 780 | 720 |
| $N_{retrans}$ | 100  | 100  | 100  | 100  | 100  | 80  | 80  |
| SNR (dB)      | 12   | 13   | 14   | 15   | 16   | 17  | 18  |
| $N_{base}$    | 600  | 560  | 480  | 440  | 360  | 330 | 280 |
| $N_{retrans}$ | 80   | 70   | 70   | 70   | 70   | 60  | 60  |
| SNR (dB)      | 19   | 20   | 21   | 22   | 23   | 24  | 25  |
| $N_{base}$    | 260  | 230  | 220  | 200  | 190  | 170 | 160 |
| $N_{retrans}$ | 60   | 60   | 40   | 40   | 20   | 20  | 20  |

and retransmission block (denote as Nbase and Nretrans respectively in Tables 2 and 3) at different SNR in AWGN channel and fading channel.

#### 2.4 The Communication System

The communication system with RCM ASIC decoder is shown as Fig. 5, which includes three parts: the personal computer (PC), the FPGA board (XILINX ML605), the PCB board with the ASIC decoder. The FPGA board uses PCIe cable to connect with the PC, and uses FMC interface to connect with the decoder board. We implement the encoder and the channel emulator in a C program runs in the PC. The PC transmits RCM symbols after AWGN/fading channel to the ASIC decoder through the FPGA board. The decoder finishes decoding and sends back decoding bits to FPGA at first, then to the PC side.

The schematic diagram of RCM PCB board is shown as Fig. 6. The PCB board not only supplies power, clock, reset and PLL configuration to the RCM decoder, but also connects all pins of RCM decoder (except power pins) with FMC interface of ML605. Considering parallel transmission of data signal, we



Fig. 4. ASIC decoder chip.



Fig. 5. Communication system.



Fig. 6. The schematic diagram of RCM PCB board.

set all signal lines and control lines from the RCM decoder to FMC interface to be same length in the PCB board. The PLL CLOCK supplies outside clock to the PLL inside the RCM decoder, and the PLL configures all clocks of the whole chip. I/O CLOCK supplies outside clock to the pins of RCM decoder.

## 3 Results Evaluation

In order to test the average spectrum efficiency of the ASIC decoder, we generate 5000 source frames randomly, each frame include 400 bits. We test the ASIC decoder both in AWGN channel and fading channel (IEEE 802.11a fading channel mode A, which is a typical office environment for nonline-of-sight condition with 50 ns root mean square delay spread).

In the rateless scheme, the transmitter continues to send RCM symbols until the receiver can completely decode all source bits. So the average spectrum efficiency can be calculated as:

$$(N_{source} \times 5000)/N_{symbol\_total}$$
 (2)

Where  $N_{source}$  is the number of source bits of one frame which is 400, and  $N_{symbol\_total}$  is the number of accumulation symbols when all bits of these 5000 frames can be decoded correctly.

In the non-rateless scheme, the transmitter only send at most 3 blocks of RCM symbols (1 base block and 2 retransmission blocks) according to Table 1. If the receiver can't decode all source bits correctly in the case of three transmission



Fig. 7. Rate comparing in AWGN channel. Fig. 8. Rate comparing in fading channel.

blocks, a frame error will be recorded. The average spectrum efficiency can be calculated as:

$$(1 - FER) \times (N_{source} \times 5000) / N_{symbol\_total} \tag{3}$$

Where FER is the Frame Error Ratio,  $N_{symbol\_total}$  and  $N_{source}$  are the same as the rateless scheme.

We compared the spectrum efficiency (rate) of the ASIC decoder with 802.11a standard AMC, which are shown in Figs. 7 and 8. In AMC, the modulations are BPSK, QPSK, 16QAM and 64QAM and the channel code is convolutional code. Each modulation scheme has two coding rate which are 1/2 (2/3 for 64QAM) and 3/4. From these two figures we can see that the ASIC decoder works correctly; the performance of non-rateless scheme can approach the rateless scheme, and both these two schemes are not only much higher than AMC, but also varying gracefully along with channel Signal Noise Ratio (SNR) in very wide dynamic range. In the rateless scheme, the transmitter need to send small symbol blocks many times, so the whole transmission delay is very large, but the non-rateless scheme only has two retransmission blocks, so it can greatly reduce the transmission time.

## 4 Conclusion

In this paper we introduces an ASIC fast decoder of RCM, to the best of our knowledge which is the first ASIC decoder of rateless code. We also build a communication system to test the performance of this ASIC decoder. Through massive testing we confirm that the ASIC decoder works well, and its performance is much better than AMC both in AWGN channel and fading channel.

Acknowledgments. This work was supported in part by the National Natural Science Foundation of China under Grant 61571329 and Grant 61390513, in part by Huawei Innovation Research Plan (HIRP) Funding under Grant YB2015110117.

# References

- Aditya, G., Sachin, K.: Strider: automatic rate adaptation and collision handling. Proceedings of the ACM SIGCOMM 2011 Conference, pp. 158–169. ACM (2011)
- Cui, H., Luo, C., Tan, K., Wu, F., Chen, C.W.: Seamless rate adaptation for wireless networking. In: MSWiM 2011, pp. 437–446 (2011)
- Cui, H., Luo, C., Wu, J., Chen, C.W., Wu, F.: Compressive coded modulation for seamless rate adaptation. IEEE Trans. Wirel. Commun. 12(10), 4892–4904 (2013)
- Perry, J., Balakrishnan, H., Shah, D.: Rateless spinal codes. In: ACM Workshop on Hot Topics in Networks 2011, pp. 1–6 (2011)
- Perry, J., Iannucci, P.A., Fleming, K.E., Balakrishnan, H., Shah, D.: Spinal codes. In: ACM SIGCOMM 2012 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, pp. 49–60 (2012)
- Qiu, L., Wang, M., Wu, J., Zhang, Z., Huang, X.: Design and implementation of seamless rate adaptive decoder. In: 2014 IEEE Military Communications Conference, pp. 356–361 (2014)
- Shirvanimoghaddam, M., Li, Y., Vucetic, B.: Near-capacity adaptive analog fountain codes for wireless channels. IEEE Commun. Lett. 17(12), 2241–2244 (2013)
- Wang, M., Wu, J., Shi, S.F., Luo, C., Wu, F.: Fast decoding and hardware design for binary-input compressive sensing. IEEE J. Emerg. Sel. Top. Circuits Syst. 2(3), 591–603 (2012)
- Wang, M., Wu, J., Yu, W., Wang, H.: Efficient coding modulation and seamless rate adaptation for visible light communications. IEEE Wirel. Commun. 22(2), 86–93 (2015)