# Design of High Speed and Area efficient modified Kogge Stone Multiplier Using ZFL 

S Baba Fariddin ${ }^{1}$, Dr. Rahul Mishra ${ }^{2}$<br>\{sbabafariddin@gmail.com ${ }^{1}$, rahulmishra@aku.ac.in² ${ }^{2}$<br>Research Scholar, Dept of ECE, Dr. A.P.J. Abdul Kalam University, Arandia, Indore, M.P, INDIA ${ }^{1}$<br>Professor \& H.O.D, Dept of ECE, Dr. A.P.J. Abdul Kalam University, Arandia, Indore, M.P, INDIA ${ }^{2}$


#### Abstract

In the applications of digital signal processing, multipliers plays important role. Basically, Finite field multiplier is the easiest of all operations in the finite field and most frequently used operation in arithmetic's. Hence in this paper the design of high speed and area efficient modified kogge stone multiplier is implemented. Multiplier and multiplicand are taken as input in this system. From both multiplier and multiplicand 1's and 0 's are identified. Factoring technique is utilized to minimize switching energy and increase the speed of operation. Zeros finding logic will identify the zeros from obtained product. This product will be added using parallel prefix adder to minimize the area. Compared to ripple carry multiplier, kogge stone multiplier, the proposed kogge stone multiplier gives effective results.


Keywords: Kogge Stone Multiplier, Zero’s Finding Logic, Factoring technique, Parallel Prefix adder.

## 1 Introduction

With the most recent upgrades in the zone of compact advanced applications and remote correspondence, power utilization investigation and techniques of reduction has become as noteworthy as speed, cost, and dependability in the circuit configuration level [1]. Energy utilization factors, which decide the measure of scattered devices, impact basic structure issues, for example, bundling and cooling necessities, power supply lines and limit, and the quantity of circuits that can be incorporated in a chip. The energy utilization in a computerized CMOS (Complex Metal Oxide Semiconductor) circuit comprises of dynamic power consumption, static power utilization, and short circuits power utilization. The prevailing power utilization is typically from the dynamic power, which is utilized in charging hub capacitances.

As this realizes that in the regions of system on chip and VLSI structures, the low power circuit plans is a significant issue. As the elements of transistors are contracted into the profound sub-micron area, the impact of static spillage flows turns out to be increasingly noteworthy. As the components of transistors are contracted into the profound sub-micron area, the impact of static spillage currents turns out to be progressively critical.

This part of intensity utilization can be controlled by novel plan, yet is transcendently taken care by process of technique. Two zones that has the focal point of dynamic research is
dynamic logic. Hence, that the power diminishing properties of these systems could be joined, at that point where it should be Possible to deliver a logic structure procedure that is dynamic.

Scaling of transistor geometries have led to integration of millions of devices in a very small space, thus driving realization of complex applications on hardware and supporting high speed applications. This energy has revolutionized not only electronics, but also industry at large. In order to reduce power, many researchers, designers and engineers have come up with many innovative techniques and have given their ideas [2].

However, designers will need to budget and plan for power dissipation as a factor nearly as important as performance and perhaps more important than area. Low power techniques have been successfully adopted and implemented in designing complex VLSI circuits [3]. As the demand for faster, low cost and reliable products that operate on remote power source performing high end applications keep increasing, there is always a need for new low power design techniques for VLSI.

In digital signal processing applications, multiplier plays very important role in this present day generation. Not only in DSP (Digital Signal Processor) but also multiplier is used in various applications. With advancement in innovation, numerous scientists have attempted and are attempting to structure multipliers which offer both of the accompanying plan targets - rapid, low power utilization, less delay and subsequently occupancy of less area. Hence in this way making them reasonable for different fast, low power and smaller VLSI execution. The basic strategy of multiplication is "shift and add" calculation. In equal multipliers number of arithmetic operations to be included which is the primary parameter that decides the presentation of the multiplier.

Multiplication is a fundamental math activity for normal Digital Signal Processing (DSP) applications, for example, Fourier transform and Fast Fourier transform (FFT) [4]. To accomplish high execution speed, equal cluster multipliers are broadly utilized. Hence, these multipliers use more power. Power utilization has become a basic issue in the present VLSI framework plan. Subsequently the architects are expected to think power proficient multipliers for the plan of low-power DSP frameworks.

CMOS computerized gadget planners have a difficult prerequisite. They need to upgrade low engendering deferral and complex usefulness alongside low power circuits. The part of the arrangement is appropriate decision of working territories. The world is confronting amazing development of interest for vitality. One of the promising arrangements might be utilization of adiabatic logic guideline. The topic of adiabatic logic is utilized to increase the speed of operation. Voltage and current sources are utilized to power dissipation in parasitic opposition. In the gate capacitors the data is stored and power supply will be collected back.

This needs a network of oscillation for power supply. There are two essential issues or plan needs that must be tended to in any CMOS adiabatic circuit. The execution must bring about a vitality proficient plan of the consolidated power supply and clock. Addition, subtraction, multiplication and division are the essential operations of arithmetic system that are acted in any framework. The exhibition of numerous computational issues is regularly commanded by the speed at which an increase activity can be executed.

A multiplier is one of the key equipment which is most advanced signal processing frameworks. The significant activities in advanced sign preparing are shifting, convolution, and inward items. Generally, in DSP applications a multiplier assumes a significant role which is incorporated with advanced communication applications[5]. Among this all large number of gates are utilized to execute various activities in a regular circuit is irreversible. That is, each time a sensible activity is executed some data about the information which is evacuated or lost.

## 2 Review of Multipliers

The below figure (1) shows the structure of array multiplier. Array multiplier is circuit which uses array of AND gates and full adders to perform multiplication of binary operands is called as array multiplier. It is one of the widely used fundamental algorithms for multiplication. The array of AND gates present in the multiplier performs AND operation of multiplicand with each bit of the multiplier. These partial products produced by AND gates are shifted to left according to the position of multiplier bit.


Fig. 1. Array Multiplier
The shifted partial products are summed up with a N-1 adders in parallel. However, addition performed in parallel there is large delay is introduced by ripple carry's. This is due to carry propagation in sequence of adders. The ripple carries are replaced with Carry Save Adder (CSA) to reduce the delay in array multiplication process. The CSAs compress the three number addition to one number addition so, that three operands are added at a time.

By using partial products in multiplication signals are generated based on shift and add techniques. The multiplier bit decides whether to shift the partial product or add the multiplicand to the product. Here this explains how conventional addition is performed. In order to fasten the multiplication procedure custom multiplication process divided into two sections. The first section focuses on producing partial products and the second section focuses on accumulation and addition of the partial products. Before adding of PPs (Partial Products) they need to be aligned in their corresponding positions by shifting.


Fig. 2. Booth Multiplier
Booth technique is the most dominating method of multiplication for signed numbers. In this technique multiplication of both positive and negative operands are similarly performed. This technique is based on add-shift algorithm. In conventional multiplication procedures the number of partial products depends upon the operand size of the multiplier. If the size of the multiplier increases number of partial products also increases resulting in large delay when they get added to produce the end result. Since the delay of the multiplier is dominated by addition operation which is focused to reduce the number of additions occur in a multiplication task.

The multiplication of signed numbers must be observed through the operation to produce the appropriate sign of the result. Whereas in case of unsigned numbers there is no need to think about sign of the operands. If the multiplication of signed 2 's complement numbers is performed in the way of positive number multiplication would results in incorrect output. Therefore, booth's algorithm introduced a technique to perform multiplication of signed numbers with the sign protection. In this technique the LSB (Least Significant Bit) of the multiplier are bi and bi-1 are tested at every clock cycle. If the two bits are zeros results shifting one bit position of the multiplier to the right. In case of sequence of l's complemented, arithmetic operations like addition and subtraction are need to be performed at the edges of block of one's while changing from 1 to 0 and 0 to 1 .

Booth's algorithm performs addition if last bits of multiplier encounters ' 01 ' and subtraction of multiplicand is needed if the LSB bits encounter ' 10 '. This method efficiently works for signed numbers also. When multiplier consists of long blocks 1 's this algorithm works well and effectively reduces number of additions performed. This algorithm clearly shown in above figure (2). In above figure (2) b, a, m represents multiplier, multiplicand and product respectively.

Almost every parallel multiplier performs multiplication function in three stages. Produces partial products using array of AND gates and applies different techniques to reduce the number of partial products. Then addition is performed to obtain end product. Baughwooley proposed a technique which is to be applied on the partial products to compress them. In order to reduce the number of partial products it arranges partial products in a triangular shape is called High Performance Multiplier (HPM) reduction tree. This arrangement results in a less wire length for connection of adders and propagates carry with less delay.

Braun multiplier is one type of parallel multiplier follows the fundamental procedure of multiplication, which is used manually. Its functioning is used in unsigned numbers only. It is also called as carry save multiplier. The structure of braun multiplier is shown in below figure. In this structure array of AND gates are used to produce partial products and full adders required for addition of partial products are reduced by column. This reduction in number of adders is achieved by using logic of carry save addition. It compresses the three number addition to two number addition. In Braun multiplier no shifting of partial products is performed.


Fig. 3. Braun Multiplier
The high speed characteristics of column compression techniques have got the attractiveness of designers to implement multipliers with this technique. Dadda Multiplier was devised by a engineer Luigi Dadda. Dadda multiplier is based on the technique of column compression. This made some changes to the Wallace tree technique and implemented Dadda multiplier.

## 3 Modified Kogge Stone Multiplier Using ZFL

The below figure (4) shows the block diagram of modified kogge stone multiplier using ZFL. Multiplier and multiplicand are taken as input in this system. From both multiplier and multiplicand 1's and 0 's are identified. Factoring technique is utilized to minimize switching energy and increase the speed of operation. Zeros finding logic will identify the zeros from obtained product. This product will be added using parallel prefix adder to minimize the area.


Fig. 4. Block Diagram Of Modified Kogge Stone Multiplier Using Zfl
Multiplication is one of the most critical arithmetic operations of the digital signal processor. Not only difficult but also consumes huge part of processors area and power. Therefore kogge stone multiplier is used to decrease the power and area consumption in highly essential processing units. Usually the result of multiplication of two operands of size n -bits requires 2 n -bits for their result. In binary number representation MSB (Most Significant Bit) bits contains product value the lower order contains small values. Some multipliers are developed with fixed size calculates only bits of the multiplication function.

Multiplier circuits are model after the "shift and add" algorithm. In this algorithm, one partial product is created for each bit in the multiplier. Each input, partial product digit and result have been given a logical name and these same names are used as signal names in the circuit schematics. By comparing with various circuits, schematics, the behavior of the multiply circuit can be confirmed.

Partial products are the part of the multiplicand if the corresponding multiplier bit is ' 1 ', and all 0 's when corresponding multiplier bit is ' 0 '. Each successive partial product is shifted one bit position to the left. This generates all multiples of multiplicand with multiplier bits. The weights of the partial products vary according to the corresponding bit positions of the multiplier.

To the best of our knowledge, a factoring method has not been reported in the literature hence in the design of a finite field multiplier at an architectural level is implemented. A logic gate substitution technique is also used in our design to reduce the internal power consumption of the proposed digit-serial multiplier. The synthesis results show that new design has both the lowest logic and route delay consumption and the lowest total delay consumption among several similar existing works

In this parallel prefix adder operation is performed. The entire addition process is performed in three stages. The carry generation stage is controlled by the intermediate signals. At last the post processing stage gives output as sum and carry.

The below figure (5) shows the RTL (Register Transfer Level) schematic of modified kogge stone multiplier using ZFL. In this ' $a$ ' and ' $b$ ' are the 32 bit inputs and ' $c$ ' is the 64 bit output. Clock is used as input in this modified kogge stone multiplier using ZFL.


Fig. 5. RTL Schematic Of Modified Kogge Stone Multiplier Using ZFL
The below figure (6) shows the technology schematic of modified kogge stone multiplier using ZFL. Technology schematic is the combination of look up tables, truth tables, K-Map and equations.


Fig. 6. Technology Schematic Of Modified Kogge Stone Multiplier Using ZFL
The below figure (7) shows the output waveform of modified kogge stone multiplier using ZFL.

|  |  |  |  |  |  |  |  | 2,000,000 ${ }^{\text {cs }}$ |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Name | Valve | 1,999,99408 | 1.999,995 ${ }^{\text {ps }}$ | 1.999,990 ${ }^{\text {ps }}$ | 11,999,997 ${ }^{\text {ps }}$ | 11.999 .9898 | 1,999,9997 | 2,000,009 ${ }^{\text {c }}$ |
| d dk | 0 |  |  |  |  |  |  |  |
|  | 0000000000000 |  |  | 0000000000000000 | 20000000001101 |  |  |  |
| - b310] | 0000000000000 |  | 000 | 0000 0 cocooou00000 | cooo0000000110 |  |  |  |
| - $\quad$ Hiticis] | 0000000000000 |  | 0150000000000000000 | 00000000000000000 | 2000000000000000 | 5000001001110 |  |  |
| 1 cout |  |  |  |  |  |  |  |  |
|  | 0000000000000 |  | 0.10000000000000000 | 0000080000000000 | 2000000000000000 | 500000000110 |  |  |
| - 7 [18 2163.0$]$ | 0000000000000 |  | 0.10000000008000000 | [000000000000000 | 20000000000000003 | 508000011110 |  |  |
|  | 0000000000000 |  | 0.10000000000000000 | 00000000000000000 | 2000000000000000 | 900001000110 |  |  |
|  | 0000000000000 |  | 0.100000000005000000 | 0000000000000000 | 20000000000000003 | 500000000110 |  |  |
|  | 0000000000000 |  | 0.10000000000500000 | [0000800000000000 | 20000000000000003 | 508001001110 |  |  |
| - 7 [4] $56[38.0]$ | 0000000000000 |  | 050000000000000000 | 0000000000000000 | 2000000000000000 | 000000000110 |  |  |
|  | 0000000000000 |  | 0.100000000000000000 | 00000800000000000 | 20000000000000000 | 200001001110 |  |  |
| - 7 [4888[30.0] | 0000000000000 |  | 0.10000000000000000 | 0000000000000000 | 2000000000000000 | 5000000001110 |  |  |
|  | 0000000000000 |  | 0.50000000000000000 | 0000000000000000 | 2000000000000000 | 500001001110 |  |  |
| - $\begin{aligned} \text { P } \\ \text { si0630] }\end{aligned}$ | 0000000000000 |  | 0.5000000000000000 | ¢000000000000000 | 2000000000000000 | 300001001110 |  |  |
|  | 0000000000000 |  | 0.0000000000000000 | coovecocooosouxo | 20000000000000000 | 500000001110 |  |  |
| - $\mathrm{m}_{4} 1226300$ | 0000000000000 |  | 0.10000000000000000 | 0000000000000000 | 2000000000000000 | 500001001110 |  |  |
|  |  | X1: 2,000,000 ps |  |  |  |  |  |  |

Fig. 7. Output Waveform Of Modified Kogge Stone Multiplier Using ZFL

## 4 Result analysis

The figure (8) shows the total delay comparison of modified kogge stone multiplier using ZFL and CLA multiplier. So, from this it can observe that total delay for modified kogge stone multiplier using ZFL is reduced very effectively.


Fig. 8. Total Delay Comparison
The figure (9) shows the logic delay comparison of modified kogge stone multiplier using ZFL and CLA multiplier. So, from this it can observe that logic delay for modified kogge stone multiplier using ZFL is reduced up to 4.128 ns and CLA multiplier up to 71.676 ns . Hence, logic delay for modified kogge stone multiplier using ZFL is reduced very effectively.


Fig. 9. Logic Delay Comparison
The figure (10) shows the route delay comparison of modified kogge stone multiplier using ZFL and CLA multiplier. So, from this it can observe that route delay for modified kogge stone multiplier using ZFL is reduced up to 43.141 ns and CLA multiplier up to 54.468 ns . Hence, route delay for modified kogge stone multiplier using ZFL is reduced very effectively.


Fig. 10. Route Delay Comparison

## 5 Conclusion

Therefore in this paper the design of high speed and area efficient modified kogge stone multiplier is implemented. In this system, Multiplier and multiplicand are taken as input. 1's and 0 's are identified from both multiplier and multiplicand. Factoring technique is utilized. To minimize switching energy and increase the speed of operation, zeros finding logic will identify the zeros from obtained product. Using parallel prefix adder this product will be added to minimize the area. Compared to CLA multiplier, the proposed kogge stone multiplier reduces the delay in effective way.

## References

[1] Yamini devi Ykuntam, Katta Pavani, Krishna Saladi, "Design and analysis of High speed wallace tree multiplier using parallel prefix adders for VLSI circuit designs", from IEEE Xplore -49239, July 2020.
[2] Gunjan Jain, Meenal Jain, Gaurav Gupta, "Design of Radix-4,16,32 Approx Booth Multiplier Using Error Tolerant Application", 978-1-5090-3012-5/17/\$31.00 ©2017 IEEE.
[3] "Design of area efficient and low power multipliers using multiplexer based full adder" S. Murugeswari, S. Kaja Mohideen, Second International Conference on Current Trends In Engineering and Technology - ICCTET 2014.
[4] H. Wu, "Bit-parallel finite field multiplier and squarer using polynomial basis," IEEE Trans. Comput., vol. 51, no. 7, pp. 750-758, Jul. 2012.
[5] H. Hinkelmann, P. Zipf, J. Li, G. Liu, and M. Glesner, "On the design of reconfigurable multipliers for integer and Galois field multiplication," Microprocessors Microsyst., vol. 33, no. 1, pp. 2-12, Feb. 2009.
[6] "Power-delay-area efficient design of vedic multiplier using adaptable manchester carry chain adder", Raghava Katreepalli, Themistoklis Haniotakis, 2007 International Conference on Communication and Signal Processing (ICCSP).
[7] P. K. Meher, "High-throughput hardware-efficient digit-serial architecture for field multiplication over GF( 2 m )," in Proc. 6th Int. Conf. Inf., Commun. Signal Process. (ICICS), Dec. 2007, pp. 1-5.
[8] "Design of area and delay efficient Vedic multiplier using Carry Select Adder", G. R. Gokhale, S. R. Gokhale, 2005 International Conference on Information Processing (ICIP).
[9] "Comparative study of performance vedic multiplier on the basis of adders used", Josmin Thomas , R. Pushpangadan, S Jinesh, 2005 IEEE International WIE Conference on Electrical and Computer Engineering (WIECON-ECE).
[10] "Design of high speed multiplier using modified booth algorithm with hybrid carry lookahead adder" R Balakumaran , E Prabhu, 2004 International Conference on Circuit, Power and Computing Technologies (ICCPCT).
[11] "A vertical-MOSFET-based digital core circuit for high-speed low-power vector matching", Yitao Ma, Tetsuo Endoh, Tadashi Shibata, 2001 International SoC Design Conference.
[12] "Design of ultra low power multipliers using hybrid adders", Thottempudi Pardhu, N.Alekhya Reddy, 2001 International Conference on Communications and Signal Processing (ICCSP).

