

A Peer Revieved Open Access International Journal

www.ijiemr.org

#### **COPY RIGHT**





2022 IJIEMR. Personal use of this material is permitted. Permission from IJIEMR must

be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. No Reprint should be done to this paper, all copy right is authenticated to Paper Authors

IJIEMR Transactions, online available on 26th May 2022. Link

:http://www.ijiemr.org/downloads.php?vol=Volume-11&issue=ISSUE-05

DOI: 10.48047/IJIEMR/V11/I05/41

Title DELAY OPTIMIZZATION OF RADIX-4 FFT ALGORRITHM BY USING REVERSIBLE GATES

Volume 11, Issue 05, Pages: 245-252

**Paper Authors** 

D.V.S.K.Kireeti , A.V.Sumanth ,P.Priyatham ,A.Sai Krishna ,K.Sreekanth ,Mrs.C.VIDYA Mr. Rajesh Pasupuleti





USE THIS BARCODE TO ACCESS YOUR ONLINE PAPER

To Secure Your Paper As Per UGC Guidelines We Are Providing A Electronic

Bar Code



A Peer Revieved Open Access International Journal

www.ijiemr.org

# DELAY OPTIMIZZATION OF RADIX-4 FFT ALGORRITHM BY USING REVERSIBLE GATES

#### **First Author:**

D.V.S.K.Kireeti B.Tech (Ece) From Sree Venkateswara College Of Engineering, Nellore.
A.V.Sumanth B.Tech (Ece) From Sree Venkateswara College Of Engineering, Nellore.
P.Priyatham B.Tech (Ece) From Sree Venkateswara College Of Engineering, Nellore.
A.Sai Krishna B.Tech (Ece) From Sree Venkateswara College Of Engineering, Nellore.
K.Sreekanth B.Tech (Ece) From Sree Venkateswara College Of Engineering, Nellore.

#### **Second Author:**

Mrs.C.VIDYA, M. Tech, Assistant Professor, Department Of ECE, Sree Venkateswara College of Engineering, Nellore

#### **Third Author:**

**Mr. Rajesh Pasupuleti** (Ph.D), HOD & Associate Professor, Department Of ECE, Sree Venkateswara College Of Engineering, Nellore

#### **ABSTRACT:**

FFT is normally utilized in computerized flag preparing algorithms. 4G correspondence and different remote framework based correspondence are directly hotly debated issues of innovative work in the remote correspondence and organizing field. FFT is a calculation that speeds up the count of DFT. In the main stage, low multifaceted nature Radix-2 Multi-way Delay Communicator (R2MDC) FFT recurrence change method is created through Exceptionally Large-Scale Integration System structure condition. Low power utilization, less zone and rapid speed are the VLSI primary parameters. Customary R2MDC FFT structure has more equipment multifaceted nature because of its escalated computational components.

The FFT are realized using the reversible gates which has great advantage of power reduction compared to the conventional gates. In addition to that, for optimizing the delay, the following architectures were employed. In the present work, radix-4 FFT has been developed to improve the performance compared to the Radix 2 FFT. While moving to the higher order radix FFT, Radix FFT offers more delay compared to the Radix 4 FFT. The Radix 4 FFT has been realized in two different versions. In the first version fixed latency carry select adders are employed. In the second version, variable latency adders are employed. The second version presents the best performance in terms of delay compared to the first version

#### 1. INTRODUCTION

Fast Fourier Transform (FFT) calculation is broadly utilized in many flags preparing and correspondence frameworks. The FFT calculation is analyzed one of the rudimentary calculations in numerous DSP ventures. At present, FFT is the imperative

portable building hinder for the correspondences, especially for the symmetrical recurrence division multiplexing based correspondence frameworks, for example, advanced video furthermore, sound telecom, uneven



A Peer Revieved Open Access International Journal

www.ijiemr.org

advanced supporter circle [1]. Be that as it may, a period multifaceted nature of DFT is O (n2) where a period multifaceted nature of FFT is O(NlogrN).

The FFT was found by Cooley and Tukey to effectively accelerate the calculation time. FFT"s are registered in O (logrN) stages, where N is the length of the change and r is the radix of the FFT breaking down. Here N information words are perused and compose by each stage in FFT. The plan of FFT processor is for the most part having four engineering: Single, double memory, cluster design and pipeline engineering. The FFT designs can be partitioned into two classes: memory-based and pipelined designs [2,3]. Memory-based models involve a butterfly unit and a certain number of memory obstructs for giving minimal effort plans. In any case, it is troublesome to accomplish ongoing preparing at the low clock recurrence.

On the other hand, a pipelined design include different stages to give higher throughput at the expense of something else equipment. Amidst, the best possible FFT estimate shifts for various applications. For precedent, the size can be 128, 256, 512, 1024 or 2048 for WiMAX applications and 256, 512, 1024 or 2048 for DAB frameworks [4]. Henceforth, for a particular application, the asked for FFT center ought to be all around arranged to meet its very own exceptional necessities. The FFT is a run of the mill computation where the Memory get to seriously and the high parallelism is required. FFT calculation ought to have pipelined design and parallel design, be ordinary and measured. At calculation level, it should come to the multiplicative multifaceted nature as low as practical. At the structural dimension, utilize

deferral criticism buffering the methodology to lessen the memory measure. It ought to have measured and ordinary modules, neighborhood directing and low control intricacy [5]. FFT is used to change over time space flag to recurrence area flag. It is utilized to figure the DFT adequately. To meet the superior, high speed and ongoing prerequisites of present applications, equipment fashioners have continuously attempted to perform proficient structures for the estimation of the FFT. The pipelined equipment models are generally utilized, in light of the fact that they give throughputs and low latencies appropriate for continuous, just as a sensibly low region what's more, control utilization. For the most part DIT manages the info and yield backward arrangement and ordinary grouping separately, while DIF manages information and yield in typical succession and turn around arrangement individually. Just DIT calculation will be contemplated.

### 2. LITERATURE SURVEY

#### REVERSIBLE GATE

Rolf Landauer, 1961. Whenever we use a logically irreversible gate we dissipate energy into the environment. Information loss = energy loss Interest in reversible computation arises from the desire to reduce heat dissipation, thereby allowing: - higher densities - higher speed Solution = Reversibility Power dissipation of reversible circuit, under ideal physical circumstances, is zero Landauer/Bennett: all operations required in computation could be performed in a reversible manner, thus dissipating no heat! The first condition for deterministic device to be reversible is that its input and output be uniquely retrievable from each other - then it is called logically Reversible. The second condition: a device can actually run backwards - then it is called



A Peer Revieved Open Access International Journal

www.ijiemr.org

physically Reversible and the second law of thermodynamics guarantees that it dissipates no heat .Reversible are circuits (gates) that have one to-one mapping between vectors of inputs and outputs.

# Different design using Reversible Gates Universal Shift Register (RUSR) Using Novel Reversible Gates

In 2017, maity et al. [1] has proposed the design of a 4-bit reversible universal shift register (RUSR) using novel reversible The proposed gates. design has application in quantum computing by its lesser quantum cost (OC); less no. of the reversible logic gate and low garbage outputs (GO). They obtained QC as 68 and GO as 16. In previous designs, these are high that are 94 and 19 respectively. The universal shift register store binary data and its data can be shifted left or right when a clock signal is applied. All modes of operation such as SISO (serial-in-serial output), SIPO (serial-in-parallel output), PISO (parallel-in-serial output) and PIPO (parallel-in-parallel output) can also be performed upon the occurrence of clock. Thus, serial data (SIR during right shift and SIL during left shift) or parallel data can be loaded into shift register. The values of the select lines determine the operation to be performed as given in table 1. The existing design of Reversible Universal Register in reference [11] is basically built from basic cells comprising of DFF, Feynman gate (FG) and Fredkin gates. In the existing design fanout circuits are not used for any of the signal.

# **Array Multiplier Using Reversible Logic Gates**

In 2018, K. Yugandhar et.al [2] presented a design of a high-performance array

multiplier using half adder and a full adder that is designed by using reversible logic gates. They designed a full adder using reversible multiplexers. This reversible multiplexer is designed by using the COG reversible gate. By the use of multiplexer based design, the numbers of I/O buffers needed are reduced. So they intend to decrease I/O buffer for reducing static power dissipation. They designed half adders using the Peres gate. By this design, the power consumption and delay are greatly reduced when compared to conventional logic. They greatly focused on reducing I/O buffers. In the future, they want to implement an 8-bit comparator using the same formula.

#### **Full Adder Using Reversible Logic**

The adder circuit play a major role in all arithmetic and logical operations in digital circuits .and also it placed a major Roll in processor memory organization by accessing the entire memory address with time multiplexed minimum address lines in early processors. The proposing adder called Reversible adder using multiplexers only some reversible multiplexers based full adder which we proposed full adders produce less power dissipation compare to previous adder. First let's see what is multiplexer; A Multiplexer is a device which is used to selectively present output, based off the selection input provided. By cleverly manipulating the Input lines and the selection lines, we can simulate the logic behind many circuits using multiplexers.

#### Half Adder Using Reversible Logic

We proposed half adder using peres5 reversible gate. Here in peres5 reversible gate a ,b and c are the inputs and P and Q are the outputs.

In the proposed half adder A and B are



A Peer Revieved Open Access International Journal

www.ijiemr.org

inputs SUM and CARRY are outputs. From the above reversible gate half adder function is justified.

#### **Array Multiplier**

A multiplier is simply nothing but, it is a logic circuit which is doing multiplication of two or more numbers. If we talk about multiplier then we talk about multiplier from designer side but not from the user side. Here the multiplication operation is done on the numbers only logic 1 and logic 0 i.e., binary, because, the computer or logic circuit takes any number into binary form only. The logical multiplication is done on the basis of basic multiplication rules that are  $0 \times 0 = 0$ ,  $0 \times 1 = 0$ ,  $1 \times 0 = 0$  and  $1 \times 1 = 0$ 1. Suppose if we multiply two binary numbers 1011 and 1001 then from the above procedure result would be 01100011. It is an 8 bit number.

# Asynchronous counter and Synchronous counter Using Reversible Gates

In 2016, Rupali Singh et al.[3] has done their work in the field of reversible counters. They designed some modules like asynchronous counter and synchronous counter using reversible gates. From their design perspective, the counter has improved in terms of reversible gates count. And they

showed their circuits would have less delay and reduced GO count and decreased circuit complexity. They obtained performance improvement of delay and GO as 44% and 31% respectively concerning previous designs.

#### 3. PROPOSED SYSTEM

The butterfly of a radix-4 algorithm consists of four inputs and four outputs (see Figure 5.1). The FFT length is 4M, where M is the number of stages. A stage is half of radix-2. The radix-4 DIF FFT divides an N-point discrete Fourier transform (DFT) into four N/4 -point DFTs, then into 16 N/16 -point DFTs, and so on.

In the radix-2 DIF FFT, the DFT equation is expressed as the sum of two calculations. One calculation sum for the first half and one calculation sum for the second half of the input sequence. Similarly, the radix-4 DIF fast Fourier transform (FFT) expresses the DFT equation as four summations, then divides it into four equations, each of which computes every fourth output sample. The following equations illustrate radix-4 decimation in frequency.



Fig 1: Radix 4 FFT Algorithm



A Peer Revieved Open Access International Journal

www.ijiemr.org

$$X(k) = \sum_{n=0}^{N-1} x(n) W_N^{nk}$$

$$= \sum_{n=0}^{N/4-1} x(n) W_N^{nk} + \sum_{n=N/4}^{2N/4-1} x(n) W_N^{nk} + \sum_{n=2N/4}^{3N/4-1} x(n) W_N^{nk} + \sum_{n=3N/4}^{N-1} x(n) W_N^{nk}$$

$$= \sum_{n=0}^{N/4-1} x(n) W_N^{nk} + \sum_{n=0}^{N/4-1} x(n+N/4) W_N^{(n+N/4)k} + \sum_{n=0}^{N/4-1} x(n+N/2) W_N^{(n+N/2)k}$$

$$+ \sum_{n=0}^{N/4-1} x(n+3N/4) W_N^{(n+3N/4)k}$$

$$= \sum_{n=0}^{N/4-1} \left[ \frac{x(n) + x(n+N/4) W_N^{(n+N/4)k} + x(n+N/2) W_N^{(n+N/2)k}}{x(n+3N/4) W_N^{(n+N/2)k}} \right] W_N^{nk}$$
(1)

The three twiddle factor coefficients can be expressed as follows:

$$W_{N}^{(\frac{N}{4})^{k}} = \left[\cos(\frac{\pi}{2}) - j\sin(\frac{\pi}{2})\right]^{k} = (-j)^{k}$$

$$W_{N}^{(\frac{N}{2})^{k}} = \left[\cos(\pi) - j\sin(\pi)\right]^{k} = (-1)^{k}$$

$$W_{N}^{(\frac{3N}{4})^{k}} = \left[\cos(\frac{3\pi}{2}) - j\sin(\frac{3\pi}{2})\right]^{k} = j^{k}$$

$$X(k) = \sum_{n=0}^{N_{4}-1} \left[ x(n) + (-j)^{k} x(n + N_{4}) + (-1)^{k} x(n + N_{2}) + (-1)^{k} x(n + N_{2}) \right] W_{N}^{nk}$$
(2)

$$X(4k) = \sum_{n=0}^{\frac{N_4-1}{4}} \left[ x(n) + x(n + \frac{N_4}{4}) + x(n + \frac{N_2}{2}) \right] W_{N_4}^{nk}$$
(3)

$$X(4k+1) = \sum_{n=0}^{N_{4}-1} \begin{bmatrix} x(n) - jx(n+N_{4}) - x(n+N_{2}) \\ +jx(n+3N_{4}) \end{bmatrix} W_{N}^{n} W_{N_{4}}^{nk}$$
(4)

$$X(4k+2) = \sum_{n=0}^{N/4-1} \begin{bmatrix} x(n) - x(n+N/4) - x(n+N/2) \\ -x(n+3N/4) \end{bmatrix} W_N^{2n} W_{N/4}^{nk}$$
 (5)

$$X(4k+3) = \sum_{n=0}^{N_4-1} \begin{bmatrix} x(n) + jx(n+N_4) - x(n+N_2) \\ -jx(n+3N_4) \end{bmatrix} W_N^{3n} W_{N_4}^{nk}$$
 (6)

for 
$$k = 0$$
 to  $N_A - 1$ 

X(4k), X(4k+1), X(4k+2) and X(4k+3) are N/4 -point DFTs. Each of their N/4 points is

a sum of four input samples x(n), x(n+N/4), x(n+N/2) and x(n+3N/4), each multiplied by either +1, -1, j, or -j. The sum is multiplied by a twiddle factor (WN <sup>0</sup>, WN <sup>n</sup>, WN <sup>2n</sup>, or WN <sup>3n</sup>). The four N/4 -point DFTs together make up an N-point DFT. Each of these N/4 -point DFTs is divided into four N/16 -point DFTs. Each N/16 DFT is further divided into four N/64 -point DFTs, and so on, until the final decimation produces fourpoint DFTs. The four-point DFT equation makes up the butterfly calculation of the radix4 FFT. A radix-4 butterfly is shown graphically in Figure 2.

$$X(4k) = \sum_{n=0}^{N_{4}-1} \left[ x(n) + x(n+\frac{N_{4}}{4}) + x(n+\frac{N_{2}}{2}) \right] W_{N_{4}}^{nk}$$

$$\text{Let } x(n) + x(n+\frac{N_{4}}{4}) + x(n+\frac{N_{2}}{2}) + x(n+\frac{3N_{4}}{4}) = g0(n)$$

$$X(4k) = \sum_{n=0}^{\infty} g0(n) W_{N_{4}}^{nk} \qquad \qquad N_{4} \text{-point FFT}$$

$$\text{then } X(16K) = \sum_{n=0}^{\infty} \left[ \frac{g0(n) + g0(n+\frac{N_{4}}{4}) + g0(n+\frac{N_{2}}{2})}{+g0(n+\frac{3N_{4}}{4})} \right] W_{N_{16}}^{nk}$$

$$N_{16} \text{-point FFT}$$



Fig.2: Radix-4 DIF FFT Butterfly

Based on Figure 5.2, assume the following:

$$x(n) = xa + jya$$

$$x(n+N 4) = xb+jyb$$

$$x(n+N 2) = xc+iyc$$

$$x(n+3 4 N) = xd+i yd$$

$$x(4r) = xa' + i ya'$$

$$x(4r+1) = xb'+i yb'$$



A Peer Revieved Open Access International Journal

www.ijiemr.org

$$x(4r+2) = xc'+jyc'$$
  
 $x(4r+3) = xd'+j yd'$   
 $w^n = Wb = Cb+j (-Sb)$   
 $w^{2n} = Wc = Cc+j (-Sc)$   
 $w^{3n} = Wd = Cd+j (-Sd)$ 

The real and imaginary output values for the radix-4 butterfly are given by equations (7)

$$-(14).$$

$$xa' = xa + xb + xc + xd(7)$$

$$ya' = ya+yb+yc+yd$$
 (8)

$$xb' = (xa+yb-xc-yd)Cb - (ya-xb-yc+xd)(-Sb) (9)$$

$$yb' = (ya-xb-yc+xd)Cb + (xa+yb-xc-yd)(-Sb) (10)$$

$$xc' = (xa-xb+xc-xd)Cc - (ya-yb+yc-yd)(-Sc)$$

$$yc' = (ya-yb+yc-yd)Cc + (xa-xb+xc-xd)(-Sc)$$
 (12)

$$xd' = (xa-yb-xc+yd)Cd - (ya+xb-yc-xd)(-Sd) (13)$$

$$yd' = (ya+xb-yc-xd)Cd + (xa-yb-xc+yd)(-Sd (14)$$

Typically, more than one hundred operations are required to calculate this radix4 butterfly. Due to PP high degree parallelism, the radix-4 DIF butterfly can be done within 22 machine cycles. It is three or four times faster than other devices.



Fig 3: RTL View Of Radix 4 FFT Version 1

The fixed latency adder version of the FFT is shown in the above figure 5.3. The ripple carry adder is replaced with the fixed latency carry select adder. The carry generation in the ripple carry adder is the predominant

factor in deciding the delay of the design. As the number of stages in the ripple carry adder increases, the delay increases linearly. In order to reduce the delay, the loon chain of adder is broken down into small blocks, where in the delay of the longest is divided



A Peer Revieved Open Access International Journal

www.ijiemr.org

by the size of the block. But this model has an extra overhead of the delay of the multiplexer. It presents the better performance compared to previous method in terms of delay.

The radix 4 FFT with variable latency is shown in the above figure. In this architecture, the long chain of the ripple carry adder is not broken down into equal blocks. It is divided into the increasing order of the delays. Hence the in the first stage only two adders have been employer and the next stage two more adders and in the next

stage three adders and in the next stage 4 adders followed by the 5 adders. This variable latency shows the better performance compared to the previous model, since in this architecture the delays have been perfectly balanced with each stage.

Advantages of proposed model:

- 1) Low latency compared to the FFT\_Ver1
- 2) Compatible with previous architectures



Fig 4: RTL View Of Radix 4 FFT Version 2

#### 4. RESULT

#### Simulation Results



Fig 5: FFT 4 Version 1 Simulation Results



A Peer Revieved Open Access International Journal

www.ijiemr.org



Fig 6:FFT 4 Version 2 Simulation Results

#### 5. CONCLUSION

Reversible based Radix-4 FFT architecture with Two versions the first version is without latency and the second version with latency are presented. The first version consumes 328 LUT'S and the second version consumes 332 LUT'S .The time analysis of first version is 17.618ns and the second version with latency is 16.030ns The first version consumes less number of LUT's But There is a delay in the first version it was overcomes by second version finally the delay was optimized.

#### 6. REFERENCES

- 1. J. W. Cooley, and J. Tukey, "An algorithm for machine calculation of complex Fourier series", Math. Comput., vol. 19, pp. 297–301, Apr.1965.
- 2. Lars Wanhamar, "Digital Integrated Circuits". Academic Press; 1st edition (24 February 1999).
- 3. Eleanor Chu, and Alan George " INSIDE the FFT BLACK BOX Serial and Parallel Fast Fourier Transform Algorithms ", CRC Press, 2000.
- 4. Xilinx IP cores data avaiable at www.xilinx.com.
- 5. Yang K.J, and Chuang G.C.H,"A MDC FFT Processor with Variable Length for MIMO-FDM", IEEE Transactions on VLSI systems, Vol. 21, 2013, pp.720-731.
- 6. Mario Garrido and Keshab K.Parhi, "A Pipelined FFT Architecture for Real-Valued Signals" IEEE Trans. Circuits Syst. I, Vol. 56, No.

- 12, Dec 2009, pp. 2634 –2643
- 7. Mario Garrido, J. Grajal, M. A. Sánchez, Oscar Gustafsson, "Pipelined Radix-2k Feed forward FFT Architectures", IEEE Trans on VLSI systems, Vol. 21, No. 1, Jan 2013, pp. 23 –32.
- 8. Jienan Chen, Jianhao Hu, Shuyang Lee and Gerald E.Sobelman"Hardware sEfficient Mixed Radix-25/16/9 FFT for LTE Systems", IEEE Transactions On Very Large Scale Integration (VLSI) Systems, Vol. 23, No. 2, February 2015.
- 9. Inkeun Cho1, Chung-Ching Shen1, Yahia Tachwali2, Chia-Jui Hsu2, and Shuvra S. Bhattacharyya1, "Configurable Resource Optimized FFT Architecture for OFDM Communication", in proc. Of IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2746-2750, 2013.
- 10. Manohar Ayinala, and Keshab K. Parhi, "FFT Architectures for RealValued Signals Based on Radix-23 and Radix-24 Algorithms", IEEE Trans. Circuits Syst. I, Vol. 60, No. 9, Sep 2013, pp. 2422 –2430.
- 11. Kala S, Nalesh S, S K Nandy, and RanjaniNarayan, "Design of a Low Power 64 Point FFT Architecture for WLAN Applications", in proc. Of 25th IEEE International on Microelectronics (ICM), 2013