

A Peer Revieved Open Access International Journal

www.ijiemr.org

#### COPY RIGHT



# ELSEVIER SSRN

**2019IJIEMR**. Personal use of this material is permitted. Permission from IJIEMR must

be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. No Reprint should be done to this paper, all copy right is authenticated to Paper Authors

IJIEMR Transactions, online available on 23<sup>nd</sup> Aug 2019.

Link: http://www.ijiemr.org/downloads.php?vol=Volume-08&issue=ISSUE-08

Title: DESIGN OF HIGH SPEED 64-BIT MAC UNIT

Volume 08, Issue 08, Pages: 427-432.

Paper Authors

<sup>1</sup>DHAMISETTI DHARMA SASTA. <sup>2</sup>P.NAGARAJU.M.Tech,(Ph.D.)





USE THIS BARCODE TO ACCESS YOUR ONLINE PAPER

To Secure Your Paper As Per UGC Guidelines We Are Providing A Electronic

Bar Code



PEER REVIEWED OPEN ACCESS INTERNATIONAL JOURNAL

www.ijiemr.org

### **DESIGN OF HIGH SPEED 64-BIT MAC UNIT**

<sup>1</sup>DHAMISETTI DHARMA SASTA. <sup>2</sup>P.NAGARAJU.M.Tech,(Ph.D.)

<sup>1</sup>VLSI, Dept of E.C.E, Kakinada Institute of Engineering Technology, Korangi, Andhrapradesh, India, 533461

<sup>2</sup>Associate Professor, Kakinada Institute of Engineering & Technology, Korangi, Andhrapradesh, India, 533461

#### **ABSTRACT:**

MAC unit is an inevitable many digital signal processing (DSP) applications involving multiplications and/or accumulations. MAC unit is used for high performance digital signal processing systems. The DSP applications include filtering, convolution, and inner products. Most of digital signal processing methods use nonlinear functions such as discrete cosine transform (DCT) or discrete wavelet transforms (DWT). Because they are basically accomplished by repetitive application of multiplication and addition, the speed of the multiplication and addition arithmetic determines the execution speed and performance of the entire calculation. The Multiplication-and-accumulate (MAC) operations are typical for digital filters. Therefore, the functionality of the MAC unit enables high-speed filtering and other processing typical for DSP applications. Since the MAC unit operates completely independent of the CPU, it can process data separately and thereby reduce CPU load. The application like optical communication systems which is based on DSP, require extremely fast processing of huge amount of digital data. The Fast Fourier Transform (FFT) also requires addition and multiplication. 64 bit can handle larger bits and have more memory.

Keywords: Modified Wallace Multiplier, Carry Save Adder, Multiplier And Accumulator (MAC).

#### 1. INTRODUCTION

With the recent rapid advances in multimedia and communication systems,

real-time signal processing like audio signal processing, video/image processing, or large-capacity data processing are



PEER REVIEWED OPEN ACCESS INTERNATIONAL JOURNAL

www.ijiemr.org

demanded. The increasingly being multiplier and multiplier-and-accumulator (MAC) are the essential elements of the digital signal processing such as filtering, convolution, transformations and Inner products. There are different entities that one would like to optimize when designing a VLSI circuit. These entities can often not optimized simultaneously, improve one entity at the expense of one or more others The design of an efficient integrated circuit in terms of power, area, and speed simultaneously, has become a very challenging problem. Power dissipation is recognized as a critical parameter in modern the objective of a good multiplier is to provide a physically compact, good speed and low power consuming chip. This paper proposes a architecture of multiplierandnew accumulator (MAC) for high speed and low-power by adopting the new SPST implementing approach as shown in Fig.1. This multiplier is designed by equipping the Spurious Power Suppression Technique (SPST) on a modified Booth encoder which is controlled by a detection unit using an AND gate. The modified booth encoder will reduce the number of partial products generated by a factor of 2.

The SPST adder will avoid the unwanted addition and thus minimize the switching power dissipation. By combining multiplication with accumulation and devising a low power equipped carry save adder (CSA), the performance was improved.



Fig.1.1. Block diagram

Basic Multiply and accumulator (MAC) unit consists of multiplier and a accumulate adder. Multiplier is used to multiply the inputs which are obtained from the memory location and given to the accumulator part which contains the sum of the previous successive products. Our design consists of 64 bit Vedic multiplier, 128 bit carry save adder and a register. MAC unit consists of three main components they are multiplier unit, adder



PEER REVIEWED OPEN ACCESS INTERNATIONAL JOURNAL

www.ijiemr.org

unit and accumulation unit. Multiplier unit multiplies the input numbers and output is given to adder unit here in addition unit addition of present output and previous done output will be and carried accumulator were storing of output in register. In our design 64, bit Vedic multiplier which accepts 64 bits input and hence the output will be 128 bits the multiplier output is given as the input to the carry save adder, which performs addition.

#### 2. RELATED STUDY

Multipliers play a significant role in today's digital signal processing and various other applications. With advances in technology, many researchers have tried and are trying to design multipliers which offer either of following high speed, low power consumption, regularity of layout and hence less area or even combination of them in one multiplier, thus making them suitable for various high speed, low power, compact **VLSI** implementation. Multipliers play a significant role in today's digital signal processing and various other applications. With advances in technology, many researchers have tried and are trying to design multipliers which offer either of following high speed, low

power consumption, regularity of layout and hence less area or even combination of them in one multiplier, thus making them suitable for various high speed, low power, and compact VLSI implementation. 2.1: Wallace tree Multiplier To make the conventional Wallace multiplier more efficient we use modified Wallace multiplier. Here in this modified Wallace multiplier our main aim is to reduce the number of half adder by replacing them with full adders. Generally in conventional Wallace multipliers many full adders and half adders are used in their reduction phase. Half adders will not reduce number partial products bits. Therefore minimizing the number of half adders with a very slight increase in the number of full adders will somewhat reduces the delay. Modified Wallace multiplier consists of three stages. First stage the N×N product matrix is formed and before passing on to the second phase the product matrix is rearranged to take the shape of inverted pyramid. Now in second phase the inverted pyramid is grouped into nonoverlapping group based on the below formula

> $r_{i+1} = 2[r_i/3] + r_i \mod 3$ if  $r_i \mod 3 = 0$ , then  $r_{i+1} = 2r_i/3$



PEER REVIEWED OPEN ACCESS INTERNATIONAL JOURNAL

www.ijiemr.org

If the valve calculated from the above equation for number of rows in each stage in the second phase and the number of rows that are formed in each stage of the second phase does not match, only then the half adders will be used. The final product of this phase will be in a height of two bits and passed to the third phase .During this stage carry save adder is used for better performance rather than carry select adder and a ripple carry adder. The 64 bit modified Wallace multiplier is difficult to represent, so for understanding purpose a typical 8-bit by 8-bit reduction is shown in the below.

#### 3. PROPOSED SYSTEM

The present Modified Booth Encoding (MBE) multiplier and the Baugh-Wooley multiplier perform multiplication operation on signed numbers only. The array multiplier and Braun array multipliers multiplication perform operation unsigned numbers only. Thus, requirement of the modern computer system is a dedicated and very high speed unique multiplier unit for signed and unsigned numbers. Therefore, this paper presents the design and implementation of SUMBE multiplier. The modified Booth Encoder circuit generates half the partial

products in parallel. By extending sign bit of the operands and generating additional partial product the SUMBE multiplier is obtained. The Carry Save Adders (CSA) tree and the final Carry Lookahead (CLA) adder used to speed up the multiplier operation. Since signed and unsigned multiplication operation performed by the same multiplier unit the required hardware and the chip area reduces and this in turn reduces power dissipation and cost of a system. The binary adder is the critical element in most digital circuit designs including digital signal processors (DSP) and microprocessor data path units. As such, extensive research continues to be focused on improving the power delay performance of the adder. In VLSI implementations, parallelprefix adders are known to have the best performance. Parallel-prefix adders (also known as carry-tree adders) are known to have the best performance in VLSI designs as Fig.3. shown in However. performance advantage does not translate directly into FPGA implementations due to constraints on logic block configurations and routing overhead. This paper investigates three types of carry-tree

PEER REVIEWED OPEN ACCESS INTERNATIONAL JOURNAL

www.ijiemr.org

adders (the Kogge-Stone, sparse Kogge-Stone, and spanning tree adder).



Fig.3.2. Schematic diagram of 16 bit input.

Delay is measured in terms of signal length from input to output. To measure the delay parameter in our two MAC units we have to consider the length of the signal that travels via components from input to output end which is shown in schematic diagrams. Lets us first consider our existing 64-bits MAC unit with modified Wallace multiplier and carry save adder (CSA). Here we take the core block that is 8×8 Wallace multiplier block in this block the signal length is more from input AND gate through FULL ADDER to When we consider our output end. implemented MAC unit with Vedic

multiplier and CSA the signal length in the basic block 2×2 multiplier is very less when compared with modified Wallace multiplier and also in this multiplier signal to all Vedic block will travel parallel. That's why it takes less time. Therefore delay is also less for vedic multiplier.



Fig.3.3. Simulation results

#### 4. CONCLUSION

Hence, a High Performance 64 bit MAC Unit is designed and implemented using Vedic Multiplier and Carry Save Adder. When compared with modified Wallace multiplier with carry save adder MAC Unit which is developed and more efficient than earlier MAC units using different combinations of multipliers and adders the designed Vedic Multiplier offers High Performance with Less Area, and Less Propagation Delay, which further increases



PEER REVIEWED OPEN ACCESS INTERNATIONAL JOURNAL

www.ijiemr.org

the overall speed of MAC Unit. This MAC Unit is designed using Verilog - HDL and Synthesized using Xilinx 14.3 ISE.

#### REFERENCES

- [1] S. Lin and D. J. Costello, Error Control Coding: Fundamentals and Applications, 2nd ed. Englewood Cliffs, NJ, USA: Prentice-Hall, 2004.
- [2] S. B. Wicker, Error Control Systems for Digital Communication and Storage. Englewood Cliffs, NJ, USA: Prentice-Hall, 1994.
- [3] Y. Lin, C. Yang, C. Hsu, H. Chang, and C. Lee, "A MPCN-based parallel architecture in BCH decoders for NAND Flash memory devices," IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 58, no. 10, pp. 682–686,Oct. 2011.
- [4] Y. Lee, H. Yoo, and I.-C. Park, "High-throughput and low-complexity BCH decoding architecture for solid-state drives," IEEE Trans. Very Large Scale Integr. Syst., vol. 22, no. 5, pp. 1183–1187, May 2014.
- [5] X. Zhang and Z. Wang, "A low-complexity threeerror-correcting BCH decoder for optical transport network," IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 59, no. 10, pp. 663–667, Oct. 2012.

- [6] K. Lee, S. Lim, and J. Kim, "Low-cost, low-power and high-through put BCH decoder for NAND flash memory," in Proc. IEEE ISCAS, May 2012, pp. 413–415.
- [7] Y. Wu, "Low power decoding of BCH codes," in Proc. IEEE ISCAS, May 2004, pp. II-369–II-372.
- [8]. H. Weingarten, E. Sterin, O. A. Kanter, and M. Katz, "Low Power Chien-Search Based BCH/RS Decoding System for Flash Memory, Mobile Communications Devices and Other Applications," U.S. Patent 2010 013 1831 A1, May 27, 2010.
- [9]. H. Choi, W. Liu, and W. Sung, "VLSI implementation of BCH error correction for multilevel cell NAND Flash memory," IEEE Trans. Very Large Scale Integr. Syst., vol. 18, no. 5, pp. 843–847, May 2010.