

A Peer Revieved Open Access International Journal

www.ijiemr.org

#### **COPY RIGHT**





2019IJIEMR. Personal use of this material is permitted. Permission from IJIEMR must

be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. No Reprint should be done to this paper, all copy right is authenticated to Paper Authors

IJIEMR Transactions, online available on 4<sup>th</sup> Aug 2019. Link

:http://www.ijiemr.org/downloads.php?vol=Volume-08&issue=ISSUE-08

Title A PERFORMANCE IMPROVEMENT OF FAST FOURIER TRANSFORM WITH MONTGOMERY MODULAR MULTIPLICATION ALGORITHM USING CYCLIC AND NEGA CYCLIC CONVOLUTIONS

Volume 08, Issue 08, Pages: 210-214.

**Paper Authors** 

MR. N.B.KRISHNA CHAITANYA, MR. K.SANDEEP

SREE VAHINI INSTITUTEOF SCIENCE & TECHNOLOGY, TIRUVURU, KRISHNA





USE THIS BARCODE TO ACCESS YOUR ONLINE PAPER

To Secure Your Paper As Per UGC Guidelines We Are Providing A Electronic

Bar Code



A Peer Revieved Open Access International Journal

www.ijiemr.org

# A PERFORMANCE IMPROVEMENT OF FAST FOURIER TRANSFORM WITH MONTGOMERY MODULAR MULTIPLICATION ALGORITHM USING CYCLIC AND NEGA CYCLIC CONVOLUTIONS

<sup>1</sup>MR. N.B.KRISHNA CHAITANYA, <sup>2</sup>MR. K.SANDEEP

<sup>1</sup>VLSI, SREE VAHINI INSTITUTE OF SCIENCE & TECHNOLOGY, TIRUVURU, KRISHNA <sup>2</sup>PROFESSOR, SREE VAHINI INSTITUTE OF SCIENCE & TECHNOLOGY, TIRUVURU, KRISHNA

#### **ABSTRACT:**

Execution improvement of particular duplication utilizing cyclic and nega cyclic convolution. The convolution length performed utilizing cyclic and weighted change .in which single and twofold butterfly structures are utilized to acquire high territory inertness arrangements which increment the clock recurrence by increment of calculation speed utilizing Xilinx vertex. FPGA is in type of skew circuit which is produced by vectors an and b where convolution length decreased by half of a factor. At that point speed progresses toward becoming improve by a few components to make quicker calculation without trading off territory of chip. The chip might be utilized in different applications. The impediments can be diminished by utilizing some additional methods to beat the inconveniences in each application to improve its dependability.

#### **INTRODUCTION:**

Incorporated circuit (IC) innovation is the empowering innovation for an entire host of creative Integrated gadgets and frameworks that have changed the manner in which we live. Jack Kil by and Robert Noyce got the 2000 Nobel Prize in Physics for their development of the incorporated circuit without the coordinated circuit, neither transistors nor PCs would be as significant as they are today. VLSI frameworks are a lot littler and expend less power than the discrete segments used to construct electronic frameworks before the 1960s. Inventive watcher to be connected to taking care of an issue. Incorporated circuits are additionally a lot simpler to structure and produce and are more dependable than discrete frameworks that make it conceivable to create extraordinary reason

frameworks that are more effective than broadly useful PCs for the job needing to be done. Electronic frameworks presently play out a wide assortment of undertakings in day by day life. Electronic frameworks now and again have supplanted systems that worked precisely, using pressurized water, or by different methods hardware are generally littler, progressively adaptable, and simpler to support. In different electronic frameworks have cases made Electronic absolutely new applications. assortment of frameworks play out an undertakings, some of them unmistakable, some increasingly covered up Personal amusement frameworks, for example, convenient MP3 players and DVD players perform complex calculations with surprisingly little vitality.



A Peer Revieved Open Access International Journal

www.ijiemr.org

Electronic frameworks in autos work sound systems and presentations they likewise control fuel infusion frameworks, change suspensions to shifting landscape, and play out the control capacities required for antilock braking (ABS) frameworks. Digital gadgets pack decompress video, even at top quality information rates, on-the-fly in buyer hardware. Low-cost terminals for Web perusing still require modern hardware, regardless of their Personal committed capacity. **PCs** workstations give word-handling, monetary examination, and amusements. PCs incorporate both focal preparing units (CPUs) and extraordinary reason equipment for plate get to, quicker screen show, and so on. Medicinal electronic frameworks measure real capacities and perform complex handling calculations to caution about unordinary conditions. The accessibility of these intricate frameworks, a long way from overpowering customers, just makes interest for significantly increasingly complex frameworks. The developing modernity of utilizations constantly pushes the plan and assembling of incorporated circuits and electronic frameworks higher than ever of unpredictability. What's more, maybe the most stunning normal for this gathering frameworks is its assortment as frameworks turned out to be increasingly mind boggling, we manufacture not a couple of universally useful PCs but rather an ever more extensive scope of uncommon reason frameworks. Our capacity to do as such is a demonstration of our developing dominance of both coordinated assembling and structure, however expanding requests of clients keep on testing the points of confinement of plan and assembling.

The subject of the paper is equipment usage of the RSA calculation with bigger than

1,024-piece modulus length. Specifically, our goal is to make usage that accomplish high zone effectiveness, instead of making exceptionally low region or ultra fast executions at the staggering expense of the other. The RSA calculation, being the absolute first open key encryption and advanced mark calculation since 1978, is universally conveyed and utilized, from keen cards to mobile phones and SSL boxes. Its security relies upon the trouble of calculating a modulus n to locate its two prime variables p and q. The security is expanded by choosing higher modulus, anyway to the detriment of huge circuit estimate or moderate operational speed. The absolute first usage of the RSA calculation in mid 1980s accepted 512piece modulus (and therefore ,two 256-piece primes) would be adequate, yet inside 10 years, progresses in factorization techniques expanded the modulus length to 1,024 bits. This has been the situation for right around 2 decades, however at this point, as of late as 2010s, the security of 1,024-piece was addressed.

#### **RELATED WORK:**

This area presents the scientific foundation of FFT-based secluded augmentation. For the implicity of reference, the parameters in this paper and their definitions are recorded For a subjective non-negative whole number x, we speak to the radix-B configuration of x as

where B is a positive whole number, 0.6 xi < B with  $I = 0, 1, \dots, P-1$  are known as the digits of x, and P is the number of digits.

The accumulation of digits xi is signified as fxig. With the comparative connection among fyjg and y, the increase z = xy is proportionate to a length-2P cyclic convolution, what's more, the segments zn of z can be gotten by

The number theoretic change (NTT) gives a unique space, called unearthly area, in which a



A Peer Revieved Open Access International Journal

www.ijiemr.org

CC registered by part can be astute augmentations. So also, a NCC can likewise be registered part astutely while applying the number-theoretic weighted change (NWT). The forward and converse NWT over ring ZM are characterized as Rather than getting the particular item xy (mod N) straightforwardly, Montgomery presents an additional fixed whole number R, what's more, registers xyR-1 (mod N). In this manner, the MMM can productively keep away from the tedious preliminary division.

McLaughlin proposed a changed form of MMM, the itemized computational advances are given in Algorithm 1. Not at all like the first form where R equivalents to a power of 2, the changed rendition rethinks R = 21 - 1 with an extra modulus Q0 = 21 + 1. McLaughlin's calculation has a quicker assessed running time contrasted with the first one. Also, for fixed modulus R and Q0, the CT and NCT can be connected to the particular augmentation steps. The FFT technique can be connected to Algorithm 1 to perform productive duplications modulo R and Q0, the FFT-based calculation (FMLM3) is given as appeared in Algorithm 2. The calculation of FMLM3 begins from either time or unearthly area, which relies upon the kind of info information what's more, the yield ought to be predictable with the info. Calculation 2 begins the calculation from unearthly space, in this way two additional means are required to get T(t) and T0(t). In Step 12 of Algorithm 2, the aftereffects of NCT□1 ought to be identical to the parts of NCC.

## IMPLEMENTATION RESULTS AND COMPARISONS:

The proposed FMLM3 architecture is implemented on a Virtex-6 (xc6vlx130t-1) FPGA. ynthesis and Place & Route are carried

out by using Xilinx ISE 14.7 with default settings. Each DSP48E1 in Virtex-6 FPGA contains a signed 25\_18-bit multiplier, and we use it to build the base multiplier unit of the Multiply Adder. The based multipliers are pipelined with the optimal stages for high frequency. Parameter sets with 1024, 2048, 3072, 4096 and 7680- bit operand sizes are implemented. Table 6 presents the post place-and-route results for the selected parameters.

The rows highlighted gray depict the implementation results with 1 BFS, while the rest of the rows depict with 2 BFSs. In Table 6, the second column indicates the algorithm (Algorithm 2 or 3) applied to the corresponding parameter set. For the same transform length P and digit size u, the implementation of Algorithm 3 usually requires a larger M due to the different definitions of c in (15) and (20), which results in a larger area cost. For example, when P = 32 and u = 32, the first set has a larger M, so its implementation costs more look-uptable (LUTs) and has a longer critical path compared to the second one. On the contrary, the first implementation requires less clock cycles, this is because Algorithm 3 requires less NWTs The performance comparison between the FMLM3 and the state-of-the-art architectures is provided.

We only select the most efficient parameter set for each operand size 1 for comparison. The design in implement the digit-based method, implements the RNS method, and implements FFT method. Since different design may involve different DSP and RAM resources, their costs are also considered rather than only evaluating the area-latency product. In order to provide a fair comparison, the reduction and improvement ratios and are not included, since they are implemented by other FPGA



A Peer Revieved Open Access International Journal

www.ijiemr.org

families. In Table 7, the first column shows our design could fit the NIST recommended key size perfectly, while in, larger operand sizes are always required (i.e., when targeting 1024- bit key size, the 1 of our design is exactly 1024-bit compared to that of 1084-bit). A larger operand size may lead to waste of hardware resources. Our design has lower arealatency products compared to for all the operand sizes, with an average of 54% area-latency efficiency improvement. The area-latency product growth tendency is also provided in Fig. 9 including the design of and ours with 1 and 2 BFSs. It can be observed that our design with 2 BFSs has the least growth rate, which implies that the arealatency efficiency improvement becomes more obvious for larger operand sizes. For 1024 and 3072-bit cases, both and our design employ the same number of DSP blocks, but our design employs more RAM blocks. This is because the FMLM3 has more precomputations. Moreover, an extra RAM set is included to enable parallel computing of Step 8 and Step 9 in Algorithm 2 (cf. Section 5.4). For 2048, 4096, and 7680-bit cases, our design employs less DSP and RAM blocks with an average reduction ratios of 72% and 35%, respectively. The reduction mainly comes from the low hardware complexity cause by the small ring size. Besides, according to the comparison results, architecture with 2 BFSs also has a better area-latency efficiency, and less DSP. and RAM usage than that of 1 BFS. Additionally, it can be observed from Table 7 that the design of achieves a very small arealatency product, but employs 33 DSP blocks. The designs employs no DSP and RAM resources, but they have relatively large arealatency products. The design of requires only 463Kb memory bits for case l = 7680, while the 34 RAM blocks used in our design is capable

for more than 1Mb memory bits. This is mainly because the FMLM3 includes more precomputations than the algorithm. However, costs more DSP blocks and its area-latency product is about 10 times greater than ours.

#### CONCLUSIONS

In this work, we proposed a modified version of the **FFTbased** Montgomery modular multiplication algorithm under McLaughlin's framework (FMLM3). By applying cyclic and nega-cyclic convolutions to compute the modular multiplication steps, the zero-padding operation is avoided and the transform length is reduced by half compared to the regular FFTbased multiplication. Furthermore, we explored for some special cases, the number of transforms can be further reduced from 7 to 5 without extra computational efforts, so that the FMLM3 can be further accelerated. A general method of efficient parameter set selection has been summarized for a given operand size. Moreover, pipelined architectures with 1 and 2 butterfly structures are designed for high arealatency efficiency. We also analysed the connection between the number of butterfly structures and the cycle requirement. The estimation results indicate a feasible physical approach can be implemented which could trade area cost for faster speed by adding more butterfly structures. The Virtex-6 FPGA implementation results shows the proposed FMLM3 with both 1 and 2 butterfly structures have better area-latency efficiency than the state-of-the-art FFT-based Montgomery modular multiplication. In addition, processing speed of the proposed multiplier is also comparable, especially for large transform length (i.e. P = 64 or higher).

#### **REFERENCE:**

[1] R. L. Rivest, A. Shamir, and L. Adleman, "A



A Peer Revieved Open Access International Journal

www.ijiemr.org

technique for getting computerized marks and open key cryptosystems," Communications of the ACM, vol. 21, no. 2, pp. 120–126, 1978.

[2] R. L. Rivest, "A depiction of a solitary chip execution of the RSA figure," Lambda, vol. 1, no. Final Quarter, pp. 14–18, 1980.

[3] E. Barker, W. Barker, W. Burr, W. Polk, M. Smid, P. D. Gallagher et al., "NIST extraordinary distribution 800-57 proposal for key management—section 1: General," 2012.

[4] P. L. Montgomery, "Particular augmentation without preliminary division," Science of calculation, vol. 44, no. 170, pp. 519–521, 1985.

[5] A. Karatsuba and Y. Ofman, "Increase of multidigit numbers on automata," in Soviet material science doklady, vol. 7, 1963, p. 595.

[6] S. A. Cook and S. O. Aanderaa, "On the base calculation time of capacities," Transactions of the American Mathematical Society, pp. 291–314, 1969.

[7] A. Sch"onhage and V. Strassen, "Schnelle multiplikation großer zahlen," Computing, vol. 7, no. 3-4, pp. 281–292, 1971.

[8] M. F. urer, "Quicker whole number augmentation," SIAM Journal on Computing, vol. 39, no. 3, pp. 979–1005, 2009.

[9] D. Harvey, J. Van Der Hoeven, and G. Lecerf, "Much quicker whole number increase," arXiv preprint arXiv:1407.3360, 2014.

[10] S. Covanov and E. Thom'e, "Quick number juggling for quicker whole number increase," arXiv preprint arXiv:1502.02800, 2015.

[11] A. F. Tenca and C<sub>s</sub>. K. Koc<sub>s</sub>, "A versatile design for measured increase dependent on Montgomery's calculation," Computers, IEEE Transactions on, vol. 52, no. 9, pp. 1215–1221, 2003.

[12] M. D. Shieh and W. C. Lin, "Word-based Montgomery particular increase calculation for

low-inactivity versatile designs," PCs, IEEE Transactions on, vol. 59, no. 8, pp. 1145–1151, 2010.

[13] M. Spirits Sandoval and A. Diaz-Perez, "Versatile gf (p) montgomery multiplier digit- digit dependent calculation on a approach," Computers IET and **Digital** Techniques, 2015.

[14] M. Huang, K. Gaj, and T. El-Ghazawi, "New equipment designs for Montgomery secluded duplication calculation," Computers, IEEE Transactions on, vol. 60, no. 7, pp. 923–936, 2011.

[15] G. C. Chow, K. Eguro, W. Luk, and P. Leong, "A Karatsubabased Montgomery multiplier," in Field Programmable Logic and Applications (FPL), 2010 International Conference on. IEEE, 2010, pp. 434–437.

[16] M. K. Jaiswal and R. C. C. Cheung, "Region proficient models for huge number and fourfold accuracy drifting point multipliers," in Field-Programmable Custom Computing Machines (FCCM), 2012

IEEE twentieth Annual International Symposium on. IEEE, 2012, pp. 25–28. Student Details:



Mr. *N.B.Krishna Chaitanya*, Student Of Electronics & Communication Engineering Department, Sree Vahini Instituteof Science & Technology, Tiruvuru, Krishna.



A Peer Revieved Open Access International Journal

www.ijiemr.org