

A Peer Revieved Open Access International Journal

www.ijiemr.org

### COPY RIGHT

**2017 IJIEMR**. Personal use of this material is permitted. Permission from IJIEMR must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. No Reprint should be done to this paper, all copy right is authenticated to Paper Authors

IJIEMR Transactions, online available on 16<sup>th</sup> July 2017. Link : http://www.ijiemr.org/downloads.php?vol=Volume-6&issue=ISSUE-5

Title: Hardware And Energy-Efficient Stochastic Lu Decomposition Scheme For Mimo Receivers.

Volume 06, Issue 05, Page No: 1862 – 1865.

Paper Authors

#### \*BOMMAKANTI VINEELA, V. LEELASHYAM.

\* Dept of ECE, Anurag Engineering College.





USE THIS BARCODE TO ACCESS YOUR ONLINE PAPER

To Secure Your Paper As Per UGC Guidelines We Are Providing A Electronic Bar Code



A Peer Revieved Open Access International Journal

www.ijiemr.org

### HARDWARE AND ENERGY-EFFICIENT STOCHASTIC LU DECOMPOSITION SCHEME FOR MIMO RECEIVERS

#### \*BOMMAKANTI VINEELA, \*\*V. LEELASHYAM

\*PG Scholar, Dept of ECE, Anurag Engineering College.

\*\*Assistant Professor, Dept of ECE, Anurag Engineering College.

Vineela29.Bommakanti@gmail.com

leelashyam.chanti@gmail.com

#### **ABSTRACT**:

In this paper, we design a hardware and energy-efficient stochastic lower–upper decomposition (LUD) scheme for multiple-input multiple-output receivers. By employing stochastic computation, the complex arithmetic operations in LUD can be performed with simple logic gates. With proposed dual partition computation method, the stochastic multiplier and divider exhibit high computation accuracy with relative short length stochastic stream. We have designed and synthesized the stochastic LUD with CMOS 130-nm technology. According to the postlayout report, the hardware efficiency of the stochastic LUD is as high as  $1.5 \times$  compared with the exiting LUD methods, and the energy efficiency is also higher than the state-of-the-art LUD when the matrix dimension is  $8 \times 8$  and larger.

#### 1. INTRODUCTION

Matrix decomposition is an essential algorithm in the linear solution problem [1]. Especially, in the multipleinput multiple-output (MIMO) systems, matrix decomposition is the main burden for the implementation of hardware and the energy-efficient MIMO detector [2]. The existing matrix decomposition optimization methods aim at large-size matrices such as dimension with 16 kB [3]-[5]. However, in the practical MIMO systems, the scale of antennas is limited by the area of antenna array. For example, in the longterm evolution (LTE) standards, the MIMO systems employ a 4×4 dimension antenna array. Even in the largescale MIMO system [21], the required inversion matrix dimension is no more than 100. The MIMO systems are interested in more hardware-efficient and energy-efficient VLSI implementation of matrix decomposition algorithm. Generally, there are two main approaches for the matrix decomposition systems: method in MIMO 1) QR decomposition and 2) lower-upper

(LUD) decomposition QR [6]–[11]. decomposition algorithm, which transfers a matrix into an orthogonal matrix and an upper triangular matrix, is widely employed in the path-search-based MIMO-detection algorithm [12]. In the other aspect, LUD algorithm factorizes a matrix into a lower triangular matrix and an upper triangular matrix [8]–[11]. LUD has the same function as OR decomposition, which serves for a path searchbased MIMO detection. Moreover, LUD is an indispensable processing in the zero-force (ZF) [13] and the minimum mean square error (MMSE)-based MIMO system [2]. In this paper, we focus on the implementation of LUD algorithm. Plenty of LUD methods have been proposed in [3], [4], and [9]-[11], such as parallel-processing-based, circularlinear-arraybased, and blocking-based architectures that target on large-size matrices with high throughput. However, energy consumption and hardware complexity are two fatal design criterions in the wireless communication systems, especially for the mobile terminals. Hence, it is necessary to design a high-



A Peer Revieved Open Access International Journal

www.ijiemr.org

performance LUD scheme specific for the MIMO systems. In [9], an LUD based on computation sharing multiplier (CSHM) is proposed, which has considerable energy-saving capacity. An approximate matrix inversion structure for large-scale MIMO uplink is proposed in [21], which can only be used in the system with massive receiving antenas.

#### 2. PROJECT DESCRIPTION

A. DPC-Based Stochastic Multiplier The hardware scheme of DPC-based stochastic multiplier is given in Fig. 2. We highlight the logic gates with corresponding function to help understand structure. the Since AL, B L, C L, and DL are Boolean signals, the multiplications in (22)and (24)are implemented by AND gates. The subtraction "- $AL(t) \cdot C L(t) \cdot 2k$ " is performed by operating MSB of the adder outputting signal. The adder is shared by (22) and (24) in  $1 \rightarrow L$  cycles and  $L+1 \rightarrow 2L$  cycles. The function of (23) is performed by a sign detector, in which a kinput AND gate is employed to obtain the absolute signal a. The MSB represents the signed bit in TCS. Hence, signed signal s is obtained by MSB of register output. To feedback a k +1-bit TCS signal to the adder, we duplicate the MSB bit of register. The signal "Ctl" controls the stochastic multiplier to process two section of the stream E L and F L



Fig. 2. High-accuracy stochastic multiplier.

DPC-Based **Stochastic** Divider Β. The hardware implementation scheme of proposed SD is given in Fig. 3. The back converter (B.C.) which converts stochastic stream to FP signal can be bypassed when the input signal is already a TCS signal. To obtain  $(AL(t) \cdot 2k + B L(t)) \cdot 2k$ , a stream generator (S.G.) is employed with left shifting AL(t) to 22k-bits and B L(t) to 2k-bits. In the first L = 2k cycles, the register is updated with (38). A multiplexer is used to control the register storing the current value according to E L(t). The output signal E L(t) is generated by (37), where a comparison is performed by inverting the MSB of adder output. From 2k + 1 to 2k+1cycles, the left shifting module at the register output is enabled to perform (41). Notice that "Ctl\_p" is a pulse signal with the function.



**Conversion Units** 

1) Back Conversion Unit: The B.C. unit converts the stochastic stream into the TCS signal, which is widely used in the stochastic logic-based system. The hardware implementation is simple, as shown in Fig. 5(a).

2) Stream Generation Unit: The S.G. unit performs a reverse function of B.C., which generates the stochastic stream with a given TCS signal. It contains an adder, a register, and a multiplexer as shown in Fig. 5(b).
3) DSC Generator: As discussed in Section III-C, uniform distribution vectors in (9) are required to perform the SM. We propose a



A Peer Revieved Open Access International Journal

www.ijiemr.org

simple and effectively method to implement (9). As shown in Fig. 5(c), we employ a counter with k rising edge detectors. The uniform distribution vectors can also be shared by each signal generator.



Fig. 4. Parallel LUD scheme processing steps. (a) Stage 1. (b) Stage 2. (c) Stage 3.



Fig. 5. Conversion units. (a) B.C. (b) S.G. (c) Uniform distribution generator.

E. Block LU Decomposition With Stochastic Computation The block LUD algorithm can be employed for the large matrix based on stochastic computation. We first review the block LUD algorithm for A = LU where we have (a) A11 = L11U11, (b) A12 = L11U12 (c) A21 = L21U11, (d) A22 = L21U12 + L22U22.

In (a), L11 and U11 is obtained by LUD. Then, we submit L11 and U11 to (b) and (c) to obtain U21 and L21, respectively. Finally, after computing A22-L21U12, we perform LUD again to obtain L22 and U22 in (d). Matrix A22 can be further factorized by the block LUD method.



The hardware architecture is shown in Fig. 6. A RAM is employed to store the matrix elements. The control logic generates reading addresses the data from the access RAM. to The register banks hold the data to perform stochastic computations. After the FP data are converted by S.G., the 1-bit streams are input to the stochastic 4×4 matrix decomposition unit. The 1-bit multiplexers are employed to perform data routing.

#### 3. CONCLUSION

In this paper, we proposed a stochastic-based LUD scheme with high hardware efficiency and power efficiency. In order to achieve high presented accuracy, we several novel techniques to improve stochastic computation performance. The proposed DPC has reduced the computation latency from 2k to 2k/2+1. The high-accuracy SM and SD can achieve SNR performance of 60 dB, which is capable of employing the proposed stochastic logic to the system that requires high computation. The scheme proposed in this paper can also be used in other DSP systems.

#### REFERENCES

[1] H.R.Rategh et al., "A CMOS frequency synthesizer with an ijectedlocked frequency divider for 5-GHz wirless LAN receiver," IEEE J Soli-State Circuits, vol. 35, no. 5, pp. 780-787, May 2000.

[2] P. Y. Deg et al., "A 5 GHz frequency synthesizer with an injection locked frequency



A Peer Revieved Open Access International Journal

www.ijiemr.org

divider and differential switched capacitors," IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 56, no. 2, pp. 320-326, Feb. 2009. [3] L. Lai Kan Leung et al., "A I-V 9.7-mW CMOS frequecy synthesizer for IEE 802.IIa transceivers," IEEE Trans. Microw. Theor Tech., vol. 56, no. I, pp. 39-48, Jan. 2008. [4] M. Alioto and G. Palumbo, Model and Design of Bipolar and MOS Current-Mode Logic Digital Circuits. New York: Springer, 2005.

[5] Y. Ji-ren et al., "Atue single-phae-clock

dynamicCMOScircuit technique," IEEE J Solid-State

Circuits, vol. 24, no. 2, pp. 62-70, Feb. 1989. [6] S Pellerno et al., "A J3.5-mW 5 GHz frequenc synthesizer with dyamic-logic frequency divider, " IEE J. Solid-State Circuits, vol. 39, no. 2, pp. 378-383, Feb. 2004 [7] V. K. Manthena et al., "A low power fully programmable J MHz resoluti on 2.4 GHz CMOS PLL frequenc synthesizer, " in Proc. IEE Biomed. Circuits Syst. Conf, Nov. 2007.pp 187-19.