Design and Implementation of VLSI 8 Bit Systolic Array Multiplier

Similar documents
DESIGN AND IMPLEMENTATION OF VLSI SYSTOLIC ARRAY MULTIPLIER FOR DSP APPLICATIONS

Mastrovito Multipliers Based New High Speed Hybrid Double Multiplication Architectures Based On Verilog

VLSI Design and Implementation of High Speed and High Throughput DADDA Multiplier

FPGA Implementation of Multiplier for Floating- Point Numbers Based on IEEE Standard

Implementation of Galois Field Arithmetic Unit on FPGA

HIGH-PERFORMANCE RECONFIGURABLE FIR FILTER USING PIPELINE TECHNIQUE

Design and Implementation of Adaptive FIR filter using Systolic Architecture

16 BIT IMPLEMENTATION OF ASYNCHRONOUS TWOS COMPLEMENT ARRAY MULTIPLIER USING MODIFIED BAUGH-WOOLEY ALGORITHM AND ARCHITECTURE.

DESIGN AND PERFORMANCE ANALYSIS OF CARRY SELECT ADDER

Comparison of pipelined IEEE-754 standard floating point multiplier with unpipelined multiplier

International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering

An Efficient Carry Select Adder with Less Delay and Reduced Area Application

FIR Filter Architecture for Fixed and Reconfigurable Applications

International Journal for Research in Applied Science & Engineering Technology (IJRASET) IIR filter design using CSA for DSP applications

Power Optimized Programmable Truncated Multiplier and Accumulator Using Reversible Adder

ISSN Vol.02, Issue.11, December-2014, Pages:

PINE TRAINING ACADEMY

Pipelined Quadratic Equation based Novel Multiplication Method for Cryptographic Applications

VLSI DESIGN OF REDUCED INSTRUCTION SET COMPUTER PROCESSOR CORE USING VHDL

IMPLEMENTATION OF LOW-COMPLEXITY REDUNDANT MULTIPLIER ARCHITECTURE FOR FINITE FIELD

Design and Implementation of Low-Complexity Redundant Multiplier Architecture for Finite Field

Implementation of Pipelined Architecture Based on the DCT and Quantization For JPEG Image Compression

ISSN Vol.08,Issue.12, September-2016, Pages:

A High Speed Design of 32 Bit Multiplier Using Modified CSLA

Prachi Sharma 1, Rama Laxmi 2, Arun Kumar Mishra 3 1 Student, 2,3 Assistant Professor, EC Department, Bhabha College of Engineering

FPGA Implementation of MIPS RISC Processor

FPGA based Simulation of Clock Gated ALU Architecture with Multiplexed Logic Enable for Low Power Applications

Keywords: Soft Core Processor, Arithmetic and Logical Unit, Back End Implementation and Front End Implementation.

A High-Speed FPGA Implementation of an RSD-Based ECC Processor

WORD LEVEL FINITE FIELD MULTIPLIERS USING NORMAL BASIS

VLSI Implementation of Low Power Area Efficient FIR Digital Filter Structures Shaila Khan 1 Uma Sharma 2

ANALYSIS OF AN AREA EFFICIENT VLSI ARCHITECTURE FOR FLOATING POINT MULTIPLIER AND GALOIS FIELD MULTIPLIER*

High Performance Interconnect and NoC Router Design

FPGA Based Design and Simulation of 32- Point FFT Through Radix-2 DIT Algorith

Design and Implementation of CVNS Based Low Power 64-Bit Adder

Analysis of Radix- SDF Pipeline FFT Architecture in VLSI Using Chip Scope

Implementation of A Optimized Systolic Array Architecture for FSBMA using FPGA for Real-time Applications

Design and Implementation of Hamming Code on FPGA using Verilog

COPY RIGHT. To Secure Your Paper As Per UGC Guidelines We Are Providing A Electronic Bar Code

Designing an Improved 64 Bit Arithmetic and Logical Unit for Digital Signaling Processing Purposes

Implementation of Efficient Modified Booth Recoder for Fused Sum-Product Operator

Digit-Level Semi-Systolic and Systolic Structures for the Shifted Polynomial Basis Multiplication Over Binary Extension Fields

FPGA Implementation of High Speed AES Algorithm for Improving The System Computing Speed

An Optimized Montgomery Modular Multiplication Algorithm for Cryptography

DESIGN AND IMPLEMENTATION OF SDR SDRAM CONTROLLER IN VHDL. Shruti Hathwalia* 1, Meenakshi Yadav 2

University, Patiala, Punjab, India 1 2

Verilog for High Performance

Design of Convolution Encoder and Reconfigurable Viterbi Decoder

Xilinx Based Simulation of Line detection Using Hough Transform

Synthesis of VHDL Code for FPGA Design Flow Using Xilinx PlanAhead Tool

DESIGN AND IMPLEMENTATION OF I2C SINGLE MASTER ON FPGA USING VERILOG

International Journal of Computer Sciences and Engineering. Research Paper Volume-6, Issue-2 E-ISSN:

Power and Area Efficient Implementation for Parallel FIR Filters Using FFAs and DA

Design of 8 bit Pipelined Adder using Xilinx ISE

Bus Matrix Synthesis Based On Steiner Graphs for Power Efficient System on Chip Communications

An Efficient Flexible Architecture for Error Tolerant Applications

Parallel-Prefix Adders Implementation Using Reverse Converter Design. Department of ECE

A New Architecture of High Performance WG Stream Cipher

Implementation of Convolution Encoder and Viterbi Decoder Using Verilog

Hemraj Sharma 1, Abhilasha 2

Design and Analysis of 64 bit Multiplier using Carry Look Ahead Logic for Low Latency and Optimized Power Delay Product

FPGA Implementation of Multiplierless 2D DWT Architecture for Image Compression

OPTIMIZATION OF AREA COMPLEXITY AND DELAY USING PRE-ENCODED NR4SD MULTIPLIER.

Performance Analysis of 64-Bit Carry Look Ahead Adder

FPGA Implementation of a High Speed Multiplier Employing Carry Lookahead Adders in Reduction Phase

HIGH-THROUGHPUT FINITE FIELD MULTIPLIERS USING REDUNDANT BASIS FOR FPGA AND ASIC IMPLEMENTATIONS

Fast FPGA Routing Approach Using Stochestic Architecture

FPGA Based Low Area Motion Estimation with BISCD Architecture

ABSTRACT I. INTRODUCTION. 905 P a g e

A Novel Design of 32 Bit Unsigned Multiplier Using Modified CSLA

Performance Evaluation & Design Methodologies for Automated CRC Checking for 32 bit address Using HDLC Block

II. MOTIVATION AND IMPLEMENTATION

Hardware Implementation of Cryptosystem by AES Algorithm Using FPGA

VLSI ARCHITECTURE FOR NANO WIRE BASED ADVANCED ENCRYPTION STANDARD (AES) WITH THE EFFICIENT MULTIPLICATIVE INVERSE UNIT

32 bit Arithmetic Logical Unit (ALU) using VHDL

Implementation of Double Precision Floating Point Multiplier on FPGA

Design and Implementation of 3-D DWT for Video Processing Applications

Design and Analysis of Kogge-Stone and Han-Carlson Adders in 130nm CMOS Technology

Design of a Multiplier Architecture Based on LUT and VHBCSE Algorithm For FIR Filter

High speed Integrated Circuit Hardware Description Language), RTL (Register transfer level). Abstract:

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC)

Simulation & Synthesis of FPGA Based & Resource Efficient Matrix Coprocessor Architecture

Soft-Core Embedded Processor-Based Built-In Self- Test of FPGAs: A Case Study

Implementation of Low Power High Speed 32 bit ALU using FPGA

DESIGN & IMPLEMENTATION OF SYSTOLIC ARRAY ARCHITECTURE

Implementation of Ripple Carry and Carry Skip Adders with Speed and Area Efficient

VLSI Design Of a Novel Pre Encoding Multiplier Using DADDA Multiplier. Guntur(Dt),Pin:522017

Design and Implementation of Gabor Type Filters on FPGA

VHDL for Synthesis. Course Description. Course Duration. Goals

A SIMULINK-TO-FPGA MULTI-RATE HIERARCHICAL FIR FILTER DESIGN

Optimized Design and Implementation of a 16-bit Iterative Logarithmic Multiplier

Design of Convolutional Codes for varying Constraint Lengths

Area And Power Optimized One-Dimensional Median Filter

Hardware Description of Multi-Directional Fast Sobel Edge Detection Processor by VHDL for Implementing on FPGA

THE INTERNATIONAL JOURNAL OF SCIENCE & TECHNOLEDGE

IMPLEMENTATION OF AN ADAPTIVE FIR FILTER USING HIGH SPEED DISTRIBUTED ARITHMETIC

VLSI Implementation of Parallel CRC Using Pipelining, Unfolding and Retiming

A Simple Method to Improve the throughput of A Multiplier

AN EFFICIENT DESIGN OF VLSI ARCHITECTURE FOR FAULT DETECTION USING ORTHOGONAL LATIN SQUARES (OLS) CODES

Linköping University Post Print. Analysis of Twiddle Factor Memory Complexity of Radix-2^i Pipelined FFTs

Transcription:

Design and Implementation of VLSI 8 Bit Systolic Array Multiplier Khumanthem Devjit Singh, K. Jyothi MTech student (VLSI & ES), GIET, Rajahmundry, AP, India Associate Professor, Dept. of ECE, GIET, Rajahmundry, AP, India ABSTRACT: Multiplication is most commonly used operation in mathematics. Integer multiplication is used commonly in the real world, and binary multiplication is the basic multiplication used for the integer multiplication. Systolic algorithms are the efficient algorithms to perform the binary multiplication. Systolic array is an arrangement of processors in an array where data flows synchronously across the array between neighbors, usually with different data flowing in different directions. Each processor at each step takes in data from one or more neighbors (e.g. North and West), processes it and, in the next step, outputs results in the opposite direction (South and East). The present work is concentrated on developing hardware model for systolic multiplier using Verilog HDL (Hardware Description Language) as a platform. The design is simulated using modelsim simulator and synthesized on Spartan 3 FPGA Board. KEYWORDS: Systolic Array, Parallel Processing, Xilinx, FPGA, VHDL. I. INTRODUCTION The paper briefs about the method to verify appropriate program execution. In order to achieve the high speed and low power demand in DSP applications, parallel array multipliers are widely used. In DSP applications, most of the power is consumed by the multipliers. Hence, low power multipliers must be designed in order to reduce the power dissipation in DSP applications. Systolic algorithm is the form of pipelining, sometimes in more than one dimension. In this algorithm data flows from a memory in a rhythmic fashion, passing through many processing elements before it returns to memory. H. T. Kung and Charles Leiserson were the first to publish a paper on systolic arrays in 1978, and coined the name, which refers to the pumping action of the heart. What can a systolic array do? This question is very important, and the answer is that every Sequential algorithm that can be transformed to a parallel version suitable for running on array processors that execute operations in the so-called systolic manner, and systolic array is one of solutions to the need for a highly parallel computational power. Due to the use of matrix-multiplication algorithm in wide fields such as Digital Signal Processing (DSP), image processing, solutions of differential comparison, nonnumeric application, and complex arithmetic operations, we shall design an 8-bit systolic array that designed and implements. Verilog finds many applications because of its very high speed integrated circuit. It is a hardware description language and program can be loaded into the chip and can be used by tool user. As the language supports flexible design methodology, it can be used to define complex electronic systems. Common language can be used to describe the library components. Because of its unique feature, Verilog is used to design N-bit binary multiplier. The program in Verilog provides the scope for multiplication of two N-bit binary numbers by including the user defined package. II. PURPOSE OF IMPLEMENTATION The purpose of this implementation is to efficiently resolve a wide spectrum of computational problems by designing and implementing the systolic array architecture on FPGA. Designers search for high performance and flexible architectures; so this paper introduces one of the most efficient ways to meeting them. The systolic array combines different properties that are seldom found together. These properties are 1) Flexible hardware and software (i.e. FPGA technology). 2) Spatial and temporal parallelism (i.e. systolic array Architecture). Copyright to IJAREEIE www.ijareeie.com 13331

3) Easily scalable architecture. 4) High degree of pipelining. III. DESIGN METHODOLOGY In systolic multiplication, to carry out the multiplication and get the final product following steps should be followed 1. The multiplicand and multiplier are arranged in the form of array as shown in the Fig.1 2. Each bit of multiplicand is multiplied with each bit of multiplier to get the partial products. 3. The partial products of the same column are added along with carry generated. 4. So the resulted output by adding partial products and the carry is the final product of the two binary numbers. Fig.1. Methodology of systolic multiplication Fig.2. Block diagram of 8 bit systolic Multiplier Copyright to IJAREEIE www.ijareeie.com 13332

Figure shows the functional units of the 8 bit Systolic Multiplier. Each unit is an independent processing unit. These units share information with their neighbors, after performing the needed operations on the data. Each box in the Fig. represents a full adder. Inputs vector X and Y are ANDed and the AND outputs are fed as input to full adder, which gives rise to two outputs. One is the actual multiplied output vector Z and another one is the carry generated from the addition of the AND output, which is further used by other full adders. In this way computation takes place simultaneously in the rows and columns. All the inputs are applied simultaneously, therefore registers are not required. Availability of inputs at the start of the computation and the systolic array structure helps the multiplier to perform faster compared to other multipliers. IV. IMPLEMENTATION The programming environment for implementing the circuit is based on Verilog. In our implementation Systolic Array Multiplier is designed for 8 bits using structural and behavioral styles and is implemented, tested on the Spartan-3 FPGA board. Obtained results are accurate and error free. In structural modeling, multiplier is divided into 3 sections i.e. upper, middle and lower sections. Where, all the three sections operate on the data simultaneously. Full adder and AND gates are basic building blocks of the multiplier. Each section has 8 full adders and associated AND gates. The entity part or the peripheral view of the multiplier is given in Fig. 3 Fig.3. Entity view of multiplier The behavioral description is written and implemented based on behavior of the Systolic Array multiplier. A timing analysis tool is then applied to the object module to determine maximum operating speed. If algorithms are time consuming then the efficiency of multiplier decreases because, in recent days the efficiency is measured not only with the accuracy but also with the speed. As FPGA is purely hardware circuit, the time taken by it to execute the algorithm is much less. Thus, the Systolic multiplication algorithm implemented on FPGA works faster than any other multiplication technique. Copyright to IJAREEIE www.ijareeie.com 13333

V. DEVICE UTILIZATION SUMMARY Table.1. Device Utilization Summary Device Utilization Summary [-] Logic Utilization Used Available Utilization Note(s) Number of 4 input LUTs 123 7,168 1% Number of occupied Slices 66 3,584 1% Number of Slices containing only related logic 66 66 100% Number of Slices containing unrelated logic 0 66 0% Total Number of 4 input LUTs 123 7,168 1% Number of bonded IOBs 32 141 22% Average Fan-out of Non-Clock Nets 3.50 Table shows the device utilization summary obtained from the synthesis report of structural modeling respectively. By analyzing the synthesis results there is drastic reduction of resources of FPGA when modeled with two different styles and power consumption is also reduced to a greater extent. Because of this implementation, system requires very less space while designing hardware and eliminates the complexity in hardware design. Fig.4. Entity view of multiplier Simulation results are as shown in the Fig. 4 for inputs 11001100 and 01100110 and the output obtained is 1110101101001000. Fig. 5 and Fig.6 shows the RTL schematic of the 8 bit Systolic Array Multiplier obtained by using structural style of modeling. Copyright to IJAREEIE www.ijareeie.com 13334

Fig.5. RTL schematic for structural style of modeling Fig.6. Technology schematic Copyright to IJAREEIE www.ijareeie.com 13335

VI. CONCLUSION Thus the design of 8 bit Systolic Array Multiplier design was optimized using structural style compared with behavioral style. The designed circuit has been implemented on FPGA and simulated using Isim simulator version 14.3 simulation software and the results are up to the mark. Again, using Xilinx XST we have synthesized the design on Spartan 3e board effectively. By implementing such designs in Verilog one easily understands the behavior of designing aspects effectively. If this prototype is implemented in real time then there will be number of advantages benefited to the mankind. REFERENCES [1] H. T. Kung, "Why systolic architectures?" IEEE Com., vol. 15. [2] R. Smith and G. Sobelman, "Simulation-Based Design of Programmablc Systolic Arrays." Coniprrier-Aided Design. Vol. 2.3. No. IO. Dec. 1901, pp. 669-675. [3] C. Wenyang, L. Yanda. and J. Yuc, "Systolic Realization for 2D Convolution Using Conrigurable Functional Method in VLSI Parallel Array Dcsigns." Proc. IEEE, Compurrrs und DigirtrlTe~hnologyV~o l. 138, No. S. Sept. 1991. pp. 361-370. [4] Anand Kumar, Fundamentals of Digital Circuits, Prentice Hall India Publication. [5] Peter Mc Curry, Fearghal Morgan, Liam Kilmartin. Xilinx FPGA implementation of a pixel processor for object detection applications. In the Proc. Irish Signals and Systems Conference, Volume 3, Page(s):346 349, Oct. 2001. [6] Douglas Perry, VHDL Programming By Example, McGraw Hill Publication. [7] Peter J. Ashenden, VHDL Tutoria l, EDA Consultant, Ashenden Designs PTY. LTD. [8] IEEE Standard VHDL Language Reference Manual. IEEE Std 1076-1987, New York, NY 1988. [9] VHDL Compiler Reference Manual Synopsis Inc., Mt View, CA, 1992 [10] M. Ciet, J. J. Quisquater, and F. Sica, A secure family of compositefinite fields suitable for fast implementation of elliptic curve cryptography, in Proc. Int. Conf. Cryptol. India, 2001, pp. 108 116. [11] H. Fan and M. A. Hasan, Relationship between GF (2 )Montgomery and shifted polynomial basis multiplication algorithms, IEEETrans. Computers, vol. 55, no. 9, pp. 1202 1206, Sep. 2006. [12] C.-L.Wang and J-L. Lin, Systolic array implementation of multipliersfor finite fields GF (2 ), IEEE Trans. Circuits Syst., vol. 38, no. 7,pp. 796 800, Jul. 1991. [13] B. Sunar and C. K. Koc, Mastrovito multiplier for all trinomials, IEEE Trans. Comput., vol. 48, no. 5, pp. 522 527, May 1999. [14] C. H. Kim, C.-P. Hong, and S. Kwon, A digit-serial multiplier forfinite field GF (2 ), IEEE Trans. Very Large Scale Integr. (VLSI)Syst., vol. 13, no. 4, pp. 476 483, 2005. [15] C. Paar, Low complexity parallel multipliers for Galois fieldsgf (2 )based on special types of primitive polynomials, in Proc.IEEE Int. Symp. Inform. Theory, 1994, p. 98. Copyright to IJAREEIE www.ijareeie.com 13336