DESIGN AND IMPLEMENTATION OF APPLICATION SPECIFIC 32-BITALU USING XILINX FPGA

Similar documents
An Efficient Carry Select Adder with Less Delay and Reduced Area Application

Design and Characterization of High Speed Carry Select Adder

A High Speed Design of 32 Bit Multiplier Using Modified CSLA

Implementation of Ripple Carry and Carry Skip Adders with Speed and Area Efficient

Design and Verification of Area Efficient High-Speed Carry Select Adder

Design of an Efficient 128-Bit Carry Select Adder Using Bec and Variable csla Techniques

Keywords: Soft Core Processor, Arithmetic and Logical Unit, Back End Implementation and Front End Implementation.

Implimentation of A 16-bit RISC Processor for Convolution Application

Design of Delay Efficient Carry Save Adder

High Performance and Area Efficient DSP Architecture using Dadda Multiplier

Keyless Car Entry Authentication System Based on A Novel Face-Recognition Structure

DYNAMIC CIRCUIT TECHNIQUE FOR LOW- POWER MICROPROCESSORS Kuruva Hanumantha Rao 1 (M.tech)

A Novel Efficient VLSI Architecture for IEEE 754 Floating point multiplier using Modified CSA

FPGA Implementation of Efficient Carry-Select Adder Using Verilog HDL

Area Delay Power Efficient Carry Select Adder

Implementation of CMOS Adder for Area & Energy Efficient Arithmetic Applications

32 bit Arithmetic Logical Unit (ALU) using VHDL

High Speed Han Carlson Adder Using Modified SQRT CSLA

Low-Power And Area-Efficient Of 128-Bit Carry Select Adder

Performance of Constant Addition Using Enhanced Flagged Binary Adder

DESIGN AND PERFORMANCE ANALYSIS OF CARRY SELECT ADDER

Let s put together a Manual Processor

Area Delay Power Efficient Carry-Select Adder

International Journal of Innovative Research in Computer Science & Technology (IJIRCST) ISSN: , Volume-3, Issue-5, September-2015

Designing an Improved 64 Bit Arithmetic and Logical Unit for Digital Signaling Processing Purposes

DESIGN AND IMPLEMENTATION OF ADDER ARCHITECTURES AND ANALYSIS OF PERFORMANCE METRICS

Anisha Rani et al., International Journal of Computer Engineering In Research Trends Volume 2, Issue 11, November-2015, pp.

A Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding

VHDL IMPLEMENTATION OF FLOATING POINT MULTIPLIER USING VEDIC MATHEMATICS

A New Architecture Designed for Implementing Area Efficient Carry-Select Adder

16 BIT IMPLEMENTATION OF ASYNCHRONOUS TWOS COMPLEMENT ARRAY MULTIPLIER USING MODIFIED BAUGH-WOOLEY ALGORITHM AND ARCHITECTURE.

ASIC IMPLEMENTATION OF 16 BIT CARRY SELECT ADDER

Chapter 4. The Processor

FPGA based Simulation of Clock Gated ALU Architecture with Multiplexed Logic Enable for Low Power Applications

Keywords: Fast Fourier Transforms (FFT), Multipath Delay Commutator (MDC), Pipelined Architecture, Radix-2 k, VLSI.

Carry Select Adder with High Speed and Power Efficiency

Arithmetic Circuits. Nurul Hazlina Adder 2. Multiplier 3. Arithmetic Logic Unit (ALU) 4. HDL for Arithmetic Circuit

Designing and Characterization of koggestone, Sparse Kogge stone, Spanning tree and Brentkung Adders

FPGA Implementation of ALU Based Address Generation for Memory

Chapter 4. The Processor. Instruction count Determined by ISA and compiler. We will examine two MIPS implementations

Area Delay Power Efficient Carry-Select Adder

FPGA architecture and design technology

VLSI Implementation of Adders for High Speed ALU

A Novel Design of 32 Bit Unsigned Multiplier Using Modified CSLA

Arithmetic Circuits. Design of Digital Circuits 2014 Srdjan Capkun Frank K. Gürkaynak.

An Efficient Hybrid Parallel Prefix Adders for Reverse Converters using QCA Technology

Implementation of CMOS Adder for Area & Energy Efficient Arithmetic Applications

Implementation of Regular Linear Carry Select Adder with Binary to Excess-1 Converter

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume 9 /Issue 3 / OCT 2017

A Novel Architecture of Parallel Multiplier Using Modified Booth s Recoding Unit and Adder for Signed and Unsigned Numbers

VLSI DESIGN OF REDUCED INSTRUCTION SET COMPUTER PROCESSOR CORE USING VHDL

Implementation of 64-Bit Kogge Stone Carry Select Adder with ZFC for Efficient Area

DESIGN AND SIMULATION OF 1 BIT ARITHMETIC LOGIC UNIT DESIGN USING PASS-TRANSISTOR LOGIC FAMILIES

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University

Implementation of Double Precision Floating Point Multiplier on FPGA

Design of Double Precision Floating Point Multiplier Using Vedic Multiplication

Implementation of Low Power High Speed 32 bit ALU using FPGA

International Journal of Engineering and Techniques - Volume 4 Issue 2, April-2018

Design and Analysis of Kogge-Stone and Han-Carlson Adders in 130nm CMOS Technology

Performance Analysis of 64-Bit Carry Look Ahead Adder

University, Patiala, Punjab, India 1 2

A Low-Power Field Programmable VLSI Based on Autonomous Fine-Grain Power Gating Technique

[Sahu* et al., 5(7): July, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116

ISSN Vol.05, Issue.02, February-2017, Pages:

HIGH PERFORMANCE QUATERNARY ARITHMETIC LOGIC UNIT ON PROGRAMMABLE LOGIC DEVICE

HIGH-PERFORMANCE RECONFIGURABLE FIR FILTER USING PIPELINE TECHNIQUE

A Modified Radix2, Radix4 Algorithms and Modified Adder for Parallel Multiplication

Chapter 4. The Processor

International Journal of Scientific & Engineering Research, Volume 4, Issue 10, October ISSN

Run-Time Reconfigurable multi-precision floating point multiplier design based on pipelining technique using Karatsuba-Urdhva algorithms

Implementation of Optimized ALU for Digital System Applications using Partial Reconfiguration

Design & Analysis of 16 bit RISC Processor Using low Power Pipelining

Embedded Soc using High Performance Arm Core Processor D.sridhar raja Assistant professor, Dept. of E&I, Bharath university, Chennai

Area-Delay-Power Efficient Carry-Select Adder

ISSN Vol.02, Issue.11, December-2014, Pages:

International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering

Design of Digital Circuits

Design and Implementation of IEEE-754 Decimal Floating Point Adder, Subtractor and Multiplier

structure syntax different levels of abstraction

Here is a list of lecture objectives. They are provided for you to reflect on what you are supposed to learn, rather than an introduction to this

A High Performance Reconfigurable Data Path Architecture For Flexible Accelerator

Lecture 3: Modeling in VHDL. EE 3610 Digital Systems

Verilog for High Performance

ANADOLU UNIVERSITY DEPARTMENT OF ELECTRICAL AND ELECTRONICS ENGINEERING. EEM Digital Systems II

An Efficient Implementation of Floating Point Multiplier

VLSI Implementation of High Speed and Area Efficient Double-Precision Floating Point Multiplier

Design of a Pipelined 32 Bit MIPS Processor with Floating Point Unit

VHDL Implementation of Arithmetic Logic Unit

FPGA Implementation of a High Speed Multiplier Employing Carry Lookahead Adders in Reduction Phase

Energy Aware Optimized Resource Allocation Using Buffer Based Data Flow In MPSOC Architecture

Digital Logic & Computer Design CS Professor Dan Moldovan Spring 2010

Implementation of Floating Point Multiplier Using Dadda Algorithm

The Processor: Datapath and Control. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

REALIZATION OF AN 8-BIT PROCESSOR USING XILINX

Arithmetic Logic Unit. Digital Computer Design

DESIGN AND ANALYSIS OF COMPETENT ARITHMETIC AND LOGIC UNIT FOR RISC PROCESSOR

International Journal of Research in Computer and Communication Technology, Vol 4, Issue 11, November- 2015

ARCHITECTURAL DESIGN OF 8 BIT FLOATING POINT MULTIPLICATION UNIT

High Speed Radix 8 CORDIC Processor

Hemraj Sharma 1, Abhilasha 2

Transcription:

DESIGN AND IMPLEMENTATION OF APPLICATION SPECIFIC 32-BITALU USING XILINX FPGA T.MALLIKARJUNA 1 *,K.SREENIVASA RAO 2 1 PG Scholar, Annamacharya Institute of Technology & Sciences, Rajampet, A.P, India. 2 Asso.prof, Annamacharya Institute of Technology & Sciences, Rajampet, A.P, India. Abstract-- ALU is a combinational device which performs arithmetic and logical operations. It plays an important role in most of the existing commercial processors. The power consumption and area of a processor settle on the structure of an ALU. There are two structures of an ALU one is tree and another one Chain. This paper aims to design and implementation of 32-bit ALU with chain structure. With this structure we are reducing the overall delay and area of an ALU by repositioning functional components based on the frequency of an operation within the specific application. The FPGA implementation and functionality test of the 32-bit ALU is done by using the Xilinx ISE Design suite 14.7 Tool. Keywords: 32-bit ALU, Verilog, ALU Functional Blocks, Behavioral, chain structure of 32-Bit ALU. Such as Negation, AND, OR, XOR as well As rotate and Shift operations. The power consumption and area of a processor resolve on the structure of an ALU. There are two structures of an ALU one is tree and another one Chain. The main aim of this paper is to design and implementation of a 32-bit ALU which performs addition, subtraction, xor, or, and negation operations. Allthe above module are connected in structure. 1.2 Tree Structure The Tree Structure of an ALU all functional components are connected in parallel to the multiplexor. The multiplexer selects one of the functional block output. The Tree Structure of the ALU contains five functional components for addition, and, xor, or, not. The tree structure design as below figure 1(a) shown, 1. ALU 1.1 Introduction The design of low power and high speed microprocessors requires that its components should consume less power. Arithmetic and Logic Unit (ALU) is one of the most power consuming components in a microprocessor. To reduce the power consumption of the entire ALU each of its components should consume less power. The ALU performs the arithmetic operations such as addition, subtraction multiplication etc. and logical operations. Figure 1(a): Tree Structure Design The above tree structure the multiplexer selects one result as the ALU output among results from all functional components. The tree structure often demands more area. With a modern ISSN: 2278 1323 All Rights Reserved 2014 IJARCET 4020

processor design, where the processor is pipelined into a number of stages, the speed of the processor is determined by the longest delay that is that is associated with the critical path. 1.3 Chain Structure The chain structure functional components are concatenated through a series of multiplexers, each multiplexer takes a subset of functional components and pass the result to the output. The Chain Structure of an ALU shown below figure 1(b). Figure 1(b): Chain Structure The chain structure takes more levels of multiplexor transmission to the output than Tree Structure. Even though the chain structure for the ALU save area. In this paper, we investigate the effect on path delay of functional component placement in the chain structure we proposed a functional component placement approach to reduce path delay for a given application. propagate a carry through the intermediate stages of an adder, the sum for each bit position in anelementary adder is generated sequentially only after the previous bit position has been summed and carry propagated into the next position. In this paper a 32-bit carry select adder(csla) was designed byusing Ripple Carry Adder (RCA), 5-bit and 7-bit BECmodules along with multiplexer. The CSLA is used in many computational systems to generating multiple carries and then select a carry to generate the sum. The CSLA is not area efficient because it uses multiple pairs of RCA, to generate partial sum and carry.in proposed CSLA architecture uses Binary to Excess-1 Converter (BEC) instead of RCA with C_in=1 in the regular CSLA to achieve lower area and power consumption.the main advantage of this Binary to Excess-1 Converter (BEC) logic comes from the lesser number of logic gates than the n-bit Full Adder (FA) structure. The structure and the function table of a 5-bit BEC and 7-bit BEC are shown in Figure 2(a) and 2(b), 2. ALU Functional Blocks The proposed 32-bit ALU consists of adder, and, or, xor & not. All functional components are below shown. 2.132-BIT ADDER In digital adders, the speed of addition is limited by the time required to Figure 2(a): 5-Bit BEC. The Boolean expressions of the 5- bit BEC is listed as X0 = ~B0 X1 = B0^B1 X2 = B2^ (B0 & B1) X3 = B3^ (B0 & B1 & B2) ISSN: 2278 1323 All Rights Reserved 2014 IJARCET 4021

X4=B0&B1&B2&B3 from a given application written in a high level programming language. TheTargetmachine architecture is selected and the program is compiled for the target machine. Based on the instruction set of the machine architecture model for the processor model is developed by either commercial or in-house development tool. Figure 2(b): 7-Bit BEC The Boolean expressions of the 7- bit BEC is listed as X0 = ~B0 X1 = B0^B1 X2 = B2^ (B0 & B1) X3 = B3^ (B0 & B1 & B2) X4= B4^ (B0&B1&B2&B3) X5 = B5^ (B0&B1&B2&B3&B4) X6 = B0&B1&B2&B3&B4&B5 The 32-bit adder RTL Diagram as shown in figure 2(c). Figure 3: Design Flow with Verilog The processor model is then updated with the customize ALU. Since the update does not functionally affect the processor and instruction set architecture, no modification is required to the instruction code. Next, the new processor model for the application code is simulated. The design is synthesized using a synthesis tool. Based on the synthesis, the design is evaluated. 4. Chain Structure of 32-bit ALU In the 32-bit Chain Structure of an ALU hasadder, and, xor, or, andnot modules,which are concatenated by 2- input 32-bit multiplexer with single selection line shown in figure 4.1. Figure 2 (c): 32-Bit adder 3. Verilog ALU Compilation The customization technique can be integrated into a processor design environment. A general design platform is given in Figure 3. The design flow starts Figure 4.1: 2 input 32-bit multiplexer ISSN: 2278 1323 All Rights Reserved 2014 IJARCET 4022

The complete structure of32-bit Chain Structured ALU as shown in below figure 4.2. Figure 4.2: 32-bit Chain Structure Design For the chain structure, there are a variety of options to place functional components. So far, the functional component placement in the chain is arbitrarily chosen in the processor design. To our best knowledge, no work related to this design issue has been reported. In fact, placing a functional component differently in the chain structure may cause different power consumption. For example, swapping the add component and the not component may favor in some applications and can save a considerable amount of ALU power. 5. Simulation results To verify our simulation results, we first developed a small stand-alone ALU for full design space exploration. The ALU contains five functional components each for addition, xor, and, or, not. The experiment is given below figure 5 shows. Figure 5: Simulation Results for 32-bit ALU Chain Structure The above figure shows that all functional components are present in the simulation result. To enter an input values from A, B and c_in(carry in) and selection. The ALU of five functional components was used for exploring designs of all possible placements in order to verify the effectiveness of this approach. Based on the Chain Structure program, the adder is longest component, therefore when it is positioned next to the output in the chain, the overall delay is reduced. The power always reaches a minimum level when the related functional component is placed closet to the output. We can see that the CPU clock time remains changed throughout all designs, which demonstrates that the critical path and differing functional components in ALU affect the processor clock speed. Finally, the processor with the customized ALU is evaluated using the Xilinx ISE Design Suite 14.7 Tool. 6.Conclusion In this paper we discussed the effect of component placement on the chain structure design. We found that the order of functional components in the chain effects is shown in below table 1. ISSN: 2278 1323 All Rights Reserved 2014 IJARCET 4023

Table 1: Different levels of 32-bit adder To reduce latency and area, the frequently operating component should be positioned close to the output of the chain. If the adder is placed at the higher stage (level 1), the path delay is 21.335ns and it takes 250 LUTs. The adder is changed to below XOR placed (level 2). The path delay is 20.289ns and its takes 211 LUTs. Again the adder is changed to OR place (level 3),the path delay is 20.113ns and it takes 216 LUTs. The adder is changed to NOT (level 4), the path delay is 19.961ns and its takes 181 LUTs. So the adder is placed higher stage the path delay is 21.335ns (level 1) and the adder is placed closest to the ALU output, the path delay is 19.961ns (level 4). The overall path delay is 1.37ns will be saved. Therefore the adder is placed closest to the ALU output will take less amount. We developed a Verilog customization approach for the ALU design. The customization is extremely simple. In details neither additional control logic for modification to the inter face of the ALU model in the processor design. 7. References [1] M. Borah, R. M. Owens, and M. j. Irwin. Transistor sizing for low power cmos circuits, IEEE Trans. On computer Aided Design of integrated circuits and systems, 15:665-671, 1996. [2] Y.-T Ho and T.-T. Hwang. Low Power design using dual threshold voltage. In Proceedings of the Asia and South pacific, Design Automation Conference, page 205-208, 2005. [3] C. Thimmannagari. CPU Design: Answers to Frequently Asked Questions, Springer, 2005. [4] L. Shang, L. Peh, and N, Jha. Dynamic Voltage Scaling with links for power optimization of interconnection networks. In proceeding of International High Performance Computer Architecture, page 91-102, 2003. ISSN: 2278 1323 All Rights Reserved 2014 IJARCET 4024

8. BIODATA International Journal of Advanced Research in Computer Engineering & Technology (IJARCET) T. Mallikarjuna, born in Kadapa, A.P., India in 1990. He received his B.Tech Degree in Electronics and Communication Engineering from J.N.T University Anantapuramu, India. Presently pursuing M.Tech (EMBEDDED SYSTEM DESIGN) from Annamacharya Institute of Technology and Sciences, Rajampet, A.P., India. His research interests include VLSI, Digital Signal Processing and micro processor. Mr. K. Sreenivasa Raohas received his M. Tech degree in DSCE. Currently, he is working as Associate Professor in the Department of Electronics & Communication Engineering, Annamachrya Inst of Technology & Science, Rajampet, Kadapa, A.P, and India. He has published a number of research papers in various National and International Journals and Conferences. He is currently working towards Ph.D Degree in at RayalaseemaUniversity, Kurnool, A.P, and India. His areas of interests are VLSI, Micro processor, Embedded Systems and Signals and Systems. ISSN: 2278 1323 All Rights Reserved 2014 IJARCET 4025