Reconfiguration of Memory for High Speed Matrix Multiplication

Similar documents
EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs)

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC)

Outline. EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs) FPGA Overview. Why FPGAs?

FPGA architecture and design technology

FPGA: What? Why? Marco D. Santambrogio

ANADOLU UNIVERSITY DEPARTMENT OF ELECTRICAL AND ELECTRONICS ENGINEERING. EEM Digital Systems II

FPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011

Introduction to Microcontrollers

Dec Hex Bin ORG ; ZERO. Introduction To Computing

Register Transfer and Micro-operations

PROGRAMMABLE MODULES SPECIFICATION OF PROGRAMMABLE COMBINATIONAL AND SEQUENTIAL MODULES

FPGA Implementation of ALU Based Address Generation for Memory

FPGA Matrix Multiplier

Evolution of Implementation Technologies. ECE 4211/5211 Rapid Prototyping with FPGAs. Gate Array Technology (IBM s) Programmable Logic

A Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding

Field Programmable Gate Array

Digital Logic Design Lab

475 Electronics for physicists Introduction to FPGA programming

COPROCESSOR APPROACH TO ACCELERATING MULTIMEDIA APPLICATION [CLAUDIO BRUNELLI, JARI NURMI ] Processor Design

Keywords: Soft Core Processor, Arithmetic and Logical Unit, Back End Implementation and Front End Implementation.

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University

HCTL Open Int. J. of Technology Innovations and Research HCTL Open IJTIR, Volume 4, July 2013 e-issn: ISBN (Print):

Outline. Field Programmable Gate Arrays. Programming Technologies Architectures. Programming Interfaces. Historical perspective

RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC. Zoltan Baruch

Matrix Manipulation Using High Computing Field Programmable Gate Arrays

Introduction to Field Programmable Gate Arrays

UNIT-III REGISTER TRANSFER LANGUAGE AND DESIGN OF CONTROL UNIT

New Logic Module for secured FPGA based system

32 bit Arithmetic Logical Unit (ALU) using VHDL

The x86 Microprocessors. Introduction. The 80x86 Microprocessors. 1.1 Assembly Language

Efficient Self-Reconfigurable Implementations Using On-Chip Memory

FPGA Design Challenge :Techkriti 14 Digital Design using Verilog Part 1

Run-Time Reconfigurable multi-precision floating point multiplier design based on pipelining technique using Karatsuba-Urdhva algorithms

THE DESIGN OF HIGH PERFORMANCE BARREL INTEGER ADDER S.VenuGopal* 1, J. Mahesh 2

For more notes of DAE

FPGA Implementation of a High Speed Multiplier Employing Carry Lookahead Adders in Reduction Phase

Implementation of FPGA for Decision Making for IC S Library

Module 5 - CPU Design

Implementation of Lifting-Based Two Dimensional Discrete Wavelet Transform on FPGA Using Pipeline Architecture

VLSI Based 16 Bit ALU with Interfacing Circuit

Implementation of Ethernet, Aurora and their Integrated module for High Speed Serial Data Transmission using Xilinx EDK on Virtex-5 FPGA

CPE/EE 422/522. Introduction to Xilinx Virtex Field-Programmable Gate Arrays Devices. Dr. Rhonda Kay Gaede UAH. Outline

CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP

Hardware Modeling using Verilog Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

AL8253 Core Application Note

PIPELINE AND VECTOR PROCESSING

e-pg Pathshala Subject : Computer Science Paper: Embedded System Module: Introduction to Computing Module No: CS/ES/1 Quadrant 1 e-text

Embedded Soc using High Performance Arm Core Processor D.sridhar raja Assistant professor, Dept. of E&I, Bharath university, Chennai

ISSN Vol.03, Issue.08, October-2015, Pages:

Date Performed: Marks Obtained: /10. Group Members (ID):. Experiment # 09 MULTIPLEXERS

Novel Design of Dual Core RISC Architecture Implementation

International Journal of Applied Sciences, Engineering and Management ISSN , Vol. 05, No. 02, March 2016, pp

A Brief Introduction to Verilog Hardware Definition Language (HDL)

Design and Implementation of Buffer Loan Algorithm for BiNoC Router

Workshop on Digital Circuit Design in FPGA

Scalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA

Mapping Multi-Million Gate SoCs on FPGAs: Industrial Methodology and Experience

Lecture-9 Intel 8085 Microprocessor It is a 40-pin DIP(Dual in package) chip, base on NMOS technology, on a single chip of silicon.

Implementation of Ripple Carry and Carry Skip Adders with Speed and Area Efficient

address ALU the operation opcode ACC Acc memory address

Segment 1A. Introduction to Microcomputer and Microprocessor

FPGA. Logic Block. Plessey FPGA: basic building block here is 2-input NAND gate which is connected to each other to implement desired function.

Design and Simulation of Power Optimized 8 Bit Arithmetic Unit using Gating Techniques in Cadence 90nm Technology

Multiplier Generator V6.0. Features

4DM4 Lab. #1 A: Introduction to VHDL and FPGAs B: An Unbuffered Crossbar Switch (posted Thursday, Sept 19, 2013)

SIDDHARTH GROUP OF INSTITUTIONS :: PUTTUR Siddharth Nagar, Narayanavanam Road QUESTION BANK (DESCRIPTIVE) UNIT-I

! Program logic functions, interconnect using SRAM. ! Advantages: ! Re-programmable; ! dynamically reconfigurable; ! uses standard processes.

b. List different system buses of 8085 microprocessor and give function of each bus. (8) Answer:

Module 5 Introduction to Parallel Processing Systems

2. BLOCK DIAGRAM Figure 1 shows the block diagram of an Asynchronous FIFO and the signals associated with it.

Laboratory Exercise 3 Comparative Analysis of Hardware and Emulation Forms of Signed 32-Bit Multiplication

Simulation & Synthesis of FPGA Based & Resource Efficient Matrix Coprocessor Architecture

Microprocessors and Microcontrollers Prof. Santanu Chattopadhyay Department of E & EC Engineering Indian Institute of Technology, Kharagpur

Design and Implementation of Arbiter schemes for SDRAM on FPGA

Knowledge Organiser. Computing. Year 10 Term 1 Hardware

EE 8217 *Reconfigurable Computing Systems Engineering* Sample of Final Examination

EE577A FINAL PROJECT REPORT Design of a General Purpose CPU

DESIGN AND IMPLEMENTATION OF I2C SINGLE MASTER ON FPGA USING VERILOG

Novel Architecture for Designing Asynchronous First in First out (FIFO)

Three-box Model: These three boxes need interconnecting (usually done by wiring known as a bus. 1. Processor CPU e.g. Pentium 4 2.

CC411: Introduction To Microprocessors

Basic Processing Unit: Some Fundamental Concepts, Execution of a. Complete Instruction, Multiple Bus Organization, Hard-wired Control,

Microprocessor (COM 9323)

Combinational Logic with MSI and LSI

In this lecture, we will look at how storage (or memory) works with processor in a computer system. This is in preparation for the next lecture, in

THE LOGIC OF COMPOUND STATEMENTS

THE INTERNATIONAL JOURNAL OF SCIENCE & TECHNOLEDGE

CSE 141L Computer Architecture Lab Fall Lecture 3

Learning Outcomes. Spiral 3 1. Digital Design Targets ASICS & FPGAS REVIEW. Hardware/Software Interfacing

DIGITAL DESIGN TECHNOLOGY & TECHNIQUES

3.1 Description of Microprocessor. 3.2 History of Microprocessor

CPLD Experiment 4. XOR and XNOR Gates with Applications

CprE 583 Reconfigurable Computing

CHAPTER 9 MULTIPLEXERS, DECODERS, AND PROGRAMMABLE LOGIC DEVICES

Contents. Chapter 9 Datapaths Page 1 of 28

DESIGN AND IMPLEMENTATION OF 32-BIT CONTROLLER FOR INTERACTIVE INTERFACING WITH RECONFIGURABLE COMPUTING SYSTEMS

Today. Comments about assignment Max 1/T (skew = 0) Max clock skew? Comments about assignment 3 ASICs and Programmable logic Others courses

MICROPROCESSOR AND MICROCONTROLLER BASED SYSTEMS

Field Programmable Gate Array (FPGA)

Microcomputer Architecture and Programming

Transcription:

Reconfiguration of Memory for High Speed Matrix Multiplication Ms. Shamshad Shirgeri 1,Ms. Pallavi Umesh Naik 2,Mrs. Rohini 3, Prof. Krishnananda Shet 4 1,2,3 Student, M.Tech in Industrial Electronics, 4 - Research Scholar, 1,2,3-KLS s VDRIT,Haliyal affiliated to VTU,Belgaum, 4- Kuvempu University, Shivamogga, Karnataka, India. Abstract The paper entitled RECONFIGURATION OF MEMORY FOR HIGH SPEED MATRIX MULTIPLICATION is basically an enhancement in speed of matrix multiplication. The project aims at reducing the time involved in the operation. The system presented here makes use of three RAMS designated as MAT- RAM-A, MAT-RAM-B and MAT-RAM-C and system also consists of a control circuitry. This system works in two modes i.e., mode-1 and mode-2. In mode-1 the RAM s act as extensions of existing memory of the processor and in mode-2 the two RAMs MAT-RAM-A and MAT-RAM-B are configured as source for hardware with MAT-RAM-C working as destination for storage or resultant matrix after multiplication. The RAMs are configured back in mode-1, so that data can be read from MAT-RAM-C. Keywords: RAM, MAT-RAM-A, MAT- RAM-B and MAT-RAM-C, SIMD. 1. INTRODUCTION In the present electronic era, speed plays a very crucial role. Speed size ad economics are the present buzzwords in the electronic industry. Any designer will have to take all the three outlined factors into account before starting any design. The trend now-adays is miniaturization coupled with fast access time. The technique presented here uses additional RAMs which will act as extension of the existing RAMs of computers. These RAM devices are switched in a special configuration to provide high speed matrix multiplication. In vector computers, special vector registers within the CPU are used for handling vector operations. The hardware and associated software will have to take care of transferring data approximately. This is quite an involved task and it involves substantial hardware and software overheads. In SIMD architecture the multiplication of two matrices involves a number of load and store instructions. Moreover this requires N processors for N x N matrix operation. Also in this case the data will have to be stored column wise in each machine and then load and store instructions have to be executed for N times. The technique presented here overcomes these problems and provides a very convenient and economical solution for high speed matrix multiplication. 2. PREAMBLE 2.1 Statement of problem Traditional method for matrix multiplication consumes lot of time, as it involves reading from memory, multiplying them and again storing the result back in memory and this process has to be repeated for each element of the matrix. This process 1904

obviously wastes processors time. Processors time can be judiciously utilized by the technique presented here. 2.2 Scope of Study This system provides a very convenient and economical solution for high speed matrix multiplication. The speed is quite high and is limited only by the access time of the RAM devices used for this purpose. 2.3 Methodology Splitting the design into functional modules. The complete design to be implemented to be broken into smaller modules catering to the different functions. Adding the designed modules to the library. The design modules are added to the library so that they can be accessed whenever necessary. This method reduces length of coding. Interconnecting all the modules to complete the circuit. The various modules required for are called from the library and then are interconnected as required. 2.4 Limitation of the system The speed of the system is limited only by the access time of the RAMS used in the circuit. External control unit RAM s This system operates in one of the two modes in mode-1 these can be used as RAM s of the main processor, connected just as an extension RAM s of the main processor. In this mode the necessary control signals of the RAM s are connected to the processors data bus. In mode-2, these are used for matrix multiplication process. In this mode the RAM s are isolated from the main processor MATRAM-A and MATRAM-B store the input data matrix and output of the operation is stored in MATRAM-C. The necessary control signals are obtained from the external circuit. 3.1 Switching unit The switching part consists of two multiplexers one for transferring data between RAM and external control unit. The other multiplexer is used for routing the control signals to RAM either from the processor or from an external control unit. Depending upon the select input switching unit connect the RAM s either to the processor or the external control unit. The system operates in two modes, the mode of operation is decided by the input applied to select pin of switching unit. 3. BLOCK DIAGRAM OF RECONFIGURATION OF MEMORY FOR HIGH SPEED MATRIX MULTIPLICATION Figure2. Switching Unit Figure1. Block Diagram of reconfiguration of memory for high speed matrix multiplication The system presented here consists of Processor Switching unit 1905

3.2 External control unit Figure3. External Control Unit It consists of A Multiplier An 8 bit parallel adder A register Three counters A clock 3.2.3 Register Used for storing the sum from the adder block temporarily. Register initially contains zero. Register gets reset once the resultant matrix element is generated. 3.2.4 Counters The counters used here are binary up counters that generate addresses of RAMs during Mode-2 operation along with the addresses, and bi-directional multiplexers used I the circuit that controls the mode of operation of the system. The counter has a register in it, which should be loaded with the number of elements of the matrix. The counter counts until the loaded value, during which the select signal generated by the counter is at logic high and hence the system operates I mode-2, as soon as the count equals the stored value the select signal becomes low, now the system is switched back to mode- 1. 3.2.1 Multiplier Used for the multiplication of the matrix elements. It takes the inputs from the two matrices, namely MATRAM-A and MATRAM-B. It generates the partial product and this becomes the input for the order block. 3.2.2 8 bit Parallel adder Used for the summation purpose. It has two inputs one is partial product from the multiplier block and the other input is the previous sum from the register block. Figure4. 8 bit parallel adder 3.3 RAM s Figure5. Counters MATRAM-A, MATRAM-B and MATRAM- C which are present in the system are just an extension of existing RAM s of the main processor during mode-1 and in this mode the address lines are generated by the processor and input data are given by the processor itself. MATRAM-A and MATRAM-B are configured as source for the hardware. MATRAM-C works as destination matrix to store the resultant. During mode-2 their address lines are generated by the external circuit. The RAM s are again configured back in mode-1, so that data can be read from MATRAM-C. 1906

4. MODES OF OPERATION 4.1 Mode-1 In mode-1, the input to the select pin of external circuit is at logic low. Under this condition the RAM s are connected to the processor. Under this mode, the RAM s are connected to the processor and are isolated from the external circuit. In this mode the address and control signals for the RAM s are obtained from the processor, and processor data bus is connected to the RAM s data bus. Hence reading from or writing into the RAM is under the control of the processor. 4.2 Mode-2 In the mode-2 the input to the select pin is high. Under this condition the RAM s are connected to the external circuit. The RAM s are completely isolated from main processor. In this mode, the address of the RAM s are obtained from the external control unit, the data buses of RAM-A and RAM-B are connected to the multiplier input and the data bus of RAM-C is connected to the output of the adder. Figure6. Implementation of Reconfiguration of memory for high speed matrix multiplication using VHDL. 6. GRAPHICAL COMPARISION 5. VHDL IMPLEMENTATION Figure7. Graphical comparison of Reconfiguration of memory for high speed matrix multiplication 1907

7. CONCLUSION It provides a much faster and an economic technique for matrix multiplication operation. The speed in this technique will be more than ten times faster compared to Pentium processor. Cost involved will be less compared to that of a vector machine or a SMID machine. The speed of the operation is only limited by the access time of the RAM chip used in the circuit. 8. REFERENCES P.H.W. Leong, M.P. Leong, O.Y.H. Cheung, T. Tung, C.M. Kwok, and K.H. Lee, Pilchard A Reconfigurable Computing Platform with Memory Interface. Xilinx, Virtex-E 1.8 V Field Programmable Gate Arrays. Randall Hyde, "The Art of Assembly Language Programming". http://webster.cs.ucr.edu/page_aoa Linux/HTML/AoATOC.html IEEE international conference on An FPGA based floating-point processor array supporting a high-precision dot product by Fritz Mayer. 1908