An Efficient Hardwired Realization of Embedded Neural Controller on System-On-Programmable-Chip (SOPC)

Size: px
Start display at page:

Download "An Efficient Hardwired Realization of Embedded Neural Controller on System-On-Programmable-Chip (SOPC)"

Transcription

1 An Efficient Hardwired Realization of Embedded Neural Controller on System-On-Programmable-Chip (SOPC) Karan Goel, Uday Arun, A. K. Sinha ABES Engineering College, Department of Computer Science & Engineering, Ghaziabad Membership, IEEE Circuit and System Society and ACM Embedded System Computing Society Membership, IEEE Computer Society Abstract In this paper, an efficient hardwired realization of Embedded Neural Controller (ENC) on Field Programmable Gate Array (FPGA/SOPC) is presented. This system architecture is described using Very High Speed Integrated Circuits Hardware Description Language (VHDL) and implemented on Xilinx FPGA/SOPC chip. A design is verified on a Xilinx FPGA/SOPC demo board. This paper discusses the issues involved in the implementation of a multi-input neuron with nonlinear activation function using FPGA/SOPC. The issues are parallel/sequential implementation of neurons, data representation and precision for the inputs, weights & activation function and the use of hybrid method (PLAN with RALUT approximation) for the approximation of a nonlinear, continuous, monotonically increasing, differentiable activation function. It is shown that fully parallel implementation of neurons is not feasible in hardware rather an alternative method called MAC circuit is used to overcome this problem. Moreover, fixed point (FXP) representation is proved to be better than floating point (FLP) representation. Finally, hardware realization of activation function is presented. The results are analysed in terms of low values of average error & maximum error, maximum working frequency, high speed of execution, low power consumption, less delay, less area, higher accuracy, low cost and usage percentage of chip resources such as number of slices (each slice is a logic unit which includes two 4-input look-up tables (LUT) & two flip-flops), number of Input/output blocks (IOB) & number of block RAMs. Here, the multiplier has a parallel, non-pipelined and combinational structure. Keywords: ANN, FPGA/SOPC, sigmoid activation function, tangent hyperbolic activation function, PlAN approximation, LUT approximation, RALUT approximation, hybrid approximation method. Introduction Artificial neural networks (ANNs) have been used in many fields such as solving pattern classification & recognition problems, signal processing, control system etc. Although, neural networks have been implemented mostly in software, hardware versions are gaining importance. Software versions have the advantage of being easy to implement, but with poor performance. Hardware versions are generally more difficult and time consuming to implement, but with better performance than software versions. Hardware version takes the advantage of neural network s inherent parallelism. Neural networks can be implemented using analog or digital systems. The analog ones are more precise but difficult to implement and have problem with weight storage. The digital implementation is more popular as it has the advantage of higher accuracy, better repeatability, lower noise sensitivity, better testability, higher flexibility, compatibility with other types of processors & has no problem with weight storage. The digital NN hardware implementation are further classified as DSP Embedded Development board, Microprocessor/Microcontroller based, ASICbased, VLSI-based, FPGA- based implementations. Digital Signal Processor (DSP) and microprocessor based implementation is sequential and hence does not preserve the parallel architecture of the neurons in a layer. Application Specific Integrated Circuit (ASIC) & Very Large Scale Integration (VLSI) based implementation do not offer re-configurability by the user, run only specific algorithms, expensive, time consuming to develop such chip and limitation on the size of the network. 276

2 FPGA is a suitable hardware for neural network implementation as it preserves the parallel architecture of the neurons in a layer and offer flexibility in reconfiguration and maintains high gate density which is needed to utilize the parallel computation in an ANN. Parallelism, modularity and dynamic adaptation are three computational characteristics typically associated with ANNs. Therefore, FPGA/SOPC based implementation is best suited to implement ANNs. FPGA/SOPC are a family of programmable device based on an array of configurable logic blocks (CLBs) which gives a great flexibility in prototyping, designing and development of complex hardware real time systems. When implementing ANN on FPGA/SOPC, certain measures are to be taken to minimize the hardware. These measures are the data representation and the number of bits (precision) of inputs, weights and the activation function. 1. Generic Architecture of Multilayer Neural Network (MLP) The Generic architecture of multilayer neural network shown in figure 1 consists of a layer of input nodes called input layer, one or more hidden layer(s) consisting of hidden nodes and one output layer consisting of output nodes NN Implementation with only One Input for the Weights Figure 2. NN implementation with only one input for the weights 2. Mathematical Model of a Single Artificial Neuron The common mathematical model of a single artificial neuron is shown as The neuron output can be written as: Input layer Hidden layer Output layer Where Figure 1. Generic architecture of MLP x1, x2 & x3 are the inputs. HN1 to HN6 are the hidden nodes that constitutes a hidden layer Y1, Y2, Y3 are the outputs of the network. Where p j is the input value and w j is the corresponding weight value, a is the output of the neuron and the f(.) is a nonlinear activation function. 3. Data Precision & Representation Before beginning a hardware implementation of an ANN, an arithmetic representation (floating point or fixed point) and the number of bits (precision) must be considered for the inputs, weights and activation function, which may range from 16 bit, 32 bit or 64 bit floating-point format (FLP) or fixed point format (FXP). Increasing the precision of the design elements significantly increases the resources used. 277

3 FLP format is area hungry with improved precision whereas FXP format needs minimal resources at the expense of precision. The minimum allowable precision and minimum allowable range is to be decided to conserve silicon area and power without affecting the performance of ANN. Precision of fixed point data is independent of the number but dependent on the position of the radix point. Antony W.Savich et al. [9] had shown that the resource requirement of networks implemented with FXP arithmetic is appreciably two times lesser than the FLP arithmetic with similar precision and range. Also FXP computation produces better convergence in lesser clock cycles. For a normalized input and output in the range of [0,1] to realize the log-sigmoid activation function, Holt and Bakes [10] showed that 16-bit FXP (1-3-12) representation was the minimum allowable precision. Ligon III et al. [11] exhibited the density advantage of 32-bit FXP computation over FLP computation for Xilinx FPGA/SOPC Analogy of Fixed Point Format The Fixed point format is defined as follows: [s] a.b Where the optional s denotes a sign bit with 0 for positive and 1 for negative numbers, a is the number of integer bits and b is the number of fractional bits. To represent negative numbers, two s complement notation is used. The examples in Table 1 clarify the notation. If the sign bit is used, the minimum x min and maximum x max numbers in a.b notation are x min = 2 -a x max = 2 a - 2 -b If the sign bit is not used, the minimum x min and maximum x max numbers in a.b notation are x min = 0 x max = 2 a - 2 -b 4. Implementation of Fully Parallel Neural Network A fully parallel network is fast but inflexible. In a fully parallel network, the number of multipliers per neuron must be equal to the number of connection to this neuron. Since all of the products must be summed, the number of full adders equals to the number of connections to the previous layer minus one. In architecture, the output neuron must have 5 multipliers and 4 Full-Adders, while each hidden neuron must have 3 multipliers and 2 adders. So, different neuron architectures have to be designed for each layer. A second drawback of a fully parallel network is gate resource usage because multipliers are costly hardware. Krips et. Al. [1] proposed such architecture. 278

4 5. Design and Modelling of Multiply and Accumulate (MAC) Circuit of a Neuron In this work, we chose multiply and accumulate structure for neurons. In this structure, there is one multiplier and one accumulator per neuron. The inputs from previous layer neurons enter the neuron serially and are multiplied with their corresponding weights. Every neuron has its own weight storage ROM. Multiplied values are summed in an accumulator. The processes are synchronized to clock (global) signal. The number of clock cycles for a neuron to finish its work is equals to the number of connections from the previous layer. The accumulator has a load signal, so that the bias values are loaded to all neurons at start-up. This neuron architecture is shown in figure 3. In this design, the neuron architecture is fixed throughout the network and is not dependent on the number of connection. coming from the previous layer is fed successively at each clock cycle. All of the neurons in the layer operate parallel. They take an input from their common input line, multiply it with the corresponding weight from their weight ROM and accumulate the product. If the previous layer has 3 neurons, present layer take and processes these inputs in 3 clock cycles. After these 3 clock cycles, every neuron in the layer has its net values ready. Then the layer starts to transfer these values to its output one by one for the next layer to take them successively by enabling corresponding neuron s three-state output. Since, only one neuron s output have to be present at the layer s output at a time, instead of implementing an activation function for each neuron, it is convenient to implement one activation function for each layer. The block diagram of a layer architecture including 3 neurons is shown in figure 4. Figure 3. Block diagram of a single neuron The precision of the weights and input values are both 8 bits. The multiplier is an 8-bit by 8-bit multiplier, which results in a 16-bit product, and the accumulator is 16-bits wide. The accumulator has also an enable signal, which enables the accumulating function and an out_enable signal for its three state outputs. Here the multiplier has a parallel, non-pipelined, combinational structure, generated by Xilinx Logicore Multiplier Generator V 5.0[4]. 6. Analysis & Synthesis of Layered Architecture of Multi-Neurons In our design, an ANN layer has one input which is connected to all neurons in this layer. But previous layer may have several outputs depending on the number of neurons it has. Each input to the layer Figure 4. Block diagram of a layer consisting of 3 neurons 7. Design, Modelling, Synthesis of Neural Network Architecture The control signals in the system are generated by a state machine. This state machine is responsible of controlling all of the operations in the network. First it activates the load signals of the neurons and 279

5 the neurons load their bias values. Then hidden layer is enabled for 3 clock cycles, and then the output layer consisting of a single neuron is enabled for 5 clock cycles. Out_enable signals are also activated by this state machine. The state machine generates weight ROM addresses in a priory determined sequence so that the same address lines can be used by all of the neurons in the system. Input RAM also uses this address line. Once the input RAM is loaded by input values using the switches on the board, the propagation phase starts and the output of the network is displayed. The block diagram of the network is shown in figure 5. The resource usage of the selected network architecture is shown in table 2. and is shown by figure 6. Figure 5. Block diagram of the complete neural network 8. Hardware Realization of Activation Function One of the important components in designing ANNs in hardware is the nonlinear activation function. The most common used function is sigmoid activation function and tangent hyperbolic activation function. These activation functions is a monotonically increasing, continuous, differential and nonlinear exponential function that limits the output of a neuron within the certain range i.e. [0,1] & [-1, 1] respectively. These functions are reported to be a better activation function for ANN. It is defined as: Figure 6. Log-sigmoid and hyperbolic tangent AF Straightforward implementation of these functions in hardware is not practical due to their exponential nature. Several different approaches exist for the hardware approximation of activation function, including piecewise linear approximation, look-up tables (LUT), Range addressable look-up tables and Hybrid Methods [2],[3],[4],[5],[6],[7],[8]. The efficiency criteria for a successful approximation are the achieved accuracy, speed and area resources Analysis of Piece-wise Linear Approximation of a Nonlinear Function (PLAN) Piecewise linear (PWL) approximations uses a series of linear segments to approximate the activation function [2]. Piecewise linear approximation is a method to obtain low values for both maximum and average error with low computational complexity. The number and locations of these segments are chosen such that error, processing time and area utilization are minimized. The number of pieces required in the PWL approximation of the activation function depends upon the complexity of the problem to be solved [12]. Using power of two co-efficient to represent the slope of the linear segments allows right shift operations instead of multiplication [12] to reduce the hardware as multipliers are expensive hardware components in terms of area and delay. Piecewise linear approximations are slow, but consume less area. For hardware implementation, it is convenient that the number of data points be a power of two, we will assume that the interval I is divided into 2 k intervals: 280

6 [L, L + (U-L)/ 2 k ), [L + (U-L)/ 2 k, L + 2(U-L)/2 k )..., [L + 2 k-1 (U-L)/2 k ) Where U and L are the upper and lower limit of inputs applied to the activation function respectively. Then, given an argument x, the interval into which it falls can readily be located by using, as an address, the k most significant bits of the binary representation of x. A piecewise linear approximation of the hyperbolic tangent function with five segments is shown in figure 7. Figure 7. Piecewise linear approximation of Tanh(x) with five segments The representation of five segments is shown as: Mod n : Subtraction : Constant adder Simulated results of PWL based approach is shown in table Analysis of LUT Approximation With this approach, the function is approximated by a limited number of uniformly distributed points [5], [7]. This methods results in high performance design, however it will require a large amount of memory to store the look up table (LUT) in hardware. The look-up table s input space is designed so that it covers a range between -8 and +8. In our data representation, this range corresponds to s4.5 format ie.. 10 bits. So, a 1024*8 ROM is needed to implement the activation function s look-up table. The LUT design does not optimize well under FLP format and hence fixed point format is used. But still on-chip realization of log-sigmoid function increases the size of hardware considerably. To optimize the area to some extent, the inbuilt RAM available in FPGA/SOPC is used to realize LUT based activation function. It reduces area and improves speed. If the precision is to improve, then PWL approximation approach of activation function is to be utilized along with the look-up table (Hybrid Method). The look-up table approximation of the sigmoid function with sixteen points is shown in figure 9. The computational modules of the activation function is as shown in figure 8. Figure 9. LUT approximation of sigmoid function with sixteen points Figure 8. Computational modules of the activation function Individual sub blocks are realized using FXP formats. 8-n : Inverter module /n : constant bit shifts The log sigmoid activation function is realized by representing input x in 10 bits and output f(x) in 8 bits. 10-bit input x has the following FXP format to represent the address of the memory location and output f(x) is represented as 8-bit fractional part only, to indicate the respective content of memory location. The data precision of the above two approaches (PWL and LUT approximation) of AF 281

7 is shown in table 3. Simulated results of LUT based approach is shown in table 6. Comparison of Synthesis report for LUT and PWL approximation using Xilinx is shown in table 5. situations where the output remains constant over a range. Examples are the hyperbolic tangent and the sigmoid functions, where the output changes only slightly outside the range of (-2, 2). A RALUT approximation of the hyperbolic tangent function with seven points is shown in figure 10. The smallest possible value that can be represented as output is 2 -n, where n is the number of bits in the stream. This value should be treated as zero, since f(x)=0 when x=minus infinity, which may never occur. Hence an error 2 -n is introduced. Similarly the largest possible value of f(x) is 1-2 -n to limit the output to be within (0, 1). Therefore, the minimum value of AF, f(x1) = 1 / (1+e -x1 ) = 2 -n which decides the lower limit of input, x1 = -ln (2 n 1) Similarly, the maximum vale of AF, f(x2) = 1 / (1 + e-x2) = 1-2 -n which decides the upper limit of input, x2 = ln (2n - 1) Hence, the step size or increment of input is found to be X = ln [( n ) / ( n )] The minimum number of LUT values is given by min number = (x2-x1) / X min number for the x1 and x2 respectively is found to be 710 for 8-bit precision. Hence 1K RAM with 10-bit address lines is chosen that gives 1024 locations with step size X = For a 10 bit input x, the minimum value x1 = and the maximum value is x2 = The output RAM has 1K memory in which 710 locations are used leaving 374 locations unused. Wastage of memory is a draw back and the second method (PWL) overcomes it RALUT approximation It shares many aspects with the classic LUT with a few notable differences. In LUTs, every data point stored in the table corresponds to a unique address. In RALUTs, every data point corresponds to a range of addresses. This alternate addressing allows for a large reduction in data points in Figure 10. RALUT approximation of hyperbolic tangent function 8.4. Analysis of Hybrid Method Approximation Hybrid methods usually combine two or more of the previously mentioned methods to achieve better performance. This method makes use of piecewise linear approximation along with the use of RALUT [8]. It initially makes use of a piecewise linear approximation with five segments to approximate the activation function. Afterwards, the difference between the actual function s value and the approximated value will be adjusted to achieve a better approximation. The adjustment values are calculated offline and stored in a lookup table. These adjustments consists of subtracting the values stored in the lookup table from the initial piecewise linear approximation. This can be achieved with a simple combinational subtracter circuit. This method has two main advantages. First, all the five segments were chosen to avoid the use of a multiplier in the design. Second, the lookup table used in our design stores the difference between the piecewise linear approximation and the actual function. This greatly minimizes the range of the output variance, which reduces the size of the lookup table. To further optimize the design, RALUT was used instead of classic LUT. The block diagram of the hybrid system is shown in figure 11. Complexity comparison of different implementation is shown in table

8 Table 5. Comparison of Synthesis report for LUT and PWL approximation using Xilinx Figure 11. System block for a hybrid method 9. Experimental results Table 2. Resource usage of the network architecture Table 5. Simulated results of LUT based approach Table 3. Data precision of LUT and PWL Activation function Table 7. Complexity comparison of different approximation method of activation function in terms of area and delay Table 4. Simulated results of PWL based approach 10. Conclusion Table 5. Simulated results of LUT based approach This work has presented the analysis, modeling, synthesis, verification, prototyping and implementation of embedded neural controller for high intelligent control systems on embedded SOPC & SPARTAN6 FPGA device. The proposed network architecture is modular, being possible to easily increase or decrease the number of neurons as well as layers. SOPC can be used for portable, modular, and reconfigurable hardwired solutions 283

9 for neural networks, which have been mostly used to be realized on computers until now. With the internal layer parallelism we used, it takes lesser clock cycles for the applied network to calculate its output as compared to microprocessor based implementation. Because neural networks are inherently parallel structures, parallel architectures always result faster than serial ones. Moreover, hardware implementation of an activation function that is used to realize an ANN is tried in this work. A two-fold approach to minimize the hardware is proposed here, aiming at the SOPC realization. Firstly, FXP arithmetic is followed against FLP arithmetic to reduce area, by compromising slightly on accuracy. Secondly, PLAN approximation with five linear segments along with RALUT is proved to be hardware friendly. 11. Future work Future research work includes the analysis, modeling, synthesis, verification, prototyping and implementation of embedded neural controller for high intelligent control systems on embedded SOPC & SPARTAN6 FPGA device, analysis & synthesis of hybrid approximation of activation function, pipelined based hybrid multiplier and memory elements for embedded neural controller. 12. References 1. M. Krips, T. Lammert, and Anton Kummert, FPGA Implementation of a Neural Network for a Real-Time Hand Tracking System, Proceedings of the first IEEE International Workshop on Electronic Design, Test and Applications, K. Basterretxea and J.M. Tarela and I. Del Campo, Approximation of sigmoid function and the derivative for hardware implementation of artificial neurons, IEE Proceedings - Circuits, Devices and Systems, vol. 151, no. 1, pp , February K. Basterretxea and J.M. Tarela, Approximation of sigmoid function and the derivative for artificial neurons, advances in neural networks and applications, WSES Press, Athens, pp , M.T. Tommiska, Efficient digital implementation of the sigmoid function for reprogrammable logic, IEE Proceedings on Computers and Digital Techniques, vol. 150, no. 6, pp , Nov K. Leboeuf, A.H. Namin, R. Muscedere, H. Wu, and M. Ahmadi Efficient Hardware Implementation of the Hyperbolic Tangent Sigmoid Function, Third International Conference on Convergence and hybrid Information Technology, accepted, November H.K. Kwan, Simple sigmoid-like activation function suitable for digital hardware implementation Electronic Letters,vol. 28, no. 15, pp , July F. Piazza, A. Uncini, and M. Zenobi, Neural networks with digital LUT activation functions, Proceedings of 1993 International Joint Conference on Neural Networks, IJCNN, pp ,October A.H. Namin, K. Leboeuf, R. Muscedere, H. Wu, and M. Ahmadi, HighSpeed VLSI Implementation of the Hyperbolic Tangent Sigmoid Function, IEEE International Symposium on Circuits and Systems (ISCAS), accepted, May Antony W. Savich, Medhat Moussa, and Shawki Areibi, The Impact of Arithmetic Representation on Implementing MLP-BP on FPGAs: A Study, IEEE Trans. Neural Networks, vol.18, no.1, Jan J. Holt and T. Baker, Back propagation simulations using limited precision calculations, in Proc. Int. Joint Conf. Neural Netw. (IJCNN-91), Seattle, WA, Jul. 1991, vol. 2, pp W. Ligon, III, S. McMillan, G. Monn, K. Schoonover, F. Stivers, andk. Underwood, A re-evaluation of the practicality of floating point operations on FPGAs, in Proc. IEEE Symp. FPGAs Custom Comput. Mach., K. L. Pocek and J. Arnold, Eds., Los Alamitos, CA, 1998, pp

A Library of Parameterized Floating-point Modules and Their Use

A Library of Parameterized Floating-point Modules and Their Use A Library of Parameterized Floating-point Modules and Their Use Pavle Belanović and Miriam Leeser Department of Electrical and Computer Engineering Northeastern University Boston, MA, 02115, USA {pbelanov,mel}@ece.neu.edu

More information

DIGITAL IMPLEMENTATION OF THE SIGMOID FUNCTION FOR FPGA CIRCUITS

DIGITAL IMPLEMENTATION OF THE SIGMOID FUNCTION FOR FPGA CIRCUITS DIGITAL IMPLEMENTATION OF THE SIGMOID FUNCTION FOR FPGA CIRCUITS Alin TISAN, Stefan ONIGA, Daniel MIC, Attila BUCHMAN Electronic and Computer Department, North University of Baia Mare, Baia Mare, Romania

More information

Journal of Engineering Technology Volume 6, Special Issue on Technology Innovations and Applications Oct. 2017, PP

Journal of Engineering Technology Volume 6, Special Issue on Technology Innovations and Applications Oct. 2017, PP Oct. 07, PP. 00-05 Implementation of a digital neuron using system verilog Azhar Syed and Vilas H Gaidhane Department of Electrical and Electronics Engineering, BITS Pilani Dubai Campus, DIAC Dubai-345055,

More information

IMPROVEMENTS TO THE BACKPROPAGATION ALGORITHM

IMPROVEMENTS TO THE BACKPROPAGATION ALGORITHM Annals of the University of Petroşani, Economics, 12(4), 2012, 185-192 185 IMPROVEMENTS TO THE BACKPROPAGATION ALGORITHM MIRCEA PETRINI * ABSTACT: This paper presents some simple techniques to improve

More information

Implementation of FPGA-Based General Purpose Artificial Neural Network

Implementation of FPGA-Based General Purpose Artificial Neural Network Implementation of FPGA-Based General Purpose Artificial Neural Network Chandrashekhar Kalbande & Anil Bavaskar Dept of Electronics Engineering, Priyadarshini College of Nagpur, Maharashtra India E-mail

More information

ISSN (Online), Volume 1, Special Issue 2(ICITET 15), March 2015 International Journal of Innovative Trends and Emerging Technologies

ISSN (Online), Volume 1, Special Issue 2(ICITET 15), March 2015 International Journal of Innovative Trends and Emerging Technologies VLSI IMPLEMENTATION OF HIGH PERFORMANCE DISTRIBUTED ARITHMETIC (DA) BASED ADAPTIVE FILTER WITH FAST CONVERGENCE FACTOR G. PARTHIBAN 1, P.SATHIYA 2 PG Student, VLSI Design, Department of ECE, Surya Group

More information

Fast Evaluation of the Square Root and Other Nonlinear Functions in FPGA

Fast Evaluation of the Square Root and Other Nonlinear Functions in FPGA Edith Cowan University Research Online ECU Publications Pre. 20 2008 Fast Evaluation of the Square Root and Other Nonlinear Functions in FPGA Stefan Lachowicz Edith Cowan University Hans-Joerg Pfleiderer

More information

INTRODUCTION TO FPGA ARCHITECTURE

INTRODUCTION TO FPGA ARCHITECTURE 3/3/25 INTRODUCTION TO FPGA ARCHITECTURE DIGITAL LOGIC DESIGN (BASIC TECHNIQUES) a b a y 2input Black Box y b Functional Schematic a b y a b y a b y 2 Truth Table (AND) Truth Table (OR) Truth Table (XOR)

More information

FPGA Implementation of Multiplier for Floating- Point Numbers Based on IEEE Standard

FPGA Implementation of Multiplier for Floating- Point Numbers Based on IEEE Standard FPGA Implementation of Multiplier for Floating- Point Numbers Based on IEEE 754-2008 Standard M. Shyamsi, M. I. Ibrahimy, S. M. A. Motakabber and M. R. Ahsan Dept. of Electrical and Computer Engineering

More information

IEEE-754 compliant Algorithms for Fast Multiplication of Double Precision Floating Point Numbers

IEEE-754 compliant Algorithms for Fast Multiplication of Double Precision Floating Point Numbers International Journal of Research in Computer Science ISSN 2249-8257 Volume 1 Issue 1 (2011) pp. 1-7 White Globe Publications www.ijorcs.org IEEE-754 compliant Algorithms for Fast Multiplication of Double

More information

Design and Implementation of VLSI 8 Bit Systolic Array Multiplier

Design and Implementation of VLSI 8 Bit Systolic Array Multiplier Design and Implementation of VLSI 8 Bit Systolic Array Multiplier Khumanthem Devjit Singh, K. Jyothi MTech student (VLSI & ES), GIET, Rajahmundry, AP, India Associate Professor, Dept. of ECE, GIET, Rajahmundry,

More information

A SIMULINK-TO-FPGA MULTI-RATE HIERARCHICAL FIR FILTER DESIGN

A SIMULINK-TO-FPGA MULTI-RATE HIERARCHICAL FIR FILTER DESIGN A SIMULINK-TO-FPGA MULTI-RATE HIERARCHICAL FIR FILTER DESIGN Xiaoying Li 1 Fuming Sun 2 Enhua Wu 1, 3 1 University of Macau, Macao, China 2 University of Science and Technology Beijing, Beijing, China

More information

Fast evaluation of nonlinear functions using FPGAs

Fast evaluation of nonlinear functions using FPGAs Adv. Radio Sci., 6, 233 237, 2008 Author(s) 2008. This work is distributed under the Creative Commons Attribution 3.0 License. Advances in Radio Science Fast evaluation of nonlinear functions using FPGAs

More information

Implementation of CORDIC Algorithms in FPGA

Implementation of CORDIC Algorithms in FPGA Summer Project Report Implementation of CORDIC Algorithms in FPGA Sidharth Thomas Suyash Mahar under the guidance of Dr. Bishnu Prasad Das May 2017 Department of Electronics and Communication Engineering

More information

HIGH-PERFORMANCE RECONFIGURABLE FIR FILTER USING PIPELINE TECHNIQUE

HIGH-PERFORMANCE RECONFIGURABLE FIR FILTER USING PIPELINE TECHNIQUE HIGH-PERFORMANCE RECONFIGURABLE FIR FILTER USING PIPELINE TECHNIQUE Anni Benitta.M #1 and Felcy Jeba Malar.M *2 1# Centre for excellence in VLSI Design, ECE, KCG College of Technology, Chennai, Tamilnadu

More information

Field Programmable Gate Array (FPGA)

Field Programmable Gate Array (FPGA) Field Programmable Gate Array (FPGA) Lecturer: Krébesz, Tamas 1 FPGA in general Reprogrammable Si chip Invented in 1985 by Ross Freeman (Xilinx inc.) Combines the advantages of ASIC and uc-based systems

More information

Evaluation of High Speed Hardware Multipliers - Fixed Point and Floating point

Evaluation of High Speed Hardware Multipliers - Fixed Point and Floating point International Journal of Electrical and Computer Engineering (IJECE) Vol. 3, No. 6, December 2013, pp. 805~814 ISSN: 2088-8708 805 Evaluation of High Speed Hardware Multipliers - Fixed Point and Floating

More information

An Optimized Montgomery Modular Multiplication Algorithm for Cryptography

An Optimized Montgomery Modular Multiplication Algorithm for Cryptography 118 IJCSNS International Journal of Computer Science and Network Security, VOL.13 No.1, January 2013 An Optimized Montgomery Modular Multiplication Algorithm for Cryptography G.Narmadha 1 Asst.Prof /ECE,

More information

International Journal of Advanced Research in Computer Science and Software Engineering

International Journal of Advanced Research in Computer Science and Software Engineering ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: Configuring Floating Point Multiplier on Spartan 2E Hardware

More information

Pipelined Quadratic Equation based Novel Multiplication Method for Cryptographic Applications

Pipelined Quadratic Equation based Novel Multiplication Method for Cryptographic Applications , Vol 7(4S), 34 39, April 204 ISSN (Print): 0974-6846 ISSN (Online) : 0974-5645 Pipelined Quadratic Equation based Novel Multiplication Method for Cryptographic Applications B. Vignesh *, K. P. Sridhar

More information

Synthesis of VHDL Code for FPGA Design Flow Using Xilinx PlanAhead Tool

Synthesis of VHDL Code for FPGA Design Flow Using Xilinx PlanAhead Tool Synthesis of VHDL Code for FPGA Design Flow Using Xilinx PlanAhead Tool Md. Abdul Latif Sarker, Moon Ho Lee Division of Electronics & Information Engineering Chonbuk National University 664-14 1GA Dekjin-Dong

More information

FPGA Matrix Multiplier

FPGA Matrix Multiplier FPGA Matrix Multiplier In Hwan Baek Henri Samueli School of Engineering and Applied Science University of California Los Angeles Los Angeles, California Email: chris.inhwan.baek@gmail.com David Boeck Henri

More information

Implementation of 14 bits floating point numbers of calculating units for neural network hardware development

Implementation of 14 bits floating point numbers of calculating units for neural network hardware development International Conference on Recent Trends in Physics 206 (ICRTP206) Journal of Physics: Conference Series 755 (206) 000 doi:0.088/742-6596/755//000 Implementation of 4 bits floating numbers of calculating

More information

Performance Analysis of CORDIC Architectures Targeted by FPGA Devices

Performance Analysis of CORDIC Architectures Targeted by FPGA Devices International OPEN ACCESS Journal Of Modern Engineering Research (IJMER) Performance Analysis of CORDIC Architectures Targeted by FPGA Devices Guddeti Nagarjuna Reddy 1, R.Jayalakshmi 2, Dr.K.Umapathy

More information

VHDL IMPLEMENTATION OF FLOATING POINT MULTIPLIER USING VEDIC MATHEMATICS

VHDL IMPLEMENTATION OF FLOATING POINT MULTIPLIER USING VEDIC MATHEMATICS VHDL IMPLEMENTATION OF FLOATING POINT MULTIPLIER USING VEDIC MATHEMATICS I.V.VAIBHAV 1, K.V.SAICHARAN 1, B.SRAVANTHI 1, D.SRINIVASULU 2 1 Students of Department of ECE,SACET, Chirala, AP, India 2 Associate

More information

An Efficient Constant Multiplier Architecture Based On Vertical- Horizontal Binary Common Sub-Expression Elimination Algorithm

An Efficient Constant Multiplier Architecture Based On Vertical- Horizontal Binary Common Sub-Expression Elimination Algorithm Volume-6, Issue-6, November-December 2016 International Journal of Engineering and Management Research Page Number: 229-234 An Efficient Constant Multiplier Architecture Based On Vertical- Horizontal Binary

More information

IMPLEMENTATION OF DOUBLE PRECISION FLOATING POINT RADIX-2 FFT USING VHDL

IMPLEMENTATION OF DOUBLE PRECISION FLOATING POINT RADIX-2 FFT USING VHDL IMPLEMENTATION OF DOUBLE PRECISION FLOATING POINT RADIX-2 FFT USING VHDL Tharanidevi.B 1, Jayaprakash.R 2 Assistant Professor, Dept. of ECE, Bharathiyar Institute of Engineering for Woman, Salem, TamilNadu,

More information

MCM Based FIR Filter Architecture for High Performance

MCM Based FIR Filter Architecture for High Performance ISSN No: 2454-9614 MCM Based FIR Filter Architecture for High Performance R.Gopalana, A.Parameswari * Department Of Electronics and Communication Engineering, Velalar College of Engineering and Technology,

More information

An Efficient Implementation of Multi Layer Perceptron Neural Network for Signal Processing

An Efficient Implementation of Multi Layer Perceptron Neural Network for Signal Processing An Efficient Implementation of Multi Layer Perceptron Neural Network for Signal Processing A.THILAGAVATHY 1, M.E Vlsi Desgin(Pg Scholar), Srinivasan Engineering College, Perambalur-621 212, Tamilnadu,

More information

FPGA Implementation of a Nonlinear Two Dimensional Fuzzy Filter

FPGA Implementation of a Nonlinear Two Dimensional Fuzzy Filter Justin G. R. Delva, Ali M. Reza, and Robert D. Turney + + CORE Solutions Group, Xilinx San Jose, CA 9514-3450, USA Department of Electrical Engineering and Computer Science, UWM Milwaukee, Wisconsin 5301-0784,

More information

Digital Electronics 27. Digital System Design using PLDs

Digital Electronics 27. Digital System Design using PLDs 1 Module -27 Digital System Design 1. Introduction 2. Digital System Design 2.1 Standard (Fixed function) ICs based approach 2.2 Programmable ICs based approach 3. Comparison of Digital System Design approaches

More information

DESIGN AND PERFORMANCE ANALYSIS OF CARRY SELECT ADDER

DESIGN AND PERFORMANCE ANALYSIS OF CARRY SELECT ADDER DESIGN AND PERFORMANCE ANALYSIS OF CARRY SELECT ADDER Bhuvaneswaran.M 1, Elamathi.K 2 Assistant Professor, Muthayammal Engineering college, Rasipuram, Tamil Nadu, India 1 Assistant Professor, Muthayammal

More information

University, Patiala, Punjab, India 1 2

University, Patiala, Punjab, India 1 2 1102 Design and Implementation of Efficient Adder based Floating Point Multiplier LOKESH BHARDWAJ 1, SAKSHI BAJAJ 2 1 Student, M.tech, VLSI, 2 Assistant Professor,Electronics and Communication Engineering

More information

Binary Adders. Ripple-Carry Adder

Binary Adders. Ripple-Carry Adder Ripple-Carry Adder Binary Adders x n y n x y x y c n FA c n - c 2 FA c FA c s n MSB position Longest delay (Critical-path delay): d c(n) = n d carry = 2n gate delays d s(n-) = (n-) d carry +d sum = 2n

More information

Design of Convolution Encoder and Reconfigurable Viterbi Decoder

Design of Convolution Encoder and Reconfigurable Viterbi Decoder RESEARCH INVENTY: International Journal of Engineering and Science ISSN: 2278-4721, Vol. 1, Issue 3 (Sept 2012), PP 15-21 www.researchinventy.com Design of Convolution Encoder and Reconfigurable Viterbi

More information

Developing a Data Driven System for Computational Neuroscience

Developing a Data Driven System for Computational Neuroscience Developing a Data Driven System for Computational Neuroscience Ross Snider and Yongming Zhu Montana State University, Bozeman MT 59717, USA Abstract. A data driven system implies the need to integrate

More information

FPGA architecture and design technology

FPGA architecture and design technology CE 435 Embedded Systems Spring 2017 FPGA architecture and design technology Nikos Bellas Computer and Communications Engineering Department University of Thessaly 1 FPGA fabric A generic island-style FPGA

More information

TSEA44 - Design for FPGAs

TSEA44 - Design for FPGAs 2015-11-24 Now for something else... Adapting designs to FPGAs Why? Clock frequency Area Power Target FPGA architecture: Xilinx FPGAs with 4 input LUTs (such as Virtex-II) Determining the maximum frequency

More information

PROJECT REPORT IMPLEMENTATION OF LOGARITHM COMPUTATION DEVICE AS PART OF VLSI TOOLS COURSE

PROJECT REPORT IMPLEMENTATION OF LOGARITHM COMPUTATION DEVICE AS PART OF VLSI TOOLS COURSE PROJECT REPORT ON IMPLEMENTATION OF LOGARITHM COMPUTATION DEVICE AS PART OF VLSI TOOLS COURSE Project Guide Prof Ravindra Jayanti By Mukund UG3 (ECE) 200630022 Introduction The project was implemented

More information

FPGA Implementation of Multiplierless 2D DWT Architecture for Image Compression

FPGA Implementation of Multiplierless 2D DWT Architecture for Image Compression FPGA Implementation of Multiplierless 2D DWT Architecture for Image Compression Divakara.S.S, Research Scholar, J.S.S. Research Foundation, Mysore Cyril Prasanna Raj P Dean(R&D), MSEC, Bangalore Thejas

More information

Implementation and Analysis of an Error Detection and Correction System on FPGA

Implementation and Analysis of an Error Detection and Correction System on FPGA Implementation and Analysis of an Error Detection and Correction System on FPGA Constantin Anton, Laurenţiu Mihai Ionescu, Ion Tutănescu, Alin Mazăre, Gheorghe Şerban University of Piteşti, Romania Abstract

More information

Fused Floating Point Arithmetic Unit for Radix 2 FFT Implementation

Fused Floating Point Arithmetic Unit for Radix 2 FFT Implementation IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 6, Issue 2, Ver. I (Mar. -Apr. 2016), PP 58-65 e-issn: 2319 4200, p-issn No. : 2319 4197 www.iosrjournals.org Fused Floating Point Arithmetic

More information

PERFORMANCE ANALYSIS OF HIGH EFFICIENCY LOW DENSITY PARITY-CHECK CODE DECODER FOR LOW POWER APPLICATIONS

PERFORMANCE ANALYSIS OF HIGH EFFICIENCY LOW DENSITY PARITY-CHECK CODE DECODER FOR LOW POWER APPLICATIONS American Journal of Applied Sciences 11 (4): 558-563, 2014 ISSN: 1546-9239 2014 Science Publication doi:10.3844/ajassp.2014.558.563 Published Online 11 (4) 2014 (http://www.thescipub.com/ajas.toc) PERFORMANCE

More information

LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS

LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS Neural Networks Classifier Introduction INPUT: classification data, i.e. it contains an classification (class) attribute. WE also say that the class

More information

ON THE IMPLEMENTATION OF DIFFERENT HYPERBOLIC TANGENT SOLUTIONS IN FPGA. Darío Baptista and Fernando Morgado-Dias

ON THE IMPLEMENTATION OF DIFFERENT HYPERBOLIC TANGENT SOLUTIONS IN FPGA. Darío Baptista and Fernando Morgado-Dias ON THE IMPLEMENTATION OF DIFFERENT HYPERBOLIC TANGENT SOLUTIONS IN FPGA Darío Baptista and Fernando Morgado-Dias Madeira Interactive Technologies Institute and Centro de Competências de Ciências Exactas

More information

Low Power and Memory Efficient FFT Architecture Using Modified CORDIC Algorithm

Low Power and Memory Efficient FFT Architecture Using Modified CORDIC Algorithm Low Power and Memory Efficient FFT Architecture Using Modified CORDIC Algorithm 1 A.Malashri, 2 C.Paramasivam 1 PG Student, Department of Electronics and Communication K S Rangasamy College Of Technology,

More information

DESIGN AND IMPLEMENTATION OF VLSI SYSTOLIC ARRAY MULTIPLIER FOR DSP APPLICATIONS

DESIGN AND IMPLEMENTATION OF VLSI SYSTOLIC ARRAY MULTIPLIER FOR DSP APPLICATIONS International Journal of Computing Academic Research (IJCAR) ISSN 2305-9184 Volume 2, Number 4 (August 2013), pp. 140-146 MEACSE Publications http://www.meacse.org/ijcar DESIGN AND IMPLEMENTATION OF VLSI

More information

International Journal of Advanced Research in Computer Science and Software Engineering

International Journal of Advanced Research in Computer Science and Software Engineering ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: Implementation of Floating Point Multiplier on Reconfigurable

More information

FPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST

FPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST FPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST SAKTHIVEL Assistant Professor, Department of ECE, Coimbatore Institute of Engineering and Technology Abstract- FPGA is

More information

IMPLEMENTATION OF FPGA-BASED ARTIFICIAL NEURAL NETWORK (ANN) FOR FULL ADDER. Research Scholar, IIT Kharagpur.

IMPLEMENTATION OF FPGA-BASED ARTIFICIAL NEURAL NETWORK (ANN) FOR FULL ADDER. Research Scholar, IIT Kharagpur. Journal of Analysis and Computation (JAC) (An International Peer Reviewed Journal), www.ijaconline.com, ISSN 0973-2861 Volume XI, Issue I, Jan- December 2018 IMPLEMENTATION OF FPGA-BASED ARTIFICIAL NEURAL

More information

Design and Implementation of CVNS Based Low Power 64-Bit Adder

Design and Implementation of CVNS Based Low Power 64-Bit Adder Design and Implementation of CVNS Based Low Power 64-Bit Adder Ch.Vijay Kumar Department of ECE Embedded Systems & VLSI Design Vishakhapatnam, India Sri.Sagara Pandu Department of ECE Embedded Systems

More information

New Computational Modeling for Solving Higher Order ODE based on FPGA

New Computational Modeling for Solving Higher Order ODE based on FPGA New Computational Modeling for Solving Higher Order ODE based on FPGA Alireza Fasih 1, Tuan Do Trong 2, Jean Chamberlain Chedjou 3, Kyandoghere Kyamakya 4 1, 3, 4 Alpen-Adria University of Klagenfurt Austria

More information

Topics. Midterm Finish Chapter 7

Topics. Midterm Finish Chapter 7 Lecture 9 Topics Midterm Finish Chapter 7 ROM (review) Memory device in which permanent binary information is stored. Example: 32 x 8 ROM Five input lines (2 5 = 32) 32 outputs, each representing a memory

More information

Analysis of Different Multiplication Algorithms & FPGA Implementation

Analysis of Different Multiplication Algorithms & FPGA Implementation IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 4, Issue 2, Ver. I (Mar-Apr. 2014), PP 29-35 e-issn: 2319 4200, p-issn No. : 2319 4197 Analysis of Different Multiplication Algorithms & FPGA

More information

FPGAs: FAST TRACK TO DSP

FPGAs: FAST TRACK TO DSP FPGAs: FAST TRACK TO DSP Revised February 2009 ABSRACT: Given the prevalence of digital signal processing in a variety of industry segments, several implementation solutions are available depending on

More information

A Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding

A Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding A Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding N.Rajagopala krishnan, k.sivasuparamanyan, G.Ramadoss Abstract Field Programmable Gate Arrays (FPGAs) are widely

More information

Designing and Characterization of koggestone, Sparse Kogge stone, Spanning tree and Brentkung Adders

Designing and Characterization of koggestone, Sparse Kogge stone, Spanning tree and Brentkung Adders Vol. 3, Issue. 4, July-august. 2013 pp-2266-2270 ISSN: 2249-6645 Designing and Characterization of koggestone, Sparse Kogge stone, Spanning tree and Brentkung Adders V.Krishna Kumari (1), Y.Sri Chakrapani

More information

Adaptive FIR Filter Using Distributed Airthmetic for Area Efficient Design

Adaptive FIR Filter Using Distributed Airthmetic for Area Efficient Design International Journal of Scientific and Research Publications, Volume 5, Issue 1, January 2015 1 Adaptive FIR Filter Using Distributed Airthmetic for Area Efficient Design Manish Kumar *, Dr. R.Ramesh

More information

COPY RIGHT. To Secure Your Paper As Per UGC Guidelines We Are Providing A Electronic Bar Code

COPY RIGHT. To Secure Your Paper As Per UGC Guidelines We Are Providing A Electronic Bar Code COPY RIGHT 2018IJIEMR.Personal use of this material is permitted. Permission from IJIEMR must be obtained for all other uses, in any current or future media, including reprinting/republishing this material

More information

A Ripple Carry Adder based Low Power Architecture of LMS Adaptive Filter

A Ripple Carry Adder based Low Power Architecture of LMS Adaptive Filter A Ripple Carry Adder based Low Power Architecture of LMS Adaptive Filter A.S. Sneka Priyaa PG Scholar Government College of Technology Coimbatore ABSTRACT The Least Mean Square Adaptive Filter is frequently

More information

Simulation & Synthesis of FPGA Based & Resource Efficient Matrix Coprocessor Architecture

Simulation & Synthesis of FPGA Based & Resource Efficient Matrix Coprocessor Architecture Simulation & Synthesis of FPGA Based & Resource Efficient Matrix Coprocessor Architecture Jai Prakash Mishra 1, Mukesh Maheshwari 2 1 M.Tech Scholar, Electronics & Communication Engineering, JNU Jaipur,

More information

EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs)

EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs) EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs) September 12, 2002 John Wawrzynek Fall 2002 EECS150 - Lec06-FPGA Page 1 Outline What are FPGAs? Why use FPGAs (a short history

More information

High Speed Special Function Unit for Graphics Processing Unit

High Speed Special Function Unit for Graphics Processing Unit High Speed Special Function Unit for Graphics Processing Unit Abd-Elrahman G. Qoutb 1, Abdullah M. El-Gunidy 1, Mohammed F. Tolba 1, and Magdy A. El-Moursy 2 1 Electrical Engineering Department, Fayoum

More information

Prachi Sharma 1, Rama Laxmi 2, Arun Kumar Mishra 3 1 Student, 2,3 Assistant Professor, EC Department, Bhabha College of Engineering

Prachi Sharma 1, Rama Laxmi 2, Arun Kumar Mishra 3 1 Student, 2,3 Assistant Professor, EC Department, Bhabha College of Engineering A Review: Design of 16 bit Arithmetic and Logical unit using Vivado 14.7 and Implementation on Basys 3 FPGA Board Prachi Sharma 1, Rama Laxmi 2, Arun Kumar Mishra 3 1 Student, 2,3 Assistant Professor,

More information

Outline. EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs) FPGA Overview. Why FPGAs?

Outline. EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs) FPGA Overview. Why FPGAs? EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs) September 12, 2002 John Wawrzynek Outline What are FPGAs? Why use FPGAs (a short history lesson). FPGA variations Internal logic

More information

Efficient Conversion From Binary to Multi-Digit Multi-Dimensional Logarithmic Number Systems using Arrays of Range Addressable Look- Up Tables

Efficient Conversion From Binary to Multi-Digit Multi-Dimensional Logarithmic Number Systems using Arrays of Range Addressable Look- Up Tables Efficient Conversion From Binary to Multi-Digit Multi-Dimensional Logarithmic Number Systems using Arrays of Range Addressable Look- Up Tables R. Muscedere 1, V.S. Dimitrov 2, G.A. Jullien 2, and W.C.

More information

Design and Optimized Implementation of Six-Operand Single- Precision Floating-Point Addition

Design and Optimized Implementation of Six-Operand Single- Precision Floating-Point Addition 2011 International Conference on Advancements in Information Technology With workshop of ICBMG 2011 IPCSIT vol.20 (2011) (2011) IACSIT Press, Singapore Design and Optimized Implementation of Six-Operand

More information

Compact Clock Skew Scheme for FPGA based Wave- Pipelined Circuits

Compact Clock Skew Scheme for FPGA based Wave- Pipelined Circuits International Journal of Communication Engineering and Technology. ISSN 2277-3150 Volume 3, Number 1 (2013), pp. 13-22 Research India Publications http://www.ripublication.com Compact Clock Skew Scheme

More information

ECE 341 Midterm Exam

ECE 341 Midterm Exam ECE 341 Midterm Exam Time allowed: 90 minutes Total Points: 75 Points Scored: Name: Problem No. 1 (10 points) For each of the following statements, indicate whether the statement is TRUE or FALSE: (a)

More information

FPGA. Logic Block. Plessey FPGA: basic building block here is 2-input NAND gate which is connected to each other to implement desired function.

FPGA. Logic Block. Plessey FPGA: basic building block here is 2-input NAND gate which is connected to each other to implement desired function. FPGA Logic block of an FPGA can be configured in such a way that it can provide functionality as simple as that of transistor or as complex as that of a microprocessor. It can used to implement different

More information

FPGA Implementation and Validation of the Asynchronous Array of simple Processors

FPGA Implementation and Validation of the Asynchronous Array of simple Processors FPGA Implementation and Validation of the Asynchronous Array of simple Processors Jeremy W. Webb VLSI Computation Laboratory Department of ECE University of California, Davis One Shields Avenue Davis,

More information

High Level Abstractions for Implementation of Software Radios

High Level Abstractions for Implementation of Software Radios High Level Abstractions for Implementation of Software Radios J. B. Evans, Ed Komp, S. G. Mathen, and G. Minden Information and Telecommunication Technology Center University of Kansas, Lawrence, KS 66044-7541

More information

RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC. Zoltan Baruch

RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC. Zoltan Baruch RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC Zoltan Baruch Computer Science Department, Technical University of Cluj-Napoca, 26-28, Bariţiu St., 3400 Cluj-Napoca,

More information

Overview. CSE372 Digital Systems Organization and Design Lab. Hardware CAD. Two Types of Chips

Overview. CSE372 Digital Systems Organization and Design Lab. Hardware CAD. Two Types of Chips Overview CSE372 Digital Systems Organization and Design Lab Prof. Milo Martin Unit 5: Hardware Synthesis CAD (Computer Aided Design) Use computers to design computers Virtuous cycle Architectural-level,

More information

PINE TRAINING ACADEMY

PINE TRAINING ACADEMY PINE TRAINING ACADEMY Course Module A d d r e s s D - 5 5 7, G o v i n d p u r a m, G h a z i a b a d, U. P., 2 0 1 0 1 3, I n d i a Digital Logic System Design using Gates/Verilog or VHDL and Implementation

More information

ARITHMETIC operations based on residue number systems

ARITHMETIC operations based on residue number systems IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 53, NO. 2, FEBRUARY 2006 133 Improved Memoryless RNS Forward Converter Based on the Periodicity of Residues A. B. Premkumar, Senior Member,

More information

OPTIMIZATION OF AREA COMPLEXITY AND DELAY USING PRE-ENCODED NR4SD MULTIPLIER.

OPTIMIZATION OF AREA COMPLEXITY AND DELAY USING PRE-ENCODED NR4SD MULTIPLIER. OPTIMIZATION OF AREA COMPLEXITY AND DELAY USING PRE-ENCODED NR4SD MULTIPLIER. A.Anusha 1 R.Basavaraju 2 anusha201093@gmail.com 1 basava430@gmail.com 2 1 PG Scholar, VLSI, Bharath Institute of Engineering

More information

Implementation and Impact of LNS MAC Units in Digital Filter Application

Implementation and Impact of LNS MAC Units in Digital Filter Application Implementation and Impact of LNS MAC Units in Digital Filter Application Hari Krishna Raja.V.S *, Christina Jesintha.R * and Harish.I * * Department of Electronics and Communication Engineering, Sri Shakthi

More information

Parallelized Radix-4 Scalable Montgomery Multipliers

Parallelized Radix-4 Scalable Montgomery Multipliers Parallelized Radix-4 Scalable Montgomery Multipliers Nathaniel Pinckney and David Money Harris 1 1 Harvey Mudd College, 301 Platt. Blvd., Claremont, CA, USA e-mail: npinckney@hmc.edu ABSTRACT This paper

More information

VLSI Design Of a Novel Pre Encoding Multiplier Using DADDA Multiplier. Guntur(Dt),Pin:522017

VLSI Design Of a Novel Pre Encoding Multiplier Using DADDA Multiplier. Guntur(Dt),Pin:522017 VLSI Design Of a Novel Pre Encoding Multiplier Using DADDA Multiplier 1 Katakam Hemalatha,(M.Tech),Email Id: hema.spark2011@gmail.com 2 Kundurthi Ravi Kumar, M.Tech,Email Id: kundurthi.ravikumar@gmail.com

More information

FPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011

FPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011 FPGA for Complex System Implementation National Chiao Tung University Chun-Jen Tsai 04/14/2011 About FPGA FPGA was invented by Ross Freeman in 1989 SRAM-based FPGA properties Standard parts Allowing multi-level

More information

Programmable Logic Devices FPGA Architectures II CMPE 415. Overview This set of notes introduces many of the features available in the FPGAs of today.

Programmable Logic Devices FPGA Architectures II CMPE 415. Overview This set of notes introduces many of the features available in the FPGAs of today. Overview This set of notes introduces many of the features available in the FPGAs of today. The majority use SRAM based configuration cells, which allows fast reconfiguation. Allows new design ideas to

More information

FPGA Implementation of a High Speed Multistage Pipelined Adder Based CORDIC Structure for Large Operand Word Lengths

FPGA Implementation of a High Speed Multistage Pipelined Adder Based CORDIC Structure for Large Operand Word Lengths International Journal of Computer Science and Telecommunications [Volume 3, Issue 5, May 2012] 105 ISSN 2047-3338 FPGA Implementation of a High Speed Multistage Pipelined Adder Based CORDIC Structure for

More information

DESIGN TRADEOFF ANALYSIS OF FLOATING-POINT. ADDER IN FPGAs

DESIGN TRADEOFF ANALYSIS OF FLOATING-POINT. ADDER IN FPGAs DESIGN TRADEOFF ANALYSIS OF FLOATING-POINT ADDER IN FPGAs A Thesis Submitted to the College of Graduate Studies and Research In Partial Fulfillment of the Requirements For the Degree of Master of Science

More information

ARCHITECTURAL DESIGN OF 8 BIT FLOATING POINT MULTIPLICATION UNIT

ARCHITECTURAL DESIGN OF 8 BIT FLOATING POINT MULTIPLICATION UNIT ARCHITECTURAL DESIGN OF 8 BIT FLOATING POINT MULTIPLICATION UNIT Usha S. 1 and Vijaya Kumar V. 2 1 VLSI Design, Sathyabama University, Chennai, India 2 Department of Electronics and Communication Engineering,

More information

An FPGA based Implementation of Floating-point Multiplier

An FPGA based Implementation of Floating-point Multiplier An FPGA based Implementation of Floating-point Multiplier L. Rajesh, Prashant.V. Joshi and Dr.S.S. Manvi Abstract In this paper we describe the parameterization, implementation and evaluation of floating-point

More information

An Efficient Implementation of Floating Point Multiplier

An Efficient Implementation of Floating Point Multiplier An Efficient Implementation of Floating Point Multiplier Mohamed Al-Ashrafy Mentor Graphics Mohamed_Samy@Mentor.com Ashraf Salem Mentor Graphics Ashraf_Salem@Mentor.com Wagdy Anis Communications and Electronics

More information

arxiv: v1 [cs.ne] 25 Sep 2016

arxiv: v1 [cs.ne] 25 Sep 2016 Accurate and Efficient Hyperbolic Tangent Activation Function on FPGA using the DCT Interpolation Filter Ahmed M. Abdelsalam, J.M. Pierre Langlois and F. Cheriet Computer and Software Engineering Department

More information

WORD LEVEL FINITE FIELD MULTIPLIERS USING NORMAL BASIS

WORD LEVEL FINITE FIELD MULTIPLIERS USING NORMAL BASIS WORD LEVEL FINITE FIELD MULTIPLIERS USING NORMAL BASIS 1 B.SARGUNAM, 2 Dr.R.DHANASEKARAN 1 Assistant Professor, Department of ECE, Avinashilingam University, Coimbatore 2 Professor & Director-Research,

More information

Implementation of Floating Point Multiplier Using Dadda Algorithm

Implementation of Floating Point Multiplier Using Dadda Algorithm Implementation of Floating Point Multiplier Using Dadda Algorithm Abstract: Floating point multiplication is the most usefull in all the computation application like in Arithematic operation, DSP application.

More information

Design and Implementation of Floating Point Multiplier for Better Timing Performance

Design and Implementation of Floating Point Multiplier for Better Timing Performance Design and Implementation of Floating Point Multiplier for Better Timing Performance B.Sreenivasa Ganesh 1,J.E.N.Abhilash 2, G. Rajesh Kumar 3 SwarnandhraCollege of Engineering& Technology 1,2, Vishnu

More information

A Novel Efficient VLSI Architecture for IEEE 754 Floating point multiplier using Modified CSA

A Novel Efficient VLSI Architecture for IEEE 754 Floating point multiplier using Modified CSA RESEARCH ARTICLE OPEN ACCESS A Novel Efficient VLSI Architecture for IEEE 754 Floating point multiplier using Nishi Pandey, Virendra Singh Sagar Institute of Research & Technology Bhopal Abstract Due to

More information

REALIZATION OF MULTIPLE- OPERAND ADDER-SUBTRACTOR BASED ON VEDIC MATHEMATICS

REALIZATION OF MULTIPLE- OPERAND ADDER-SUBTRACTOR BASED ON VEDIC MATHEMATICS REALIZATION OF MULTIPLE- OPERAND ADDER-SUBTRACTOR BASED ON VEDIC MATHEMATICS NEETA PANDEY 1, RAJESHWARI PANDEY 2, SAMIKSHA AGARWAL 3, PRINCE KUMAR 4 Department of Electronics and Communication Engineering

More information

Digital Logic & Computer Design CS Professor Dan Moldovan Spring 2010

Digital Logic & Computer Design CS Professor Dan Moldovan Spring 2010 Digital Logic & Computer Design CS 434 Professor Dan Moldovan Spring 2 Copyright 27 Elsevier 5- Chapter 5 :: Digital Building Blocks Digital Design and Computer Architecture David Money Harris and Sarah

More information

An Efficient Hardware Architecture for a Neural Network Activation Function Generator

An Efficient Hardware Architecture for a Neural Network Activation Function Generator An Efficient Hardware Architecture for a Neural Network Activation Function Generator Daniel Larkin, Andrew Kinane, Valentin Muresan, and Noel O Connor Centre for Digital Video Processing, Dublin City

More information

NEURON IMPLEMENTATION USING SYSTEM GENERATOR

NEURON IMPLEMENTATION USING SYSTEM GENERATOR NEURON IMPLEMENTATION USING SYSTEM GENERATOR Luis Aguiar,2, Leonardo Reis,2, Fernando Morgado Dias,2 Centro de Competências de Ciências Exactas e da Engenharia, Universidade da Madeira Campus da Penteada,

More information

VLSI Design and Implementation of High Speed and High Throughput DADDA Multiplier

VLSI Design and Implementation of High Speed and High Throughput DADDA Multiplier VLSI Design and Implementation of High Speed and High Throughput DADDA Multiplier U.V.N.S.Suhitha Student Department of ECE, BVC College of Engineering, AP, India. Abstract: The ever growing need for improved

More information

High Speed Pipelined Architecture for Adaptive Median Filter

High Speed Pipelined Architecture for Adaptive Median Filter Abstract High Speed Pipelined Architecture for Adaptive Median Filter D.Dhanasekaran, and **Dr.K.Boopathy Bagan *Assistant Professor, SVCE, Pennalur,Sriperumbudur-602105. **Professor, Madras Institute

More information

Spiral 2-8. Cell Layout

Spiral 2-8. Cell Layout 2-8.1 Spiral 2-8 Cell Layout 2-8.2 Learning Outcomes I understand how a digital circuit is composed of layers of materials forming transistors and wires I understand how each layer is expressed as geometric

More information

Numeric Encodings Prof. James L. Frankel Harvard University

Numeric Encodings Prof. James L. Frankel Harvard University Numeric Encodings Prof. James L. Frankel Harvard University Version of 10:19 PM 12-Sep-2017 Copyright 2017, 2016 James L. Frankel. All rights reserved. Representation of Positive & Negative Integral and

More information