Mapping Algorithms to Hardware By Prawat Nagvajara

Size: px
Start display at page:

Download "Mapping Algorithms to Hardware By Prawat Nagvajara"

Transcription

1 Electrical and Computer Engineering Mapping Algorithms to Hardware By Prawat Nagvajara Synopsis This note covers theory, design and implementation of the bit-vector multiplication algorithm. It presents also a general method for mapping iterative-loop algorithms to hardware. Introduction Numerical calculations involving nested loop algorithms are ubiquitous in signal processing and telecomm application. Examples of a two nested-loops algorithm are the convolution filters (Finite Impulse Response, FIR filters) and the Infinite Impulse Response (IIR) filters. Arithmetical calculations the bit-vector addition (positional numbers), multiplications, division and sorting are nested-loop algorithms. In fact signal processing algorithms have their foundations in arithmetical algorithms. For instance, the convolution and the bit-vector multiplication are the same algorithm. Algorithms Arithmetical algorithms and their hardware considered in the earlier notes were of one dimensional loop. These are, for instance, the bit-vector addition, the two s compliment, the bit-vector compare and finding the maximum from a set of numbers. The data dependency graphs are a one-dimension array where the loop indices are the indices for the nodes. For a doubly-nested loop the data dependency consists of nodes arranged in a two-dimensional array whose indices are the loop indices. The edges with arrows represent the data dependency. The positional bit-vector multiplication x * y = x * (y n-1 2 n-1 + y n-2 2 n-1 + +y 0 ) (1) where, x is an m-bit number and y is an n-bit number y i {0, 1}, n-1 i 0. A recursive description of (1) is partial_sum 0 := 0; for i in 0 to n-1 loop if y i = 1 then partial_sum i+1 := partial_sum i + x*2 i ; With the initial partial sum ps 0 is equal to 0, at the ith iteration the recursion accumulates x*2 i into the partial sum i+1. The final answer, the product x*y is partial_sum n. Consider the calculation at the bit level, the addition partial_sum i+1 := partial_sum i + x*2 i is the bit-vector addition that involves an iteration on the bit-positions with carry bits to the nextpositions. x is an m-bit vector, whereas x*2 i and ps i are (m+n-1)-bit vectors (unsigned type). The vector x multiplied by 2 i, x*2 i is (i+m-1 downto i) => x, others => 0, in other words, x*2 i is an (m+n-1)- bit vector with the vector x at the positions i+m- 1 down to i and other bits are zeros. When i = n- 1 (y s the most significant position) x*2 i is x at the positions n-1+m-1 down to n-1 and other bits are zeros. 1

2 The addition partial_sum i+1 := partial_sum i + x*2 i involves only n consecutive positions starting at position i+m-1 down to i. A description in hardware description language is as follows; function "*" (x, y : std_logic_vector) return std_logic_vector is variable m: natural := x'length; variable n: natural := y'length; type two_d_array is array (natural range <>, natural range <>) of std_logic; variable ps, c : two_d_array (n - 1 downto 0, m+n-2 downto 0); variable temp: std_logic; variable z: std_logic_vector(n+m-1 downto 0); In the declaration phase of the algorithm, declare signals ps, and the carry c as a twodimensional array of bits where the row i is the ith iteration from 0 to n-1 and the columns indexed by j are the bit positions from 0 to m- 1+n-1. The elements of the two-dimsional arrays are of std_logic type, e.g., ps(i, j) is the jth position bit of the vector ps i in the ith iteration of the algorithm. Since the addition of x*2 i to the partial sum happens only when y(i) = 1, declare a temporary variable temp and, if y(i) = 1 assign temp with x(j); else if y(i) = 0 assign temp with 0. In other words, temp := y(i) and x(j). The variable temp is added to the ith partial sum at the jth bit position in calculating the (i+1)th partial sum. The algorithm body is as follows; begin -- partial sum initial values for j in 0 to m-1 loop ps(0,j) := '0'; for i in 0 to n-1 loop for j in i to m-1+i loop temp := (y(i) and x(j-i)); -- not the last iteration if i < n-1 then -- rightmost bit position if j = i then c(i, j) := 0 ; z(j) := temp xor ps(i, j) xor c(i, j); c(i,j+1):= (temp and ps(i, j)) or -- not rightmost and leftmost positions elsif j < i+m-1 and j > i then ps(i+1,j) := temp xor ps(i, j) xor c(i, j); c(i,j+1):= (temp and ps(i, j)) or -- leftmost position elsif j = i+m-1 then ps(i+1,j) := temp xor ps(i, j) xor c(i, j); ps(i+1,j+1) := (temp and ps(i, j)) or -- last iteration elsif i = n-1 then -- carry initial value if j = i then c(i, j) := 0 ; -- not leftmost position if j < i+m-1 then z(j) := temp xor ps(i, j) xor c(i, j); c(i,j+1) := (temp and ps(i, j)) or -- leftmost position elsif j = i+m-1 then z(j) := temp xor ps(i, j) xor c(i, j); z(j+1) :=(temp and ps(i, j)) or (ps(i, j) and c(i, j)) or(c(i, j) and temp); return z; 2

3 end "*"; end arith_pack; The inner loop with index j is the bitwise addition, where the sum and the carry assignments are described in terms of logic expressions. The carry at the beginning of the inner loop, c(i, i) is initialized to 0. During the ith iteration, the inner loop m-bit addition involves the position j equals to i to i+n-1 where the least significant position is j = i and the most significant position is j = i+m-1. When j = i+n-1 the carry will be added to the partial sum in the (i+1)th iteration. Since the carry is to the bit position j = i + m which is the most significant position of the addition in (i+1)th iteration, the c(i, i+m) is the most significant bit of the partial sum into the next iteration, in other words, p(i+1, i+m) := c(i, i+m). The output (returned values) is described as follows; If i < n-1 the partial sum ps(i, i) in the ith iteration no longer involves in the addition in the (i+1)th iteration and the jth bit of the answer z(j) := ps(i, i), 0 i n-1, j = i. When i = n-1; j = n-1,, m+n-2, the partial sum ps(i, j) is z(j) and when j = m+n-2 the carry into the (m+n-1)th position is z(m+n-1) the most significant bit of the answer. The data dependency graph description of the algorithm for n = 4 is given below; z(7) z(6) z(5) z(4) z(3) z(2) z(1) z(0) Fig. 1 Data Dependency Graph The graph consists of nodes and edges on a two-dimensional grid where the index i = 0,, n-1, enumerates the iterations as the rows and j = 0,, m+n-2, enumerates the bit positions as the columns. The top right corner is the coordinate (i, j) = (0, 0) the index i increases in the downward direction and the index j increases toward the left-hand side. Use a two-dimensional column vector [1 0] T to denote the i-direction and [0 1] T to denote the j- direction, where T is the transposition. The nodes form a parallelogram where the vector x traverses in the [1 1] T direction. At i = 0, the vector x enters the calculation at the position j = 0 to j = n-1. In general at row i the vector x enters the nodes at the position j = i to j = i+n-1. This follows from the fact that at the ith iteration x is multiplied by 2 i which is equivalent to the vector x shifted to the left by i positions. The vector y are the edges traverse in the j- direction [0 1] T direction. The components y(i) enter the calculation at the column j = i, 0 i n-1. The partial sum ps(i, j) edges traverse in the i- direction [1 0] T, and the carry c(i, j) edges traverse [0 1]T direction. The initial ps(0, j) = 0 and the initial c(i, i) = 0 are the edges on the top and the right boarders of the parallelogram. The carry c(i, i+n) is ps(i+1, n+1). The results z(j), j = 0,, 2*n -1, appear at the right and bottom boarders. The data dependency graph of the algorithm can be transformed to a more efficient indexing. This can be done by transforming the parallelogram to a rectangular. In general if x is an m-bit vector and y is an n-bit vector the product is (m+n)-bit vector and the parallelogram has the width m and height n. The transformation matrix is [1 0; -1 1] where the semi-colon separates the rows of the matrix. The direction vectors mappings are as 3

4 follow; [1 0] [1-1], [0 1] [0 1] and [1 1] [1 0]. The coordinate (i, j) maps to the new coordinate (i, j ) = (i, -i+j). Fig. 2 Transformed Data Dependency Graph Figure 2 shows an example of the transformed data dependency graph for 4-bit vectors multiplication. The node (i, j) in Fig. 1 is mapped to node (i, -j+i) in Fig. 2. The vector x now traverses downward in [1 0] T direction, y and c vectors traverse leftward [0 1] T direction and the partial sum vector ps now traverses in the [1-1] T direction. The vector x and y are simply a broadcast of their values whereas the partial sum ps and the carry c, are functions of x, y, and ps. The carry signals at the leftmost column nodes (j = n-1) are the partial sum ps(i+1, n-1). A description of a bit-vector multiplication algorithm based on the transformed data dependency graph (Fig. 2) is as follows. library ieee; use ieee.std_logic_1164.all; package arith_pack is function "*" (x, y: std_logic_vector) return std_logic_vector; end arith_pack; package body arith_pack is function "*" (x, y : std_logic_vector) return std_logic_vector is z(0) z(1) z(2) z(7) z(6) z(5) z(4) z(3) variable m: natural := x'length; variable n: natural := y'length; type two_d_array is array (natural range <>, natural range <>) of std_logic; variable ps, c : two_d_array (n - 1 downto 0, m - 1 downto 0); variable temp: std_logic; variable z: std_logic_vector(n+m-1 downto 0); begin for j in 0 to m-1 loop ps(0,j) := '0'; for i in 0 to n-1 loop c(i, 0) := '0'; for i in 0 to n-1 loop for j in 0 to m-1 loop temp := (y(i) and x(j)); -- not the last row if i < n-1 then -- not 1st and last column if j < m-1 and j > 0 then ps(i+1,j-1) := temp xor ps(i, j) xor c(i, j); c(i,j+1):= (temp and ps(i, j)) or -- 1st column elsif j = 0 then z(i) := temp xor ps(i, j) xor c(i, j); c(i,j+1):= (temp and ps(i, j)) or -- last column elsif j = m-1 then ps(i+1,j-1) := temp xor ps(i, j) xor c(i, j); ps(i+1,j) := (temp and ps(i, j)) or -- last row elsif i = n-1 then -- not last column if j < m-1 then z(i+j) := temp xor ps(i, j) xor c(i, j); c(i,j+1) := (temp and ps(i, j)) or -- last column elsif j = m-1 then z(i+j) := temp xor ps(i, j) xor c(i, j); z(i+j+1) :=(temp and ps(i, j)) or (ps(i, j) and c(i, j)) or(c(i, j) and temp); return z; end "*"; end arith_pack; library ieee; 4

5 use ieee.std_logic_1164.all, work.arith_pack.all; entity test_mult_arith_pack is port( x, y: in std_logic_vector(3 downto 0); z : out std_logic_vector(7 downto 0) ); end test_mult_arith_pack; yo psi xi w yi architecture beh of test_mult_arith_pack is begin z <= x * y; end beh; co ci Figure 3 shows a simulation wave of the test_mult_arith_pack. xo pso Fig. 4 Processing Element Block Diagram Fig. 3 Verification Mapping Data Dependency Graph to Combinational Circuit A mapping of the transformed dependency graph to an array of processing elements as a combinational circuit is straightforward. A hardware description can use for generate statement to construct the interconnection of the processing elements. Figure 4 shows the processing element block diagram where xi, yi, psi and ci denote the x, y, partial sum and carry inputs. The outputs xo and yo are the copies of the input xi and yi. The signal w is the temp variable in the * function. A hardware description of an array comprising of the processing elements, sets of the interconnecting wires and, the input values, initial values and output assignments, is as follows; A description of the processing element is as follows; library ieee; use ieee.std_logic_1164.all; entity pe is port (xi, yi, psi, ci : in std_logic; xo, yo, pso, co : out std_logic); end pe; architecture dataflow of pe is signal w : std_logic; begin xo <= xi; yo <= yi; pso <= w xor ci xor psi; co <= (psi and w) or (w and ci) or (ci and psi); w <= xi and yi; end dataflow; 5

6 Mapping Data Dependency Graph to Synchronous Circuit This section covers a method on mapping data dependency graph to a synchronous circuit comprising combinational logic and storages that synchronized to the clock rising edges. A serial code (single thread) is basically a map of the data dependency graph data flow (edges) and processing (nodes) to a single processor where the loops in the code describe a schedule on when the nodes are to be calculates. As for example the multiplication function * code above describes a schedule for calculating the data dependency graph in a row-wise schedule where the nodes in the columns are calculated for left to right. Data dependency graph is useful for designing parallel computation on multiple processors. Moreover, linear algebra provides a design tool for mapping the computations (nodes) and data flows (arcs). Consider a linear projection P, P maps a k- dimensional data dependency graph G (knested loop algorithm) to a (k-1) dimensional graph G. The nodes in G are vectors in a k- dimension space over integers, V k, and a linear projection P: G G, where G is a (k-1)- dimensional subspace. The projected graph has vertices {v, v G } that represent the multiple processors for calculating the vertices {v; v G} of the data dependency graph. A processor v calculates the nodes {v Pv = v, v G } during different clock cycles. The projection P maps the arcs a G which are vectors describing the data flow directions, to the data flow directions between the processors. Let e be the orthogonal vector to a subspace V, u V if and only if u T e = 0, u,e V k. The affine subspaces V t, t = 0, 1,, where t represents the discrete time clock cycles, are defined as u V t 6

7 if and only if u T e = t. Calculations at time t are the nodes in the affine subspace V t which are distributed among the processors by a linear projection P: G G. The affine subspaces V t, t = 0, 1, are the schedules for the processors. Registers and memory provide temporary storage for the processors. They can be placed on the arcs of G to provide delay buffers that synchronize the calculations. Define an arc with direction to be an ordered pair nodes, a = (q, r). The number of time steps d between q V t and r V t+d such that q and r, are projected to adjacent processors is the number of delay buffers required on the arc a in G. Serial Multiplier with Combinational Adder Fig. 5 Mapping Data Dependency Graph to Multiple Processing Units Figure 5 shows an example of a mapping of the multiplication data dependency graph to an array of processing units (processors). The vector [i j] T means row i and column j where i increases downward and j increases leftward. The projection matrix P = [0 0; 0 1], maps the nodes [i j] T P[i j] T = [0 j] T, that is, the nodes in column j are calculated by the processor j. The nodes (processors) of the projected graph G are on the bottom of Fig. 5. The schedule is the affine subspaces orthogonal to e = [1 0] T shown as the red lines. The nodes u in the affine s subspace V t are calculated at time t = u T e = [i j][1 0] T = i. The data flow arcs for the multiplicand bits a(j), j = 0,, 4 (downward arcs) map to P[i 0] T = [0 0] T, self-loop arcs with a delay buffer on the nodes [0 j] T, j = 0,, 4 of G. These arcs are not shown in Fig. 5, however, they are delays that store a(j) in the node j, j = 0, 4, of G. The arcs in [0 1] T direction the multiplier bits and the carries, map to P[0 j] T = [0 j] T, the arcs traversing leftward in G. The arcs in G connect the nodes belong in the same affine subspace which implies that the data are available to the nodes during the clock cycle i, thus, no delays are required on the arcs traversing leftward in G. The schedule implies a combinational bitvector addition hardware where the carry signals propagate from the least significant position (j = 0) to the most significant position (j = m-1) during the clock cycle, and that the partial sums are valid before the next clock cycle. The partial sum the [1-1]T direction arcs, map to P[i -j] T = [0 -j] T, the arcs traversing rightward in G. The number of time steps (distance) between q and r of arc (q, r) in the direction e a is e T e a = [1 0][1-1] T = 1 delay. The projected graph G shows one delay D placed on the arcs traversing rightward. In the graph G, the nodes store the multiplicand bits which are multiplied by the multiplier bit i at time t = i. These are added to the partial-sum bits which are updated to the storages on the rightward traversing arcs (the projected partial-sum arcs). The carry out at the most significant position loops back as the most significant bit of the partial sum. The product bits p(t), t = 0, 1,, 9 are the output of G at time t. 7

8 Figure 6 below shows the projected graph G as a serial multiplier hardware with m = 5 multiplicand bits x4,, x0. The multiplier bits b0,, b4, 0, 0, are applied at time t = 0, 1, and the product bits p(t) at the output at time t = 0, 1,, 9. nodes such that e T u = t, for instance, the nodes [0 4] T, [1 2] T and [2 0] T lies on the affine subspace e T u = [2 1]u = t = 4 (see Fig. 7). The partial sum arcs in G traverse across one time step, e T [1-1] T = [2 1][1-1] T = 1, which place a delay on the projected partial-sum arcs in G. The carry and the multiplier arcs in the [0 1] T direction in G also traverse across one time step, e T [0 1] T = [2 1][0 1] T = 1, which place a delay on the projected carry and multiplier arcs in G (see Fig. 7). A pipeline multiplier G has the data rate equal to 1/2, that is, the input data are applied on every two clock rising edge (cycle). An optimum pipelining rate is one, that is, input data are applied on every cycle. Fig. 6 Serial Multiplier Based on [1 0] T schedule Pipeline Multiplier In Fig. 7, the multiplier bits are applied on t = 0, 2, 4,, 8, and zeros are applied afterward. The output product bits appear also at the rate equal to 1/2. Note that, the arcs in the [1 0] T direction traverse across 2 time steps, that is, [2 1][1 0] T = 2. The processor utilization is 1/2 because at any instance t half of the processors are computing, for example, at t = 4, the processors [0 j]t, j = 0, 2, 4 are calculating the nodes [0 4] T, [1 2] T and [2 0] T. Fig. 7 Pipeline Multiplier Based on [2 1] T Schedule Consider the same projected graph G with the projection matrix P = [0 0; 0 1], a different schedule e = [2 1] T eliminates the propagation latency due to the carry signals which grows linearly with the number of bits in the multiplicand. The affine subspaces are the Fig. 8 Pipeline Multiplier Processing Unit Figure 8 shows the processing unit of the projected graph G (see Fig. 7). The delay buffers (Delay Flip-Flop, DFF) are placed on the arcs as the output buffers. This gives stable drives of the signals from the unit. An array of the processing units and an addition delay flip-flop 8

9 at the most significant position at the PS_in port form a pipeline multiplier G. A series of snapshots of the pipeline multiplier with 3-bit multiplicand a = 111 and 3-bit multiplier b = 111 calculating 7x7 = 49 = are shown below (Fig. 9). The input multiplier bits are applied at t = 0, 2, 4 followed by zeros until t = 11. The inputs are also zero for t = 1, 3. The product bits appear at the output at t = 1, 3, 5, 7, 9 and 11 starting from the least significant bit. The snapshots begin at t = 0 and continue to t = 11 showing the data flow in the pipeline. The inputs and outputs are highlighted (blue and red). Calculating units are highlighted red. Fig. 9 Pipeline Multiplier Snapshots Optimum Rate Pipeline Multiplier Consider a different projection P = [1 0; 0 0] of the data dependency graph G in the [0 1] T 9

10 direction. Figure 10 shows an example of G and the projected graph G. indefinitely as the inputs signals are applied to the convolution algorithm indefinitely. The addition of the partial sum is now an accumulation (integration) of numbers. The projected graph G (Fig. 10) is a convolution filter or a Finite Impulse Response (FIR) filter or a moving average filter, which calculates a weighted (filter coefficients are the weights) of the past n inputs (the number of nodes in G ). Fig. 10 Projected Graph in [0 1] T Direction The projected G has 2 delays placed on the multiplicand arcs in the [1 0]T direction, that is, e T [1 0] T = [2 1][1 0] T = 2 delays. The projected partial-sum arcs has one delay, [2 1][1-1] T = 1. The pipeline rate is 1 as the multiplication bits a(t), t = 0, are applied to the pipeline every clock cycle. The latency is 2n + m where m is the number of multiplicand bits and n is the number of multiplier bits. The least significant bit of the product appears at t = n and the product consists of n+m bits. The processing unit consists of a serial adder with an internal storage for the carry signal. The carry signal is assigned as the partial-sum output very m clock cycles. In a sense the unit is a processor (computer) consisting of a state machine and memory storage. Convolution Filter The graph G when the data are integers (or real numbers the floating points) the multiplication of the multiplicand bits and the multiplier bits (implemented as AND logic) are the integer multiplication. The bit-vector addition becomes integer addition. There are no carry bits. The graph is extended to the left The projected graph G as a version of the convolution filter comprises processing units connected as shown in Fig. 10 where the delays are registers storing numbers. The processing unit consists of a Multiply and Add (MAC) unit multiplying the filter coefficients {b(i), i = 0, n-1} (n is called the number of taps), with the past input signal a(t) traversing through the filter. The multiplication of the past inputs and the coefficients are added into the partial sum recursively. The output of the filter is the weighted sum of the past n inputs. In Fig. 10 the dependency graph shows the filter output p(t) reaches the steady state when t 4, and the weighted sum is given by, p(t) = b(0)a(t) + b(1)a(t-1) + b(2)a(t-2) + b(3)a(t-3) + b(4)a(t-4), t 4. Based on the method on mapping algorithm to hardware, the projected graph G in Fig 10 is a 5-tap convolution filter with a pipeline rate one and the latency equal to 5 cycles. Conclusions A method on mapping data dependency graphs of algorithms to array processing hardware are relevant in today s (2017) signal processing. Further studies and designs can include the infinite impulse response filter, matrix multiplication and decompositions (Lower- Upper, orthogonal and singular value decompositions). The study on bit-vector multiplication hardware presented provides a fundamental. 10

Arithmetic Circuits. Nurul Hazlina Adder 2. Multiplier 3. Arithmetic Logic Unit (ALU) 4. HDL for Arithmetic Circuit

Arithmetic Circuits. Nurul Hazlina Adder 2. Multiplier 3. Arithmetic Logic Unit (ALU) 4. HDL for Arithmetic Circuit Nurul Hazlina 1 1. Adder 2. Multiplier 3. Arithmetic Logic Unit (ALU) 4. HDL for Arithmetic Circuit Nurul Hazlina 2 Introduction 1. Digital circuits are frequently used for arithmetic operations 2. Fundamental

More information

Abi Farsoni, Department of Nuclear Engineering and Radiation Health Physics, Oregon State University

Abi Farsoni, Department of Nuclear Engineering and Radiation Health Physics, Oregon State University Hardware description language (HDL) Intended to describe circuits textually, for a computer to read Evolved starting in the 1970s and 1980s Popular languages today include: VHDL Defined in 1980s by U.S.

More information

Hardware Description Language VHDL (1) Introduction

Hardware Description Language VHDL (1) Introduction Hardware Description Language VHDL (1) Introduction Digital Radiation Measurement and Spectroscopy NE/RHP 537 Introduction Hardware description language (HDL) Intended to describe circuits textually, for

More information

Lecture 38 VHDL Description: Addition of Two [5 5] Matrices

Lecture 38 VHDL Description: Addition of Two [5 5] Matrices Lecture 38 VHDL Description: Addition of Two [5 5] Matrices -- First, write a package to declare a two-dimensional --array with five elements library IEEE; use IEEE.STD_LOGIC_1164.all; package twodm_array

More information

UNIT I Introduction to VHDL VHDL: - V -VHSIC, H - Hardware, D - Description, L Language Fundamental section of a basic VHDL code Library :

UNIT I Introduction to VHDL VHDL: - V -VHSIC, H - Hardware, D - Description, L Language Fundamental section of a basic VHDL code Library : UNIT I Introduction to VHDL VHDL stands for very high-speed integrated circuit hardware description language. Which is one of the programming languages used to model a digital system by dataflow, behavioral

More information

In this section we cover the following: State graphs introduction Serial Adder Multiplier Divider STATE GRAPHS FOR CONTROL NETWORKS What are the

In this section we cover the following: State graphs introduction Serial Adder Multiplier Divider STATE GRAPHS FOR CONTROL NETWORKS What are the In this section we cover the following: State graphs introduction Serial Adder Multiplier Divider STATE GRAPHS FOR CONTROL NETWORKS What are the conditions for having a proper state graph? If an arc is

More information

Control and Datapath 8

Control and Datapath 8 Control and Datapath 8 Engineering attempts to develop design methods that break a problem up into separate steps to simplify the design and increase the likelihood of a correct solution. Digital system

More information

Digital Design Laboratory Lecture 2

Digital Design Laboratory Lecture 2 ECE 280 / CSE 280 Digital Design Laboratory Lecture 2 Adder Design Basic building block is a full adder Chained together as a ripple carry adder Carry lookahead adder is an other option Propagate and generate

More information

Lecture 12 VHDL Synthesis

Lecture 12 VHDL Synthesis CPE 487: Digital System Design Spring 2018 Lecture 12 VHDL Synthesis Bryan Ackland Department of Electrical and Computer Engineering Stevens Institute of Technology Hoboken, NJ 07030 1 What is Synthesis?

More information

JUNE, JULY 2013 Fundamentals of HDL (10EC45) PART A

JUNE, JULY 2013 Fundamentals of HDL (10EC45) PART A JUNE, JULY 2013 Fundamentals of HDL (10EC45) Time: 3hrs Max Marks:100 Note: Answer FIVE full questions, selecting at least TWO questions from each part. PART A Q1.a. Describe VHDL scalar data types with

More information

Design Guidelines for Using DSP Blocks

Design Guidelines for Using DSP Blocks Design Guidelines for Using DSP Blocks in the Synplify Software April 2002, ver. 1.0 Application Note 193 Introduction Altera R Stratix TM devices have dedicated digital signal processing (DSP) blocks

More information

VHDL simulation and synthesis

VHDL simulation and synthesis VHDL simulation and synthesis How we treat VHDL in this course You will not become an expert in VHDL after taking this course The goal is that you should learn how VHDL can be used for simulation and synthesis

More information

Lecture 7. Standard ICs FPGA (Field Programmable Gate Array) VHDL (Very-high-speed integrated circuits. Hardware Description Language)

Lecture 7. Standard ICs FPGA (Field Programmable Gate Array) VHDL (Very-high-speed integrated circuits. Hardware Description Language) Standard ICs FPGA (Field Programmable Gate Array) VHDL (Very-high-speed integrated circuits Hardware Description Language) 1 Standard ICs PLD: Programmable Logic Device CPLD: Complex PLD FPGA: Field Programmable

More information

Sequential Statement

Sequential Statement Sequential Statement Sequential Logic Output depends not only on current input values but also on previous input values. Are building blocks of; Counters Shift registers Memories Flip flops are basic sequential

More information

Contents. Appendix D VHDL Summary Page 1 of 23

Contents. Appendix D VHDL Summary Page 1 of 23 Appendix D VHDL Summary Page 1 of 23 Contents Appendix D VHDL Summary...2 D.1 Basic Language Elements...2 D.1.1 Comments...2 D.1.2 Identifiers...2 D.1.3 Data Objects...2 D.1.4 Data Types...2 D.1.5 Data

More information

!"#$%&&"'(')"*+"%,%-".#"'/"'.001$$"

!#$%&&'(')*+%,%-.#'/'.001$$ !"#$%&&"'(')"*+"%,%-".#"'/"'.001$$"!!"#$%&'#()#*+"+#,-."/0110#230#4."50",+"+#)6# 6+-+#(.6+-0#)4475.8)60#0/#.65-0#230#9+**+"+# 2.48).-0#(.6+-0#! 2+"*5."5*:#,."/0110#;)**0! *),".6*:#-.99-0*0"5."+#2+660,.40"5)#;)*)2)#

More information

Synopsis A set of notes on projects for learning the fundamentals of digital systems concepts, design and implementation.

Synopsis A set of notes on projects for learning the fundamentals of digital systems concepts, design and implementation. Electrical and Computer Engineering Digital Systems Projects By Prawat Nagvajara Synopsis A set of notes on projects for learning the fundamentals of digital systems concepts, design and implementation.

More information

DIGITAL LOGIC WITH VHDL (Fall 2013) Unit 6

DIGITAL LOGIC WITH VHDL (Fall 2013) Unit 6 DIGITAL LOGIC WITH VHDL (Fall 2013) Unit 6 FINITE STATE MACHINES (FSMs) Moore Machines Mealy Machines FINITE STATE MACHINES (FSMs) Classification: Moore Machine: Outputs depend only on the current state

More information

VHDL Examples Mohamed Zaky

VHDL Examples Mohamed Zaky VHDL Examples By Mohamed Zaky (mz_rasmy@yahoo.co.uk) 1 Half Adder The Half Adder simply adds 2 input bits, to produce a sum & carry output. Here we want to add A + B to produce Sum (S) and carry (C). A

More information

COMPUTER ARCHITECTURE AND ORGANIZATION Register Transfer and Micro-operations 1. Introduction A digital system is an interconnection of digital

COMPUTER ARCHITECTURE AND ORGANIZATION Register Transfer and Micro-operations 1. Introduction A digital system is an interconnection of digital Register Transfer and Micro-operations 1. Introduction A digital system is an interconnection of digital hardware modules that accomplish a specific information-processing task. Digital systems vary in

More information

Using ModelSim to Simulate Logic Circuits in VHDL Designs. 1 Introduction. For Quartus II 13.0

Using ModelSim to Simulate Logic Circuits in VHDL Designs. 1 Introduction. For Quartus II 13.0 Using ModelSim to Simulate Logic Circuits in VHDL Designs For Quartus II 13.0 1 Introduction This tutorial is a basic introduction to ModelSim, a Mentor Graphics simulation tool for logic circuits. We

More information

CprE 583 Reconfigurable Computing

CprE 583 Reconfigurable Computing Recap 4:1 Multiplexer CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #18 VHDL for Synthesis I LIBRARY ieee

More information

EECE-4740/5740 Advanced VHDL and FPGA Design. Lecture 3 Concurrent and sequential statements

EECE-4740/5740 Advanced VHDL and FPGA Design. Lecture 3 Concurrent and sequential statements EECE-4740/5740 Advanced VHDL and FPGA Design Lecture 3 Concurrent and sequential statements Cristinel Ababei Marquette University Department of Electrical and Computer Engineering Overview Components hierarchy

More information

VHDL Simulation. Testbench Design

VHDL Simulation. Testbench Design VHDL Simulation Testbench Design The Test Bench Concept Elements of a VHDL/Verilog testbench Unit Under Test (UUT) or Device Under Test (DUT) instantiate one or more UUT s Stimulus of UUT inputs algorithmic

More information

Chapter 6 Combinational-Circuit Building Blocks

Chapter 6 Combinational-Circuit Building Blocks Chapter 6 Combinational-Circuit Building Blocks Commonly used combinational building blocks in design of large circuits: Multiplexers Decoders Encoders Comparators Arithmetic circuits Multiplexers A multiplexer

More information

Matrix Multiplication

Matrix Multiplication Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2013 CPS343 (Parallel and HPC) Matrix Multiplication Spring 2013 1 / 32 Outline 1 Matrix operations Importance Dense and sparse

More information

VHDL. VHDL History. Why VHDL? Introduction to Structured VLSI Design. Very High Speed Integrated Circuit (VHSIC) Hardware Description Language

VHDL. VHDL History. Why VHDL? Introduction to Structured VLSI Design. Very High Speed Integrated Circuit (VHSIC) Hardware Description Language VHDL Introduction to Structured VLSI Design VHDL I Very High Speed Integrated Circuit (VHSIC) Hardware Description Language Joachim Rodrigues A Technology Independent, Standard Hardware description Language

More information

[VARIABLE declaration] BEGIN. sequential statements

[VARIABLE declaration] BEGIN. sequential statements PROCESS statement (contains sequential statements) Simple signal assignment statement

More information

Computer-Aided Digital System Design VHDL

Computer-Aided Digital System Design VHDL بس م اهلل الر حم ن الر حی م Iran University of Science and Technology Department of Computer Engineering Computer-Aided Digital System Design VHDL Ramin Rajaei ramin_rajaei@ee.sharif.edu Modeling Styles

More information

ECE 545 Lecture 8. Data Flow Description of Combinational-Circuit Building Blocks. George Mason University

ECE 545 Lecture 8. Data Flow Description of Combinational-Circuit Building Blocks. George Mason University ECE 545 Lecture 8 Data Flow Description of Combinational-Circuit Building Blocks George Mason University Required reading P. Chu, RTL Hardware Design using VHDL Chapter 7, Combinational Circuit Design:

More information

Implementing FIR Filters

Implementing FIR Filters Implementing FIR Filters in FLEX Devices February 199, ver. 1.01 Application Note 73 FIR Filter Architecture This section describes a conventional FIR filter design and how the design can be optimized

More information

Sequential Logic - Module 5

Sequential Logic - Module 5 Sequential Logic Module 5 Jim Duckworth, WPI 1 Latches and Flip-Flops Implemented by using signals in IF statements that are not completely specified Necessary latches or registers are inferred by the

More information

Two HDLs used today VHDL. Why VHDL? Introduction to Structured VLSI Design

Two HDLs used today VHDL. Why VHDL? Introduction to Structured VLSI Design Two HDLs used today Introduction to Structured VLSI Design VHDL I VHDL and Verilog Syntax and ``appearance'' of the two languages are very different Capabilities and scopes are quite similar Both are industrial

More information

Matrix Multiplication

Matrix Multiplication Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Matrix Multiplication Spring 2018 1 / 32 Outline 1 Matrix operations Importance Dense and sparse

More information

1 ST SUMMER SCHOOL: VHDL BOOTCAMP PISA, JULY 2013

1 ST SUMMER SCHOOL: VHDL BOOTCAMP PISA, JULY 2013 MARIE CURIE IAPP: FAST TRACKER FOR HADRON COLLIDER EXPERIMENTS 1 ST SUMMER SCHOOL: VHDL BOOTCAMP PISA, JULY 2013 Introduction to VHDL Calliope-Louisa Sotiropoulou PhD Candidate/Researcher Aristotle University

More information

HIGH PERFORMANCE FUSED ADD MULTIPLY OPERATOR

HIGH PERFORMANCE FUSED ADD MULTIPLY OPERATOR HIGH PERFORMANCE FUSED ADD MULTIPLY OPERATOR R. Alwin [1] S. Anbu Vallal [2] I. Angel [3] B. Benhar Silvan [4] V. Jai Ganesh [5] 1 Assistant Professor, 2,3,4,5 Student Members Department of Electronics

More information

University of California at Berkeley College of Engineering Department of Electrical Engineering and Computer Sciences. Spring 2010 May 10, 2010

University of California at Berkeley College of Engineering Department of Electrical Engineering and Computer Sciences. Spring 2010 May 10, 2010 University of California at Berkeley College of Engineering Department of Electrical Engineering and Computer Sciences EECS150 J. Wawrzynek Spring 2010 May 10, 2010 Final Exam Name: ID number: This is

More information

Inthis lecture we will cover the following material:

Inthis lecture we will cover the following material: Lecture #8 Inthis lecture we will cover the following material: The standard package, The std_logic_1164 Concordia Objects & data Types (Signals, Variables, Constants, Literals, Character) Types and Subtypes

More information

Digital Systems Design

Digital Systems Design Digital Systems Design Review of Combinatorial Circuit Building Blocks: VHDL for Combinational Circuits Dr. D. J. Jackson Lecture 2-1 Introduction to VHDL Designer writes a logic circuit description in

More information

Concurrent Signal Assignment Statements (CSAs)

Concurrent Signal Assignment Statements (CSAs) Concurrent Signal Assignment Statements (CSAs) Digital systems operate with concurrent signals Signals are assigned values at a specific point in time. VHDL uses signal assignment statements Specify value

More information

Multiplication Simple Gradeschool Algorithm for 16 Bits (32 Bit Result)

Multiplication Simple Gradeschool Algorithm for 16 Bits (32 Bit Result) Multiplication Simple Gradeschool Algorithm for 16 Bits (32 Bit Result) Input Input Multiplier Multiplicand AND gates 16 Bit Adder 32 Bit Product Register Multiplication Simple Gradeschool Algorithm for

More information

Addition and multiplication

Addition and multiplication Addition and multiplication Arithmetic is the most basic thing you can do with a computer, but it s not as easy as you might expect! These next few lectures focus on addition, subtraction, multiplication

More information

Parallel Computing: Parallel Algorithm Design Examples Jin, Hai

Parallel Computing: Parallel Algorithm Design Examples Jin, Hai Parallel Computing: Parallel Algorithm Design Examples Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology ! Given associative operator!! a 0! a 1! a 2!! a

More information

Lattice VHDL Training

Lattice VHDL Training Lattice Part I February 2000 1 VHDL Basic Modeling Structure February 2000 2 VHDL Design Description VHDL language describes a digital system as a set of modular blocks. Each modular block is described

More information

Hardware Modeling. VHDL Basics. ECS Group, TU Wien

Hardware Modeling. VHDL Basics. ECS Group, TU Wien Hardware Modeling VHDL Basics ECS Group, TU Wien VHDL Basics 2 Parts of a Design Unit Entity Architecture Configuration Package Package Package Body Library How to create a Design Unit? Interface to environment

More information

VHDL/Verilog Simulation. Testbench Design

VHDL/Verilog Simulation. Testbench Design VHDL/Verilog Simulation Testbench Design The Test Bench Concept Elements of a VHDL/Verilog testbench Unit Under Test (UUT) or Device Under Test (DUT) instantiate one or more UUT s Stimulus of UUT inputs

More information

Review of Digital Design with VHDL

Review of Digital Design with VHDL Review of Digital Design with VHDL Digital World Digital world is a world of 0 and 1 Each binary digit is called a bit Eight consecutive bits are called a byte Hexadecimal (base 16) representation for

More information

COVER SHEET: Total: Regrade Info: 5 (14 points) 7 (15 points) Midterm 1 Spring 2012 VERSION 1 UFID:

COVER SHEET: Total: Regrade Info: 5 (14 points) 7 (15 points) Midterm 1 Spring 2012 VERSION 1 UFID: EEL 4712 Midterm 1 Spring 2012 VERSION 1 Name: UFID: IMPORTANT: Please be neat and write (or draw) carefully. If we cannot read it with a reasonable effort, it is assumed wrong. As always, the best answer

More information

Binary Adders. Ripple-Carry Adder

Binary Adders. Ripple-Carry Adder Ripple-Carry Adder Binary Adders x n y n x y x y c n FA c n - c 2 FA c FA c s n MSB position Longest delay (Critical-path delay): d c(n) = n d carry = 2n gate delays d s(n-) = (n-) d carry +d sum = 2n

More information

Two-Level CLA for 4-bit Adder. Two-Level CLA for 4-bit Adder. Two-Level CLA for 16-bit Adder. A Closer Look at CLA Delay

Two-Level CLA for 4-bit Adder. Two-Level CLA for 4-bit Adder. Two-Level CLA for 16-bit Adder. A Closer Look at CLA Delay Two-Level CLA for 4-bit Adder Individual carry equations C 1 = g 0 +p 0, C 2 = g 1 +p 1 C 1,C 3 = g 2 +p 2 C 2, = g 3 +p 3 C 3 Fully expanded (infinite hardware) CLA equations C 1 = g 0 +p 0 C 2 = g 1

More information

Contents. Appendix D Verilog Summary Page 1 of 16

Contents. Appendix D Verilog Summary Page 1 of 16 Appix D Verilog Summary Page 1 of 16 Contents Appix D Verilog Summary... 2 D.1 Basic Language Elements... 2 D.1.1 Keywords... 2 D.1.2 Comments... 2 D.1.3 Identifiers... 2 D.1.4 Numbers and Strings... 3

More information

Review. LIBRARY list of library names; USE library.package.object; ENTITY entity_name IS generic declarations PORT ( signal_name(s): mode signal_type;

Review. LIBRARY list of library names; USE library.package.object; ENTITY entity_name IS generic declarations PORT ( signal_name(s): mode signal_type; LIBRARY list of library names; USE library.package.object; Review ENTITY entity_name IS generic declarations PORT ( signal_name(s): mode signal_type; signal_name(s) : mode signal_type); END ENTITY entity_name;

More information

MCM Based FIR Filter Architecture for High Performance

MCM Based FIR Filter Architecture for High Performance ISSN No: 2454-9614 MCM Based FIR Filter Architecture for High Performance R.Gopalana, A.Parameswari * Department Of Electronics and Communication Engineering, Velalar College of Engineering and Technology,

More information

DIGITAL LOGIC DESIGN VHDL Coding for FPGAs Unit 6

DIGITAL LOGIC DESIGN VHDL Coding for FPGAs Unit 6 DIGITAL LOGIC DESIGN VHDL Coding for FPGAs Unit 6 FINITE STATE MACHINES (FSMs) Moore Machines Mealy Machines Algorithmic State Machine (ASM) charts FINITE STATE MACHINES (FSMs) Classification: Moore Machine:

More information

Digital Signal Processing with Field Programmable Gate Arrays

Digital Signal Processing with Field Programmable Gate Arrays Uwe Meyer-Baese Digital Signal Processing with Field Programmable Gate Arrays Third Edition With 359 Figures and 98 Tables Book with CD-ROM ei Springer Contents Preface Preface to Second Edition Preface

More information

ELCT 501: Digital System Design

ELCT 501: Digital System Design ELCT 501: Digital System Lecture 4: CAD tools (Continued) Dr. Mohamed Abd El Ghany, Basic VHDL Concept Via an Example Problem: write VHDL code for 1-bit adder 4-bit adder 2 1-bit adder Inputs: A (1 bit)

More information

Control Unit: Binary Multiplier. Arturo Díaz-Pérez Departamento de Computación Laboratorio de Tecnologías de Información CINVESTAV-IPN

Control Unit: Binary Multiplier. Arturo Díaz-Pérez Departamento de Computación Laboratorio de Tecnologías de Información CINVESTAV-IPN Control Unit: Binary Multiplier Arturo Díaz-Pérez Departamento de Computación Laboratorio de Tecnologías de Información CINVESTAV-IPN Example: Binary Multiplier Two versions Hardwired control Microprogrammed

More information

EITF35: Introduction to Structured VLSI Design

EITF35: Introduction to Structured VLSI Design EITF35: Introduction to Structured VLSI Design Part 1.2.2: VHDL-1 Liang Liu liang.liu@eit.lth.se 1 Outline VHDL Background Basic VHDL Component An example FSM Design with VHDL Simulation & TestBench 2

More information

VLSI Implementation of Low Power Area Efficient FIR Digital Filter Structures Shaila Khan 1 Uma Sharma 2

VLSI Implementation of Low Power Area Efficient FIR Digital Filter Structures Shaila Khan 1 Uma Sharma 2 IJSRD - International Journal for Scientific Research & Development Vol. 3, Issue 05, 2015 ISSN (online): 2321-0613 VLSI Implementation of Low Power Area Efficient FIR Digital Filter Structures Shaila

More information

Numerical Algorithms

Numerical Algorithms Chapter 10 Slide 464 Numerical Algorithms Slide 465 Numerical Algorithms In textbook do: Matrix multiplication Solving a system of linear equations Slide 466 Matrices A Review An n m matrix Column a 0,0

More information

VHDL Testbench Design. Textbook chapters 2.19, , 9.5

VHDL Testbench Design. Textbook chapters 2.19, , 9.5 VHDL Testbench Design Textbook chapters 2.19, 4.10-4.12, 9.5 The Test Bench Concept Elements of a VHDL/Verilog testbench Unit Under Test (UUT) or Device Under Test (DUT) instantiate one or more UUT s Stimulus

More information

Design Guidelines for Using DSP Blocks

Design Guidelines for Using DSP Blocks Design Guidelines for Using DSP Blocks in the LeonardoSpectrum Software April 2002, ver. 1.0 Application Note 194 Introduction Altera R Stratix TM devices have dedicated digital signal processing (DSP)

More information

Structure of Computer Systems

Structure of Computer Systems 288 between this new matrix and the initial collision matrix M A, because the original forbidden latencies for functional unit A still have to be considered in later initiations. Figure 5.37. State diagram

More information

Hardware Modeling. VHDL Syntax. Vienna University of Technology Department of Computer Engineering ECS Group

Hardware Modeling. VHDL Syntax. Vienna University of Technology Department of Computer Engineering ECS Group Hardware Modeling VHDL Syntax Vienna University of Technology Department of Computer Engineering ECS Group Contents Identifiers Types & Attributes Operators Sequential Statements Subroutines 2 Identifiers

More information

ECE 448 Lecture 3. Combinational-Circuit Building Blocks. Data Flow Modeling of Combinational Logic

ECE 448 Lecture 3. Combinational-Circuit Building Blocks. Data Flow Modeling of Combinational Logic ECE 448 Lecture 3 Combinational-Circuit Building Blocks Data Flow Modeling of Combinational Logic George Mason University Reading Required P. Chu, FPGA Prototyping by VHDL Examples Chapter 3, RT-level

More information

Modeling Complex Behavior

Modeling Complex Behavior Modeling Complex Behavior Sudhakar Yalamanchili, Georgia Institute of Technology, 2006 (1) Outline Abstraction and the Process Statement Concurrent processes and CSAs Process event behavior and signals

More information

ECE 448 Lecture 3. Combinational-Circuit Building Blocks. Data Flow Modeling of Combinational Logic

ECE 448 Lecture 3. Combinational-Circuit Building Blocks. Data Flow Modeling of Combinational Logic ECE 448 Lecture 3 Combinational-Circuit Building Blocks Data Flow Modeling of Combinational Logic George Mason University Reading Required P. Chu, FPGA Prototyping by VHDL Examples Chapter 3, RT-level

More information

EEL 4783: Hardware/Software Co-design with FPGAs

EEL 4783: Hardware/Software Co-design with FPGAs EEL 4783: Hardware/Software Co-design with FPGAs Lecture 9: Short Introduction to VHDL* Prof. Mingjie Lin * Beased on notes of Turfts lecture 1 What does HDL stand for? HDL is short for Hardware Description

More information

EE577A FINAL PROJECT REPORT Design of a General Purpose CPU

EE577A FINAL PROJECT REPORT Design of a General Purpose CPU EE577A FINAL PROJECT REPORT Design of a General Purpose CPU Submitted By Youngseok Lee - 4930239194 Narayana Reddy Lekkala - 9623274062 Chirag Ahuja - 5920609598 Phase 2 Part 1 A. Introduction The core

More information

COE 405, Term 062. Design & Modeling of Digital Systems. HW# 1 Solution. Due date: Wednesday, March. 14

COE 405, Term 062. Design & Modeling of Digital Systems. HW# 1 Solution. Due date: Wednesday, March. 14 COE 405, Term 062 Design & Modeling of Digital Systems HW# 1 Solution Due date: Wednesday, March. 14 Q.1. Consider the 4-bit carry-look-ahead adder (CLA) block shown below: A 3 -A 0 B 3 -B 0 C 3 4-bit

More information

VHDL. ELEC 418 Advanced Digital Systems Dr. Ron Hayne. Images Courtesy of Cengage Learning

VHDL. ELEC 418 Advanced Digital Systems Dr. Ron Hayne. Images Courtesy of Cengage Learning VHDL ELEC 418 Advanced Digital Systems Dr. Ron Hayne Images Courtesy of Cengage Learning Design Flow 418_02 2 VHDL Modules 418_02 3 VHDL Libraries library IEEE; use IEEE.std_logic_1164.all; std_logic Single-bit

More information

EEL 4712 Digital Design Test 1 Spring Semester 2007

EEL 4712 Digital Design Test 1 Spring Semester 2007 IMPORTANT: Please be neat and write (or draw) carefully. If we cannot read it with a reasonable effort, it is assumed wrong. COVER SHEET: Problem: Points: 1 (15 pts) 2 (20 pts) Total 3 (15 pts) 4 (18 pts)

More information

Part 4: VHDL for sequential circuits. Introduction to Modeling and Verification of Digital Systems. Memory elements. Sequential circuits

Part 4: VHDL for sequential circuits. Introduction to Modeling and Verification of Digital Systems. Memory elements. Sequential circuits M1 Informatique / MOSIG Introduction to Modeling and erification of Digital Systems Part 4: HDL for sequential circuits Laurence PIERRE http://users-tima.imag.fr/amfors/lpierre/m1arc 2017/2018 81 Sequential

More information

VHDL is a hardware description language. The code describes the behavior or structure of an electronic circuit.

VHDL is a hardware description language. The code describes the behavior or structure of an electronic circuit. VHDL is a hardware description language. The code describes the behavior or structure of an electronic circuit. Its main applications include synthesis of digital circuits onto CPLD/FPGA (Complex Programmable

More information

CprE 583 Reconfigurable Computing

CprE 583 Reconfigurable Computing Recap Moore FSM Example CprE / ComS 583 Reconfigurable Computing Moore FSM that recognizes sequence 10 0 1 0 1 S0 / 0 S1 / 0 1 S2 / 1 Prof. Joseph Zambreno Department of Electrical and Computer Engineering

More information

VHDL: A Crash Course

VHDL: A Crash Course VHDL: A Crash Course Dr. Manuel Jiménez With contributions by: Irvin Ortiz Flores Electrical and Computer Engineering Department University of Puerto Rico - Mayaguez Outline Background Program Structure

More information

EE/CSCI 451 Midterm 1

EE/CSCI 451 Midterm 1 EE/CSCI 451 Midterm 1 Spring 2018 Instructor: Xuehai Qian Friday: 02/26/2018 Problem # Topic Points Score 1 Definitions 20 2 Memory System Performance 10 3 Cache Performance 10 4 Shared Memory Programming

More information

ECE331: Hardware Organization and Design

ECE331: Hardware Organization and Design ECE331: Hardware Organization and Design Lecture 9: Binary Addition & Multiplication Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Pop Quiz! Using 4 bits signed integer notation:

More information

Array Multipliers. Figure 6.9 The partial products generated in a 5 x 5 multiplication. Sec. 6.5

Array Multipliers. Figure 6.9 The partial products generated in a 5 x 5 multiplication. Sec. 6.5 Sec. 6.5 Array Multipliers I'r) 1'8 P7 p6 PS f'4 1'3 1'2 1' 1 "0 Figure 6.9 The partial products generated in a 5 x 5 multiplication. called itemrive arrc.ly multipliers or simply cirruy m~illil>liers.

More information

Lecture 1: VHDL Quick Start. Digital Systems Design. Fall 10, Dec 17 Lecture 1 1

Lecture 1: VHDL Quick Start. Digital Systems Design. Fall 10, Dec 17 Lecture 1 1 Lecture 1: VHDL Quick Start Digital Systems Design Fall 10, Dec 17 Lecture 1 1 Objective Quick introduction to VHDL basic language concepts basic design methodology Use The Student s Guide to VHDL or The

More information

CPE300: Digital System Architecture and Design

CPE300: Digital System Architecture and Design CPE300: Digital System Architecture and Design Fall 2011 MW 17:30-18:45 CBC C316 Arithmetic Unit 10122011 http://www.egr.unlv.edu/~b1morris/cpe300/ 2 Outline Recap Fixed Point Arithmetic Addition/Subtraction

More information

Parallelizing The Matrix Multiplication. 6/10/2013 LONI Parallel Programming Workshop

Parallelizing The Matrix Multiplication. 6/10/2013 LONI Parallel Programming Workshop Parallelizing The Matrix Multiplication 6/10/2013 LONI Parallel Programming Workshop 2013 1 Serial version 6/10/2013 LONI Parallel Programming Workshop 2013 2 X = A md x B dn = C mn d c i,j = a i,k b k,j

More information

COE 405 Design Methodology Based on VHDL

COE 405 Design Methodology Based on VHDL COE 405 Design Methodology Based on VHDL Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum & Minerals Outline Elements of VHDL Top-Down Design Top-Down Design with

More information

VHDL for Synthesis. Course Description. Course Duration. Goals

VHDL for Synthesis. Course Description. Course Duration. Goals VHDL for Synthesis Course Description This course provides all necessary theoretical and practical know how to write an efficient synthesizable HDL code through VHDL standard language. The course goes

More information

Very High Speed Integrated Circuit Har dware Description Language

Very High Speed Integrated Circuit Har dware Description Language Very High Speed Integrated Circuit Har dware Description Language Industry standard language to describe hardware Originated from work in 70 s & 80 s by the U.S. Departm ent of Defence Root : ADA Language

More information

EE878 Special Topics in VLSI. Computer Arithmetic for Digital Signal Processing

EE878 Special Topics in VLSI. Computer Arithmetic for Digital Signal Processing EE878 Special Topics in VLSI Computer Arithmetic for Digital Signal Processing Part 6c High-Speed Multiplication - III Spring 2017 Koren Part.6c.1 Array Multipliers The two basic operations - generation

More information

1. Prove that if you have tri-state buffers and inverters, you can build any combinational logic circuit. [4]

1. Prove that if you have tri-state buffers and inverters, you can build any combinational logic circuit. [4] HW 3 Answer Key 1. Prove that if you have tri-state buffers and inverters, you can build any combinational logic circuit. [4] You can build a NAND gate from tri-state buffers and inverters and thus you

More information

Area And Power Efficient LMS Adaptive Filter With Low Adaptation Delay

Area And Power Efficient LMS Adaptive Filter With Low Adaptation Delay e-issn: 2349-9745 p-issn: 2393-8161 Scientific Journal Impact Factor (SJIF): 1.711 International Journal of Modern Trends in Engineering and Research www.ijmter.com Area And Power Efficient LMS Adaptive

More information

CSE 260 Introduction to Digital Logic and Computer Design. Exam 1 Solutions

CSE 260 Introduction to Digital Logic and Computer Design. Exam 1 Solutions CSE 6 Introduction to igital Logic and Computer esign Exam Solutions Jonathan Turner /3/4. ( points) raw a logic diagram that implements the expression (B+C)(C +)(B+ ) directly (do not simplify first),

More information

Department of Electronics & Communication Engineering Lab Manual E-CAD Lab

Department of Electronics & Communication Engineering Lab Manual E-CAD Lab Department of Electronics & Communication Engineering Lab Manual E-CAD Lab Prasad V. Potluri Siddhartha Institute of Technology (Sponsored by: Siddhartha Academy of General & Technical Education) Affiliated

More information

Using Library Modules in VHDL Designs

Using Library Modules in VHDL Designs Using Library Modules in VHDL Designs This tutorial explains how Altera s library modules can be included in VHDL-based designs, which are implemented by using the Quartus R II software. Contents: Example

More information

3 Designing Digital Systems with Algorithmic State Machine Charts

3 Designing Digital Systems with Algorithmic State Machine Charts 3 Designing with Algorithmic State Machine Charts An ASM chart is a method of describing the sequential operations of a digital system which has to implement an algorithm. An algorithm is a well defined

More information

CSCI Lab 3. VHDL Syntax. Due: Tuesday, week6 Submit to: \\fs2\csci250\lab-3\

CSCI Lab 3. VHDL Syntax. Due: Tuesday, week6 Submit to: \\fs2\csci250\lab-3\ CSCI 250 - Lab 3 VHDL Syntax Due: Tuesday, week6 Submit to: \\fs2\csci250\lab-3\ Objectives 1. Learn VHDL Valid Names 2. Learn the presentation of Assignment and Comments 3. Learn Modes, Types, Array,

More information

ECE 545 Lecture 12. Datapath vs. Controller. Structure of a Typical Digital System Data Inputs. Required reading. Design of Controllers

ECE 545 Lecture 12. Datapath vs. Controller. Structure of a Typical Digital System Data Inputs. Required reading. Design of Controllers ECE 545 Lecture 12 Design of Controllers Finite State Machines and Algorithmic State Machine (ASM) Charts Required reading P. Chu, using VHDL Chapter 1, Finite State Machine: Principle & Practice Chapter

More information

isplever Parallel FIR Filter User s Guide October 2005 ipug06_02.0

isplever Parallel FIR Filter User s Guide October 2005 ipug06_02.0 isplever TM CORE Parallel FIR Filter User s Guide October 2005 ipug06_02.0 Introduction This document serves as a guide containing technical information about the Lattice Parallel FIR Filter core. Overview

More information

RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC. Zoltan Baruch

RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC. Zoltan Baruch RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC Zoltan Baruch Computer Science Department, Technical University of Cluj-Napoca, 26-28, Bariţiu St., 3400 Cluj-Napoca,

More information

Lecture 19: Arithmetic Modules 14-1

Lecture 19: Arithmetic Modules 14-1 Lecture 19: Arithmetic Modules 14-1 Syllabus Objectives Addition and subtraction Multiplication Division Arithmetic and logic unit 14-2 Objectives After completing this chapter, you will be able to: Describe

More information

CHAPTER 4: Register Transfer Language and Microoperations

CHAPTER 4: Register Transfer Language and Microoperations CS 224: Computer Organization S.KHABET CHAPTER 4: Register Transfer Language and Microoperations Outline Register Transfer Language Register Transfer Bus and Memory Transfers Arithmetic Microoperations

More information

Clocked Sequential System Design. Multiply Example

Clocked Sequential System Design. Multiply Example Clocked Sequential System Design Example 1 Multipliers (Gradeschool, Modified Gradeschool) Multiply Example (185) (215) 00000000 00000000 ------ 1001101101011111 (39775) 1 0000000000000000

More information

Using Library Modules in VHDL Designs

Using Library Modules in VHDL Designs Using Library Modules in VHDL Designs This tutorial explains how Altera s library modules can be included in VHDL-based designs, which are implemented by using the Quartus R II software. Contents: Example

More information