Sungmin Bae, Hyung-Ock Kim, Jungyun Choi, and Jaehong Park. Design Technology Infrastructure Design Center System-LSI Business Division
|
|
- Randolf Booker
- 5 years ago
- Views:
Transcription
1 Sungmin Bae, Hyung-Ock Kim, Jungyun Choi, and Jaehong Park Design Technology Infrastructure Design Center System-LSI Business Division
2 1. Motivation 2. Design flow 3. Parallel multiplier 4. Coarse-grained structural placement methodology 5. Experimental results 6. Future works
3 Data-flow (design structure) awareness is crucial to enhance physical design qualities. Timing, area, congestion, and power etc. Structured datapath placement is mostly done manually. In general, it is thought that placement tools do not perform well on the datapath designs. Design efforts; days ~ weeks Sum = A + B Floorplan Coarser Memory macro placement Control granularity Structured datapath placement Finer 3
4 We have added another methodology in the data-flow aware physical design. Automated extraction and mapping for a synthesized parallel multiplier. Sum = A * B Floorplan Coarser Logic Synthesis Memory Floorplan macro placement Coarse-grained structured Memory macro placement datapath placement Coarser Control granularity Control granularity Datapath template Automated datapath extraction and mapping Structured datapath placement Finer 4
5 Identify cells of a synthesized parallel multiplier to be structurally placed RTL code Parsing/Elaboration Technology library Timing/ Area constraints Inherent structural location extractions of the cells Analyze data-flow of the multiplier Logic Synthesis Arithmetic operation extraction High-level arithmetic optimizations Datapath generator Non-arithmetic logic High-level optimizations Structurally mapping the cells on a logical 2-D array Structural templates (Multiplier) Technology independent and dependent optimizations Optimized gate-level netlist Physical bit-slice alignment of the cells Generate structural relative placement directives Guide structural placement during global placement 5 N o Structure Extraction and Mapping Structural location inference/ Cell mapping Physical aware bit-slice alignment Structural relative placement directives Global Placement Coarse-grained structural placement Result satisfactory? Yes User Dataflow analysis N o
6 A parallel multiplier is one of the most abundant arithmetic circuits in today s multi-media feature intensive SoCs. Parallel multiplier largely consists of three parts. Partial product generation Partial product reduction Carry propagating adder (Final adder) Multiplicand Partial Product Multiplier Multiplicand Multiplier Y3 Y2 Y1 Y0 X3 X2 X1 X0 Partial Product Reduction Final Adder Multiplication in dot-notation Partial Products Final Product X0Y3 X0Y2 X0Y1 X0Y0 X1Y3 X1Y2 X1Y1 X1Y0 X2Y3 X2Y2 X2Y1 X2Y0 X3Y3 X3Y2 X3Y1 X3Y0 S7 S6 S5 S4 S3 S2 S1 S0 Final Product 6
7 Partial product generation Non-booth : generates the logical product of a multiplicand and multiplier (AND). Booth (Radix-4) : reduces the number of partial products to the half. Partial product reduction Carry-save addition : reduces every column to 2 output rows using compressor cell. Carry-propagate adder (final adder) Carry look ahead adder : adds the 2 output rows Multiplicand Partial Product Partial Product Reduction Final Adder Final Product Multiplier Multiplication in dot-notation Multiplicand Multiplier Partial Products Final Product Xi Partial Carry-propagate product generation reduction adder PPij PPi+2j-2 PPi+1j-1 PPij 3:2 3:2 PPi-1j+1 A2 B2 A1 B1 A0 B0 C0 Yj Cout FA FA FA Sum Cin C2 C1 S2 S1 S0 P2 G2 P1 G1 P0 G0 C3 Carry-look ahead unit Non-booth Booth
8 It performs 1. Identify cells of a synthesized parallel multiplier to be structurally placed The PI cells from the partial product generation The PO cells from the final adder 2. Inherent structural location extraction of the cells Tagging structural locations for the PI and PO cells RTL code Parsing/Elaboration Logic Synthesis Arithmetic operation extraction High-level arithmetic optimizations Datapath generator Structural templates (Multiplier) Technology library Technology independent and dependent optimizations Timing/ Area constraints Non-arithmetic logic High-level optimizations Optimized gate-level netlist 3. Analyze data-flow of the multiplier 4. Structurally mapping the cells on a logical 2-D array 5. Physical bit-slice alignment of the cells 6. Generate structural relative placement directives 7. Guide structural placement during global placement 8 N o Structure Extraction and Mapping Structural location inference/ Cell mapping Physical aware bit-slice alignment Structural relative placement directives Global Placement Coarse-grained structural placement Result satisfactory? Yes User Dataflow analysis N o
9 The PI cells from the partial product generation The PI cells are retrieved by the immediate fan-out cone cells of the input nets. A set of nets that to collect the PI cells differs depending on the type of the partial product generation. Non-booth : multiplicand and multiplier input nets Booth : multiplicand input nets Multiplicand Multiplier Partial product generation Partial Product Y3 Y2 Y1 Y0 X3 X2 X1 X0 Partial Product Reduction Final Adder Final Product Xi Yj PPij Non-booth Booth X1Y3 X1Y2 X1Y1 X1Y0 X2Y3 X2Y2 X2Y1 X2Y0 X3Y3 X3Y2 X3Y1 X3Y0 S7 S6 S5 S4 X0Y3 X0Y2 X0Y1 X0Y0 S3 S2 S1 S0 9
10 After extracting the PI cells, the PI cells are tagged by 2-D locations of a partial product row and column. Row inference Column inference The row of the PI cell can be inferred by its topologically closest multiplier inputs. Row inference i indicates the ith row of the partial product generator. - PIrow(Ck) : the row number of the PI cell Ck - PIcol(Ck) : the column number of the PI cell Ck - Bmd(Ck) : the closest multiplicand bit of Ck - Bmr(Ck) : the closest multiplier bit of Ck - PPtype : the partial product type Xi Yj PPij Non-booth Booth
11 The column of the PI cell can be inferred by its topologically closest and bitslice aligned multiplier output bit. Topological order propagation is restricted to only follow the same weighted bit-slice along the CSA tree. - Ignoring carry-out pins of the compressor cells. Column inference Find topologically closest and bit-slice aligned result. 3:2 3:2 Y3 Y2 Y1 Y0 3:2 3:2 X3 X2 X1 X0 3:2 3:2 3:2 X2Y3 X2Y2 X2Y1 X2Y0 X3Y3 X3Y2 X3Y1 X3Y0 X0Y3 X0Y2 X0Y1 X0Y0 X1Y3 X1Y2 X1Y1 X1Y0 Column[i+1] Column[i] S7 S6 S5 S4 S3 S2 S1 S0 11
12 The PO cells are parts of the final carry propagating adder. The PO cells are retrieved by the immediate fan-in cone cells of the output nets. Tags corresponding multiplier output bits to the PO cells Multiplicand Partial Product Multiplier Carry-propagate adder A2 B2 A1 B1 A0 B0 Partial Product Reduction Final Adder Final Product C3 FA FA FA C2 C1 S2 S1 S0 P2 G2 P1 G1 P0 G0 Carry-look ahead unit C0 12
13 It performs 1. Identify cells of a parallel multiplier to be structurally placed RTL code Parsing/Elaboration Logic Synthesis Technology library Timing/ Area constraints 2. Inherent structural location extraction of the cells Arithmetic operation extraction High-level arithmetic optimizations Non-arithmetic logic High-level optimizations 3. Structurally mapping the cells on a logical 2-D array 4. Analyze data-flow of the multiplier 5. Physical bit-slice alignment of the cells 6. Generate structural relative placement directives 7. Guide structural placement during global placement 13 N o Datapath generator Structural templates (Multiplier) Technology independent and dependent optimizations Structure Extraction and Mapping Structural location inference/ Cell mapping Physical aware bit-slice alignment Structural relative placement directives Global Placement Result satisfactory? Optimized gate-level netlist Coarse-grained structural placement Yes User Dataflow analysis N o
14 It performs 1. Identify cells of a parallel multiplier to be structurally placed RTL code Parsing/Elaboration Logic Synthesis Technology library Timing/ Area constraints 2. Inherent structural location extraction of the cells 3. Analyze data-flow of the multiplier Arithmetic operation extraction High-level arithmetic optimizations Datapath generator Non-arithmetic logic High-level optimizations 4. Structurally mapping the cells on a logical 2-D array Using the inferred row and column numbers. 5. Physical bit-slice alignment of the cells 6. Generate structural relative placement directives 7. Guide structural placement during global placement Structural templates (Multiplier) Technology independent and dependent optimizations Structure Extraction and Mapping Structural location inference/ Cell mapping Physical aware bit-slice alignment Structural relative placement directives Global Placement Optimized gate-level netlist Coarse-grained structural placement User Dataflow analysis N o Result satisfactory? Yes N o
15 The PI cells are mapped onto a logical 2-D array according to their tagged row and column numbers. However, the number of cells inferring to the same location can be uneven due to the local nature of logic synthesis optimizations. If enough slots are allocated for all the cells, the 2-D array may have uncontrollable aspect ratio which may degrade placement quality. The maximum number of columns is constrained to control the array dimension. The number of rows is fixed. Some mis-mappings are allowed. Slot sharing between adjacent columns. There are spacing between the rows of the 2-D array. Non-guided cells to be placed close to their inherent structural locations. 15
16 Min-cost max-flow based cell mapping to maximize the number of mapped PI cells with minimum mis-mapping cost for a given 2-D array. An initial 2-D slot array may not fully contain all the PI cells. It allows empty slot sharing between adjacent bit-slice columns. It iteratively add dummy (empty) column slots at columns with the worst mis-mapping costs during the mapping. PI Cell[i-1,0] PI Cell[i,0] Cost [0,0] CostSH [0,0] Cost [0,1] CostSH [0,0] Cost [0,0] Cost [0,n] CostDS [0,0] PI Cell[i+1,0] PI Cell[i+1,0] CostDS [0,0] The slots are divided into the three types for each column having different mapping cost weights. Non-shared : mapping weight γown j slots m slots k slots Shared : mapping weight γshared Dummy : mapping weight γdummy Column[i-1] Shared Slot Column[i] Dummy Column[i+1] Slot[i] Column[i+1] Capacity = m Shared Slot Capacity = j Capacity = m Capacity = m Capacity = k 16 Mis-mapping cost : γx* rowcell rowslot
17 HPWL is considered to compensate for net-connection blindness of the mapping as a tiebreaker for the mapping. Linear programming formulations of the weighted sum of min-cost max-flow for CostMA(ci) and HPWL minimization for CostHPWL(ni) CostMA(ci) : weighted sum of mis-mapping cost of cell ci CostHPWL(ni) : weighted sum of mis-mapping cost of cell ci Gradually add dummy column slots to minimize mis-mapping cost at columns with the worst mis-mapping cost, then solve the linear program iteratively. 17
18 It performs 1. Identify cells of a parallel multiplier to be structurally placed RTL code Parsing/Elaboration Logic Synthesis Technology library Timing/ Area constraints 2. Inherent structural location extraction of the cells 3. Analyze data-flow of the multiplier Arithmetic operation extraction High-level arithmetic optimizations Datapath generator Non-arithmetic logic High-level optimizations 4. Structurally mapping the cells on a logical 2-D array Structural templates (Multiplier) Technology independent and dependent optimizations Optimized gate-level netlist 5. Physical bit-slice alignment of the cells 6. Generate structural relative placement directives 7. Guide structural placement during global placement Structure Extraction and Mapping Structural location inference/ Cell mapping Physical aware bit-slice alignment Structural relative placement directives Global Placement Coarse-grained structural placement User Dataflow analysis N o Result satisfactory? Yes N o
19 The logically mapped PI and PO cells are then bit-slice aligned with respect to their physical dimension. Strict bit-slice alignment : a column width is decided by the widest cell among them - uncontrollable cell alignment size Ci,j-1 Ci,j Ci,j+1 Ci,j+2 Ci,j+3 i-1,j-1 Ci-1,j Ci-1,j+1 Ci-1,j+2 Ci-1,j+3 i-2,j-1 Ci-2,j Ci-2,j+1 Ci-2,j+2 Ci-2,j+3 Compression alignment : this generates a compact cell cluster - It cannot ensure vertical bit-slice alignment Ci,j-1 Ci,j Ci,j+1 Ci,j+2 Ci,j+3 Ci-1,j Ci-1,j+1 Ci-1,j+2 Ci-1,j+3 Ci-2,j Ci-2,j+1 Ci-2,j+2 Ci-2,j+3 19
20 Our method combines the advantages of the aforementioned methods. Align the columns within a maximum width constraint It performs bit slice misalignment minimization while ensuring a maximum alignment width. Misalignment at each column Ci,j-1 Ci,j Ci,j+1 Ci,j+2 Ci,j+3 i-1,j-1 Ci-1,j Ci-1,j+1 Ci-1,j+2 Ci-1,j+3 i-2,j-1 Ci-2,j Ci-2,j+1 Ci-2,j+2 Ci-2,j+3 Maximum width constraint 20
21 It performs 1. Identify cells of a parallel multiplier to be structurally placed RTL code Parsing/Elaboration Logic Synthesis Technology library Timing/ Area constraints 2. Inherent structural location extraction of the cells 3. Analyze data-flow of the multiplier Arithmetic operation extraction High-level arithmetic optimizations Datapath generator Non-arithmetic logic High-level optimizations 4. Structurally mapping the cells on a logical 2-D array 5. Physical bit-slice alignment of the cells 6. Generate structural relative placement directives The relative row and column locations of the cells The column spaces between the cells 7. Guide structural placement during global placement N o Structural templates (Multiplier) Technology independent and dependent optimizations Structure Extraction and Mapping Structural location inference/ Cell mapping Physical aware bit-slice alignment Structural relative placement directives Global Placement Result satisfactory? Optimized gate-level netlist Coarse-grained structural placement Yes User Dataflow analysis N o
22 After the bit-slice alignment, the structural locations and the cell spacings are transformed into structural relative placement directives. Relative row and column locations of the cells Cell spaces between the cells To accommodate the cell spaces, the number of the array column is set to be twice of the logical 2-D array. The compression based alignment is used to align the cell. An estimated dataflow direction is used to set the initial orientations of the arrays for global placement. Cell spacing Cell slots Space slots Ci,j-1 Ci,j Ci,j+1 Ci,j+2 Ci,j+3 Ci-1,j Ci-1,j+1 Ci-1,j+2 Ci-1,j+3 Ci-2,j Ci-2,j+1 Ci-2,j+2 Ci-2,j+3 22
23 It performs 1. Identify cells of a parallel multiplier to be structurally placed RTL code Parsing/Elaboration Logic Synthesis Technology library Timing/ Area constraints 2. Inherent structural location extraction of the cells 3. Analyze data-flow of the multiplier Arithmetic operation extraction High-level arithmetic optimizations Datapath generator Non-arithmetic logic High-level optimizations 4. Structurally mapping the cells on a logical 2-D array Structural templates (Multiplier) Technology independent and dependent optimizations Optimized gate-level netlist 5. Physical bit-slice alignment of the cells 6. Generate structural relative placement directives 7. Guide structural placement during global placement N o Structure Extraction and Mapping Structural location inference/ Cell mapping Physical aware bit-slice alignment Structural relative placement directives Global Placement Coarse-grained structural placement Result satisfactory? Yes User Dataflow analysis N o
24 Structural relative placement directives hold the locations of the PI and PO cells. Non-guided cells are attracted to the PI and PO cells. 13*12 non-booth multiplier 32*16 Booth multiplier 24
25 We implemented the proposed methodology in Tcl and CLP as a linear program solver. Commercial logic synthesis and P&R tools with industrial designs were used. About 2%, 42%, and 2% improvements in critical path delay, total negative slack, and total wire-length respectively. D11 degraded the physical implementation quality, which had about 25% of the inputs are pruned due to constant propagation, and was not sufficient for the approach. Design # Mults Area ratio CPD TNS Wirelength D D D D D D D D D D D Ave
26 A snapshot of D10 26
27 The future works will focus on Extending the methodology for other synthesized datapath circuits. Developing regularity measuring methods to avoid structurally mapping insufficiently regular multipliers. Adding more surround awareness to further automate the methodology. 27
28
EE878 Special Topics in VLSI. Computer Arithmetic for Digital Signal Processing
EE878 Special Topics in VLSI Computer Arithmetic for Digital Signal Processing Part 6c High-Speed Multiplication - III Spring 2017 Koren Part.6c.1 Array Multipliers The two basic operations - generation
More informationEE878 Special Topics in VLSI. Computer Arithmetic for Digital Signal Processing
EE878 Special Topics in VLSI Computer Arithmetic for Digital Signal Processing Part 6b High-Speed Multiplication - II Spring 2017 Koren Part.6b.1 Accumulating the Partial Products After generating partial
More informationUNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666
UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital Computer Arithmetic ECE 666 Part 6c High-Speed Multiplication - III Israel Koren Fall 2010 ECE666/Koren Part.6c.1 Array Multipliers
More informationUNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666
UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital Computer Arithmetic ECE 666 Part 6b High-Speed Multiplication - II Israel Koren ECE666/Koren Part.6b.1 Accumulating the Partial
More informationII. MOTIVATION AND IMPLEMENTATION
An Efficient Design of Modified Booth Recoder for Fused Add-Multiply operator Dhanalakshmi.G Applied Electronics PSN College of Engineering and Technology Tirunelveli dhanamgovind20@gmail.com Prof.V.Gopi
More informationDigital Computer Arithmetic
Digital Computer Arithmetic Part 6 High-Speed Multiplication Soo-Ik Chae Spring 2010 Koren Chap.6.1 Speeding Up Multiplication Multiplication involves 2 basic operations generation of partial products
More informationPartial product generation. Multiplication. TSTE18 Digital Arithmetic. Seminar 4. Multiplication. yj2 j = xi2 i M
TSTE8 igital Arithmetic Seminar 4 Oscar Gustafsson Multiplication Multiplication can typically be separated into three sub-problems Generating partial products Adding the partial products using a redundant
More informationIntroduction. A very important step in physical design cycle. It is the process of arranging a set of modules on the layout surface.
Placement Introduction A very important step in physical design cycle. A poor placement requires larger area. Also results in performance degradation. It is the process of arranging a set of modules on
More informationArray Multipliers. Figure 6.9 The partial products generated in a 5 x 5 multiplication. Sec. 6.5
Sec. 6.5 Array Multipliers I'r) 1'8 P7 p6 PS f'4 1'3 1'2 1' 1 "0 Figure 6.9 The partial products generated in a 5 x 5 multiplication. called itemrive arrc.ly multipliers or simply cirruy m~illil>liers.
More informationAt the ith stage: Input: ci is the carry-in Output: si is the sum ci+1 carry-out to (i+1)st state
Chapter 4 xi yi Carry in ci Sum s i Carry out c i+ At the ith stage: Input: ci is the carry-in Output: si is the sum ci+ carry-out to (i+)st state si = xi yi ci + xi yi ci + xi yi ci + xi yi ci = x i yi
More informationSum to Modified Booth Recoding Techniques For Efficient Design of the Fused Add-Multiply Operator
Sum to Modified Booth Recoding Techniques For Efficient Design of the Fused Add-Multiply Operator D.S. Vanaja 1, S. Sandeep 2 1 M. Tech scholar in VLSI System Design, Department of ECE, Sri VenkatesaPerumal
More informationECE331: Hardware Organization and Design
ECE331: Hardware Organization and Design Lecture 9: Binary Addition & Multiplication Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Pop Quiz! Using 4 bits signed integer notation:
More informationINTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume VII /Issue 2 / OCT 2016
NEW VLSI ARCHITECTURE FOR EXPLOITING CARRY- SAVE ARITHMETIC USING VERILOG HDL B.Anusha 1 Ch.Ramesh 2 shivajeehul@gmail.com 1 chintala12271@rediffmail.com 2 1 PG Scholar, Dept of ECE, Ganapathy Engineering
More informationMulti-Operand Addition Ivor Page 1
Multi-Operand Addition 1 Multi-Operand Addition Ivor Page 1 9.1 Motivation The motivation for multi-operand adders comes from the need for innerproduct calculations and multiplication (summing the partial
More informationFPGA Implementation of a High Speed Multiplier Employing Carry Lookahead Adders in Reduction Phase
FPGA Implementation of a High Speed Multiplier Employing Carry Lookahead Adders in Reduction Phase Abhay Sharma M.Tech Student Department of ECE MNNIT Allahabad, India ABSTRACT Tree Multipliers are frequently
More informationArea Efficient, Low Power Array Multiplier for Signed and Unsigned Number. Chapter 3
Area Efficient, Low Power Array Multiplier for Signed and Unsigned Number Chapter 3 Area Efficient, Low Power Array Multiplier for Signed and Unsigned Number Chapter 3 3.1 Introduction The various sections
More informationLecture 19: Arithmetic Modules 14-1
Lecture 19: Arithmetic Modules 14-1 Syllabus Objectives Addition and subtraction Multiplication Division Arithmetic and logic unit 14-2 Objectives After completing this chapter, you will be able to: Describe
More informationDesign of Arithmetic Units ECE152B AU 1
Design of Arithmetic Units ECE152B AU 1 Design of Arithmetic Units We will discuss the design of Adders/Substractors Multipliers/Dividers li id and analyze algorithms & methods to perform the desired d
More informationL14 - Placement and Routing
L14 - Placement and Routing Ajay Joshi Massachusetts Institute of Technology RTL design flow HDL RTL Synthesis manual design Library/ module generators netlist Logic optimization a b 0 1 s d clk q netlist
More informationChapter 3 Part 2 Combinational Logic Design
University of Wisconsin - Madison ECE/Comp Sci 352 Digital Systems Fundamentals Kewal K. Saluja and Yu Hen Hu Spring 2002 Chapter 3 Part 2 Combinational Logic Design Originals by: Charles R. Kime and Tom
More informationECE 30 Introduction to Computer Engineering
ECE 30 Introduction to Computer Engineering Study Problems, Set #6 Spring 2015 1. With x = 1111 1111 1111 1111 1011 0011 0101 0011 2 and y = 0000 0000 0000 0000 0000 0010 1101 0111 2 representing two s
More informationImplementation of Efficient Modified Booth Recoder for Fused Sum-Product Operator
Implementation of Efficient Modified Booth Recoder for Fused Sum-Product Operator A.Sindhu 1, K.PriyaMeenakshi 2 PG Student [VLSI], Dept. of ECE, Muthayammal Engineering College, Rasipuram, Tamil Nadu,
More informationAnalysis of Different Multiplication Algorithms & FPGA Implementation
IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 4, Issue 2, Ver. I (Mar-Apr. 2014), PP 29-35 e-issn: 2319 4200, p-issn No. : 2319 4197 Analysis of Different Multiplication Algorithms & FPGA
More informationMARKET demands urge embedded systems to incorporate
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 19, NO. 3, MARCH 2011 429 High Performance and Area Efficient Flexible DSP Datapath Synthesis Sotirios Xydis, Student Member, IEEE,
More informationCAD Algorithms. Placement and Floorplanning
CAD Algorithms Placement Mohammad Tehranipoor ECE Department 4 November 2008 1 Placement and Floorplanning Layout maps the structural representation of circuit into a physical representation Physical representation:
More informationHardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University
Hardware Design Environments Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Outline Welcome to COE 405 Digital System Design Design Domains and Levels of Abstractions Synthesis
More informationArithmetic Circuits. Nurul Hazlina Adder 2. Multiplier 3. Arithmetic Logic Unit (ALU) 4. HDL for Arithmetic Circuit
Nurul Hazlina 1 1. Adder 2. Multiplier 3. Arithmetic Logic Unit (ALU) 4. HDL for Arithmetic Circuit Nurul Hazlina 2 Introduction 1. Digital circuits are frequently used for arithmetic operations 2. Fundamental
More informationIterative-Constructive Standard Cell Placer for High Speed and Low Power
Iterative-Constructive Standard Cell Placer for High Speed and Low Power Sungjae Kim and Eugene Shragowitz Department of Computer Science and Engineering University of Minnesota, Minneapolis, MN 55455
More informationUnit 2: High-Level Synthesis
Course contents Unit 2: High-Level Synthesis Hardware modeling Data flow Scheduling/allocation/assignment Reading Chapter 11 Unit 2 1 High-Level Synthesis (HLS) Hardware-description language (HDL) synthesis
More informationBest Practices for Implementing ARM Cortex -A12 Processor and Mali TM -T6XX GPUs for Mid-Range Mobile SoCs.
Best Practices for Implementing ARM Cortex -A12 Processor and Mali TM -T6XX GPUs for Mid-Range Mobile SoCs. Cortex-A12: ARM-Cadence collaboration Joint team working on ARM Cortex -A12 irm flow irm content:
More informationInternational Journal of Research in Computer and Communication Technology, Vol 4, Issue 11, November- 2015
Design of Dadda Algorithm based Floating Point Multiplier A. Bhanu Swetha. PG.Scholar: M.Tech(VLSISD), Department of ECE, BVCITS, Batlapalem. E.mail:swetha.appari@gmail.com V.Ramoji, Asst.Professor, Department
More informationCPE300: Digital System Architecture and Design
CPE300: Digital System Architecture and Design Fall 2011 MW 17:30-18:45 CBC C316 Arithmetic Unit 10122011 http://www.egr.unlv.edu/~b1morris/cpe300/ 2 Outline Recap Fixed Point Arithmetic Addition/Subtraction
More information*Instruction Matters: Purdue Academic Course Transformation. Introduction to Digital System Design. Module 4 Arithmetic and Computer Logic Circuits
Purdue IM:PACT* Fall 2018 Edition *Instruction Matters: Purdue Academic Course Transformation Introduction to Digital System Design Module 4 Arithmetic and Computer Logic Circuits Glossary of Common Terms
More informationAn Efficient Fused Add Multiplier With MWT Multiplier And Spanning Tree Adder
An Efficient Fused Add Multiplier With MWT Multiplier And Spanning Tree Adder 1.M.Megha,M.Tech (VLSI&ES),2. Nataraj, M.Tech (VLSI&ES), Assistant Professor, 1,2. ECE Department,ST.MARY S College of Engineering
More informationTiming for Ripple Carry Adder
Timing for Ripple Carry Adder 1 2 3 Look Ahead Method 5 6 7 8 9 Look-Ahead, bits wide 10 11 Multiplication Simple Gradeschool Algorithm for 32 Bits (6 Bit Result) Multiplier Multiplicand AND gates 32
More informationEffective Improvement of Carry save Adder
Effective Improvement of Carry save Adder K.Nandini 1, A.Padmavathi 1, K.Pavithra 1, M.Selva Priya 1, Dr. P. Nithiyanantham 2 1 UG scholars, Department of Electronics, Jay Shriram Group of Institutions,
More informationCOE 561 Digital System Design & Synthesis Introduction
1 COE 561 Digital System Design & Synthesis Introduction Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum & Minerals Outline Course Topics Microelectronics Design
More informationHIGH PERFORMANCE FUSED ADD MULTIPLY OPERATOR
HIGH PERFORMANCE FUSED ADD MULTIPLY OPERATOR R. Alwin [1] S. Anbu Vallal [2] I. Angel [3] B. Benhar Silvan [4] V. Jai Ganesh [5] 1 Assistant Professor, 2,3,4,5 Student Members Department of Electronics
More informationPushPull: Short Path Padding for Timing Error Resilient Circuits YU-MING YANG IRIS HUI-RU JIANG SUNG-TING HO. IRIS Lab National Chiao Tung University
PushPull: Short Path Padding for Timing Error Resilient Circuits YU-MING YANG IRIS HUI-RU JIANG SUNG-TING HO IRIS Lab National Chiao Tung University Outline Introduction Problem Formulation Algorithm -
More informationVTU NOTES QUESTION PAPERS NEWS RESULTS FORUMS Arithmetic (a) The four possible cases Carry (b) Truth table x y
Arithmetic A basic operation in all digital computers is the addition and subtraction of two numbers They are implemented, along with the basic logic functions such as AND,OR, NOT,EX- OR in the ALU subsystem
More informationOPTIMIZING THE POWER USING FUSED ADD MULTIPLIER
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 11, November 2014,
More informationAutomated Extraction of Physical Hierarchies for Performance Improvement on Programmable Logic Devices
Automated Extraction of Physical Hierarchies for Performance Improvement on Programmable Logic Devices Deshanand P. Singh Altera Corporation dsingh@altera.com Terry P. Borer Altera Corporation tborer@altera.com
More informationHigh Speed Special Function Unit for Graphics Processing Unit
High Speed Special Function Unit for Graphics Processing Unit Abd-Elrahman G. Qoutb 1, Abdullah M. El-Gunidy 1, Mohammed F. Tolba 1, and Magdy A. El-Moursy 2 1 Electrical Engineering Department, Fayoum
More informationNumbering Systems. Number Representations Part 1
Introduction Verilog HDL modeling language allows numbers being represented in several radix systems. The underlying circuit processes the number in binary, however, input into and output from such circuits
More informationBinary Multiplication
inary Multiplication The key to multiplication was memorizing a digit-by-digit table Everything else was just adding 2 3 4 5 6 7 8 9 2 3 4 5 6 7 8 9 2 2 4 6 8 2 4 6 8 3 3 6 9 2 5 8 2 24 27 + You ve got
More informationStudy, Implementation and Survey of Different VLSI Architectures for Multipliers
Study, Implementation and Survey of Different VLSI Architectures for Multipliers Sonam Kandalgaonkar, Prof.K.R.Rasane Department of Electronics and Communication Engineering, VTU University KLE s College
More informationAdvanced Synthesis Techniques
Advanced Synthesis Techniques Reminder From Last Year Use UltraFast Design Methodology for Vivado www.xilinx.com/ultrafast Recommendations for Rapid Closure HDL: use HDL Language Templates & DRC Constraints:
More informationMapping Algorithms to Hardware By Prawat Nagvajara
Electrical and Computer Engineering Mapping Algorithms to Hardware By Prawat Nagvajara Synopsis This note covers theory, design and implementation of the bit-vector multiplication algorithm. It presents
More informationReview of Last lecture. Review ALU Design. Designing a Multiplier Shifter Design Review. Booth s algorithm. Today s Outline
Today s Outline San Jose State University EE176-SJSU Computer Architecture and Organization Lecture 5 HDL, ALU, Shifter, Booth Algorithm Multiplier & Divider Instructor: Christopher H. Pham Review of Last
More informationECE 5745 Complex Digital ASIC Design Topic 13: Physical Design Automation Algorithms
ECE 7 Complex Digital ASIC Design Topic : Physical Design Automation Algorithms Christopher atten School of Electrical and Computer Engineering Cornell University http://www.csl.cornell.edu/courses/ece7
More informationMore complicated than addition. Let's look at 3 versions based on grade school algorithm (multiplicand) More time and more area
Multiplication More complicated than addition accomplished via shifting and addition More time and more area Let's look at 3 versions based on grade school algorithm 01010010 (multiplicand) x01101101 (multiplier)
More informationA novel technique for fast multiplication
INT. J. ELECTRONICS, 1999, VOL. 86, NO. 1, 67± 77 A novel technique for fast multiplication SADIQ M. SAIT², AAMIR A. FAROOQUI GERHARD F. BECKHOFF and In this paper we present the design of a new high-speed
More informationICS 252 Introduction to Computer Design
ICS 252 Introduction to Computer Design Placement Fall 2007 Eli Bozorgzadeh Computer Science Department-UCI References and Copyright Textbooks referred (none required) [Mic94] G. De Micheli Synthesis and
More informationECE468 Computer Organization & Architecture. The Design Process & ALU Design
ECE6 Computer Organization & Architecture The Design Process & Design The Design Process "To Design Is To Represent" Design activity yields description/representation of an object -- Traditional craftsman
More informationManaging Dynamic Reconfiguration Overhead in Systems-on-a-Chip Design Using Reconfigurable Datapaths and Optimized Interconnection Networks
Managing Dynamic Reconfiguration Overhead in Systems-on-a-Chip Design Using Reconfigurable Datapaths and Optimized Interconnection Networks Zhining Huang, Sharad Malik Electrical Engineering Department
More informationOverview. EECS Components and Design Techniques for Digital Systems. Lec 16 Arithmetic II (Multiplication) Computer Number Systems.
Overview EE 15 - omponents and Design Techniques for Digital ystems Lec 16 Arithmetic II (Multiplication) Review of Addition Overflow Multiplication Further adder optimizations for multiplication LA in
More informationAn Interconnect-Centric Design Flow for Nanometer Technologies
An Interconnect-Centric Design Flow for Nanometer Technologies Jason Cong UCLA Computer Science Department Email: cong@cs.ucla.edu Tel: 310-206-2775 URL: http://cadlab.cs.ucla.edu/~cong Exponential Device
More informationVirtex-II Architecture
Virtex-II Architecture Block SelectRAM resource I/O Blocks (IOBs) edicated multipliers Programmable interconnect Configurable Logic Blocks (CLBs) Virtex -II architecture s core voltage operates at 1.5V
More informationCSE140 L. Instructor: Thomas Y. P. Lee January 18,2006. CSE140L Course Info
CSE4 L Instructor: Thomas Y. P. Lee January 8,26 CSE4L Course Info Lectures Wedesday :-:2AM, HSS33 Lab Assignment egins TA s JinHua Liu (jhliu@cs.ucsd.edu) Contact TAs if you re still looking for a lab
More informationClock Tree Resynthesis for Multi-corner Multi-mode Timing Closure
Clock Tree Resynthesis for Multi-corner Multi-mode Timing Closure Subhendu Roy 1, Pavlos M. Mattheakis 2, Laurent Masse-Navette 2 and David Z. Pan 1 1 ECE Department, The University of Texas at Austin
More informationCS 5803 Introduction to High Performance Computer Architecture: Arithmetic Logic Unit. A.R. Hurson 323 CS Building, Missouri S&T
CS 5803 Introduction to High Performance Computer Architecture: Arithmetic Logic Unit A.R. Hurson 323 CS Building, Missouri S&T hurson@mst.edu 1 Outline Motivation Design of a simple ALU How to design
More informationLaboratory 6. - Using Encounter for Automatic Place and Route. By Mulong Li, 2013
CME 342 (VLSI Circuit Design) Laboratory 6 - Using Encounter for Automatic Place and Route By Mulong Li, 2013 Reference: Digital VLSI Chip Design with Cadence and Synopsys CAD Tools, Erik Brunvand Background
More informationComputer Arithmetic Multiplication & Shift Chapter 3.4 EEC170 FQ 2005
Computer Arithmetic Multiplication & Shift Chapter 3.4 EEC170 FQ 200 Multiply We will start with unsigned multiply and contrast how humans and computers multiply Layout 8-bit 8 Pipelined Multiplier 1 2
More informationLearning Outcomes. Spiral 2-2. Digital System Design DATAPATH COMPONENTS
2-2. 2-2.2 Learning Outcomes piral 2-2 Arithmetic Components and Their Efficient Implementations I understand the control inputs to counters I can design logic to control the inputs of counters to create
More informationEliminating Routing Congestion Issues with Logic Synthesis
Eliminating Routing Congestion Issues with Logic Synthesis By Mike Clarke, Diego Hammerschlag, Matt Rardon, and Ankush Sood Routing congestion, which results when too many routes need to go through an
More informationPaper ID # IC In the last decade many research have been carried
A New VLSI Architecture of Efficient Radix based Modified Booth Multiplier with Reduced Complexity In the last decade many research have been carried KARTHICK.Kout 1, MR. to reduce S. BHARATH the computation
More informationVARUN AGGARWAL
ECE 645 PROJECT SPECIFICATION -------------- Design A Microprocessor Functional Unit Able To Perform Multiplication & Division Professor: Students: KRIS GAJ LUU PHAM VARUN AGGARWAL GMU Mar. 2002 CONTENTS
More informationTo design a 4-bit ALU To experimentally check the operation of the ALU
1 Experiment # 11 Design and Implementation of a 4 - bit ALU Objectives: The objectives of this lab are: To design a 4-bit ALU To experimentally check the operation of the ALU Overview An Arithmetic Logic
More informationInternational Journal of Engineering and Techniques - Volume 4 Issue 2, April-2018
RESEARCH ARTICLE DESIGN AND ANALYSIS OF RADIX-16 BOOTH PARTIAL PRODUCT GENERATOR FOR 64-BIT BINARY MULTIPLIERS K.Deepthi 1, Dr.T.Lalith Kumar 2 OPEN ACCESS 1 PG Scholar,Dept. Of ECE,Annamacharya Institute
More informationMULTIPLICATION TECHNIQUES
Learning Objectives EE 357 Unit 2a Multiplication Techniques Perform by hand the different methods for unsigned and signed multiplication Understand the various digital implementations of a multiplier
More informationAPPLICATION NOTE. Constant Coefficient Multipliers for the XC4000E. Introduction. High Performance = Constant Coefficient
APPLICATION NOTE Constant Coefficient Multipliers for the XC000E XAPP 05 December 11, 1996 (Version 1.1) Application Note by Ken Chapman Summary This paper identifies two points at which constant coefficient
More informationFILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS. Waqas Akram, Cirrus Logic Inc., Austin, Texas
FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS Waqas Akram, Cirrus Logic Inc., Austin, Texas Abstract: This project is concerned with finding ways to synthesize hardware-efficient digital filters given
More informationCS Computer Architecture. 1. Explain Carry Look Ahead adders in detail
1. Explain Carry Look Ahead adders in detail A carry-look ahead adder (CLA) is a type of adder used in digital logic. A carry-look ahead adder improves speed by reducing the amount of time required to
More informationOutline. Introduction to Structured VLSI Design. Signed and Unsigned Integers. 8 bit Signed/Unsigned Integers
Outline Introduction to Structured VLSI Design Integer Arithmetic and Pipelining Multiplication in the digital domain HW mapping Pipelining optimization Joachim Rodrigues Signed and Unsigned Integers n-1
More informationAnnouncements. Midterm 2 next Thursday, 6-7:30pm, 277 Cory Review session on Tuesday, 6-7:30pm, 277 Cory Homework 8 due next Tuesday Labs: project
- Fall 2002 Lecture 20 Synthesis Sequential Logic Announcements Midterm 2 next Thursday, 6-7:30pm, 277 Cory Review session on Tuesday, 6-7:30pm, 277 Cory Homework 8 due next Tuesday Labs: project» Teams
More informationDigital VLSI Design. Lecture 7: Placement
Digital VLSI Design Lecture 7: Placement Semester A, 2016-17 Lecturer: Dr. Adam Teman 29 December 2016 Disclaimer: This course was prepared, in its entirety, by Adam Teman. Many materials were copied from
More informationPlanning for Local Net Congestion in Global Routing
Planning for Local Net Congestion in Global Routing Hamid Shojaei, Azadeh Davoodi, and Jeffrey Linderoth* Department of Electrical and Computer Engineering *Department of Industrial and Systems Engineering
More informationFPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011
FPGA for Complex System Implementation National Chiao Tung University Chun-Jen Tsai 04/14/2011 About FPGA FPGA was invented by Ross Freeman in 1989 SRAM-based FPGA properties Standard parts Allowing multi-level
More informationWeek 7: Assignment Solutions
Week 7: Assignment Solutions 1. In 6-bit 2 s complement representation, when we subtract the decimal number +6 from +3, the result (in binary) will be: a. 111101 b. 000011 c. 100011 d. 111110 Correct answer
More informationISSN (Online)
Proposed FAM Unit with S-MB Techniques and Kogge Stone Adder using VHDL [1] Dhumal Ashwini Kashinath, [2] Asst. Prof. Shirgan Siddharudha Shivputra [1] [2] Department of Electronics and Telecommunication
More informationFPGA Implementation of Multiplier for Floating- Point Numbers Based on IEEE Standard
FPGA Implementation of Multiplier for Floating- Point Numbers Based on IEEE 754-2008 Standard M. Shyamsi, M. I. Ibrahimy, S. M. A. Motakabber and M. R. Ahsan Dept. of Electrical and Computer Engineering
More informationTOPIC : Verilog Synthesis examples. Module 4.3 : Verilog synthesis
TOPIC : Verilog Synthesis examples Module 4.3 : Verilog synthesis Example : 4-bit magnitude comptarator Discuss synthesis of a 4-bit magnitude comparator to understand each step in the synthesis flow.
More informationIntroduction to Electronic Design Automation. Model of Computation. Model of Computation. Model of Computation
Introduction to Electronic Design Automation Model of Computation Jie-Hong Roland Jiang 江介宏 Department of Electrical Engineering National Taiwan University Spring 03 Model of Computation In system design,
More informationHigh-Level Synthesis
High-Level Synthesis 1 High-Level Synthesis 1. Basic definition 2. A typical HLS process 3. Scheduling techniques 4. Allocation and binding techniques 5. Advanced issues High-Level Synthesis 2 Introduction
More informationEmbedded Soc using High Performance Arm Core Processor D.sridhar raja Assistant professor, Dept. of E&I, Bharath university, Chennai
Embedded Soc using High Performance Arm Core Processor D.sridhar raja Assistant professor, Dept. of E&I, Bharath university, Chennai Abstract: ARM is one of the most licensed and thus widespread processor
More informationBinary Arithmetic. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T.
Binary Arithmetic Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. MIT 6.004 Fall 2018 Reminder: Encoding Positive Integers Bit i in a binary representation (in right-to-left order)
More informationCAD Flow for FPGAs Introduction
CAD Flow for FPGAs Introduction What is EDA? o EDA Electronic Design Automation or (CAD) o Methodologies, algorithms and tools, which assist and automatethe design, verification, and testing of electronic
More informationFloorplan Management: Incremental Placement for Gate Sizing and Buffer Insertion
Floorplan Management: Incremental Placement for Gate Sizing and Buffer Insertion Chen Li, Cheng-Kok Koh School of ECE, Purdue University West Lafayette, IN 47907, USA {li35, chengkok}@ecn.purdue.edu Patrick
More informationAn instruction set processor consist of two important units: Data Processing Unit (DataPath) Program Control Unit
DataPath Design An instruction set processor consist of two important units: Data Processing Unit (DataPath) Program Control Unit Add & subtract instructions for fixed binary numbers are found in the
More informationAn Efficient Design of Sum-Modified Booth Recoder for Fused Add-Multiply Operator
An Efficient Design of Sum-Modified Booth Recoder for Fused Add-Multiply Operator M.Chitra Evangelin Christina Associate Professor Department of Electronics and Communication Engineering Francis Xavier
More informationMAPLE: Multilevel Adaptive PLacEment for Mixed Size Designs
MAPLE: Multilevel Adaptive PLacEment for Mixed Size Designs Myung Chul Kim, Natarajan Viswanathan, Charles J. Alpert, Igor L. Markov, Shyam Ramji Dept. of EECS, University of Michigan IBM Corporation 1
More informationFloorplan and Power/Ground Network Co-Synthesis for Fast Design Convergence
Floorplan and Power/Ground Network Co-Synthesis for Fast Design Convergence Chen-Wei Liu 12 and Yao-Wen Chang 2 1 Synopsys Taiwan Limited 2 Department of Electrical Engineering National Taiwan University,
More informationChapter 3 Arithmetic for Computers
Chapter 3 Arithmetic for Computers 1 Arithmetic Where we've been: Abstractions: Instruction Set Architecture Assembly Language and Machine Language What's up ahead: Implementing the Architecture operation
More informationMULTIPLE OPERAND ADDITION. Multioperand Addition
MULTIPLE OPERAND ADDITION Chapter 3 Multioperand Addition Add up a bunch of numbers Used in several algorithms Multiplication, recurrences, transforms, and filters Signed (two s comp) and unsigned Don
More informationTRILOBYTE SYSTEMS. Consistent Timing Constraints with PrimeTime. Steve Golson Trilobyte Systems.
TRILOBYTE SYSTEMS Consistent Timing Constraints with PrimeTime Steve Golson Trilobyte Systems http://www.trilobyte.com 2 Physical implementation Rule #1 Do not change the functionality Rule #2 Meet the
More informationIEEE-754 compliant Algorithms for Fast Multiplication of Double Precision Floating Point Numbers
International Journal of Research in Computer Science ISSN 2249-8257 Volume 1 Issue 1 (2011) pp. 1-7 White Globe Publications www.ijorcs.org IEEE-754 compliant Algorithms for Fast Multiplication of Double
More informationVerilog for Combinational Circuits
Verilog for Combinational Circuits Lan-Da Van ( 范倫達 ), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C. Fall, 2014 ldvan@cs.nctu.edu.tw http://www.cs.nctu.edu.tw/~ldvan/
More informationFastPlace 2.0: An Efficient Analytical Placer for Mixed- Mode Designs
FastPlace.0: An Efficient Analytical Placer for Mixed- Mode Designs Natarajan Viswanathan Min Pan Chris Chu Iowa State University ASP-DAC 006 Work supported by SRC under Task ID: 106.001 Mixed-Mode Placement
More informationECE 341. Lecture # 6
ECE 34 Lecture # 6 Instructor: Zeshan Chishti zeshan@pdx.edu October 5, 24 Portland State University Lecture Topics Design of Fast Adders Carry Looakahead Adders (CLA) Blocked Carry-Lookahead Adders Multiplication
More informationCHAPTER 3 METHODOLOGY. 3.1 Analysis of the Conventional High Speed 8-bits x 8-bits Wallace Tree Multiplier
CHAPTER 3 METHODOLOGY 3.1 Analysis of the Conventional High Speed 8-bits x 8-bits Wallace Tree Multiplier The design analysis starts with the analysis of the elementary algorithm for multiplication by
More information