ECE 5775 (Fall 17) High-Level Digital Design Automation. More Pipelining

Size: px
Start display at page:

Download "ECE 5775 (Fall 17) High-Level Digital Design Automation. More Pipelining"

Transcription

1 ECE 5775 (Fall 17) High-Level Digital Design Automation More Pipelining

2 Announcements HW 2 due Monday 10/16 (no late submission) Second round paper 5pm tomorrow on Piazza Talk by Prof. Margaret Martonosi 4:30pm today PHL 203 on End of Moore s Law Challenges and Opportunities: Computer Architecture Perspectives 5pm instructor office hour cancelled Midterm logistics Exam time: Thursday 10/19 in class (75 mins) Open book, open notes, closed Internet In-class recitation on Tuesday 10/17 Instructor office hour next Thursday moved to Wednesday 10/18, 4:30-5:30pm (Rhodes 320) 1

3 Agenda Midterm coverage Modulo scheduling Case studies on pipelining 2

4 Midterm (20%) Topics covered Specialized computing FPGAs C-based synthesis Control flow graph and SSA Scheduling Resource sharing Pipelining 3

5 Key Topics (1) Algorithm basics Time complexity, esp. big-o notation Graphs Trees, DAGs, topological sort BDDs, timing analysis FPGAs LUTs and LUT mapping C-based synthesis Arbitrary precision and fixed-point types Basic C-to-RTL mapping Key HLS optimizations to improve design performance 4

6 Key Topics (2) Compiler analysis Control data flow graph Dominance relation SSA Scheduling TCS & RCS algorithms: ILP, list scheduling, SDC Operation chaining, frequency/latency/resource constraints Ability to devise a simple scheduling algorithm to optimize certain design metric 5

7 Key Topics (3) Resource sharing Conflict and compatibility graphs Left edge algorithm Ability to determine minimum resource usage in # of functional units and/or registers, given a fixed schedule Pipelining Dependence types Ability to determine minimum II given a code snippet Modulo scheduling concepts: MII, RecMII, ResMII 6

8 Review: Dependence Graph Data dependences of a loop often represented by a dependence graph Forward edges: Intra-iteration (loopindependent) dependences Back edges: Inter-iteration (loop-carried) dependences Edges are annotated with distance values: number of iterations separating the two dependent operations involved [1] [0] v 1 [0] v 2 v 3 [0] [0] [2] Recurrence manifests itself as a circuit in the dependence graph v 4 Edges annotated with distance values 7

9 Pipelining through Modulo Scheduling Dependence graph: LD + Schedule: 0 1 LD + II = ST II = 2 Time (cycle) Iteration 0 LD + ST LD + ST ST LD + ST LD + ST LD ST + Initiation Interval (II) slot 0 slot 1 Steady state (II cycles) Steady state determines both performance resource usage 8

10 Modulo Scheduling A regular form of software pipelining technique Also applies to loop (/ function) pipelining for hardware synthesis Loop iterations (/ function) use the same schedule, which are initiated at a constant rate Advantages of modulo scheduling Easy to analyze Steady state determines performance and resource usage Cost efficient No code or hardware replication Optimization objective: (1) Minimize II under resource constraints or (2) minimize resource usage under II constraint NP-hard in general Optimal polynomial time solution exists without recurrences or resource constraints 9

11 Algorithmic Scheme for Modulo Scheduling Common scheme of heuristic algorithms Find MII (a lower bound on II) = max { ResMII, RecMII) Look for a schedule with given II If a feasible schedule not found, increase II and try again Find MII and set II = MII Look for a schedule Found it? No Increase II Yes 10

12 Calculating Lower Bound of Initiation Interval Minimum possible II (MII) MII = max(resmii, RecMII) A lower bound, not necessary achievable Resource constrained MII (ResMII) ResMII = max i éops(r i ) / Limit(r i )ù OPs(r): number of operations that use resource of type r Limit(r): number of available resources of type r Recurrence constrained MII (RecMII) RecMII = max i élatency(c i ) / Distance(c i )ù Latency(c i ): total latency in dependence circuit c i Distance(c i ): total distance in dependence circuit c i 11

13 Minimum II due to Resource Limits (ResMII) Dependence Reservation tables adders a0 a1 a2 a3 time i0 i1 i2 i3 i0 i1 i2 i0 i1 i0 4 5 i4 i5 i3 i4 i2 i3 i1 i2 0, 1, 2, 3, : time (clock cycles) a0, a1, a2, a3 : available adders i0, i1, i2, : loop iterations adders a0 a1 i0 i0 i1 i1 i0 i0 i2 i2 i3 i3 i1 i1 i2 i2 due to limited resources, cannot initiate iterations less than 2 cycles apart Compute ResMII: Max among all types of resources ResMII = max i éops(r i ) / Limit(r i )ù 12

14 Minimum II due to Recurrences (RecMII) Dependence Schedule (II=2) Dependence Schedule (II=1) [0] a b [1] [1] dependence distance = 1 a b a [0] a b [3] [3] dependence distance = 3 b Assume no operation chaining a b a b a b a b Compute Recurrence Minimum II (RecMII): Max among all circuits of: RecMII = max i élatency(c i ) / Distance(c i )ù Latency(c) : sum of operation latencies along circuit c Distance(c) : sum of dependence distances along circuit c 13

15 SDC-Based Modulo Scheduling SDC-based modulo scheduling Unifies intra-iteration and interiteration scheduling constraints in a single SDC Iterative algorithm with efficient incremental SDC update Loop Find Minimum II Model intra-iteration scheduling constraints Model inter-iteration scheduling constraints SDC feasible? Yes Incremental scheduling No Fail Increase II Schedule 14

16 Example: Modeling Loop-Carried Dependence The dependence between two operations from different iterations is termed inter-iteration (loop-carried) dependence Loop-carried dependence u à v with Dist(u, v) = K s u + Lat u s v + K* II A[i] C[i] ld ld v 1 v 2 II = 2 s 5 s 1 + 2*II for (i = 0; i < N-2; i++) { B[i] = A[i] * C[i]; A[i+2] = B[i] + C[i]; } v 3 + st v 4 v 5 ld v 1 v 3 ld v 2 Dist(v 5, v 1 ) = 2 A[i+2] + v 4 ld ld v 1 v 2 st v 5 v 3 15

17 SDC Constraint Graph The constraint inequalities form a system of difference constraints SDC(X, C) X denotes the variable set C denotes the constraint set The constraint graph G c (V c, E c ) of SDC(X, C) is a weighted directed graph x i Î X : v i Î V c x i x j b k Î C : e(v i, v j ) ÎE c and weight(e) = b k 16

18 Feasiblity Checking SDC Constraint Graph G c X: {x 1, x 2, x 3, x 4, x 5 } v 1 0 x 1 x 2 0 x 1 x 5-1 x 2 x 5 1 x 3 x 1-4 x 1 x 4 3 x 4 x 3-1 v 5-1 v v 2 x 3 x 1-4 x 1 x 4 3 x 4 x 3-1 à 0-2 Contradiction! v 3 An SDC(X, C) is feasible if and only if its constraint graph G c does not contain any negative cycles Feasibility check can be done by solving a single-source shortest path problem 17

19 Case Study: Prefix Sum Prefix sum computes a cumulative sum of a sequence of numbers commonly used in many applications such as radix sort, histogram, etc. void prefixsum ( int in[n], int out[n] ) out[0] = in[0]; for ( int i = 1; i < N; i++ ) { #pragma HLS pipeline II=? out[i] = out[i-1]+ in[i]; } } out[0] = in[0]; out[1] = in[0] + in[1]; out[1] = in[0] + in[1] + in[2]; out[1] = in[0] + in[1] + in[2] + in[3]; 18

20 Prefix Sum: RecMII Loop-carried dependence exists between to reads on out[] Assume chaining is not possible on memory reads (i.e., ld) and writes (i.e., st) due to target cycle time RecMII = 3 in[i] ld 2 out[i-1] ld 1 out[0] = in[0]; for ( int i = 1; i < N; i++ ) out[i] = out[i-1]+ in[i]; out[i] + st ld Load st Store cycle 1 cycle 2 cycle 3 cycle 4 i = 0 ld 1 ld 2 + st i = 1 ld 1 II = 1 + st ld 2 Assume chaining is not possible on memory reads (i.e., ld) and writes (i.e., st) due to cycle time constraint 19

21 Prefix Sum: Code Optimization Introduce an intermediate variable tmp to hold the running sum from the previous in[] values Shorter dependence circuit leads to RecMII = 1 in[i] ld + tmp int tmp = in[0]; for ( int i = 1; i < N; i++ ) { tmp += in[i]; out[i] = tmp; } out[i] st ld Load st Store cycle 1 cycle 2 cycle 3 cycle 4 i = 0 ld + st i = 1 ld + st II = 1 20

22 Case Study: Convolution for Image Processing A common computation of image/video processing is performed over overlapping stencils, termed as convolution Input image frame x3 convolution Output image frame 21

23 Achieving High Throughput with Pipelining for (r = 1; r < R; r++) for (c = 1; c < C; c++) { #pragma HLS pipeline II=? for (i = 0; i < 3; i++) for (j = 0; j < 3; j++) out[r][c] += img[r+i-1][c+j-1] * f[i][j]; } Inner loops (i & j) are automatically unrolled With a 3x3 convolution kernel, 9 pixels are required for calculating the value of one output pixel If the entire input image is stored in an on-chip buffer with two read ports ResMII =? What about RecMII? 22

24 Achieving II=1 for 3x3 Convolution Pixels in line buffer (2 lines stored) Push three pixels into shift register window: one new pixel plus two pixels from line buffer New pixel fetched from input stream or frame buffer in off-chip memory Output pixel produced by one convolution operation 23

25 Resulting Specialized Memory Hierarchy Memory architecture customized for convolution Input pixel stream Flip- Flops Convolve Output pixel stream Processing window Line buffers Frame buffers On-chip SRAMs Frame n-2 Frame n-1 Frame n Off-chip DDR 24

26 HLS Code Snippet 1 LineBuffer<2,C,pixel_t> linebuf; 2 Window<3,3,pixel_t> window; 3 for (int r = 1; r < R+1; r++) { 4 for (int c = 1; c < C+1; c++) { 5 #pragma HLS pipeline II=1 6 pixel_t new_pixel = img[r][c]; 7 // Update shift window 8 window.shift_left(); 9 if (r < R && c < C) { 10 for (int i = 0; i < 2; i++ ) 11 window.insert(buf[i][c]); 12 } 13 else { // zero padding 14 for (int i = 0; i < 2; i++) 15 window.insert(0); 16 } 17 window.insert(new_pixel); 18 // Update line buffer 19 linebuf.shift_up(c); 20 if (r < R && c < C) 21 linebuf[1].insert(c, new_pixel); 22 else // Zero padding 23 linebuf[1].insert(c, 0); 24 // Perform 3x3 convolution 25 out[r-1][c-1] = convolve(window, weights); 26 } 27 } 25

27 Summary Pipelining is one of the most commonly used techniques in HLS to boost performance of the synthesized hardware Recurrences and resource restrictions limit the pipeline throughput Modulo scheduling A regular form of software pipeline technique Also applies to loop pipelining for hardware synthesis NP-hard problem in general 26

28 Acknowledgements These slides contain/adapt materials developed by Prof. Ryan Kastner (UCSD) Prof. Scott Mahlke (UMich) Dr. Stephen Neuendorffer (Xilinx) 27

Topic 12: Cyclic Instruction Scheduling

Topic 12: Cyclic Instruction Scheduling Topic 12: Cyclic Instruction Scheduling COS 320 Compiling Techniques Princeton University Spring 2015 Prof. David August 1 Overlap Iterations Using Pipelining time Iteration 1 2 3 n n 1 2 3 With hardware

More information

ECE 5775 (Fall 17) High-Level Digital Design Automation. More Binding Pipelining

ECE 5775 (Fall 17) High-Level Digital Design Automation. More Binding Pipelining ECE 5775 (Fall 17) High-Level Digital Design Automation More Binding Pipelining Logistics Lab 3 due Friday 10/6 No late penalty for this assignment (up to 3 days late) HW 2 will be posted tomorrow 1 Agenda

More information

EECS 583 Class 13 Software Pipelining

EECS 583 Class 13 Software Pipelining EECS 583 Class 13 Software Pipelining University of Michigan October 29, 2012 Announcements + Reading Material Project proposals» Due Friday, Nov 2, 5pm» 1 paragraph summary of what you plan to work on

More information

ECE 5775 High-Level Digital Design Automation Fall More CFG Static Single Assignment

ECE 5775 High-Level Digital Design Automation Fall More CFG Static Single Assignment ECE 5775 High-Level Digital Design Automation Fall 2018 More CFG Static Single Assignment Announcements HW 1 due next Monday (9/17) Lab 2 will be released tonight (due 9/24) Instructor OH cancelled this

More information

ECE 5775 High-Level Digital Design Automation Fall Fixed-Point Types Analysis of Algorithms

ECE 5775 High-Level Digital Design Automation Fall Fixed-Point Types Analysis of Algorithms ECE 5775 High-Level Digital Design Automation Fall 2018 Fixed-Point Types Analysis of Algorithms Announcements Lab 1 on CORDIC is released Due Monday 9/10 @11:59am Part-time PhD TA: Hanchen Jin (hj424)

More information

ECE 5775 Student-Led Discussions (10/16)

ECE 5775 Student-Led Discussions (10/16) ECE 5775 Student-Led Discussions (10/16) Talks: 18-min talk + 2-min Q&A Adam Macioszek, Julia Currie, Nick Sarkis Sparse Matrix Vector Multiplication Nick Comly, Felipe Fortuna, Mark Li, Serena Krech Matrix

More information

Mapping-Aware Constrained Scheduling for LUT-Based FPGAs

Mapping-Aware Constrained Scheduling for LUT-Based FPGAs Mapping-Aware Constrained Scheduling for LUT-Based FPGAs Mingxing Tan, Steve Dai, Udit Gupta, Zhiru Zhang School of Electrical and Computer Engineering Cornell University High-Level Synthesis (HLS) for

More information

CE431 Parallel Computer Architecture Spring Compile-time ILP extraction Modulo Scheduling

CE431 Parallel Computer Architecture Spring Compile-time ILP extraction Modulo Scheduling CE431 Parallel Computer Architecture Spring 2018 Compile-time ILP extraction Modulo Scheduling Nikos Bellas Electrical and Computer Engineering University of Thessaly Parallel Computer Architecture 1 Readings

More information

LegUp HLS Tutorial for Microsemi PolarFire Sobel Filtering for Image Edge Detection

LegUp HLS Tutorial for Microsemi PolarFire Sobel Filtering for Image Edge Detection LegUp HLS Tutorial for Microsemi PolarFire Sobel Filtering for Image Edge Detection This tutorial will introduce you to high-level synthesis (HLS) concepts using LegUp. You will apply HLS to a real problem:

More information

Lecture 19. Software Pipelining. I. Example of DoAll Loops. I. Introduction. II. Problem Formulation. III. Algorithm.

Lecture 19. Software Pipelining. I. Example of DoAll Loops. I. Introduction. II. Problem Formulation. III. Algorithm. Lecture 19 Software Pipelining I. Introduction II. Problem Formulation III. Algorithm I. Example of DoAll Loops Machine: Per clock: 1 read, 1 write, 1 (2-stage) arithmetic op, with hardware loop op and

More information

Unit 2: High-Level Synthesis

Unit 2: High-Level Synthesis Course contents Unit 2: High-Level Synthesis Hardware modeling Data flow Scheduling/allocation/assignment Reading Chapter 11 Unit 2 1 High-Level Synthesis (HLS) Hardware-description language (HDL) synthesis

More information

High-Level Synthesis

High-Level Synthesis High-Level Synthesis 1 High-Level Synthesis 1. Basic definition 2. A typical HLS process 3. Scheduling techniques 4. Allocation and binding techniques 5. Advanced issues High-Level Synthesis 2 Introduction

More information

High-Level Synthesis (HLS)

High-Level Synthesis (HLS) Course contents Unit 11: High-Level Synthesis Hardware modeling Data flow Scheduling/allocation/assignment Reading Chapter 11 Unit 11 1 High-Level Synthesis (HLS) Hardware-description language (HDL) synthesis

More information

Introduction to High level. Synthesis

Introduction to High level. Synthesis Introduction to High level Synthesis LISHA/UFSC Prof. Dr. Antônio Augusto Fröhlich Tiago Rogério Mück http://www.lisha.ufsc.br/~guto June 2007 http://www.lisha.ufsc.br/ 1 What is HLS? Example: High level

More information

Lecture 21. Software Pipelining & Prefetching. I. Software Pipelining II. Software Prefetching (of Arrays) III. Prefetching via Software Pipelining

Lecture 21. Software Pipelining & Prefetching. I. Software Pipelining II. Software Prefetching (of Arrays) III. Prefetching via Software Pipelining Lecture 21 Software Pipelining & Prefetching I. Software Pipelining II. Software Prefetching (of Arrays) III. Prefetching via Software Pipelining [ALSU 10.5, 11.11.4] Phillip B. Gibbons 15-745: Software

More information

ECE 5775 (Fall 17) High-Level Digital Design Automation. Static Single Assignment

ECE 5775 (Fall 17) High-Level Digital Design Automation. Static Single Assignment ECE 5775 (Fall 17) High-Level Digital Design Automation Static Single Assignment Announcements HW 1 released (due Friday) Student-led discussions on Tuesday 9/26 Sign up on Piazza: 3 students / group Meet

More information

Lecture 6 MIPS R4000 and Instruction Level Parallelism. Computer Architectures S

Lecture 6 MIPS R4000 and Instruction Level Parallelism. Computer Architectures S Lecture 6 MIPS R4000 and Instruction Level Parallelism Computer Architectures 521480S Case Study: MIPS R4000 (200 MHz, 64-bit instructions, MIPS-3 instruction set) 8 Stage Pipeline: first half of fetching

More information

ECE 2300 Digital Logic & Computer Organization. More Verilog Finite State Machines

ECE 2300 Digital Logic & Computer Organization. More Verilog Finite State Machines ECE 2300 Digital Logic & Computer Organization Spring 2018 More Verilog Finite Machines Lecture 8: 1 Prelim 1, Thursday 3/1, 1:25pm, 75 mins Arrive early by 1:20pm Review sessions Announcements Monday

More information

Resolve: Generation of High Performance Sorting Architectures from High Level Synthesis

Resolve: Generation of High Performance Sorting Architectures from High Level Synthesis Resolve: Generation of High Performance Sorting Architectures from High Level Synthesis Janarbek Matai, Dustin Richmond, Dajung Lee, Zac Blair, Qiongzhi Wu, Amin Abazari, Ryan Kastner Department of Computer

More information

Multiple Instruction Issue. Superscalars

Multiple Instruction Issue. Superscalars Multiple Instruction Issue Multiple instructions issued each cycle better performance increase instruction throughput decrease in CPI (below 1) greater hardware complexity, potentially longer wire lengths

More information

Instruction Scheduling. Software Pipelining - 3

Instruction Scheduling. Software Pipelining - 3 Instruction Scheduling and Software Pipelining - 3 Department of Computer Science and Automation Indian Institute of Science Bangalore 560 012 NPTEL Course on Principles of Compiler Design Instruction

More information

Many ways to build logic out of MOSFETs

Many ways to build logic out of MOSFETs Many ways to build logic out of MOSFETs pass transistor logic (most similar to the first switch logic we saw) static CMOS logic (what we saw last time) dynamic CMOS logic Clock=0 precharges X through the

More information

ECE 5775 (Fall 17) High-Level Digital Design Automation. Fixed-Point Types Analysis of Algorithms

ECE 5775 (Fall 17) High-Level Digital Design Automation. Fixed-Point Types Analysis of Algorithms ECE 5775 (Fall 17) High-Level Digital Design Automation Fixed-Point Types Analysis of Algorithms Announcements Lab 1 on CORDIC is released Due Friday 9/8 @11:59am MEng TA: Jeffrey Witz (jmw483) Office

More information

Introduction to Electronic Design Automation. Model of Computation. Model of Computation. Model of Computation

Introduction to Electronic Design Automation. Model of Computation. Model of Computation. Model of Computation Introduction to Electronic Design Automation Model of Computation Jie-Hong Roland Jiang 江介宏 Department of Electrical Engineering National Taiwan University Spring 03 Model of Computation In system design,

More information

HIGH-LEVEL SYNTHESIS

HIGH-LEVEL SYNTHESIS HIGH-LEVEL SYNTHESIS Page 1 HIGH-LEVEL SYNTHESIS High-level synthesis: the automatic addition of structural information to a design described by an algorithm. BEHAVIORAL D. STRUCTURAL D. Systems Algorithms

More information

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Hardware Design Environments Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Outline Welcome to COE 405 Digital System Design Design Domains and Levels of Abstractions Synthesis

More information

Course Overview Revisited

Course Overview Revisited Course Overview Revisited void blur_filter_3x3( Image &in, Image &blur) { // allocate blur array Image blur(in.width(), in.height()); // blur in the x dimension for (int y = ; y < in.height(); y++) for

More information

ESE532: System-on-a-Chip Architecture. Today. Message. Real Time. Real-Time Tasks. Real-Time Guarantees. Real Time Demands Challenges

ESE532: System-on-a-Chip Architecture. Today. Message. Real Time. Real-Time Tasks. Real-Time Guarantees. Real Time Demands Challenges ESE532: System-on-a-Chip Architecture Day 9: October 1, 2018 Real Time Real Time Demands Challenges Today Algorithms Architecture Disciplines to achieve Penn ESE532 Fall 2018 -- DeHon 1 Penn ESE532 Fall

More information

Lab 3 Sequential Logic for Synthesis. FPGA Design Flow.

Lab 3 Sequential Logic for Synthesis. FPGA Design Flow. Lab 3 Sequential Logic for Synthesis. FPGA Design Flow. Task 1 Part 1 Develop a VHDL description of a Debouncer specified below. The following diagram shows the interface of the Debouncer. The following

More information

Parallel Programming for FPGAs. Ryan Kastner and Stephen Neuendorffer

Parallel Programming for FPGAs. Ryan Kastner and Stephen Neuendorffer Parallel Programming for FPGAs Ryan Kastner and Stephen Neuendorffer Sunday 31 st July, 2016 When someone says, I want a programming language in which I need only say what I wish done, give him a lollipop.

More information

EE/CSCI 451 Midterm 1

EE/CSCI 451 Midterm 1 EE/CSCI 451 Midterm 1 Spring 2018 Instructor: Xuehai Qian Friday: 02/26/2018 Problem # Topic Points Score 1 Definitions 20 2 Memory System Performance 10 3 Cache Performance 10 4 Shared Memory Programming

More information

ECE 498AL. Lecture 12: Application Lessons. When the tires hit the road

ECE 498AL. Lecture 12: Application Lessons. When the tires hit the road ECE 498AL Lecture 12: Application Lessons When the tires hit the road 1 Objective Putting the CUDA performance knowledge to work Plausible strategies may or may not lead to performance enhancement Different

More information

Scan Algorithm Effects on Parallelism and Memory Conflicts

Scan Algorithm Effects on Parallelism and Memory Conflicts Scan Algorithm Effects on Parallelism and Memory Conflicts 11 Parallel Prefix Sum (Scan) Definition: The all-prefix-sums operation takes a binary associative operator with identity I, and an array of n

More information

Lab 1: CORDIC Design Due Friday, September 8, 2017, 11:59pm

Lab 1: CORDIC Design Due Friday, September 8, 2017, 11:59pm ECE5775 High-Level Digital Design Automation, Fall 2017 School of Electrical Computer Engineering, Cornell University Lab 1: CORDIC Design Due Friday, September 8, 2017, 11:59pm 1 Introduction COordinate

More information

In-order vs. Out-of-order Execution. In-order vs. Out-of-order Execution

In-order vs. Out-of-order Execution. In-order vs. Out-of-order Execution In-order vs. Out-of-order Execution In-order instruction execution instructions are fetched, executed & committed in compilergenerated order if one instruction stalls, all instructions behind it stall

More information

Hello I am Zheng, Hongbin today I will introduce our LLVM based HLS framework, which is built upon the Target Independent Code Generator.

Hello I am Zheng, Hongbin today I will introduce our LLVM based HLS framework, which is built upon the Target Independent Code Generator. Hello I am Zheng, Hongbin today I will introduce our LLVM based HLS framework, which is built upon the Target Independent Code Generator. 1 In this talk I will first briefly introduce what HLS is, and

More information

ESE532: System-on-a-Chip Architecture. Today. Message. Graph Cycles. Preclass 1. Reminder

ESE532: System-on-a-Chip Architecture. Today. Message. Graph Cycles. Preclass 1. Reminder ESE532: System-on-a-Chip Architecture Day 8: September 26, 2018 Spatial Computations Today Graph Cycles (from Day 7) Accelerator Pipelines FPGAs Zynq Computational Capacity 1 2 Message Custom accelerators

More information

VHDL for Synthesis. Course Description. Course Duration. Goals

VHDL for Synthesis. Course Description. Course Duration. Goals VHDL for Synthesis Course Description This course provides all necessary theoretical and practical know how to write an efficient synthesizable HDL code through VHDL standard language. The course goes

More information

Binary Adders. Ripple-Carry Adder

Binary Adders. Ripple-Carry Adder Ripple-Carry Adder Binary Adders x n y n x y x y c n FA c n - c 2 FA c FA c s n MSB position Longest delay (Critical-path delay): d c(n) = n d carry = 2n gate delays d s(n-) = (n-) d carry +d sum = 2n

More information

An Approach for Integrating Basic Retiming and Software Pipelining

An Approach for Integrating Basic Retiming and Software Pipelining An Approach for Integrating Basic Retiming and Software Pipelining Noureddine Chabini Department of Electrical and Computer Engineering Royal Military College of Canada PB 7000 Station Forces Kingston

More information

Intel HLS Compiler: Fast Design, Coding, and Hardware

Intel HLS Compiler: Fast Design, Coding, and Hardware white paper Intel HLS Compiler Intel HLS Compiler: Fast Design, Coding, and Hardware The Modern FPGA Workflow Authors Melissa Sussmann HLS Product Manager Intel Corporation Tom Hill OpenCL Product Manager

More information

EECS 583 Class 15 Register Allocation

EECS 583 Class 15 Register Allocation EECS 583 Class 15 Register Allocation University of Michigan November 2, 2011 Announcements + Reading Material Midterm exam: Monday, Nov 14?» Could also do Wednes Nov 9 (next week!) or Wednes Nov 16 (2

More information

CS 61C: Great Ideas in Computer Architecture. Multilevel Caches, Cache Questions

CS 61C: Great Ideas in Computer Architecture. Multilevel Caches, Cache Questions CS 61C: Great Ideas in Computer Architecture Multilevel Caches, Cache Questions Instructor: Alan Christopher 7/14/2014 Summer 2014 -- Lecture #12 1 Great Idea #3: Principle of Locality/ Memory Hierarchy

More information

Efficient Hardware Acceleration on SoC- FPGA using OpenCL

Efficient Hardware Acceleration on SoC- FPGA using OpenCL Efficient Hardware Acceleration on SoC- FPGA using OpenCL Advisor : Dr. Benjamin Carrion Schafer Susmitha Gogineni 30 th August 17 Presentation Overview 1.Objective & Motivation 2.Configurable SoC -FPGA

More information

FPGA for Software Engineers

FPGA for Software Engineers FPGA for Software Engineers Course Description This course closes the gap between hardware and software engineers by providing the software engineer all the necessary FPGA concepts and terms. The course

More information

Software Pipelining for Coarse-Grained Reconfigurable Instruction Set Processors

Software Pipelining for Coarse-Grained Reconfigurable Instruction Set Processors Software Pipelining for Coarse-Grained Reconfigurable Instruction Set Processors Francisco Barat, Murali Jayapala, Pieter Op de Beeck and Geert Deconinck K.U.Leuven, Belgium. {f-barat, j4murali}@ieee.org,

More information

Software Pipelining by Modulo Scheduling. Philip Sweany University of North Texas

Software Pipelining by Modulo Scheduling. Philip Sweany University of North Texas Software Pipelining by Modulo Scheduling Philip Sweany University of North Texas Overview Instruction-Level Parallelism Instruction Scheduling Opportunities for Loop Optimization Software Pipelining Modulo

More information

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 14 EE141

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 14 EE141 EECS 151/251A Fall 2017 Digital Design and Integrated Circuits Instructor: John Wawrzynek and Nicholas Weaver Lecture 14 EE141 Outline Parallelism EE141 2 Parallelism Parallelism is the act of doing more

More information

Scalable and Modularized RTL Compilation of Convolutional Neural Networks onto FPGA

Scalable and Modularized RTL Compilation of Convolutional Neural Networks onto FPGA Scalable and Modularized RTL Compilation of Convolutional Neural Networks onto FPGA Yufei Ma, Naveen Suda, Yu Cao, Jae-sun Seo, Sarma Vrudhula School of Electrical, Computer and Energy Engineering School

More information

Tutorial Outline. 9:00 am 10:00 am Pre-RTL Simulation Framework: Aladdin. 8:30 am 9:00 am! Introduction! 10:00 am 10:30 am! Break!

Tutorial Outline. 9:00 am 10:00 am Pre-RTL Simulation Framework: Aladdin. 8:30 am 9:00 am! Introduction! 10:00 am 10:30 am! Break! Tutorial Outline Time Topic! 8:30 am 9:00 am! Introduction! 9:00 am 10:00 am Pre-RTL Simulation Framework: Aladdin 10:00 am 10:30 am! Break! 10:30 am 11:00 am! Workload Characterization Tool: WIICA! 11:00

More information

EECS150 - Digital Design Lecture 09 - Parallelism

EECS150 - Digital Design Lecture 09 - Parallelism EECS150 - Digital Design Lecture 09 - Parallelism Feb 19, 2013 John Wawrzynek Spring 2013 EECS150 - Lec09-parallel Page 1 Parallelism Parallelism is the act of doing more than one thing at a time. Optimization

More information

MOJTABA MAHDAVI Mojtaba Mahdavi DSP Design Course, EIT Department, Lund University, Sweden

MOJTABA MAHDAVI Mojtaba Mahdavi DSP Design Course, EIT Department, Lund University, Sweden High Level Synthesis with Catapult MOJTABA MAHDAVI 1 Outline High Level Synthesis HLS Design Flow in Catapult Data Types Project Creation Design Setup Data Flow Analysis Resource Allocation Scheduling

More information

ECE 5775 (Fall 17) High-Level Digital Design Automation. Specialized Computing

ECE 5775 (Fall 17) High-Level Digital Design Automation. Specialized Computing ECE 5775 (Fall 7) High-Level Digital Design Automation Specialized Computing Announcements All students enrolled in CMS & Piazza Vivado HLS tutorial on Tuesday 8/29 Install an SSH client (mobaxterm or

More information

CS521 \ Notes for the Final Exam

CS521 \ Notes for the Final Exam CS521 \ Notes for final exam 1 Ariel Stolerman Asymptotic Notations: CS521 \ Notes for the Final Exam Notation Definition Limit Big-O ( ) Small-o ( ) Big- ( ) Small- ( ) Big- ( ) Notes: ( ) ( ) ( ) ( )

More information

TECH. 9. Code Scheduling for ILP-Processors. Levels of static scheduling. -Eligible Instructions are

TECH. 9. Code Scheduling for ILP-Processors. Levels of static scheduling. -Eligible Instructions are 9. Code Scheduling for ILP-Processors Typical layout of compiler: traditional, optimizing, pre-pass parallel, post-pass parallel {Software! compilers optimizing code for ILP-processors, including VLIW}

More information

Announcements. Midterm 2 next Thursday, 6-7:30pm, 277 Cory Review session on Tuesday, 6-7:30pm, 277 Cory Homework 8 due next Tuesday Labs: project

Announcements. Midterm 2 next Thursday, 6-7:30pm, 277 Cory Review session on Tuesday, 6-7:30pm, 277 Cory Homework 8 due next Tuesday Labs: project - Fall 2002 Lecture 20 Synthesis Sequential Logic Announcements Midterm 2 next Thursday, 6-7:30pm, 277 Cory Review session on Tuesday, 6-7:30pm, 277 Cory Homework 8 due next Tuesday Labs: project» Teams

More information

Page # Let the Compiler Do it Pros and Cons Pros. Exploiting ILP through Software Approaches. Cons. Perhaps a mixture of the two?

Page # Let the Compiler Do it Pros and Cons Pros. Exploiting ILP through Software Approaches. Cons. Perhaps a mixture of the two? Exploiting ILP through Software Approaches Venkatesh Akella EEC 270 Winter 2005 Based on Slides from Prof. Al. Davis @ cs.utah.edu Let the Compiler Do it Pros and Cons Pros No window size limitation, the

More information

Lecture 20: High-level Synthesis (1)

Lecture 20: High-level Synthesis (1) Lecture 20: High-level Synthesis (1) Slides courtesy of Deming Chen Some slides are from Prof. S. Levitan of U. of Pittsburgh Outline High-level synthesis introduction High-level synthesis operations Scheduling

More information

Midterm Exam ECE 448 Spring 2019 Wednesday, March 6 15 points

Midterm Exam ECE 448 Spring 2019 Wednesday, March 6 15 points Midterm Exam ECE 448 Spring 2019 Wednesday, March 6 15 points Instructions: Zip all your deliverables into an archive .zip and submit it through Blackboard no later than Wednesday, March 6,

More information

Lecture 9 Basic Parallelization

Lecture 9 Basic Parallelization Lecture 9 Basic Parallelization I. Introduction II. Data Dependence Analysis III. Loop Nests + Locality IV. Interprocedural Parallelization Chapter 11.1-11.1.4 CS243: Parallelization 1 Machine Learning

More information

Lecture 9 Basic Parallelization

Lecture 9 Basic Parallelization Lecture 9 Basic Parallelization I. Introduction II. Data Dependence Analysis III. Loop Nests + Locality IV. Interprocedural Parallelization Chapter 11.1-11.1.4 CS243: Parallelization 1 Machine Learning

More information

Material covered. Areas/Topics covered. Logistics. What to focus on. Areas/Topics covered 5/14/2015. COS 226 Final Exam Review Spring 2015

Material covered. Areas/Topics covered. Logistics. What to focus on. Areas/Topics covered 5/14/2015. COS 226 Final Exam Review Spring 2015 COS 226 Final Exam Review Spring 2015 Ananda Gunawardena (guna) guna@cs.princeton.edu guna@princeton.edu Material covered The exam willstressmaterial covered since the midterm, including the following

More information

Topic 14: Scheduling COS 320. Compiling Techniques. Princeton University Spring Lennart Beringer

Topic 14: Scheduling COS 320. Compiling Techniques. Princeton University Spring Lennart Beringer Topic 14: Scheduling COS 320 Compiling Techniques Princeton University Spring 2016 Lennart Beringer 1 The Back End Well, let s see Motivating example Starting point Motivating example Starting point Multiplication

More information

Computer Architecture and Engineering. CS152 Quiz #5. April 23rd, Professor Krste Asanovic. Name: Answer Key

Computer Architecture and Engineering. CS152 Quiz #5. April 23rd, Professor Krste Asanovic. Name: Answer Key Computer Architecture and Engineering CS152 Quiz #5 April 23rd, 2009 Professor Krste Asanovic Name: Answer Key Notes: This is a closed book, closed notes exam. 80 Minutes 8 Pages Not all questions are

More information

1-D Time-Domain Convolution. for (i=0; i < outputsize; i++) { y[i] = 0; for (j=0; j < kernelsize; j++) { y[i] += x[i - j] * h[j]; } }

1-D Time-Domain Convolution. for (i=0; i < outputsize; i++) { y[i] = 0; for (j=0; j < kernelsize; j++) { y[i] += x[i - j] * h[j]; } } Introduction: Convolution is a common operation in digital signal processing. In this project, you will be creating a custom circuit implemented on the Nallatech board that exploits a significant amount

More information

ECE 5730 Memory Systems

ECE 5730 Memory Systems ECE 5730 Memory Systems Spring 2009 Off-line Cache Content Management Lecture 7: 1 Quiz 4 on Tuesday Announcements Only covers today s lecture No office hours today Lecture 7: 2 Where We re Headed Off-line

More information

CS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. VLIW, Vector, and Multithreaded Machines

CS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. VLIW, Vector, and Multithreaded Machines CS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture VLIW, Vector, and Multithreaded Machines Assigned 3/24/2019 Problem Set #4 Due 4/5/2019 http://inst.eecs.berkeley.edu/~cs152/sp19

More information

A Lost Cycles Analysis for Performance Prediction using High-Level Synthesis

A Lost Cycles Analysis for Performance Prediction using High-Level Synthesis A Lost Cycles Analysis for Performance Prediction using High-Level Synthesis Bruno da Silva, Jan Lemeire, An Braeken, and Abdellah Touhafi Vrije Universiteit Brussel (VUB), INDI and ETRO department, Brussels,

More information

EE/CSCI 451: Parallel and Distributed Computation

EE/CSCI 451: Parallel and Distributed Computation EE/CSCI 451: Parallel and Distributed Computation Lecture #11 2/21/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Outline Midterm 1:

More information

Notes slides from before lecture. CSE 21, Winter 2017, Section A00. Lecture 4 Notes. Class URL:

Notes slides from before lecture. CSE 21, Winter 2017, Section A00. Lecture 4 Notes. Class URL: Notes slides from before lecture CSE 21, Winter 2017, Section A00 Lecture 4 Notes Class URL: http://vlsicad.ucsd.edu/courses/cse21-w17/ Notes slides from before lecture Notes January 23 (1) HW2 due tomorrow

More information

COVER SHEET: Problem#: Points

COVER SHEET: Problem#: Points EEL 4712 Midterm 3 Spring 2017 VERSION 1 Name: UFID: Sign here to give permission for your test to be returned in class, where others might see your score: IMPORTANT: Please be neat and write (or draw)

More information

ESE532: System-on-a-Chip Architecture. Today. Message. Clock Cycle BRAM

ESE532: System-on-a-Chip Architecture. Today. Message. Clock Cycle BRAM ESE532: System-on-a-Chip Architecture Day 20: April 3, 2017 Pipelining, Frequency, Dataflow Today What drives cycle times Pipelining in Vivado HLS C Avoiding bottlenecks feeding data in Vivado HLS C Penn

More information

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI. CSCI 402: Computer Architectures Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI 6.6 - End Today s Contents GPU Cluster and its network topology The Roofline performance

More information

ElasticFlow: A Complexity-Effective Approach for Pipelining Irregular Loop Nests

ElasticFlow: A Complexity-Effective Approach for Pipelining Irregular Loop Nests ElasticFlow: A Complexity-Effective Approach for Pipelining Irregular Loop Nests Mingxing Tan 1 2, Gai Liu 1, Ritchie Zhao 1, Steve Dai 1, Zhiru Zhang 1 1 Computer Systems Laboratory, Electrical and Computer

More information

RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC. Zoltan Baruch

RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC. Zoltan Baruch RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC Zoltan Baruch Computer Science Department, Technical University of Cluj-Napoca, 26-28, Bariţiu St., 3400 Cluj-Napoca,

More information

SDSoC: Session 1

SDSoC: Session 1 SDSoC: Session 1 ADAM@ADIUVOENGINEERING.COM What is SDSoC SDSoC is a system optimising compiler which allows us to optimise Zynq PS / PL Zynq MPSoC PS / PL MicroBlaze What does this mean? Following the

More information

ECE5775 High-Level Digital Design Automation, Fall 2018 School of Electrical Computer Engineering, Cornell University

ECE5775 High-Level Digital Design Automation, Fall 2018 School of Electrical Computer Engineering, Cornell University ECE5775 High-Level Digital Design Automation, Fall 2018 School of Electrical Computer Engineering, Cornell University Lab 4: Binarized Convolutional Neural Networks Due Wednesday, October 31, 2018, 11:59pm

More information

DSP Mapping, Coding, Optimization

DSP Mapping, Coding, Optimization DSP Mapping, Coding, Optimization On TMS320C6000 Family using CCS (Code Composer Studio) ver 3.3 Started with writing a simple C code in the class, from scratch Project called First, written for C6713

More information

Lecture 9: Case Study MIPS R4000 and Introduction to Advanced Pipelining Professor Randy H. Katz Computer Science 252 Spring 1996

Lecture 9: Case Study MIPS R4000 and Introduction to Advanced Pipelining Professor Randy H. Katz Computer Science 252 Spring 1996 Lecture 9: Case Study MIPS R4000 and Introduction to Advanced Pipelining Professor Randy H. Katz Computer Science 252 Spring 1996 RHK.SP96 1 Review: Evaluating Branch Alternatives Two part solution: Determine

More information

ESE532: System-on-a-Chip Architecture. Today. Message. Fixed Point. Operator Sizes. Observe. Fixed Point

ESE532: System-on-a-Chip Architecture. Today. Message. Fixed Point. Operator Sizes. Observe. Fixed Point ESE532: System-on-a-Chip Architecture Day 27: December 5, 2018 Representation and Precision Fixed Point Today Errors from Limited Precision Precision Analysis / Interval Arithmetic Floating Point If time

More information

Synthesis at different abstraction levels

Synthesis at different abstraction levels Synthesis at different abstraction levels System Level Synthesis Clustering. Communication synthesis. High-Level Synthesis Resource or time constrained scheduling Resource allocation. Binding Register-Transfer

More information

EECS2021. EECS2021 Computer Organization. EECS2021 Computer Organization. Morgan Kaufmann Publishers September 14, 2016

EECS2021. EECS2021 Computer Organization. EECS2021 Computer Organization. Morgan Kaufmann Publishers September 14, 2016 EECS2021 Computer Organization Fall 2015 The slides are based on the publisher slides and contribution from Profs Amir Asif and Peter Lian The slides will be modified, annotated, explained on the board,

More information

Specializing Hardware for Image Processing

Specializing Hardware for Image Processing Lecture 6: Specializing Hardware for Image Processing Visual Computing Systems So far, the discussion in this class has focused on generating efficient code for multi-core processors such as CPUs and GPUs.

More information

CS 152, Spring 2011 Section 10

CS 152, Spring 2011 Section 10 CS 152, Spring 2011 Section 10 Christopher Celio University of California, Berkeley Agenda Stuff (Quiz 4 Prep) http://3dimensionaljigsaw.wordpress.com/2008/06/18/physics-based-games-the-new-genre/ Intel

More information

Goals for Today. CS 133: Databases. Final Exam: Logistics. Why Use a DBMS? Brief overview of course. Course evaluations

Goals for Today. CS 133: Databases. Final Exam: Logistics. Why Use a DBMS? Brief overview of course. Course evaluations Goals for Today Brief overview of course CS 133: Databases Course evaluations Fall 2018 Lec 27 12/13 Course and Final Review Prof. Beth Trushkowsky More details about the Final Exam Practice exercises

More information

ECE 5775 (Fall 17) High-Level Digital Design Automation. Hardware-Software Co-Design

ECE 5775 (Fall 17) High-Level Digital Design Automation. Hardware-Software Co-Design ECE 5775 (Fall 17) High-Level Digital Design Automation Hardware-Software Co-Design Announcements Midterm graded You can view your exams during TA office hours (Fri/Wed 11am-noon, Rhodes 312) Second paper

More information

A Statically Scheduled Time- Division-Multiplexed Networkon-Chip for Real-Time Systems

A Statically Scheduled Time- Division-Multiplexed Networkon-Chip for Real-Time Systems A Statically Scheduled Time- Division-Multiplexed Networkon-Chip for Real-Time Systems Martin Schoeberl, Florian Brandner, Jens Sparsø, Evangelia Kasapaki Technical University of Denamrk 1 Real-Time Systems

More information

CSE 417 Practical Algorithms. (a.k.a. Algorithms & Computational Complexity)

CSE 417 Practical Algorithms. (a.k.a. Algorithms & Computational Complexity) CSE 417 Practical Algorithms (a.k.a. Algorithms & Computational Complexity) Outline for Today > Course Goals & Overview > Administrivia > Greedy Algorithms Why study algorithms? > Learn the history of

More information

Computer Science 246 Computer Architecture

Computer Science 246 Computer Architecture Computer Architecture Spring 2009 Harvard University Instructor: Prof. dbrooks@eecs.harvard.edu Compiler ILP Static ILP Overview Have discussed methods to extract ILP from hardware Why can t some of these

More information

Lecture 10: Static ILP Basics. Topics: loop unrolling, static branch prediction, VLIW (Sections )

Lecture 10: Static ILP Basics. Topics: loop unrolling, static branch prediction, VLIW (Sections ) Lecture 10: Static ILP Basics Topics: loop unrolling, static branch prediction, VLIW (Sections 4.1 4.4) 1 Static vs Dynamic Scheduling Arguments against dynamic scheduling: requires complex structures

More information

International Training Workshop on FPGA Design for Scientific Instrumentation and Computing November 2013

International Training Workshop on FPGA Design for Scientific Instrumentation and Computing November 2013 2499-20 International Training Workshop on FPGA Design for Scientific Instrumentation and Computing 11-22 November 2013 High-Level Synthesis: how to improve FPGA design productivity RINCON CALLE Fernando

More information

Verilog for High Performance

Verilog for High Performance Verilog for High Performance Course Description This course provides all necessary theoretical and practical know-how to write synthesizable HDL code through Verilog standard language. The course goes

More information

ECE 595Z Digital Systems Design Automation

ECE 595Z Digital Systems Design Automation ECE 595Z Digital Systems Design Automation Anand Raghunathan, raghunathan@purdue.edu How do you design chips with over 1 Billion transistors? Human designer capability grows far slower than Moore s law!

More information

ESE535: Electronic Design Automation. Today. LUT Mapping. Simplifying Structure. Preclass: Cover in 4-LUT? Preclass: Cover in 4-LUT?

ESE535: Electronic Design Automation. Today. LUT Mapping. Simplifying Structure. Preclass: Cover in 4-LUT? Preclass: Cover in 4-LUT? ESE55: Electronic Design Automation Day 7: February, 0 Clustering (LUT Mapping, Delay) Today How do we map to LUTs What happens when IO dominates Delay dominates Lessons for non-luts for delay-oriented

More information

CprE 488 Embedded Systems Design. Exam 2 Review

CprE 488 Embedded Systems Design. Exam 2 Review CprE 488 Embedded Systems Design Exam 2 Review Joseph Zambreno Electrical and Computer Engineering Iowa State University www.ece.iastate.edu/~zambreno rcl.ece.iastate.edu This is the end. My only friend,

More information

Lecture 1: CS/ECE 3810 Introduction

Lecture 1: CS/ECE 3810 Introduction Lecture 1: CS/ECE 3810 Introduction Today s topics: Why computer organization is important Logistics Modern trends 1 Why Computer Organization 2 Image credits: uber, extremetech, anandtech Why Computer

More information

High Level, high speed FPGA programming

High Level, high speed FPGA programming Opportunity: FPAs High Level, high speed FPA programming Wim Bohm, Bruce Draper, Ross Beveridge, Charlie Ross, Monica Chawathe Colorado State University Reconfigurable Computing technology High speed at

More information

Notes slides from before lecture. CSE 21, Winter 2017, Section A00. Lecture 10 Notes. Class URL:

Notes slides from before lecture. CSE 21, Winter 2017, Section A00. Lecture 10 Notes. Class URL: Notes slides from before lecture CSE 21, Winter 2017, Section A00 Lecture 10 Notes Class URL: http://vlsicad.ucsd.edu/courses/cse21-w17/ Notes slides from before lecture Notes February 13 (1) HW5 is due

More information

This material exempt per Department of Commerce license exception TSU. Improving Performance

This material exempt per Department of Commerce license exception TSU. Improving Performance This material exempt per Department of Commerce license exception TSU Performance Outline Adding Directives Latency Manipulating Loops Throughput Performance Bottleneck Summary Performance 13-2 Performance

More information

ECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 15 Very Long Instruction Word Machines

ECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 15 Very Long Instruction Word Machines ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 15 Very Long Instruction Word Machines Benjamin Lee Electrical and Computer Engineering Duke University www.duke.edu/~bcl15 www.duke.edu/~bcl15/class/class_ece252fall11.html

More information