Custom Sensor-Based Embedded Computing Systems. Frank Vahid

Size: px
Start display at page:

Download "Custom Sensor-Based Embedded Computing Systems. Frank Vahid"

Transcription

1 Custom Sensor-Based Embedded Computing Systems Frank Vahid Professor Dept. of Computer Science and Engineering University of California, Riverside Assoc. Director, Center for Embedded Computer Systems, UC Irvine eblocks project 2002-present, support provided by the National Science Foundation and Intel Ph.D. student: Susan Lysecky (2006, w Asst. Prof. at U. Arizona); several MS and undergrad students also

2 The Problem What do these problems all have in common? A small store owner with many employees are they in the storeroom, breakroom, or out back? A working adult with an ageing parent at home did she get out of bed today, is she moving around? A homeowner who sometimes forgets to close the garage at night Marines wishing to outfit a building to detect whether someone is inside or when someone was inside Frank Vahid, UC Riverside 2/32

3 The Problem What do these problems all have in common? Put motion and sound sensors throughout, small LEDs (lights) near cash register Put motion sensors around the house, monitor from the web or cell phone or even be tified if motion by certain time in the morning Install contact sensor and light sensor, and indicator next to the bed Place motion, heat, and sound sensors in rooms, halls, doorways Frank Vahid, UC Riverside 3/32

4 Why Can t We Just Do This? Widely usable Lego -like sensors don t exist today Costly, hard to use, plugged into wall... light sensor transmit AND contact switch receive LED But new techlogy makes Lego-like sensor blocks possible... Frank Vahid, UC Riverside 4/32

5 Shrinking Processor Size/Cost Enables New Solution Courtesy of Joe Kahn Make sensors smarter By adding processor+battery Today, tiny and cheap Becomes a "block" easily connected to other blocks Frank Vahid, UC Riverside 5/32

6 Shrinking Processor Size/Cost Enables New Solution eblocks Existing component view New "eblock" view Button / Light Sensor / Magnetic Contact Switch / / LED / Beeper / Electric Relay Frank Vahid, UC Riverside 6/32

7 eblocks Just connect blocks, and they work Button / / Beeper No programming kwledge, electronics kwledge Light Sensor / / LED Frank Vahid, UC Riverside 7/32

8 eblocks Add intermediate blocks that compute and maintain state Spatial programming more intuitive to n-cs people than temporal programming Button Light Sensor Button Toggle Beeper When A is AND OR B is Light Sensor Button Tripper LED Combine then the output is Prolong Frank Vahid, UC Riverside 8/32 LED

9 What's Hard (1) Finding right set of building blocks Too many Overwhelming (too much choice) 2-Input Logic Toggle Tripper Splitter Splitter Too few Overwhelming (too much configuration) Input Logic Prolong (short) Splitter When A is 2 AND OR B is then the output is Input Logic Prolong (long) 5: Splitter 6:... When A is AND OR Combine B is then the output is 2 Yes detector 2 No detector SuperBlock Frank Vahid, UC Riverside 9/32

10 What's Hard (2) Making the blocks understandable People NOT likely to read directions Those that do are unlikely to understand Performed extensive user testing (over 500 students, kids, and adults) over two years Example: Combine block A B Output Logic Block Logic Sentence Frank Vahid, UC Riverside 10/32 configurable DIP switch Combine When the input is Most success A B : A is, B is A is, B is A is, B is A is, B is Phrased truth table A B A is, B is A is, B is A is, B is A is, B is Combine the output should be The output should be when: out Phrased truth table embedded in sentence When the input is A B A B A B A B A B The output should be out Combine Colored truth table embedded in sentence When A is AND OR B is then the output is Combine

11 What's Hard (3) Batteries must last years, yet performance should appear continuous Blocks are off 99.9% of the time Developed theory to map eblock events to continuous time Developed custom CAD tool to automatically find the best block parameter settings out of the billions of possibilities (a) (b) (c) (d) f t f f < < error < < f t f f error time interpreted as Frank Vahid, UC Riverside 11/32

12 eblocks Example "Garage Open at Night" detector <10 minutes to build Light Sensor Detect night-time use Light Sensor block When A is AND OR B is then the output is Combine LED Need to indicate garage open at night use LED block Plug pieces together and the system is done! Magnetic Contact Switch Detect garage door open use Contact Switch block Use Combine block to combine light sensor and contact switch into one Frank Vahid, UC Riverside 12/32

13 Graphical Simulator User specifies and tests block design Java-based simulator User chooses between pallets Blocks added by dragging User is able to configure various blocks by clicking on switches Connections created by drawing lines between blocks Button When A is AND OR Light Sensor B is then the output is Combine Beeper Available eblocks Sensors Output Compute/Communications When Button Beeper A is AND B is OR then the output is Motion Combine Green/Red Sensor Light rst in Light Sensor Once Yes, Stays Yes Toggle Yes/No seconds Prolonger Hide this panel Advanced Mode Welcome to the eblocks Simulator! In this area, you ll find helpful hints on creating your own designs. Click and drag an eblock off of the Available eblocks panel to add it to your design. To connect two blocks, click and drag from an output port (colored circle) to an input port (gray circle). A connection can be destroyed by clicking on a connected port. To move a block around the workspace, click and drag its orange area. Blocks can be moved into the trash can to delete them. Green circles indicate that the port is sending a, red circles indicate that the port is sending a, yellow Circles indicate that the port is sending an error signal, and gray circles dete an input port. Frank Vahid, UC Riverside 13/32

14 Graphical Simulator User specifies and tests block design Java-based simulator User chooses between pallets Blocks added by dragging User is able to configure various blocks by clicking on switches Connections created by drawing lines between blocks User can create, experiment, test and configure design Button When A is AND OR Light Sensor B is then the output is Combine Beeper Welcome to the eblocks Simulator! In this area, you ll find helpful hints on creating your own designs. Available eblocks Sensors Output Compute/Communications When Button Beeper A is then the output is Motion Combine Green/Red Sensor Light rst in Light Sensor Once Yes, Stays Yes Toggle Yes/No seconds Prolonger Hide this panel Click and drag an eblock off of the Available eblocks panel to add it to your design. To connect two blocks, click and drag from an output port (colored circle) to an input port (gray circle). A connection can be destroyed by clicking on a connected port. To move a block around the workspace, click and drag its orange area. Blocks can be moved into the trash can to delete them. Green circles indicate that the port is sending a, red circles indicate that the port is sending a, yellow Circles indicate that the port is sending an error signal, and gray circles dete an input port. AND B is OR Advanced Mode Frank Vahid, UC Riverside 14/32

15 eblocks and Embedded Microprocessors Can greatly simplify coding Button / 1/0 Micro- Light Sensor processor Button Tripper / 1/0 Frank Vahid, UC Riverside 15/32

16 And w for something completely different... Warp processing 2001-present, supported by SRC, Intel, IBM, Freescale, Xilinx Ph.D. students Roman Lysecky Ph.D., June 2005, w Asst. Prof. at Univ. of Arizona Greg Stitt Ph.D. June 2007, w Asst. Prof. at Univ. of Florida, Gainseville Ann Gordon-Ross Ph.D. June 2007, w Asst. Prof. at Univ. of Florida, Gainseville Frank Vahid, UC Riverside 16/32

17 Circuits on FPGAs Can Sometimes Give Big Speedups C Code for FIR Filter for (i=0; i i < 128; i++) y y[i] += += c[i] c[i] * x[i] * x[i] Circuit for FIR Filter * * * * * * * * * * * * Processor Processor Processor FPGA 1000 s of instructions ~ 7 cycles Speedup > 100x Several thousand cycles Pipelined -- >500x Circuit parallelism/pipelining can yield big speedups Frank Vahid, UC Riverside 17/32

18 Dynamic Translation Motivated by commercial dynamic binary translation of early 2000s e.g., Transmeta Crusoe code morphing VLIW VLIW Binary Performance x86 Binary Binary Translation x86 VLIW FPGA Binary Binary Translation x86 VLIW FPGA Warp processing (Lysecky/Stitt/Vahid ): dynamically translate binary to circuits on FPGAs Frank Vahid, UC Riverside 18/32

19 Warp Processing Background 1 Initially, software binary loaded into instruction memory Profiler I Mem D$ Software Binary Mov reg3, 0 Mov reg4, 0 loop: Shl reg1, reg3, 1 Add reg5, reg2, reg1 Ld reg6, 0(reg5) Add reg4, reg4, reg6 Add reg3, reg3, 1 Beq reg3, 10, -5 Ret reg4 FPGA On-chip CAD Frank Vahid, UC Riverside 19/32

20 Warp Processing Background 2 Microprocessor executes instructions in software binary Profiler I Mem D$ Software Binary Mov reg3, 0 Mov reg4, 0 loop: Shl reg1, reg3, 1 Add reg5, reg2, reg1 Ld reg6, 0(reg5) Add reg4, reg4, reg6 Add reg3, reg3, 1 Beq reg3, 10, -5 Ret reg4 Time Energy FPGA On-chip CAD Frank Vahid, UC Riverside 20/32

21 Warp Processing Background 3 Profiler monitors instructions and detects critical regions in binary Profiler beqbeqbeqbeqbeqbeqbeqbeqbeqbeq addaddaddaddaddaddaddaddaddadd I Mem D$ Software Binary Mov reg3, 0 Mov reg4, 0 loop: Shl reg1, reg3, 1 Add reg5, reg2, reg1 Ld reg6, 0(reg5) Add reg4, reg4, reg6 Add reg3, reg3, 1 Beq reg3, 10, -5 Ret reg4 Time Energy FPGA On-chip CAD Critical Loop Detected Frank Vahid, UC Riverside 21/32

22 Warp Processing Background 4 On-chip CAD reads in critical region Profiler I Mem D$ Software Binary Mov reg3, 0 Mov reg4, 0 loop: Shl reg1, reg3, 1 Add reg5, reg2, reg1 Ld reg6, 0(reg5) Add reg4, reg4, reg6 Add reg3, reg3, 1 Beq reg3, 10, -5 Ret reg4 Time Energy FPGA On-chip CAD Frank Vahid, UC Riverside 22/32

23 Warp Processing Background 5 On-chip CAD decompiles critical region into control data flow graph (CDFG) FPGA Profiler I Mem D$ Dynamic On-chip CAD Part. Module (DPM) Decompilation surprisingly effective at recovering high-level program structures Stitt et al ICCAD 02, DAC 03, CODES/ISSS 05, ICCAD 05, FPGA 05, TODAES 06, TODAES 07 Software Binary Mov reg3, 0 Mov reg4, 0 loop: Shl reg1, reg3, 1 Add reg5, reg2, reg1 Ld reg6, 0(reg5) Add reg4, reg4, reg6 Add reg3, reg3, 1 Beq reg3, 10, -5 Ret reg4 reg3 := 0 reg4 := 0 Frank Vahid, UC Riverside 23/32 Time loop: reg4 := reg4 + mem[ reg2 + (reg3 << 1)] reg3 := reg3 + 1 if (reg3 < 10) goto loop ret reg4 Energy Recover loops, arrays, subroutines, etc. needed to synthesize good circuits

24 Warp Processing Background 6 On-chip CAD synthesizes decompiled CDFG to a custom (parallel) circuit Profiler I Mem D$ Software Binary Mov reg3, 0 Mov reg4, 0 loop: Shl reg1, reg3, 1 Add reg5, reg2, reg1 Ld reg6, 0(reg5) Add reg4, reg4, reg6 Add reg3, reg3, 1 Beq reg3, 10, -5 Ret reg4 Time Energy FPGA Dynamic On-chip CAD Part. Module (DPM) reg3 := 0 reg4 := loop: reg4 := reg4 + mem[ reg2 + (reg3 << 1)] reg3 := reg3 + 1 if (reg3 < 10) goto loop ret reg4 Frank Vahid, UC Riverside 24/

25 Warp Processing Background 7 On-chip CAD maps circuit onto FPGA Profiler I Mem D$ Software Binary Mov reg3, 0 Mov reg4, 0 loop: Shl reg1, reg3, 1 Add reg5, reg2, reg1 Ld reg6, 0(reg5) Add reg4, reg4, reg6 Add reg3, reg3, 1 Beq reg3, 10, -5 Ret reg4 Time Energy FPGA Dynamic On-chip CAD Part. Module (DPM) Lean place&route/fpga 10x faster CAD (Lysecky et al DAC 03, ISSS/CODES 03, DATE 04, DAC 04, DATE 05, FCCM 05, TODAES 06) Multi-core chips use 1 powerful core for CAD reg3 := 0 reg4 + := 0 + SM SM loop: reg4 := reg4 + mem[ reg2 CLB + (reg3 << 1)] reg3 := reg3 + 1 if SM (reg3 < 10) SM goto loop ret reg4 Frank Vahid, UC Riverside 25/32 SM SM CLB

26 Warp Processing Background 8 On-chip CAD replaces instructions in binary to use hardware, causing performance and energy to warp by an order of magnitude or more Profiler I Mem D$ Software Binary Mov reg3, 0 Mov reg4, 0 loop: Shl // instructions reg1, reg3, that 1 Add interact reg5, with reg2, FPGA reg1 Ld reg6, 0(reg5) Add reg4, reg4, reg6 Add reg3, reg3, 1 Beq reg3, 10, -5 Ret reg4 >10x speedups for some apps Time Time Energy Energy Software-only Warped FPGA Dynamic On-chip CAD Part. Module (DPM) SM reg3 := 0 reg4 + := 0 + SM SM loop: reg4 := reg4 + mem[ reg2 CLB + (reg3 << 1)] reg3 := reg3 + 1 if SM (reg3 < 10) SM goto loop SM CLB ret reg4 Frank Vahid, UC Riverside 26/

27 Challenge: Decompilation Requires aggressive decompilation to recover loops, arrays,..., from binaries Results: Competitive with synthesis from C Speedup FIR Filter Beamformer Viterbi Brev Url BITMNP01 IDCTRN01 High-level Binary-level PNTRCH01 Average Synthesis from C Code Synthesis after Decompiling Binary Example Cycles ClkFrq Time Area Cycles ClkFrq Time Area %Time Overhead %Area Overhead bit_correlator % 0% fir % 3% udiv % 0% prewitt % 58% mf % 0% moravec % -1% Avg: -1% 10% Frank Vahid, UC Riverside 27/32

28 Challenge: JIT Compile to FPGA Developed ultra-lean CAD heuristics for synthesis, placement, routing, and techlogy mapping; simultaneously developed CAD-oriented FPGA e.g., Our router (ROCR) 10x faster and 20x less memory, at cost of 30% longer critical path. Similar results for synth & placement (EDAA Outstanding Dissertation Award 2006) Xilinx ISE Riverside JIT FPGA tools 0.2 s 3.6MB 9.1 s 60 MB Riverside JIT FPGA tools on a 75MHz ARM7 1.4s 3.6MB Frank Vahid, UC Riverside 28/32

29 Experiments Benchmarks: Image processing, DSP, scientific computing Highly parallel examples to illustrate thread warping potential We created multithreaded versions Base architecture 4 ARM cores Focus on recurring applications (embedded) TW: FPGA running at whatever frequency determined by synthesis Multi-core 4 ARM MHz Thread Warping 4 ARM MHz + FPGA (synth freq) Compared to FPGA On-chip CAD Frank Vahid, UC Riverside 29/32

30 Speedup from Thread Warping Fir Prewitt Linear Moravec Wavelet Maxfilter 3DTrans N-body Avg. Geo. Mean 38 4-uP TW 8-uP 16-uP 32-uP 64-uP Average 130x speedup But, FPGA uses additional area 11x faster than 64-core system Simulation pessimistic, actual results likely better So we also compare to systems with 8 to 64 ARM11 ups FPGA size = ~36 ARM11s Frank Vahid, UC Riverside 30/32

31 Dynamic Enables Expandable Logic Concept RAM RAM Expandable RAM Logic System Warp tools detect detects amount RAM of FPGA, during invisibly start, adapt improves application performance to use less/more invisiblyhardware. DMA Cache Cache Profiler FPGA FPGA Warp Tools FPGA FPGA Expandable RAM up Expandable Logic Performance Frank Vahid, UC Riverside 31/32 Speedup Softw are 1 FPGA 2 FPGAs 3 FPGAs 4 FPGAs N-Body 3DTrans Prew itt Wavelet

32 Virtual Immersion eblocks: Enables customized sensor-based system design by n-experts May lead to...? Wood and nails of the sensor world Currently working with hearingimpaired, aging Warp processing (featured in this month s IEEE Computer) Enables large speedups on certain applications (e.g., image processing), user can expand hardware without changing software Frank Vahid, UC Riverside 32/32

Software consists of bits downloaded into a

Software consists of bits downloaded into a C O V E R F E A T U R E Warp Processing: Dynamic Translation of Binaries to FPGA Circuits Frank Vahid, University of California, Riverside Greg Stitt, University of Florida Roman Lysecky, University of

More information

Warp Processors (a.k.a. Self-Improving Configurable IC Platforms)

Warp Processors (a.k.a. Self-Improving Configurable IC Platforms) (a.k.a. Self-Improving Configurable IC Platforms) Frank Vahid Department of Computer Science and Engineering University of California, Riverside Faculty member, Center for Embedded Computer Systems, UC

More information

A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning

A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning By: Roman Lysecky and Frank Vahid Presented By: Anton Kiriwas Disclaimer This specific

More information

Introduction Warp Processors Dynamic HW/SW Partitioning. Introduction Standard binary - Separating Function and Architecture

Introduction Warp Processors Dynamic HW/SW Partitioning. Introduction Standard binary - Separating Function and Architecture Roman Lysecky Department of Electrical and Computer Engineering University of Arizona Dynamic HW/SW Partitioning Initially execute application in software only 5 Partitioned application executes faster

More information

Techniques for Synthesizing Binaries to an Advanced Register/Memory Structure

Techniques for Synthesizing Binaries to an Advanced Register/Memory Structure Techniques for Synthesizing Binaries to an Advanced Register/Memory Structure Greg Stitt, Zhi Guo, Frank Vahid*, Walid Najjar Department of Computer Science and Engineering University of California, Riverside

More information

[Processor Architectures] [Computer Systems Organization]

[Processor Architectures] [Computer Systems Organization] Warp Processors ROMAN LYSECKY University of Arizona and GREG STITT AND FRANK VAHID* University of California, Riverside *Also with the Center for Embedded Computer Systems at UC Irvine We describe a new

More information

A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning

A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning Roman Lysecky, Frank Vahid To cite this version: Roman Lysecky, Frank Vahid. A Study

More information

Abstract. 1 Introduction. Reconfigurable Logic and Hardware Software Codesign. Class EEC282 Author Marty Nicholes Date 12/06/2003

Abstract. 1 Introduction. Reconfigurable Logic and Hardware Software Codesign. Class EEC282 Author Marty Nicholes Date 12/06/2003 Title Reconfigurable Logic and Hardware Software Codesign Class EEC282 Author Marty Nicholes Date 12/06/2003 Abstract. This is a review paper covering various aspects of reconfigurable logic. The focus

More information

A Codesigned On-Chip Logic Minimizer

A Codesigned On-Chip Logic Minimizer A Codesigned On-Chip Logic Minimizer Roman Lysecky, Frank Vahid* Department of Computer Science and ngineering University of California, Riverside {rlysecky, vahid}@cs.ucr.edu *Also with the Center for

More information

EE382V: System-on-a-Chip (SoC) Design

EE382V: System-on-a-Chip (SoC) Design EE382V: System-on-a-Chip (SoC) Design Lecture 10 Task Partitioning Sources: Prof. Margarida Jacome, UT Austin Prof. Lothar Thiele, ETH Zürich Andreas Gerstlauer Electrical and Computer Engineering University

More information

Automatic Tuning of Two-Level Caches to Embedded Applications

Automatic Tuning of Two-Level Caches to Embedded Applications Automatic Tuning of Two-Level Caches to Embedded Applications Ann Gordon-Ross, Frank Vahid* Department of Computer Science and Engineering University of California, Riverside {ann/vahid}@cs.ucr.edu http://www.cs.ucr.edu/~vahid

More information

Don t Forget Memories A Case Study Redesigning a Pattern Counting ASIC Circuit for FPGAs

Don t Forget Memories A Case Study Redesigning a Pattern Counting ASIC Circuit for FPGAs Don t Forget Memories A Case Study Redesigning a Pattern Counting ASIC Circuit for FPGAs David Sheldon Department of Computer Science and Engineering, UC Riverside dsheldon@cs.ucr.edu Frank Vahid Department

More information

LABORATORY # 6 * L A B M A N U A L. Datapath Components - Adders

LABORATORY # 6 * L A B M A N U A L. Datapath Components - Adders Department of Electrical Engineering University of California Riverside Laboratory #6 EE 120 A LABORATORY # 6 * L A B M A N U A L Datapath Components - Adders * EE and CE students must attempt also to

More information

IA-64, P4 HT and Crusoe Architectures Ch 15

IA-64, P4 HT and Crusoe Architectures Ch 15 IA-64, P4 HT and Crusoe Architectures Ch 15 IA-64 General Organization Predication, Speculation Software Pipelining Example: Itanium Pentium 4 HT Crusoe General Architecture Emulated Precise Exceptions

More information

Advanced processor designs

Advanced processor designs Advanced processor designs We ve only scratched the surface of CPU design. Today we ll briefly introduce some of the big ideas and big words behind modern processors by looking at two example CPUs. The

More information

PARLGRAN: Parallelism granularity selection for scheduling task chains on dynamically reconfigurable architectures *

PARLGRAN: Parallelism granularity selection for scheduling task chains on dynamically reconfigurable architectures * PARLGRAN: Parallelism granularity selection for scheduling task chains on dynamically reconfigurable architectures * Sudarshan Banerjee, Elaheh Bozorgzadeh, Nikil Dutt Center for Embedded Computer Systems

More information

Computer Systems. Binary Representation. Binary Representation. Logical Computation: Boolean Algebra

Computer Systems. Binary Representation. Binary Representation. Logical Computation: Boolean Algebra Binary Representation Computer Systems Information is represented as a sequence of binary digits: Bits What the actual bits represent depends on the context: Seminar 3 Numerical value (integer, floating

More information

Multithreading: Exploiting Thread-Level Parallelism within a Processor

Multithreading: Exploiting Thread-Level Parallelism within a Processor Multithreading: Exploiting Thread-Level Parallelism within a Processor Instruction-Level Parallelism (ILP): What we ve seen so far Wrap-up on multiple issue machines Beyond ILP Multithreading Advanced

More information

Computer Architecture Dr. Charles Kim Howard University

Computer Architecture Dr. Charles Kim Howard University EECE416 Microcomputer Fundamentals Computer Architecture Dr. Charles Kim Howard University 1 Computer Architecture Computer Architecture Art of selecting and interconnecting hardware components to create

More information

Scalability and Parallel Execution of Warp Processing - Dynamic Hardware/Software Partitioning

Scalability and Parallel Execution of Warp Processing - Dynamic Hardware/Software Partitioning Scalability and Parallel Execution of Warp Processing - Dynamic Hardware/Software Partitioning Roman Lysecky Department of Electrical and Computer Engineering University of Arizona rlysecky@ece.arizona.edu

More information

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Hardware Design Environments Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Outline Welcome to COE 405 Digital System Design Design Domains and Levels of Abstractions Synthesis

More information

Overview. CSE372 Digital Systems Organization and Design Lab. Hardware CAD. Two Types of Chips

Overview. CSE372 Digital Systems Organization and Design Lab. Hardware CAD. Two Types of Chips Overview CSE372 Digital Systems Organization and Design Lab Prof. Milo Martin Unit 5: Hardware Synthesis CAD (Computer Aided Design) Use computers to design computers Virtuous cycle Architectural-level,

More information

A First-step Towards an Architecture Tuning Methodology for Low Power

A First-step Towards an Architecture Tuning Methodology for Low Power A First-step Towards an Architecture Tuning Methodology for Low Power Greg Stitt, Frank Vahid*, Tony Givargis Department of Computer Science and Engineering University of California, Riverside {gstitt,

More information

COE 561 Digital System Design & Synthesis Introduction

COE 561 Digital System Design & Synthesis Introduction 1 COE 561 Digital System Design & Synthesis Introduction Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum & Minerals Outline Course Topics Microelectronics Design

More information

Improving System. Performance: Caches

Improving System. Performance: Caches Improving System Performance: Caches December 04 CSC201 Section 002 Fall, 2000 A Motivating Example Application: making a (mechanical) clock dozens of tools and pages of instructions, hundreds of parts

More information

RISC Architecture Ch 12

RISC Architecture Ch 12 RISC Architecture Ch 12 Some History Instruction Usage Characteristics Large Register Files Register Allocation Optimization RISC vs. CISC 18 Original Ideas Behind CISC (Complex Instruction Set Comp.)

More information

Towards Optimal Custom Instruction Processors

Towards Optimal Custom Instruction Processors Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT CHIPS 18 Overview 1. background: extensible processors

More information

Using FPGAs as Microservices

Using FPGAs as Microservices Using FPGAs as Microservices David Ojika, Ann Gordon-Ross, Herman Lam, Bhavesh Patel, Gaurav Kaul, Jayson Strayer (University of Florida, DELL EMC, Intel Corporation) The 9 th Workshop on Big Data Benchmarks,

More information

ECE 485/585 Microprocessor System Design

ECE 485/585 Microprocessor System Design Microprocessor System Design Lecture 8: Principle of Locality Cache Architecture Cache Replacement Policies Zeshan Chishti Electrical and Computer Engineering Dept Maseeh College of Engineering and Computer

More information

CS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. VLIW, Vector, and Multithreaded Machines

CS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. VLIW, Vector, and Multithreaded Machines CS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture VLIW, Vector, and Multithreaded Machines Assigned 3/24/2019 Problem Set #4 Due 4/5/2019 http://inst.eecs.berkeley.edu/~cs152/sp19

More information

Scalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA

Scalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA Scalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA Yun R. Qu, Viktor K. Prasanna Ming Hsieh Dept. of Electrical Engineering University of Southern California Los Angeles, CA 90089

More information

Keywords and Review Questions

Keywords and Review Questions Keywords and Review Questions lec1: Keywords: ISA, Moore s Law Q1. Who are the people credited for inventing transistor? Q2. In which year IC was invented and who was the inventor? Q3. What is ISA? Explain

More information

An FPGA Project for use in a Digital Logic Course

An FPGA Project for use in a Digital Logic Course Session 3226 An FPGA Project for use in a Digital Logic Course Daniel C. Gray, Thomas D. Wagner United States Military Academy Abstract The Digital Computer Logic Course offered at the United States Military

More information

Parallelizing FPGA Technology Mapping using GPUs. Doris Chen Deshanand Singh Aug 31 st, 2010

Parallelizing FPGA Technology Mapping using GPUs. Doris Chen Deshanand Singh Aug 31 st, 2010 Parallelizing FPGA Technology Mapping using GPUs Doris Chen Deshanand Singh Aug 31 st, 2010 Motivation: Compile Time In last 12 years: 110x increase in FPGA Logic, 23x increase in CPU speed, 4.8x gap Question:

More information

Computer Architecture Dr. Charles Kim Howard University

Computer Architecture Dr. Charles Kim Howard University EECE416 Microcomputer Fundamentals & Design Computer Architecture Dr. Charles Kim Howard University 1 Computer Architecture Computer Architecture Art of selecting and interconnecting hardware components

More information

Microarchitecture Overview. Performance

Microarchitecture Overview. Performance Microarchitecture Overview Prof. Scott Rixner Duncan Hall 3028 rixner@rice.edu January 15, 2007 Performance 4 Make operations faster Process improvements Circuit improvements Use more transistors to make

More information

The QR code here provides a shortcut to go to the course webpage.

The QR code here provides a shortcut to go to the course webpage. Welcome to this MSc Lab Experiment. All my teaching materials for this Lab-based module are also available on the webpage: www.ee.ic.ac.uk/pcheung/teaching/msc_experiment/ The QR code here provides a shortcut

More information

Frank Vahid Digital Design Second Edition Solution

Frank Vahid Digital Design Second Edition Solution FRANK VAHID DIGITAL DESIGN SECOND EDITION SOLUTION PDF - Are you looking for frank vahid digital design second edition solution Books? Now, you will be happy that at this time frank vahid digital design

More information

UG3 Compiling Techniques Overview of the Course

UG3 Compiling Techniques Overview of the Course UG3 Compiling Techniques Overview of the Course Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit permission

More information

CSC 170 Introduction to Computers and Their Applications. Computers

CSC 170 Introduction to Computers and Their Applications. Computers CSC 170 Introduction to Computers and Their Applications Lecture #4 Digital Devices Computers At its core, a computer is a multipurpose device that accepts input, processes data, stores data, and produces

More information

Mobile Processors. Jose R. Ortiz Ubarri

Mobile Processors. Jose R. Ortiz Ubarri Mobile Processors Jose R. Ortiz Ubarri Electrical and Computer Engineering Department University of Puerto Rico, Mayagüez Campus Mayagüez, Puerto Rico 00681 5000 Jose.Ortiz@hpcf.upr.edu Introduction While

More information

ECE 2300 Digital Logic & Computer Organization. Caches

ECE 2300 Digital Logic & Computer Organization. Caches ECE 23 Digital Logic & Computer Organization Spring 217 s Lecture 2: 1 Announcements HW7 will be posted tonight Lab sessions resume next week Lecture 2: 2 Course Content Binary numbers and logic gates

More information

Lecture 12. Motivation. Designing for Low Power: Approaches. Architectures for Low Power: Transmeta s Crusoe Processor

Lecture 12. Motivation. Designing for Low Power: Approaches. Architectures for Low Power: Transmeta s Crusoe Processor Lecture 12 Architectures for Low Power: Transmeta s Crusoe Processor Motivation Exponential performance increase at a low cost However, for some application areas low power consumption is more important

More information

Using a Victim Buffer in an Application-Specific Memory Hierarchy

Using a Victim Buffer in an Application-Specific Memory Hierarchy Using a Victim Buffer in an Application-Specific Memory Hierarchy Chuanjun Zhang Depment of lectrical ngineering University of California, Riverside czhang@ee.ucr.edu Frank Vahid Depment of Computer Science

More information

What is This Course About? CS 356 Unit 0. Today's Digital Environment. Why is System Knowledge Important?

What is This Course About? CS 356 Unit 0. Today's Digital Environment. Why is System Knowledge Important? 0.1 What is This Course About? 0.2 CS 356 Unit 0 Class Introduction Basic Hardware Organization Introduction to Computer Systems a.k.a. Computer Organization or Architecture Filling in the "systems" details

More information

UNIVERSITY OF CALIFORNIA RIVERSIDE. External Interfaces and Software Tools for Electronic Blocks

UNIVERSITY OF CALIFORNIA RIVERSIDE. External Interfaces and Software Tools for Electronic Blocks UNIVERSITY OF CALIFORNIA RIVERSIDE External Interfaces and Software Tools for Electronic Blocks A Thesis submitted in partial satisfaction of the requirements for the degree of Master of Science in Computer

More information

FPGA: What? Why? Marco D. Santambrogio

FPGA: What? Why? Marco D. Santambrogio FPGA: What? Why? Marco D. Santambrogio marco.santambrogio@polimi.it 2 Reconfigurable Hardware Reconfigurable computing is intended to fill the gap between hardware and software, achieving potentially much

More information

Accelerating DSP Applications in Embedded Systems with a Coprocessor Data-Path

Accelerating DSP Applications in Embedded Systems with a Coprocessor Data-Path Accelerating DSP Applications in Embedded Systems with a Coprocessor Data-Path Michalis D. Galanis, Gregory Dimitroulakos, and Costas E. Goutis VLSI Design Laboratory, Electrical and Computer Engineering

More information

Introduction to Computer Engineering (E114)

Introduction to Computer Engineering (E114) Introduction to Computer Engineering (E114) Lab 1: Full Adder Introduction In this lab you will design a simple digital circuit called a full adder. You will then use logic gates to draw a schematic for

More information

INTRODUCING THE CODEBIT!

INTRODUCING THE CODEBIT! GETTING STARTED Downloading the littlebits Code Kit app STEP 1 Download and open the littlebits Code Kit app at littlebits.com/code-kit-app STEP 2 Click the pink open blank canvas button to start writing

More information

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology 1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823

More information

CSE 141: Computer Architecture. Professor: Michael Taylor. UCSD Department of Computer Science & Engineering

CSE 141: Computer Architecture. Professor: Michael Taylor. UCSD Department of Computer Science & Engineering CSE 141: Computer 0 Architecture Professor: Michael Taylor RF UCSD Department of Computer Science & Engineering Computer Architecture from 10,000 feet foo(int x) {.. } Class of application Physics Computer

More information

Outline Marquette University

Outline Marquette University COEN-4710 Computer Hardware Lecture 1 Computer Abstractions and Technology (Ch.1) Cristinel Ababei Department of Electrical and Computer Engineering Credits: Slides adapted primarily from presentations

More information

Binary Synthesis. GREG STITT and FRANK VAHID University of California, Riverside

Binary Synthesis. GREG STITT and FRANK VAHID University of California, Riverside Binary Synthesis GREG STITT and FRANK VAHID University of California, Riverside Recent high-level synthesis approaches and C-based hardware description languages attempt to improve the hardware design

More information

CSE P567 - Winter 2010 Lab 1 Introduction to FGPA CAD Tools

CSE P567 - Winter 2010 Lab 1 Introduction to FGPA CAD Tools CSE P567 - Winter 2010 Lab 1 Introduction to FGPA CAD Tools This is a tutorial introduction to the process of designing circuits using a set of modern design tools. While the tools we will be using (Altera

More information

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2017 Lecture 15

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2017 Lecture 15 CS24: INTRODUCTION TO COMPUTING SYSTEMS Spring 2017 Lecture 15 LAST TIME: CACHE ORGANIZATION Caches have several important parameters B = 2 b bytes to store the block in each cache line S = 2 s cache sets

More information

Microprocessors, Lecture 1: Introduction to Microprocessors

Microprocessors, Lecture 1: Introduction to Microprocessors Microprocessors, Lecture 1: Introduction to Microprocessors Computing Systems General-purpose standalone systems (سيستم ھای نھفته ( systems Embedded 2 General-purpose standalone systems Stand-alone computer

More information

Cache Aware Optimization of Stream Programs

Cache Aware Optimization of Stream Programs Cache Aware Optimization of Stream Programs Janis Sermulins, William Thies, Rodric Rabbah and Saman Amarasinghe LCTES Chicago, June 2005 Streaming Computing Is Everywhere! Prevalent computing domain with

More information

New development within the FPGA area with focus on soft processors

New development within the FPGA area with focus on soft processors New development within the FPGA area with focus on soft processors Jonathan Blom The University of Mälardalen Langenbergsgatan 6 d SE- 722 17 Västerås +46 707 32 52 78 jbm05005@student.mdh.se Peter Nilsson

More information

Microarchitecture Overview. Performance

Microarchitecture Overview. Performance Microarchitecture Overview Prof. Scott Rixner Duncan Hall 3028 rixner@rice.edu January 18, 2005 Performance 4 Make operations faster Process improvements Circuit improvements Use more transistors to make

More information

Spiral 2-8. Cell Layout

Spiral 2-8. Cell Layout 2-8.1 Spiral 2-8 Cell Layout 2-8.2 Learning Outcomes I understand how a digital circuit is composed of layers of materials forming transistors and wires I understand how each layer is expressed as geometric

More information

Configurable and Extensible Processors Change System Design. Ricardo E. Gonzalez Tensilica, Inc.

Configurable and Extensible Processors Change System Design. Ricardo E. Gonzalez Tensilica, Inc. Configurable and Extensible Processors Change System Design Ricardo E. Gonzalez Tensilica, Inc. Presentation Overview Yet Another Processor? No, a new way of building systems Puts system designers in the

More information

CS61C : Machine Structures

CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures CS61C L22 Caches II (1) CPS today! Lecture #22 Caches II 2005-11-16 There is one handout today at the front and back of the room! Lecturer PSOE,

More information

Smart Security at Every Corner of Your Home

Smart Security at Every Corner of Your Home Spotlight Cam Smart Security at Every Corner of Your Home Your new Spotlight Cam lets you extend the Ring of Security around your entire property. Now, you ll always be the first to know when someone s

More information

1. Charging. 2. In-app Setup. 3. Physical Installation. 4. Features. 5. Troubleshooting

1. Charging. 2. In-app Setup. 3. Physical Installation. 4. Features. 5. Troubleshooting Spotlight Cam Smart Security at Every Corner of Your Home Your new Spotlight Cam lets you extend the Ring of Security around your entire property. Now, you ll always be the first to know when someone s

More information

VHDL-MODELING OF A GAS LASER S GAS DISCHARGE CIRCUIT Nataliya Golian, Vera Golian, Olga Kalynychenko

VHDL-MODELING OF A GAS LASER S GAS DISCHARGE CIRCUIT Nataliya Golian, Vera Golian, Olga Kalynychenko 136 VHDL-MODELING OF A GAS LASER S GAS DISCHARGE CIRCUIT Nataliya Golian, Vera Golian, Olga Kalynychenko Abstract: Usage of modeling for construction of laser installations today is actual in connection

More information

Exact and Approximate Task Assignment Algorithms for Pipelined Software Synthesis

Exact and Approximate Task Assignment Algorithms for Pipelined Software Synthesis Exact and Approximate Task Assignment Algorithms for Pipelined Software Synthesis Matin Hashemi Soheil Ghiasi Laboratory for Embedded and Programmable Systems http://leps.ece.ucdavis.edu Department of

More information

An Architectural Framework for Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware

An Architectural Framework for Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware An Architectural Framework for Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware Tao Chen, Shreesha Srinath Christopher Batten, G. Edward Suh Computer Systems Laboratory School of Electrical

More information

Advances and Future Challenges in Binary Translation and Optimization

Advances and Future Challenges in Binary Translation and Optimization Advances and Future Challenges in Binary Translation and Optimization ERIK R. ALTMAN, KEMAL EBCIOG LU, MICHAEL GSCHWIND, SENIOR MEMBER, IEEE, AND SUMEDH SATHAYE Presented by Holly Ferguson 4-9-2012 Can

More information

ANADOLU UNIVERSITY DEPARTMENT OF ELECTRICAL AND ELECTRONICS ENGINEERING. EEM Digital Systems II

ANADOLU UNIVERSITY DEPARTMENT OF ELECTRICAL AND ELECTRONICS ENGINEERING. EEM Digital Systems II ANADOLU UNIVERSITY DEPARTMENT OF ELECTRICAL AND ELECTRONICS ENGINEERING EEM 334 - Digital Systems II LAB 1 - INTRODUCTION TO XILINX ISE SOFTWARE AND FPGA 1. PURPOSE In this lab, after you learn to use

More information

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141 EECS151/251A Spring 2018 Digital Design and Integrated Circuits Instructors: John Wawrzynek and Nick Weaver Lecture 19: Caches Cache Introduction 40% of this ARM CPU is devoted to SRAM cache. But the role

More information

Multi-Level Cache Hierarchy Evaluation for Programmable Media Processors. Overview

Multi-Level Cache Hierarchy Evaluation for Programmable Media Processors. Overview Multi-Level Cache Hierarchy Evaluation for Programmable Media Processors Jason Fritts Assistant Professor Department of Computer Science Co-Author: Prof. Wayne Wolf Overview Why Programmable Media Processors?

More information

Comparing Memory Systems for Chip Multiprocessors

Comparing Memory Systems for Chip Multiprocessors Comparing Memory Systems for Chip Multiprocessors Jacob Leverich Hideho Arakida, Alex Solomatnikov, Amin Firoozshahian, Mark Horowitz, Christos Kozyrakis Computer Systems Laboratory Stanford University

More information

When Girls Design CPUs!

When Girls Design CPUs! When Girls Design CPUs! An overview on one of the world s most famous CPU cores: ARM 1 Once Upon a Time There was a company in UK Acorn This company was the competitor to IBM Apple They were creating personal

More information

A Configurable Logic Architecture for Dynamic Hardware/Software Partitioning Abstract Keywords 1. Introduction

A Configurable Logic Architecture for Dynamic Hardware/Software Partitioning Abstract Keywords 1. Introduction A Configurable Logic Architecture for Dynamic Hardware/Software artitioning Roman Lysecky, Frank Vahid* Department of Computer Science and ngineering University of California, Riverside {rlysecky, vahid}@cs.ucr.edu

More information

Cache Configuration Exploration on Prototyping Platforms

Cache Configuration Exploration on Prototyping Platforms Cache Configuration Exploration on Prototyping Platforms Chuanjun Zhang Department of Electrical Engineering University of California, Riverside czhang@ee.ucr.edu Frank Vahid Department of Computer Science

More information

High-Level Synthesis Techniques for In-Circuit Assertion-Based Verification

High-Level Synthesis Techniques for In-Circuit Assertion-Based Verification High-Level Synthesis Techniques for In-Circuit Assertion-Based Verification John Curreri Ph.D. Candidate of ECE, University of Florida Dr. Greg Stitt Assistant Professor of ECE, University of Florida April

More information

The personal computer system uses the following hardware device types -

The personal computer system uses the following hardware device types - EIT, Author Gay Robertson, 2016 The personal computer system uses the following hardware device types - Input devices Input devices Processing devices Storage devices Processing Cycle Processing devices

More information

Parallelism and Concurrency. COS 326 David Walker Princeton University

Parallelism and Concurrency. COS 326 David Walker Princeton University Parallelism and Concurrency COS 326 David Walker Princeton University Parallelism What is it? Today's technology trends. How can we take advantage of it? Why is it so much harder to program? Some preliminary

More information

Random-Access Memory (RAM) Systemprogrammering 2007 Föreläsning 4 Virtual Memory. Locality. The CPU-Memory Gap. Topics

Random-Access Memory (RAM) Systemprogrammering 2007 Föreläsning 4 Virtual Memory. Locality. The CPU-Memory Gap. Topics Systemprogrammering 27 Föreläsning 4 Topics The memory hierarchy Motivations for VM Address translation Accelerating translation with TLBs Random-Access (RAM) Key features RAM is packaged as a chip. Basic

More information

UC Berkeley CS61C : Machine Structures

UC Berkeley CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c Review! UC Berkeley CS61C : Machine Structures Lecture 28 Intra-machine Parallelism Parallelism is necessary for performance! It looks like itʼs It is the future of computing!

More information

A Review on Cache Memory with Multiprocessor System

A Review on Cache Memory with Multiprocessor System A Review on Cache Memory with Multiprocessor System Chirag R. Patel 1, Rajesh H. Davda 2 1,2 Computer Engineering Department, C. U. Shah College of Engineering & Technology, Wadhwan (Gujarat) Abstract

More information

The Xilinx XC6200 chip, the software tools and the board development tools

The Xilinx XC6200 chip, the software tools and the board development tools The Xilinx XC6200 chip, the software tools and the board development tools What is an FPGA? Field Programmable Gate Array Fully programmable alternative to a customized chip Used to implement functions

More information

Performance! (1/latency)! 1000! 100! 10! Capacity Access Time Cost. CPU Registers 100s Bytes <10s ns. Cache K Bytes ns 1-0.

Performance! (1/latency)! 1000! 100! 10! Capacity Access Time Cost. CPU Registers 100s Bytes <10s ns. Cache K Bytes ns 1-0. Since 1980, CPU has outpaced DRAM... EEL 5764: Graduate Computer Architecture Appendix C Hierarchy Review Ann Gordon-Ross Electrical and Computer Engineering University of Florida http://www.ann.ece.ufl.edu/

More information

Workspace for '4-FPGA' Page 1 (row 1, column 1)

Workspace for '4-FPGA' Page 1 (row 1, column 1) Workspace for '4-FPGA' Page 1 (row 1, column 1) Workspace for '4-FPGA' Page 2 (row 2, column 1) Workspace for '4-FPGA' Page 3 (row 3, column 1) ECEN 449 Microprocessor System Design FPGAs and Reconfigurable

More information

Performance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference

Performance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference The 2017 IEEE International Symposium on Workload Characterization Performance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference Shin-Ying Lee

More information

Virtual Machines and Dynamic Translation: Implementing ISAs in Software

Virtual Machines and Dynamic Translation: Implementing ISAs in Software Virtual Machines and Dynamic Translation: Implementing ISAs in Software Krste Asanovic Laboratory for Computer Science Massachusetts Institute of Technology Software Applications How is a software application

More information

Lec 13: Linking and Memory. Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University. Announcements

Lec 13: Linking and Memory. Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University. Announcements Lec 13: Linking and Memory Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University PA 2 is out Due on Oct 22 nd Announcements Prelim Oct 23 rd, 7:30-9:30/10:00 All content up to Lecture on Oct

More information

Reza Afshari Project Proposal Etec 471, Professor Todd Morton October 28, Western Washington University Electronics Engineering Technology

Reza Afshari Project Proposal Etec 471, Professor Todd Morton October 28, Western Washington University Electronics Engineering Technology WIRELESS OPTICAL USB MOUSE Reza Afshari Project Proposal Etec 471, Professor Todd Morton October 28, 2004 Western Washington University Electronics Engineering Technology INTRODUCTION Almost everybody

More information

CSCE 312 Lab manual. Instructor: Dr. Ki HwanYum. Prepared by. Dr. Rabi Mahapatra. Suneil Mohan & Amitava Biswas. Fall 2016

CSCE 312 Lab manual. Instructor: Dr. Ki HwanYum. Prepared by. Dr. Rabi Mahapatra. Suneil Mohan & Amitava Biswas. Fall 2016 CSCE 312 Lab manual Lab-3 - Sequential logic design Instructor: Dr. Ki HwanYum Prepared by Dr. Rabi Mahapatra. Suneil Mohan & Amitava Biswas Fall 2016 Department of Computer Science & Engineering Texas

More information

UCI. Intel Itanium Line Processor Efforts. Xiaobin Li. PASCAL EECS Dept. UC, Irvine. University of California, Irvine

UCI. Intel Itanium Line Processor Efforts. Xiaobin Li. PASCAL EECS Dept. UC, Irvine. University of California, Irvine Intel Itanium Line Processor Efforts Xiaobin Li PASCAL EECS Dept. UC, Irvine Outline Intel Itanium Line Roadmap IA-64 Architecture Itanium Processor Microarchitecture Case Study of Exploiting TLP at VLIW

More information

New Advances in Micro-Processors and computer architectures

New Advances in Micro-Processors and computer architectures New Advances in Micro-Processors and computer architectures Prof. (Dr.) K.R. Chowdhary, Director SETG Email: kr.chowdhary@jietjodhpur.com Jodhpur Institute of Engineering and Technology, SETG August 27,

More information

Random-Access Memory (RAM) Systemprogrammering 2009 Föreläsning 4 Virtual Memory. Locality. The CPU-Memory Gap. Topics! The memory hierarchy

Random-Access Memory (RAM) Systemprogrammering 2009 Föreläsning 4 Virtual Memory. Locality. The CPU-Memory Gap. Topics! The memory hierarchy Systemprogrammering 29 Föreläsning 4 Topics! The memory hierarchy! Motivations for VM! Address translation! Accelerating translation with TLBs Random-Access (RAM) Key features! RAM is packaged as a chip.!

More information

Introduction to the Personal Computer

Introduction to the Personal Computer Introduction to the Personal Computer 2.1 Describe a computer system A computer system consists of hardware and software components. Hardware is the physical equipment such as the case, storage drives,

More information

SystemC Implementation of VLSI Embedded Systems for MEMS. Application

SystemC Implementation of VLSI Embedded Systems for MEMS. Application Fourth LACCEI International Latin American and Caribbean Conference for Engineering and Technology (LACCET 2006) Breaking Frontiers and Barriers in Engineering: Education, Research and Practice 21-23 June

More information

High-Performance Parallel Computing

High-Performance Parallel Computing High-Performance Parallel Computing P. (Saday) Sadayappan Rupesh Nasre Course Overview Emphasis on algorithm development and programming issues for high performance No assumed background in computer architecture;

More information

Advance CPU Design. MMX technology. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ. ! Basic concepts

Advance CPU Design. MMX technology. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ. ! Basic concepts Computer Architectures Advance CPU Design Tien-Fu Chen National Chung Cheng Univ. Adv CPU-0 MMX technology! Basic concepts " small native data types " compute-intensive operations " a lot of inherent parallelism

More information

Reminder. Course project team forming deadline. Course project ideas. Friday 9/8 11:59pm You will be randomly assigned to a team after the deadline

Reminder. Course project team forming deadline. Course project ideas. Friday 9/8 11:59pm You will be randomly assigned to a team after the deadline Reminder Course project team forming deadline Friday 9/8 11:59pm You will be randomly assigned to a team after the deadline Course project ideas If you have difficulty in finding team mates, send your

More information

General Purpose Signal Processors

General Purpose Signal Processors General Purpose Signal Processors First announced in 1978 (AMD) for peripheral computation such as in printers, matured in early 80 s (TMS320 series). General purpose vs. dedicated architectures: Pros:

More information

PRELAB! Read the entire lab, and complete the prelab questions (Q1-Q3) on the answer sheet before coming to the laboratory.

PRELAB! Read the entire lab, and complete the prelab questions (Q1-Q3) on the answer sheet before coming to the laboratory. PRELAB! Read the entire lab, and complete the prelab questions (Q1-Q3) on the answer sheet before coming to the laboratory. 1.0 Objectives In the last lab we learned that Verilog is a fast and easy way

More information