Custom Sensor-Based Embedded Computing Systems. Frank Vahid
|
|
- Baldric Welch
- 5 years ago
- Views:
Transcription
1 Custom Sensor-Based Embedded Computing Systems Frank Vahid Professor Dept. of Computer Science and Engineering University of California, Riverside Assoc. Director, Center for Embedded Computer Systems, UC Irvine eblocks project 2002-present, support provided by the National Science Foundation and Intel Ph.D. student: Susan Lysecky (2006, w Asst. Prof. at U. Arizona); several MS and undergrad students also
2 The Problem What do these problems all have in common? A small store owner with many employees are they in the storeroom, breakroom, or out back? A working adult with an ageing parent at home did she get out of bed today, is she moving around? A homeowner who sometimes forgets to close the garage at night Marines wishing to outfit a building to detect whether someone is inside or when someone was inside Frank Vahid, UC Riverside 2/32
3 The Problem What do these problems all have in common? Put motion and sound sensors throughout, small LEDs (lights) near cash register Put motion sensors around the house, monitor from the web or cell phone or even be tified if motion by certain time in the morning Install contact sensor and light sensor, and indicator next to the bed Place motion, heat, and sound sensors in rooms, halls, doorways Frank Vahid, UC Riverside 3/32
4 Why Can t We Just Do This? Widely usable Lego -like sensors don t exist today Costly, hard to use, plugged into wall... light sensor transmit AND contact switch receive LED But new techlogy makes Lego-like sensor blocks possible... Frank Vahid, UC Riverside 4/32
5 Shrinking Processor Size/Cost Enables New Solution Courtesy of Joe Kahn Make sensors smarter By adding processor+battery Today, tiny and cheap Becomes a "block" easily connected to other blocks Frank Vahid, UC Riverside 5/32
6 Shrinking Processor Size/Cost Enables New Solution eblocks Existing component view New "eblock" view Button / Light Sensor / Magnetic Contact Switch / / LED / Beeper / Electric Relay Frank Vahid, UC Riverside 6/32
7 eblocks Just connect blocks, and they work Button / / Beeper No programming kwledge, electronics kwledge Light Sensor / / LED Frank Vahid, UC Riverside 7/32
8 eblocks Add intermediate blocks that compute and maintain state Spatial programming more intuitive to n-cs people than temporal programming Button Light Sensor Button Toggle Beeper When A is AND OR B is Light Sensor Button Tripper LED Combine then the output is Prolong Frank Vahid, UC Riverside 8/32 LED
9 What's Hard (1) Finding right set of building blocks Too many Overwhelming (too much choice) 2-Input Logic Toggle Tripper Splitter Splitter Too few Overwhelming (too much configuration) Input Logic Prolong (short) Splitter When A is 2 AND OR B is then the output is Input Logic Prolong (long) 5: Splitter 6:... When A is AND OR Combine B is then the output is 2 Yes detector 2 No detector SuperBlock Frank Vahid, UC Riverside 9/32
10 What's Hard (2) Making the blocks understandable People NOT likely to read directions Those that do are unlikely to understand Performed extensive user testing (over 500 students, kids, and adults) over two years Example: Combine block A B Output Logic Block Logic Sentence Frank Vahid, UC Riverside 10/32 configurable DIP switch Combine When the input is Most success A B : A is, B is A is, B is A is, B is A is, B is Phrased truth table A B A is, B is A is, B is A is, B is A is, B is Combine the output should be The output should be when: out Phrased truth table embedded in sentence When the input is A B A B A B A B A B The output should be out Combine Colored truth table embedded in sentence When A is AND OR B is then the output is Combine
11 What's Hard (3) Batteries must last years, yet performance should appear continuous Blocks are off 99.9% of the time Developed theory to map eblock events to continuous time Developed custom CAD tool to automatically find the best block parameter settings out of the billions of possibilities (a) (b) (c) (d) f t f f < < error < < f t f f error time interpreted as Frank Vahid, UC Riverside 11/32
12 eblocks Example "Garage Open at Night" detector <10 minutes to build Light Sensor Detect night-time use Light Sensor block When A is AND OR B is then the output is Combine LED Need to indicate garage open at night use LED block Plug pieces together and the system is done! Magnetic Contact Switch Detect garage door open use Contact Switch block Use Combine block to combine light sensor and contact switch into one Frank Vahid, UC Riverside 12/32
13 Graphical Simulator User specifies and tests block design Java-based simulator User chooses between pallets Blocks added by dragging User is able to configure various blocks by clicking on switches Connections created by drawing lines between blocks Button When A is AND OR Light Sensor B is then the output is Combine Beeper Available eblocks Sensors Output Compute/Communications When Button Beeper A is AND B is OR then the output is Motion Combine Green/Red Sensor Light rst in Light Sensor Once Yes, Stays Yes Toggle Yes/No seconds Prolonger Hide this panel Advanced Mode Welcome to the eblocks Simulator! In this area, you ll find helpful hints on creating your own designs. Click and drag an eblock off of the Available eblocks panel to add it to your design. To connect two blocks, click and drag from an output port (colored circle) to an input port (gray circle). A connection can be destroyed by clicking on a connected port. To move a block around the workspace, click and drag its orange area. Blocks can be moved into the trash can to delete them. Green circles indicate that the port is sending a, red circles indicate that the port is sending a, yellow Circles indicate that the port is sending an error signal, and gray circles dete an input port. Frank Vahid, UC Riverside 13/32
14 Graphical Simulator User specifies and tests block design Java-based simulator User chooses between pallets Blocks added by dragging User is able to configure various blocks by clicking on switches Connections created by drawing lines between blocks User can create, experiment, test and configure design Button When A is AND OR Light Sensor B is then the output is Combine Beeper Welcome to the eblocks Simulator! In this area, you ll find helpful hints on creating your own designs. Available eblocks Sensors Output Compute/Communications When Button Beeper A is then the output is Motion Combine Green/Red Sensor Light rst in Light Sensor Once Yes, Stays Yes Toggle Yes/No seconds Prolonger Hide this panel Click and drag an eblock off of the Available eblocks panel to add it to your design. To connect two blocks, click and drag from an output port (colored circle) to an input port (gray circle). A connection can be destroyed by clicking on a connected port. To move a block around the workspace, click and drag its orange area. Blocks can be moved into the trash can to delete them. Green circles indicate that the port is sending a, red circles indicate that the port is sending a, yellow Circles indicate that the port is sending an error signal, and gray circles dete an input port. AND B is OR Advanced Mode Frank Vahid, UC Riverside 14/32
15 eblocks and Embedded Microprocessors Can greatly simplify coding Button / 1/0 Micro- Light Sensor processor Button Tripper / 1/0 Frank Vahid, UC Riverside 15/32
16 And w for something completely different... Warp processing 2001-present, supported by SRC, Intel, IBM, Freescale, Xilinx Ph.D. students Roman Lysecky Ph.D., June 2005, w Asst. Prof. at Univ. of Arizona Greg Stitt Ph.D. June 2007, w Asst. Prof. at Univ. of Florida, Gainseville Ann Gordon-Ross Ph.D. June 2007, w Asst. Prof. at Univ. of Florida, Gainseville Frank Vahid, UC Riverside 16/32
17 Circuits on FPGAs Can Sometimes Give Big Speedups C Code for FIR Filter for (i=0; i i < 128; i++) y y[i] += += c[i] c[i] * x[i] * x[i] Circuit for FIR Filter * * * * * * * * * * * * Processor Processor Processor FPGA 1000 s of instructions ~ 7 cycles Speedup > 100x Several thousand cycles Pipelined -- >500x Circuit parallelism/pipelining can yield big speedups Frank Vahid, UC Riverside 17/32
18 Dynamic Translation Motivated by commercial dynamic binary translation of early 2000s e.g., Transmeta Crusoe code morphing VLIW VLIW Binary Performance x86 Binary Binary Translation x86 VLIW FPGA Binary Binary Translation x86 VLIW FPGA Warp processing (Lysecky/Stitt/Vahid ): dynamically translate binary to circuits on FPGAs Frank Vahid, UC Riverside 18/32
19 Warp Processing Background 1 Initially, software binary loaded into instruction memory Profiler I Mem D$ Software Binary Mov reg3, 0 Mov reg4, 0 loop: Shl reg1, reg3, 1 Add reg5, reg2, reg1 Ld reg6, 0(reg5) Add reg4, reg4, reg6 Add reg3, reg3, 1 Beq reg3, 10, -5 Ret reg4 FPGA On-chip CAD Frank Vahid, UC Riverside 19/32
20 Warp Processing Background 2 Microprocessor executes instructions in software binary Profiler I Mem D$ Software Binary Mov reg3, 0 Mov reg4, 0 loop: Shl reg1, reg3, 1 Add reg5, reg2, reg1 Ld reg6, 0(reg5) Add reg4, reg4, reg6 Add reg3, reg3, 1 Beq reg3, 10, -5 Ret reg4 Time Energy FPGA On-chip CAD Frank Vahid, UC Riverside 20/32
21 Warp Processing Background 3 Profiler monitors instructions and detects critical regions in binary Profiler beqbeqbeqbeqbeqbeqbeqbeqbeqbeq addaddaddaddaddaddaddaddaddadd I Mem D$ Software Binary Mov reg3, 0 Mov reg4, 0 loop: Shl reg1, reg3, 1 Add reg5, reg2, reg1 Ld reg6, 0(reg5) Add reg4, reg4, reg6 Add reg3, reg3, 1 Beq reg3, 10, -5 Ret reg4 Time Energy FPGA On-chip CAD Critical Loop Detected Frank Vahid, UC Riverside 21/32
22 Warp Processing Background 4 On-chip CAD reads in critical region Profiler I Mem D$ Software Binary Mov reg3, 0 Mov reg4, 0 loop: Shl reg1, reg3, 1 Add reg5, reg2, reg1 Ld reg6, 0(reg5) Add reg4, reg4, reg6 Add reg3, reg3, 1 Beq reg3, 10, -5 Ret reg4 Time Energy FPGA On-chip CAD Frank Vahid, UC Riverside 22/32
23 Warp Processing Background 5 On-chip CAD decompiles critical region into control data flow graph (CDFG) FPGA Profiler I Mem D$ Dynamic On-chip CAD Part. Module (DPM) Decompilation surprisingly effective at recovering high-level program structures Stitt et al ICCAD 02, DAC 03, CODES/ISSS 05, ICCAD 05, FPGA 05, TODAES 06, TODAES 07 Software Binary Mov reg3, 0 Mov reg4, 0 loop: Shl reg1, reg3, 1 Add reg5, reg2, reg1 Ld reg6, 0(reg5) Add reg4, reg4, reg6 Add reg3, reg3, 1 Beq reg3, 10, -5 Ret reg4 reg3 := 0 reg4 := 0 Frank Vahid, UC Riverside 23/32 Time loop: reg4 := reg4 + mem[ reg2 + (reg3 << 1)] reg3 := reg3 + 1 if (reg3 < 10) goto loop ret reg4 Energy Recover loops, arrays, subroutines, etc. needed to synthesize good circuits
24 Warp Processing Background 6 On-chip CAD synthesizes decompiled CDFG to a custom (parallel) circuit Profiler I Mem D$ Software Binary Mov reg3, 0 Mov reg4, 0 loop: Shl reg1, reg3, 1 Add reg5, reg2, reg1 Ld reg6, 0(reg5) Add reg4, reg4, reg6 Add reg3, reg3, 1 Beq reg3, 10, -5 Ret reg4 Time Energy FPGA Dynamic On-chip CAD Part. Module (DPM) reg3 := 0 reg4 := loop: reg4 := reg4 + mem[ reg2 + (reg3 << 1)] reg3 := reg3 + 1 if (reg3 < 10) goto loop ret reg4 Frank Vahid, UC Riverside 24/
25 Warp Processing Background 7 On-chip CAD maps circuit onto FPGA Profiler I Mem D$ Software Binary Mov reg3, 0 Mov reg4, 0 loop: Shl reg1, reg3, 1 Add reg5, reg2, reg1 Ld reg6, 0(reg5) Add reg4, reg4, reg6 Add reg3, reg3, 1 Beq reg3, 10, -5 Ret reg4 Time Energy FPGA Dynamic On-chip CAD Part. Module (DPM) Lean place&route/fpga 10x faster CAD (Lysecky et al DAC 03, ISSS/CODES 03, DATE 04, DAC 04, DATE 05, FCCM 05, TODAES 06) Multi-core chips use 1 powerful core for CAD reg3 := 0 reg4 + := 0 + SM SM loop: reg4 := reg4 + mem[ reg2 CLB + (reg3 << 1)] reg3 := reg3 + 1 if SM (reg3 < 10) SM goto loop ret reg4 Frank Vahid, UC Riverside 25/32 SM SM CLB
26 Warp Processing Background 8 On-chip CAD replaces instructions in binary to use hardware, causing performance and energy to warp by an order of magnitude or more Profiler I Mem D$ Software Binary Mov reg3, 0 Mov reg4, 0 loop: Shl // instructions reg1, reg3, that 1 Add interact reg5, with reg2, FPGA reg1 Ld reg6, 0(reg5) Add reg4, reg4, reg6 Add reg3, reg3, 1 Beq reg3, 10, -5 Ret reg4 >10x speedups for some apps Time Time Energy Energy Software-only Warped FPGA Dynamic On-chip CAD Part. Module (DPM) SM reg3 := 0 reg4 + := 0 + SM SM loop: reg4 := reg4 + mem[ reg2 CLB + (reg3 << 1)] reg3 := reg3 + 1 if SM (reg3 < 10) SM goto loop SM CLB ret reg4 Frank Vahid, UC Riverside 26/
27 Challenge: Decompilation Requires aggressive decompilation to recover loops, arrays,..., from binaries Results: Competitive with synthesis from C Speedup FIR Filter Beamformer Viterbi Brev Url BITMNP01 IDCTRN01 High-level Binary-level PNTRCH01 Average Synthesis from C Code Synthesis after Decompiling Binary Example Cycles ClkFrq Time Area Cycles ClkFrq Time Area %Time Overhead %Area Overhead bit_correlator % 0% fir % 3% udiv % 0% prewitt % 58% mf % 0% moravec % -1% Avg: -1% 10% Frank Vahid, UC Riverside 27/32
28 Challenge: JIT Compile to FPGA Developed ultra-lean CAD heuristics for synthesis, placement, routing, and techlogy mapping; simultaneously developed CAD-oriented FPGA e.g., Our router (ROCR) 10x faster and 20x less memory, at cost of 30% longer critical path. Similar results for synth & placement (EDAA Outstanding Dissertation Award 2006) Xilinx ISE Riverside JIT FPGA tools 0.2 s 3.6MB 9.1 s 60 MB Riverside JIT FPGA tools on a 75MHz ARM7 1.4s 3.6MB Frank Vahid, UC Riverside 28/32
29 Experiments Benchmarks: Image processing, DSP, scientific computing Highly parallel examples to illustrate thread warping potential We created multithreaded versions Base architecture 4 ARM cores Focus on recurring applications (embedded) TW: FPGA running at whatever frequency determined by synthesis Multi-core 4 ARM MHz Thread Warping 4 ARM MHz + FPGA (synth freq) Compared to FPGA On-chip CAD Frank Vahid, UC Riverside 29/32
30 Speedup from Thread Warping Fir Prewitt Linear Moravec Wavelet Maxfilter 3DTrans N-body Avg. Geo. Mean 38 4-uP TW 8-uP 16-uP 32-uP 64-uP Average 130x speedup But, FPGA uses additional area 11x faster than 64-core system Simulation pessimistic, actual results likely better So we also compare to systems with 8 to 64 ARM11 ups FPGA size = ~36 ARM11s Frank Vahid, UC Riverside 30/32
31 Dynamic Enables Expandable Logic Concept RAM RAM Expandable RAM Logic System Warp tools detect detects amount RAM of FPGA, during invisibly start, adapt improves application performance to use less/more invisiblyhardware. DMA Cache Cache Profiler FPGA FPGA Warp Tools FPGA FPGA Expandable RAM up Expandable Logic Performance Frank Vahid, UC Riverside 31/32 Speedup Softw are 1 FPGA 2 FPGAs 3 FPGAs 4 FPGAs N-Body 3DTrans Prew itt Wavelet
32 Virtual Immersion eblocks: Enables customized sensor-based system design by n-experts May lead to...? Wood and nails of the sensor world Currently working with hearingimpaired, aging Warp processing (featured in this month s IEEE Computer) Enables large speedups on certain applications (e.g., image processing), user can expand hardware without changing software Frank Vahid, UC Riverside 32/32
Software consists of bits downloaded into a
C O V E R F E A T U R E Warp Processing: Dynamic Translation of Binaries to FPGA Circuits Frank Vahid, University of California, Riverside Greg Stitt, University of Florida Roman Lysecky, University of
More informationWarp Processors (a.k.a. Self-Improving Configurable IC Platforms)
(a.k.a. Self-Improving Configurable IC Platforms) Frank Vahid Department of Computer Science and Engineering University of California, Riverside Faculty member, Center for Embedded Computer Systems, UC
More informationA Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning
A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning By: Roman Lysecky and Frank Vahid Presented By: Anton Kiriwas Disclaimer This specific
More informationIntroduction Warp Processors Dynamic HW/SW Partitioning. Introduction Standard binary - Separating Function and Architecture
Roman Lysecky Department of Electrical and Computer Engineering University of Arizona Dynamic HW/SW Partitioning Initially execute application in software only 5 Partitioned application executes faster
More informationTechniques for Synthesizing Binaries to an Advanced Register/Memory Structure
Techniques for Synthesizing Binaries to an Advanced Register/Memory Structure Greg Stitt, Zhi Guo, Frank Vahid*, Walid Najjar Department of Computer Science and Engineering University of California, Riverside
More information[Processor Architectures] [Computer Systems Organization]
Warp Processors ROMAN LYSECKY University of Arizona and GREG STITT AND FRANK VAHID* University of California, Riverside *Also with the Center for Embedded Computer Systems at UC Irvine We describe a new
More informationA Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning
A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning Roman Lysecky, Frank Vahid To cite this version: Roman Lysecky, Frank Vahid. A Study
More informationAbstract. 1 Introduction. Reconfigurable Logic and Hardware Software Codesign. Class EEC282 Author Marty Nicholes Date 12/06/2003
Title Reconfigurable Logic and Hardware Software Codesign Class EEC282 Author Marty Nicholes Date 12/06/2003 Abstract. This is a review paper covering various aspects of reconfigurable logic. The focus
More informationA Codesigned On-Chip Logic Minimizer
A Codesigned On-Chip Logic Minimizer Roman Lysecky, Frank Vahid* Department of Computer Science and ngineering University of California, Riverside {rlysecky, vahid}@cs.ucr.edu *Also with the Center for
More informationEE382V: System-on-a-Chip (SoC) Design
EE382V: System-on-a-Chip (SoC) Design Lecture 10 Task Partitioning Sources: Prof. Margarida Jacome, UT Austin Prof. Lothar Thiele, ETH Zürich Andreas Gerstlauer Electrical and Computer Engineering University
More informationAutomatic Tuning of Two-Level Caches to Embedded Applications
Automatic Tuning of Two-Level Caches to Embedded Applications Ann Gordon-Ross, Frank Vahid* Department of Computer Science and Engineering University of California, Riverside {ann/vahid}@cs.ucr.edu http://www.cs.ucr.edu/~vahid
More informationDon t Forget Memories A Case Study Redesigning a Pattern Counting ASIC Circuit for FPGAs
Don t Forget Memories A Case Study Redesigning a Pattern Counting ASIC Circuit for FPGAs David Sheldon Department of Computer Science and Engineering, UC Riverside dsheldon@cs.ucr.edu Frank Vahid Department
More informationLABORATORY # 6 * L A B M A N U A L. Datapath Components - Adders
Department of Electrical Engineering University of California Riverside Laboratory #6 EE 120 A LABORATORY # 6 * L A B M A N U A L Datapath Components - Adders * EE and CE students must attempt also to
More informationIA-64, P4 HT and Crusoe Architectures Ch 15
IA-64, P4 HT and Crusoe Architectures Ch 15 IA-64 General Organization Predication, Speculation Software Pipelining Example: Itanium Pentium 4 HT Crusoe General Architecture Emulated Precise Exceptions
More informationAdvanced processor designs
Advanced processor designs We ve only scratched the surface of CPU design. Today we ll briefly introduce some of the big ideas and big words behind modern processors by looking at two example CPUs. The
More informationPARLGRAN: Parallelism granularity selection for scheduling task chains on dynamically reconfigurable architectures *
PARLGRAN: Parallelism granularity selection for scheduling task chains on dynamically reconfigurable architectures * Sudarshan Banerjee, Elaheh Bozorgzadeh, Nikil Dutt Center for Embedded Computer Systems
More informationComputer Systems. Binary Representation. Binary Representation. Logical Computation: Boolean Algebra
Binary Representation Computer Systems Information is represented as a sequence of binary digits: Bits What the actual bits represent depends on the context: Seminar 3 Numerical value (integer, floating
More informationMultithreading: Exploiting Thread-Level Parallelism within a Processor
Multithreading: Exploiting Thread-Level Parallelism within a Processor Instruction-Level Parallelism (ILP): What we ve seen so far Wrap-up on multiple issue machines Beyond ILP Multithreading Advanced
More informationComputer Architecture Dr. Charles Kim Howard University
EECE416 Microcomputer Fundamentals Computer Architecture Dr. Charles Kim Howard University 1 Computer Architecture Computer Architecture Art of selecting and interconnecting hardware components to create
More informationScalability and Parallel Execution of Warp Processing - Dynamic Hardware/Software Partitioning
Scalability and Parallel Execution of Warp Processing - Dynamic Hardware/Software Partitioning Roman Lysecky Department of Electrical and Computer Engineering University of Arizona rlysecky@ece.arizona.edu
More informationHardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University
Hardware Design Environments Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Outline Welcome to COE 405 Digital System Design Design Domains and Levels of Abstractions Synthesis
More informationOverview. CSE372 Digital Systems Organization and Design Lab. Hardware CAD. Two Types of Chips
Overview CSE372 Digital Systems Organization and Design Lab Prof. Milo Martin Unit 5: Hardware Synthesis CAD (Computer Aided Design) Use computers to design computers Virtuous cycle Architectural-level,
More informationA First-step Towards an Architecture Tuning Methodology for Low Power
A First-step Towards an Architecture Tuning Methodology for Low Power Greg Stitt, Frank Vahid*, Tony Givargis Department of Computer Science and Engineering University of California, Riverside {gstitt,
More informationCOE 561 Digital System Design & Synthesis Introduction
1 COE 561 Digital System Design & Synthesis Introduction Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum & Minerals Outline Course Topics Microelectronics Design
More informationImproving System. Performance: Caches
Improving System Performance: Caches December 04 CSC201 Section 002 Fall, 2000 A Motivating Example Application: making a (mechanical) clock dozens of tools and pages of instructions, hundreds of parts
More informationRISC Architecture Ch 12
RISC Architecture Ch 12 Some History Instruction Usage Characteristics Large Register Files Register Allocation Optimization RISC vs. CISC 18 Original Ideas Behind CISC (Complex Instruction Set Comp.)
More informationTowards Optimal Custom Instruction Processors
Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT CHIPS 18 Overview 1. background: extensible processors
More informationUsing FPGAs as Microservices
Using FPGAs as Microservices David Ojika, Ann Gordon-Ross, Herman Lam, Bhavesh Patel, Gaurav Kaul, Jayson Strayer (University of Florida, DELL EMC, Intel Corporation) The 9 th Workshop on Big Data Benchmarks,
More informationECE 485/585 Microprocessor System Design
Microprocessor System Design Lecture 8: Principle of Locality Cache Architecture Cache Replacement Policies Zeshan Chishti Electrical and Computer Engineering Dept Maseeh College of Engineering and Computer
More informationCS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. VLIW, Vector, and Multithreaded Machines
CS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture VLIW, Vector, and Multithreaded Machines Assigned 3/24/2019 Problem Set #4 Due 4/5/2019 http://inst.eecs.berkeley.edu/~cs152/sp19
More informationScalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA
Scalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA Yun R. Qu, Viktor K. Prasanna Ming Hsieh Dept. of Electrical Engineering University of Southern California Los Angeles, CA 90089
More informationKeywords and Review Questions
Keywords and Review Questions lec1: Keywords: ISA, Moore s Law Q1. Who are the people credited for inventing transistor? Q2. In which year IC was invented and who was the inventor? Q3. What is ISA? Explain
More informationAn FPGA Project for use in a Digital Logic Course
Session 3226 An FPGA Project for use in a Digital Logic Course Daniel C. Gray, Thomas D. Wagner United States Military Academy Abstract The Digital Computer Logic Course offered at the United States Military
More informationParallelizing FPGA Technology Mapping using GPUs. Doris Chen Deshanand Singh Aug 31 st, 2010
Parallelizing FPGA Technology Mapping using GPUs Doris Chen Deshanand Singh Aug 31 st, 2010 Motivation: Compile Time In last 12 years: 110x increase in FPGA Logic, 23x increase in CPU speed, 4.8x gap Question:
More informationComputer Architecture Dr. Charles Kim Howard University
EECE416 Microcomputer Fundamentals & Design Computer Architecture Dr. Charles Kim Howard University 1 Computer Architecture Computer Architecture Art of selecting and interconnecting hardware components
More informationMicroarchitecture Overview. Performance
Microarchitecture Overview Prof. Scott Rixner Duncan Hall 3028 rixner@rice.edu January 15, 2007 Performance 4 Make operations faster Process improvements Circuit improvements Use more transistors to make
More informationThe QR code here provides a shortcut to go to the course webpage.
Welcome to this MSc Lab Experiment. All my teaching materials for this Lab-based module are also available on the webpage: www.ee.ic.ac.uk/pcheung/teaching/msc_experiment/ The QR code here provides a shortcut
More informationFrank Vahid Digital Design Second Edition Solution
FRANK VAHID DIGITAL DESIGN SECOND EDITION SOLUTION PDF - Are you looking for frank vahid digital design second edition solution Books? Now, you will be happy that at this time frank vahid digital design
More informationUG3 Compiling Techniques Overview of the Course
UG3 Compiling Techniques Overview of the Course Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit permission
More informationCSC 170 Introduction to Computers and Their Applications. Computers
CSC 170 Introduction to Computers and Their Applications Lecture #4 Digital Devices Computers At its core, a computer is a multipurpose device that accepts input, processes data, stores data, and produces
More informationMobile Processors. Jose R. Ortiz Ubarri
Mobile Processors Jose R. Ortiz Ubarri Electrical and Computer Engineering Department University of Puerto Rico, Mayagüez Campus Mayagüez, Puerto Rico 00681 5000 Jose.Ortiz@hpcf.upr.edu Introduction While
More informationECE 2300 Digital Logic & Computer Organization. Caches
ECE 23 Digital Logic & Computer Organization Spring 217 s Lecture 2: 1 Announcements HW7 will be posted tonight Lab sessions resume next week Lecture 2: 2 Course Content Binary numbers and logic gates
More informationLecture 12. Motivation. Designing for Low Power: Approaches. Architectures for Low Power: Transmeta s Crusoe Processor
Lecture 12 Architectures for Low Power: Transmeta s Crusoe Processor Motivation Exponential performance increase at a low cost However, for some application areas low power consumption is more important
More informationUsing a Victim Buffer in an Application-Specific Memory Hierarchy
Using a Victim Buffer in an Application-Specific Memory Hierarchy Chuanjun Zhang Depment of lectrical ngineering University of California, Riverside czhang@ee.ucr.edu Frank Vahid Depment of Computer Science
More informationWhat is This Course About? CS 356 Unit 0. Today's Digital Environment. Why is System Knowledge Important?
0.1 What is This Course About? 0.2 CS 356 Unit 0 Class Introduction Basic Hardware Organization Introduction to Computer Systems a.k.a. Computer Organization or Architecture Filling in the "systems" details
More informationUNIVERSITY OF CALIFORNIA RIVERSIDE. External Interfaces and Software Tools for Electronic Blocks
UNIVERSITY OF CALIFORNIA RIVERSIDE External Interfaces and Software Tools for Electronic Blocks A Thesis submitted in partial satisfaction of the requirements for the degree of Master of Science in Computer
More informationFPGA: What? Why? Marco D. Santambrogio
FPGA: What? Why? Marco D. Santambrogio marco.santambrogio@polimi.it 2 Reconfigurable Hardware Reconfigurable computing is intended to fill the gap between hardware and software, achieving potentially much
More informationAccelerating DSP Applications in Embedded Systems with a Coprocessor Data-Path
Accelerating DSP Applications in Embedded Systems with a Coprocessor Data-Path Michalis D. Galanis, Gregory Dimitroulakos, and Costas E. Goutis VLSI Design Laboratory, Electrical and Computer Engineering
More informationIntroduction to Computer Engineering (E114)
Introduction to Computer Engineering (E114) Lab 1: Full Adder Introduction In this lab you will design a simple digital circuit called a full adder. You will then use logic gates to draw a schematic for
More informationINTRODUCING THE CODEBIT!
GETTING STARTED Downloading the littlebits Code Kit app STEP 1 Download and open the littlebits Code Kit app at littlebits.com/code-kit-app STEP 2 Click the pink open blank canvas button to start writing
More informationMultilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology
1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823
More informationCSE 141: Computer Architecture. Professor: Michael Taylor. UCSD Department of Computer Science & Engineering
CSE 141: Computer 0 Architecture Professor: Michael Taylor RF UCSD Department of Computer Science & Engineering Computer Architecture from 10,000 feet foo(int x) {.. } Class of application Physics Computer
More informationOutline Marquette University
COEN-4710 Computer Hardware Lecture 1 Computer Abstractions and Technology (Ch.1) Cristinel Ababei Department of Electrical and Computer Engineering Credits: Slides adapted primarily from presentations
More informationBinary Synthesis. GREG STITT and FRANK VAHID University of California, Riverside
Binary Synthesis GREG STITT and FRANK VAHID University of California, Riverside Recent high-level synthesis approaches and C-based hardware description languages attempt to improve the hardware design
More informationCSE P567 - Winter 2010 Lab 1 Introduction to FGPA CAD Tools
CSE P567 - Winter 2010 Lab 1 Introduction to FGPA CAD Tools This is a tutorial introduction to the process of designing circuits using a set of modern design tools. While the tools we will be using (Altera
More informationCS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2017 Lecture 15
CS24: INTRODUCTION TO COMPUTING SYSTEMS Spring 2017 Lecture 15 LAST TIME: CACHE ORGANIZATION Caches have several important parameters B = 2 b bytes to store the block in each cache line S = 2 s cache sets
More informationMicroprocessors, Lecture 1: Introduction to Microprocessors
Microprocessors, Lecture 1: Introduction to Microprocessors Computing Systems General-purpose standalone systems (سيستم ھای نھفته ( systems Embedded 2 General-purpose standalone systems Stand-alone computer
More informationCache Aware Optimization of Stream Programs
Cache Aware Optimization of Stream Programs Janis Sermulins, William Thies, Rodric Rabbah and Saman Amarasinghe LCTES Chicago, June 2005 Streaming Computing Is Everywhere! Prevalent computing domain with
More informationNew development within the FPGA area with focus on soft processors
New development within the FPGA area with focus on soft processors Jonathan Blom The University of Mälardalen Langenbergsgatan 6 d SE- 722 17 Västerås +46 707 32 52 78 jbm05005@student.mdh.se Peter Nilsson
More informationMicroarchitecture Overview. Performance
Microarchitecture Overview Prof. Scott Rixner Duncan Hall 3028 rixner@rice.edu January 18, 2005 Performance 4 Make operations faster Process improvements Circuit improvements Use more transistors to make
More informationSpiral 2-8. Cell Layout
2-8.1 Spiral 2-8 Cell Layout 2-8.2 Learning Outcomes I understand how a digital circuit is composed of layers of materials forming transistors and wires I understand how each layer is expressed as geometric
More informationConfigurable and Extensible Processors Change System Design. Ricardo E. Gonzalez Tensilica, Inc.
Configurable and Extensible Processors Change System Design Ricardo E. Gonzalez Tensilica, Inc. Presentation Overview Yet Another Processor? No, a new way of building systems Puts system designers in the
More informationCS61C : Machine Structures
inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures CS61C L22 Caches II (1) CPS today! Lecture #22 Caches II 2005-11-16 There is one handout today at the front and back of the room! Lecturer PSOE,
More informationSmart Security at Every Corner of Your Home
Spotlight Cam Smart Security at Every Corner of Your Home Your new Spotlight Cam lets you extend the Ring of Security around your entire property. Now, you ll always be the first to know when someone s
More information1. Charging. 2. In-app Setup. 3. Physical Installation. 4. Features. 5. Troubleshooting
Spotlight Cam Smart Security at Every Corner of Your Home Your new Spotlight Cam lets you extend the Ring of Security around your entire property. Now, you ll always be the first to know when someone s
More informationVHDL-MODELING OF A GAS LASER S GAS DISCHARGE CIRCUIT Nataliya Golian, Vera Golian, Olga Kalynychenko
136 VHDL-MODELING OF A GAS LASER S GAS DISCHARGE CIRCUIT Nataliya Golian, Vera Golian, Olga Kalynychenko Abstract: Usage of modeling for construction of laser installations today is actual in connection
More informationExact and Approximate Task Assignment Algorithms for Pipelined Software Synthesis
Exact and Approximate Task Assignment Algorithms for Pipelined Software Synthesis Matin Hashemi Soheil Ghiasi Laboratory for Embedded and Programmable Systems http://leps.ece.ucdavis.edu Department of
More informationAn Architectural Framework for Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware
An Architectural Framework for Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware Tao Chen, Shreesha Srinath Christopher Batten, G. Edward Suh Computer Systems Laboratory School of Electrical
More informationAdvances and Future Challenges in Binary Translation and Optimization
Advances and Future Challenges in Binary Translation and Optimization ERIK R. ALTMAN, KEMAL EBCIOG LU, MICHAEL GSCHWIND, SENIOR MEMBER, IEEE, AND SUMEDH SATHAYE Presented by Holly Ferguson 4-9-2012 Can
More informationANADOLU UNIVERSITY DEPARTMENT OF ELECTRICAL AND ELECTRONICS ENGINEERING. EEM Digital Systems II
ANADOLU UNIVERSITY DEPARTMENT OF ELECTRICAL AND ELECTRONICS ENGINEERING EEM 334 - Digital Systems II LAB 1 - INTRODUCTION TO XILINX ISE SOFTWARE AND FPGA 1. PURPOSE In this lab, after you learn to use
More informationEECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141
EECS151/251A Spring 2018 Digital Design and Integrated Circuits Instructors: John Wawrzynek and Nick Weaver Lecture 19: Caches Cache Introduction 40% of this ARM CPU is devoted to SRAM cache. But the role
More informationMulti-Level Cache Hierarchy Evaluation for Programmable Media Processors. Overview
Multi-Level Cache Hierarchy Evaluation for Programmable Media Processors Jason Fritts Assistant Professor Department of Computer Science Co-Author: Prof. Wayne Wolf Overview Why Programmable Media Processors?
More informationComparing Memory Systems for Chip Multiprocessors
Comparing Memory Systems for Chip Multiprocessors Jacob Leverich Hideho Arakida, Alex Solomatnikov, Amin Firoozshahian, Mark Horowitz, Christos Kozyrakis Computer Systems Laboratory Stanford University
More informationWhen Girls Design CPUs!
When Girls Design CPUs! An overview on one of the world s most famous CPU cores: ARM 1 Once Upon a Time There was a company in UK Acorn This company was the competitor to IBM Apple They were creating personal
More informationA Configurable Logic Architecture for Dynamic Hardware/Software Partitioning Abstract Keywords 1. Introduction
A Configurable Logic Architecture for Dynamic Hardware/Software artitioning Roman Lysecky, Frank Vahid* Department of Computer Science and ngineering University of California, Riverside {rlysecky, vahid}@cs.ucr.edu
More informationCache Configuration Exploration on Prototyping Platforms
Cache Configuration Exploration on Prototyping Platforms Chuanjun Zhang Department of Electrical Engineering University of California, Riverside czhang@ee.ucr.edu Frank Vahid Department of Computer Science
More informationHigh-Level Synthesis Techniques for In-Circuit Assertion-Based Verification
High-Level Synthesis Techniques for In-Circuit Assertion-Based Verification John Curreri Ph.D. Candidate of ECE, University of Florida Dr. Greg Stitt Assistant Professor of ECE, University of Florida April
More informationThe personal computer system uses the following hardware device types -
EIT, Author Gay Robertson, 2016 The personal computer system uses the following hardware device types - Input devices Input devices Processing devices Storage devices Processing Cycle Processing devices
More informationParallelism and Concurrency. COS 326 David Walker Princeton University
Parallelism and Concurrency COS 326 David Walker Princeton University Parallelism What is it? Today's technology trends. How can we take advantage of it? Why is it so much harder to program? Some preliminary
More informationRandom-Access Memory (RAM) Systemprogrammering 2007 Föreläsning 4 Virtual Memory. Locality. The CPU-Memory Gap. Topics
Systemprogrammering 27 Föreläsning 4 Topics The memory hierarchy Motivations for VM Address translation Accelerating translation with TLBs Random-Access (RAM) Key features RAM is packaged as a chip. Basic
More informationUC Berkeley CS61C : Machine Structures
inst.eecs.berkeley.edu/~cs61c Review! UC Berkeley CS61C : Machine Structures Lecture 28 Intra-machine Parallelism Parallelism is necessary for performance! It looks like itʼs It is the future of computing!
More informationA Review on Cache Memory with Multiprocessor System
A Review on Cache Memory with Multiprocessor System Chirag R. Patel 1, Rajesh H. Davda 2 1,2 Computer Engineering Department, C. U. Shah College of Engineering & Technology, Wadhwan (Gujarat) Abstract
More informationThe Xilinx XC6200 chip, the software tools and the board development tools
The Xilinx XC6200 chip, the software tools and the board development tools What is an FPGA? Field Programmable Gate Array Fully programmable alternative to a customized chip Used to implement functions
More informationPerformance! (1/latency)! 1000! 100! 10! Capacity Access Time Cost. CPU Registers 100s Bytes <10s ns. Cache K Bytes ns 1-0.
Since 1980, CPU has outpaced DRAM... EEL 5764: Graduate Computer Architecture Appendix C Hierarchy Review Ann Gordon-Ross Electrical and Computer Engineering University of Florida http://www.ann.ece.ufl.edu/
More informationWorkspace for '4-FPGA' Page 1 (row 1, column 1)
Workspace for '4-FPGA' Page 1 (row 1, column 1) Workspace for '4-FPGA' Page 2 (row 2, column 1) Workspace for '4-FPGA' Page 3 (row 3, column 1) ECEN 449 Microprocessor System Design FPGAs and Reconfigurable
More informationPerformance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference
The 2017 IEEE International Symposium on Workload Characterization Performance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference Shin-Ying Lee
More informationVirtual Machines and Dynamic Translation: Implementing ISAs in Software
Virtual Machines and Dynamic Translation: Implementing ISAs in Software Krste Asanovic Laboratory for Computer Science Massachusetts Institute of Technology Software Applications How is a software application
More informationLec 13: Linking and Memory. Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University. Announcements
Lec 13: Linking and Memory Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University PA 2 is out Due on Oct 22 nd Announcements Prelim Oct 23 rd, 7:30-9:30/10:00 All content up to Lecture on Oct
More informationReza Afshari Project Proposal Etec 471, Professor Todd Morton October 28, Western Washington University Electronics Engineering Technology
WIRELESS OPTICAL USB MOUSE Reza Afshari Project Proposal Etec 471, Professor Todd Morton October 28, 2004 Western Washington University Electronics Engineering Technology INTRODUCTION Almost everybody
More informationCSCE 312 Lab manual. Instructor: Dr. Ki HwanYum. Prepared by. Dr. Rabi Mahapatra. Suneil Mohan & Amitava Biswas. Fall 2016
CSCE 312 Lab manual Lab-3 - Sequential logic design Instructor: Dr. Ki HwanYum Prepared by Dr. Rabi Mahapatra. Suneil Mohan & Amitava Biswas Fall 2016 Department of Computer Science & Engineering Texas
More informationUCI. Intel Itanium Line Processor Efforts. Xiaobin Li. PASCAL EECS Dept. UC, Irvine. University of California, Irvine
Intel Itanium Line Processor Efforts Xiaobin Li PASCAL EECS Dept. UC, Irvine Outline Intel Itanium Line Roadmap IA-64 Architecture Itanium Processor Microarchitecture Case Study of Exploiting TLP at VLIW
More informationNew Advances in Micro-Processors and computer architectures
New Advances in Micro-Processors and computer architectures Prof. (Dr.) K.R. Chowdhary, Director SETG Email: kr.chowdhary@jietjodhpur.com Jodhpur Institute of Engineering and Technology, SETG August 27,
More informationRandom-Access Memory (RAM) Systemprogrammering 2009 Föreläsning 4 Virtual Memory. Locality. The CPU-Memory Gap. Topics! The memory hierarchy
Systemprogrammering 29 Föreläsning 4 Topics! The memory hierarchy! Motivations for VM! Address translation! Accelerating translation with TLBs Random-Access (RAM) Key features! RAM is packaged as a chip.!
More informationIntroduction to the Personal Computer
Introduction to the Personal Computer 2.1 Describe a computer system A computer system consists of hardware and software components. Hardware is the physical equipment such as the case, storage drives,
More informationSystemC Implementation of VLSI Embedded Systems for MEMS. Application
Fourth LACCEI International Latin American and Caribbean Conference for Engineering and Technology (LACCET 2006) Breaking Frontiers and Barriers in Engineering: Education, Research and Practice 21-23 June
More informationHigh-Performance Parallel Computing
High-Performance Parallel Computing P. (Saday) Sadayappan Rupesh Nasre Course Overview Emphasis on algorithm development and programming issues for high performance No assumed background in computer architecture;
More informationAdvance CPU Design. MMX technology. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ. ! Basic concepts
Computer Architectures Advance CPU Design Tien-Fu Chen National Chung Cheng Univ. Adv CPU-0 MMX technology! Basic concepts " small native data types " compute-intensive operations " a lot of inherent parallelism
More informationReminder. Course project team forming deadline. Course project ideas. Friday 9/8 11:59pm You will be randomly assigned to a team after the deadline
Reminder Course project team forming deadline Friday 9/8 11:59pm You will be randomly assigned to a team after the deadline Course project ideas If you have difficulty in finding team mates, send your
More informationGeneral Purpose Signal Processors
General Purpose Signal Processors First announced in 1978 (AMD) for peripheral computation such as in printers, matured in early 80 s (TMS320 series). General purpose vs. dedicated architectures: Pros:
More informationPRELAB! Read the entire lab, and complete the prelab questions (Q1-Q3) on the answer sheet before coming to the laboratory.
PRELAB! Read the entire lab, and complete the prelab questions (Q1-Q3) on the answer sheet before coming to the laboratory. 1.0 Objectives In the last lab we learned that Verilog is a fast and easy way
More information