A 45nm 1.3GHz 16.7 Double-Precision GFLOPS/W RISC-V Processor with Vector Accelerators"
|
|
- Dylan Wilcox
- 6 years ago
- Views:
Transcription
1 A 45nm 1.3GHz 16.7 Double-Precision GFLOPS/W ISC-V Processor with Vector Accelerators" Yunsup Lee 1, Andrew Waterman 1, imas Avizienis 1,! Henry Cook 1, Chen Sun 1,2,! Vladimir Stojanovic 1,2, Krste Asanovic 1!! 1 University of California, Berkeley! 2 Massachusetts Institute of Technology! 1!
2 Upheaval in Computer Design" Cost per Million Gates ($)! Vdd HP! Cost per Million Gates ($)" 0.045! 0.04! 0.035! 0.03! 0.025! 0.02! 0.015! 0.01! 0.005! 0! ! ! Moore s Law (cost/transistor) over?! Dennard scaling over (Vdd ~fixed)! ! 0.014! ! ! 90nm! 65nm! 40nm! 28nm! 20nm! 16/14nm! Energy efficiency constrains everything! Incorporate specialized and heterogeneous accelerators into general-purpose processors! Write processor generators to express a design space, and do vertically-integrated design space exploration extensively! Source! [1] Why migration to 20nm bulk CMOS and 16/14nm FINFETS is not the best approach for semiconductor industry, IBS, Handel Jones, 2014.! [2] International Technology oadmap of Semiconductors (ITS)! 5! 4.5! 4! 3.5! 3! 2.5! 2! 1.5! 1! 0.5! 0! Vdd HP" 2!
3 3mm X 6mm Chip Fabricated in 45nm SOI 75m+ transistors! Dual-Core ISC-V Processor with Vector Accelerators! 1MB SAM Memory Structure for Testing! Monolithically-Integrated Silicon Photonic Links" Transmitter: Wade OFC 14! eceiver: Georgas VLSI 14!! 3!
4 Chip Architecture" 3mm 2.8mm VF L1D$ 6mm 1MB SAM Array Core Logic ocket Scalar Core L1I$ L1VI$ Hwacha Vector Accelerator ocket Scalar Core Core Hwacha Vector Accelerator 1.1mm 16K L1I$ 32K L1D$ 8KB L1VI$ 16K L1I$ 32K L1D$ 8KB L1VI$ Dual-Core ISC-V Vector Processor FPGA FSB/ HTIF Arbiter Coherence Hub 1MB SAM Array Arbiter 4!
5 ISC-V is a new, open, and completely free general-purpose ISA! Developed at UC Berkeley! ISC-V designed to be flexible and extensible! Better integrate accelerators with host cores! ISC-V software ecosystem! binutils, GCC, Newlib, glibc, GDB, LLVM, Linux, QEMU! External users contributing to ecosystem! 5!
6 ocket Scalar Core" PC! IF! ID! EX! MEM! WB! ITLB Int.F DTLB PC Gen. I$ Inst. Int.EX D$ Commit Access Decode Access bypass paths omitted for simplicity ocket Pipeline to Hwacha FP.F FP.EX1 FP.EX2 FP.EX3 64-bit 6-stage single-issue in-order pipeline! Design minimizes impact of long clock-to-output delays of compiler-generated AMs! 64-entry BTB, 256-entry BHT, 2-entry AS! MMU supports page-based virtual memory! IEEE compliant FPU! Supports SP, DP FMA with hw support for subnormals! 6!
7 AM Cortex-A5 vs. ISC-V ocket" Category" AM Cortex-A5" ISC-V ocket" ISA! 32-bit AM v7! 64-bit ISC-V v2! Architecture! Single-Issue In-Order! Single-Issue In-Order 6-stage! Performance! 1.57 DMIPS/MHz! 1.72 DMIPS/MHz! Process! TSMC 40GPLUS! TSMC 40GPLUS! Area w/o Caches! 0.27 mm 2! 0.14 mm 2! Area with 16K Caches! 0.53 mm 2! 0.39 mm 2! Area Efficiency! 2.96 DMIPS/MHz/mm 2! 4.41 DMIPS/MHz/mm 2! Frequency! >1GHz! >1GHz! Dynamic Power! <0.08 mw/mhz! mw/mhz! PPA reporting conditions! 85% utilization, use Dhrystone for benchmark, frequency/ power at TT 0.9V 25C, all regular VT transistors! 10% higher in DMIPS/MHz, 49% more area-efficient! 7!
8 Hwacha Vector Accelerator" L1 VI$ Vector Issue Unit Bank0 Ctrl Bank0 11W SAM Bank1 Ctrl Bank1 11W SAM Bank7 Ctrl Bank7 11W SAM 64-bit Integer Multiplier SP/DP Floating-Point Units Vector Memory Unit Shared L1 D$ from ocket int int int ead Ports Write Ports 8!
9 Bank Execution Diagram" W W W After a 2-cycle initial startup latency, the banked F is effectively able to read out 2 operands/cycle.! 9!
10 Processor Generators" Express hardware as highly parameterized generators! Helps tune the design under different performance, power, and area constraints! Parameters include:! number of cores! cache sizes, associativity, number of TLB entries, cache-coherence protocol! number of floating-point pipeline stages! width of off-chip I/O, and more! 10!
11 Writing Generators with Chisel" TL generator written in Chisel! HDL embedded in Scala! Full power of Scala for writing generators! object-oriented programming, functional programming! C++ code! C++ Compiler! Chisel Program! Scala/JVM! FPGA Verilog! ASIC Verilog! Software Simulator! FPGA Tools! FPGA Emulation! ASIC Tools! GDS Layout! 11!
12 Physical Design Flow" Chisel Source Code! Chisel! TL Code (Verilog)! Synthesis! Place-and-oute! The core is synthesized and place-and-routed independently, and instantiated twice Gate-level Netlist! Formality! Formal Verification! PrimeTime/StarC! Static Timing Analysis! VCS Post-PN! Gate-level Simulation! Signed-Off Design! 12!
13 Chip esults" Process! Package! Chip Parameters" 45nm SOI CMOS, 11 metal layers! C4 area I/O, flip-chip bonded to PCB! Size! Processor! 2.8mm X 1.1mm! Standard Cells! SAM Bits! 1 Core! 1.37mm X 1.06mm! SAM Array! Processor! 1.1mm X 4mm! 425K (85K flip-flops)! 1 Core! 192K (36K flip-flops)! Processor! 1246K! 1 Core! 621K! Frequency! 1GHz (Nominal), 250MHz-1.3GHz! Voltage! Power! 1V (Nominal), 0.65V-1.2V! 300mW-430mW (Nominal), 40mW-960mW! 13!
14 Measurement Setup" 45nm ISC-V Vector Processor FSB 150MHz Virtex 6 FPGA Board 1Gbps Ethernet Laptop 512MB DAM only used in basic testing mode 14!
15 Shmoo Plot of DP GFLOPS/W" unning Double-Precision Matrix Multiplication on Vector Accelerator! Vdd (V) Not Operational Frequency (MHz) More Efficient" Less Efficient" Nominal" 7.3 GFLOPS/W! Max Frequency" 4.2 GFLOPS/W! Most Efficient" 16.7 GFLOPS/W! VDD at 0.8V" 12.5 GFLOPS/W! 15!
16 Energy Efficiency Frequency (GHz)" 64-bit GFLOPS" Power" (W)" Efficiency (GFLOPS/W)" Blue Gene/Q! 1.60! 204.8! 29.7! 6.9! IBM Cell! 3.20! 108.8! 22.5! 4.8! This Work" 0.55" 1.72" 0.138" 12.5" BG/Q and IBM Cell fabricated in same 45nm SOI! Conservatively assume BG/Q and Cell achieves peak GFLOPS, we achieve 78% of peak GFLOPS! Power numbers only for the core with private caches! Blue Gene/Q: Cores dissipate 54% of total power! IBM Cell: Assume that cores dissipate 50% of total power! Why better energy efficiency than others?! Simpler, but yet more energy-efficient microarchitecture! 16!
17 More on Comparison" But BG/Q is clocked 3X faster and Cell is 6X faster?! If the end goal is to provide better energy efficiency then use simpler microarchitectures and rely on parallelism for performance.! But BG/Q and Cell have big on-chip caches? What about I/O power?! We only count the power dissipated in the core and the private L1 caches.! But BG/Q and Cell have 100X more total GFLOPS!! Sorry, we only had budget for a small test chip.! 17!
18 Conclusions" Processor generators written in high-level languages can produce energy-efficient, high-performance hardware! Our dual-core ISC-V vector processor achieves 16.7 DP GFLOPS/W at 0.65 V and a maximum frequency of 1.3 GHz at 1.2 V! Open-source ISC-V ISA can serve as a competitive base ISA for integrating specialized heterogeneous accelerators! ocket chip generator and software tools open-sourced at 18!
19 Acknowledgment" DAPA award H C-0100! DAPA award H ! Center for Future Architecture esearch, a member of STAnet, a Semiconductor esearch Corporation program sponsored by MACO! NVIDIA graduate fellowship! ASPIE Lab industrial sponsors and affiliates Intel, Google, Nokia, NVIDIA, Oracle, and Samsung! All POEM team members at MIT, UC Berkeley, CU Boulder! 19!
RISC-V Rocket Chip SoC Generator in Chisel. Yunsup Lee UC Berkeley
RISC-V Rocket Chip SoC Generator in Chisel Yunsup Lee UC Berkeley yunsup@eecs.berkeley.edu What is the Rocket Chip SoC Generator?! Parameterized SoC generator written in Chisel! Generates Tiles - (Rocket)
More informationStrober: Fast and Accurate Sample-Based Energy Simulation Framework for Arbitrary RTL
Strober: Fast and Accurate Sample-Based Energy Simulation Framework for Arbitrary RTL Donggyu Kim, Adam Izraelevitz, Christopher Celio, Hokeun Kim, Brian Zimmer, Yunsup Lee, Jonathan Bachrach, Krste Asanović
More informationRaven: A CHISEL designed 28nm RISC-V Vector Processor with Integrated Switched- Capacitor DC-DC Converters & Adaptive Clocking
Raven: A CHISEL designed 28nm RISC-V Vector Processor with Integrated Switched- Capacitor DC-DC Converters & Adaptive Clocking Yunsup Lee, Brian Zimmer, Andrew Waterman, Alberto Puggelli, Jaehwa Kwak,
More informationA Retrospective on Par Lab Architecture Research
A Retrospective on Par Lab Architecture Research Par Lab SIGARCH Krste Asanovic, Rimas Avizienis, Jonathan Bachrach, Scott Beamer, Sarah Bird, Alex Bishara, Chris Celio, Henry Cook, Ben Keller, Yunsup
More informationHwacha V4: Decoupled Data Parallel Custom Extension. Colin Schmidt, Albert Ou, Krste Asanović UC Berkeley
Hwacha V4: Decoupled Data Parallel Custom Extension Colin Schmidt, Albert Ou, Krste Asanović UC Berkeley Introduction Data-parallel, custom ISA extension to RISC-V with a configurable ASIC-focused implementation
More informationEnergy-Efficient RISC-V Processors in 28nm FDSOI
Energy-Efficient RISC-V Processors in 28nm FDSOI Borivoje Nikolić Department of Electrical Engineering and Computer Sciences University of California, Berkeley bora@eecs.berkeley.edu 26 September 2017
More informationFABRICATION TECHNOLOGIES
FABRICATION TECHNOLOGIES DSP Processor Design Approaches Full custom Standard cell** higher performance lower energy (power) lower per-part cost Gate array* FPGA* Programmable DSP Programmable general
More informationBERKELEY PAR LAB. RAMP Gold Wrap. Krste Asanovic. RAMP Wrap Stanford, CA August 25, 2010
RAMP Gold Wrap Krste Asanovic RAMP Wrap Stanford, CA August 25, 2010 RAMP Gold Team Graduate Students Zhangxi Tan Andrew Waterman Rimas Avizienis Yunsup Lee Henry Cook Sarah Bird Faculty Krste Asanovic
More informationHow to build a Megacore microprocessor. by Andreas Olofsson (MULTIPROG WORKSHOP 2017)
How to build a Megacore microprocessor by Andreas Olofsson (MULTIPROG WORKSHOP 2017) 1 Disclaimers 2 This presentation summarizes work done by Adapteva from 2008-2016. Statements and opinions are my own
More informationRISC-V Updates Krste Asanović krste@berkeley.edu http://www.riscv.org 3 rd RISC-V Workshop Oracle, Redwood Shores, CA January 5, 2016 Agenda UC Berkeley updates RISC-V transition out of Berkeley Outstanding
More informationEvaluation of RISC-V RTL with FPGA-Accelerated Simulation
Evaluation of RISC-V RTL with FPGA-Accelerated Simulation Donggyu Kim, Christopher Celio, David Biancolin, Jonathan Bachrach, Krste Asanovic CARRV 2017 10/14/2017 Evaluation Methodologies For Computer
More informationCustom Silicon for all
Custom Silicon for all Because Moore s Law only ends once Who is SiFive? Best-in-class team with technology depth and breadth Founders & Execs Key Leaders & Team Yunsup Lee CTO Krste Asanovic Chief Architect
More informationDesign of Parallel and High-Performance Computing Fall 2017 Lecture: Cache Coherence & Memory Models
Design of Parallel and High-Performance Computing Fall 2017 Lecture: Cache Coherence & Memory Models Motivational video: https://www.youtube.com/watch?v=zjybff6pqeq Instructor: Torsten Hoefler & Markus
More informationAn Overview of Standard Cell Based Digital VLSI Design
An Overview of Standard Cell Based Digital VLSI Design With examples taken from the implementation of the 36-core AsAP1 chip and the 1000-core KiloCore chip Zhiyi Yu, Tinoosh Mohsenin, Aaron Stillmaker,
More informationRaven3: 28nm RISC-V Vector Processor with On-Chip DC/DC Convertors
Raven3: 28nm RISC-V Vector Processor with On-Chip DC/DC Convertors Brian Zimmer 1, Yunsup Lee 1, Alberto Puggelli 1, Jaehwa Kwak 1, Ruzica Jevtic 1, Ben Keller 1, Stevo Bailey 1, Milovan Blagojevic 1,2,
More informationHardware Modeling using Verilog Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur
Hardware Modeling using Verilog Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture 01 Introduction Welcome to the course on Hardware
More informationPACE: Power-Aware Computing Engines
PACE: Power-Aware Computing Engines Krste Asanovic Saman Amarasinghe Martin Rinard Computer Architecture Group MIT Laboratory for Computer Science http://www.cag.lcs.mit.edu/ PACE Approach Energy- Conscious
More informationA 1.5GHz Third Generation Itanium Processor
A 1.5GHz Third Generation Itanium Processor Jason Stinson, Stefan Rusu Intel Corporation, Santa Clara, CA 1 Outline Processor highlights Process technology details Itanium processor evolution Block diagram
More informationRISC-V. Palmer Dabbelt, SiFive COPYRIGHT 2018 SIFIVE. ALL RIGHTS RESERVED.
RISC-V Palmer Dabbelt, SiFive Why Instruction Set Architecture matters Why can t Intel sell mobile chips? 99%+ of mobile phones/tablets are based on ARM s v7/v8 ISA Why can t ARM partners sell servers?
More informationADVANCED ELECTRONIC SOLUTIONS AVIATION SERVICES COMMUNICATIONS AND CONNECTIVITY MISSION SYSTEMS
The most important thing we build is trust ADVANCED ELECTRONIC SOLUTIONS AVIATION SERVICES COMMUNICATIONS AND CONNECTIVITY MISSION SYSTEMS UT840 LEON Quad Core First Silicon Results Cobham Semiconductor
More informationA Closer Look at the Epiphany IV 28nm 64 core Coprocessor. Andreas Olofsson PEGPUM 2013
A Closer Look at the Epiphany IV 28nm 64 core Coprocessor Andreas Olofsson PEGPUM 2013 1 Adapteva Achieves 3 World Firsts 1. First processor company to reach 50 GFLOPS/W 3. First semiconductor company
More informationAn overview of standard cell based digital VLSI design
An overview of standard cell based digital VLSI design Implementation of the first generation AsAP processor Zhiyi Yu and Tinoosh Mohsenin VCL Laboratory UC Davis Outline Overview of standard cellbased
More informationChisel: Constructing Hardware In a Scala Embedded Language
Chisel: Constructing Hardware In a Scala Embedded Language Jonathan Bachrach, Huy Vo, Brian Richards, Yunsup Lee, Andrew Waterman, Rimas Avizienis, John Wawrzynek, Krste Asanovic EECS UC Berkeley May 22,
More informationCreating a Scalable Microprocessor:
Creating a Scalable Microprocessor: A 16-issue Multiple-Program-Counter Microprocessor With Point-to-Point Scalar Operand Network Michael Bedford Taylor J. Kim, J. Miller, D. Wentzlaff, F. Ghodrat, B.
More informationHardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University
Hardware Design Environments Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Outline Welcome to COE 405 Digital System Design Design Domains and Levels of Abstractions Synthesis
More informationASIC Design of Shared Vector Accelerators for Multicore Processors
26 th International Symposium on Computer Architecture and High Performance Computing 2014 ASIC Design of Shared Vector Accelerators for Multicore Processors Spiridon F. Beldianu & Sotirios G. Ziavras
More informationRISECREEK: From RISC-V Spec to 22FFL Silicon
RISECREEK: From RISC-V Spec to 22FFL Silicon Vinod Ganesan, Gopinathan Muthuswamy Group CSE Dept RISE Lab IIT Madras Amudhan, Bharath, Alagu - HCL Tech Konala Varma, Ang Boon Chong, Harish Villuri - Intel
More informationEITF35: Introduction to Structured VLSI Design
EITF35: Introduction to Structured VLSI Design Part 1.1.2: Introduction (Digital VLSI Systems) Liang Liu liang.liu@eit.lth.se 1 Outline Why Digital? History & Roadmap Device Technology & Platforms System
More informationExploration of Cache Coherent CPU- FPGA Heterogeneous System
Exploration of Cache Coherent CPU- FPGA Heterogeneous System Wei Zhang Department of Electronic and Computer Engineering Hong Kong University of Science and Technology 1 Outline ointroduction to FPGA-based
More informationEach Milliwatt Matters
Each Milliwatt Matters Ultra High Efficiency Application Processors Govind Wathan Product Manager, CPG ARM Tech Symposia China 2015 November 2015 Ultra High Efficiency Processors Used in Diverse Markets
More informationLab. Course Goals. Topics. What is VLSI design? What is an integrated circuit? VLSI Design Cycle. VLSI Design Automation
Course Goals Lab Understand key components in VLSI designs Become familiar with design tools (Cadence) Understand design flows Understand behavioral, structural, and physical specifications Be able to
More informationAll About the Cell Processor
All About the Cell H. Peter Hofstee, Ph. D. IBM Systems and Technology Group SCEI/Sony Toshiba IBM Design Center Austin, Texas Acknowledgements Cell is the result of a deep partnership between SCEI/Sony,
More informationMore Course Information
More Course Information Labs and lectures are both important Labs: cover more on hands-on design/tool/flow issues Lectures: important in terms of basic concepts and fundamentals Do well in labs Do well
More informationImplementing Tile-based Chip Multiprocessors with GALS Clocking Styles
Implementing Tile-based Chip Multiprocessors with GALS Clocking Styles Zhiyi Yu, Bevan Baas VLSI Computation Lab, ECE Department University of California, Davis, USA Outline Introduction Timing issues
More informationSiFive Freedom SoCs: Industry s First Open-Source RISC-V Chips
SiFive Freedom SoCs: Industry s First Open-Source RISC-V Chips Yunsup Lee Co-Founder and CTO High Upfront Cost Has Killed Innovation Our industry needs a fundamental change Total SoC Development Cost Design
More informationCS 152, Spring 2011 Section 8
CS 152, Spring 2011 Section 8 Christopher Celio University of California, Berkeley Agenda Grades Upcoming Quiz 3 What it covers OOO processors VLIW Branch Prediction Intel Core 2 Duo (Penryn) Vs. NVidia
More informationTrends in the Infrastructure of Computing
Trends in the Infrastructure of Computing CSCE 9: Computing in the Modern World Dr. Jason D. Bakos My Questions How do computer processors work? Why do computer processors get faster over time? How much
More informationSYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS
SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS Embedded System System Set of components needed to perform a function Hardware + software +. Embedded Main function not computing Usually not autonomous
More informationTechnology Trends Presentation For Power Symposium
Technology Trends Presentation For Power Symposium 2006 8-23-06 Darryl Solie, Distinguished Engineer, Chief System Architect IBM Systems & Technology Group From Ingenuity to Impact Copyright IBM Corporation
More informationOutline Marquette University
COEN-4710 Computer Hardware Lecture 1 Computer Abstractions and Technology (Ch.1) Cristinel Ababei Department of Electrical and Computer Engineering Credits: Slides adapted primarily from presentations
More informationGemini: Sanjiv Kapil. A Power-efficient Chip Multi-Threaded (CMT) UltraSPARC Processor. Gemini Architect Sun Microsystems, Inc.
Gemini: A Power-efficient Chip Multi-Threaded (CMT) UltraSPARC Processor Sanjiv Kapil Gemini Architect Sun Microsystems, Inc. Design Goals Designed for compute-dense, transaction oriented systems (webservers,
More informationProtoFlex: FPGA Accelerated Full System MP Simulation
ProtoFlex: FPGA Accelerated Full System MP Simulation Eric S. Chung, Eriko Nurvitadhi, James C. Hoe, Babak Falsafi, Ken Mai Computer Architecture Lab at Our work in this area has been supported in part
More informationCS 152 Laboratory Exercise 5 (Version B)
CS 152 Laboratory Exercise 5 (Version B) Professor: Krste Asanovic TA: Yunsup Lee Department of Electrical Engineering & Computer Science University of California, Berkeley April 9, 2012 1 Introduction
More informationARM Processors for Embedded Applications
ARM Processors for Embedded Applications Roadmap for ARM Processors ARM Architecture Basics ARM Families AMBA Architecture 1 Current ARM Core Families ARM7: Hard cores and Soft cores Cache with MPU or
More informationFPGA Implementation and Validation of the Asynchronous Array of simple Processors
FPGA Implementation and Validation of the Asynchronous Array of simple Processors Jeremy W. Webb VLSI Computation Laboratory Department of ECE University of California, Davis One Shields Avenue Davis,
More informationAdding SRAMs to Your Accelerator
Adding SRAMs to Your Accelerator CS250 Laboratory 3 (Version 100913) Written by Colin Schmidt Adpated from Ben Keller Overview In this lab, you will use the CAD tools and jackhammer to explore tradeoffs
More informationCS 152 Laboratory Exercise 5 (Version C)
CS 152 Laboratory Exercise 5 (Version C) Professor: Krste Asanovic TA: Howard Mao Department of Electrical Engineering & Computer Science University of California, Berkeley April 9, 2018 1 Introduction
More informationProtoFlex Tutorial: Full-System MP Simulations Using FPGAs
rotoflex Tutorial: Full-System M Simulations Using FGAs Eric S. Chung, Michael apamichael, Eriko Nurvitadhi, James C. Hoe, Babak Falsafi, Ken Mai ROTOFLEX Computer Architecture Lab at Our work in this
More informationSoC FPGAs. Your User-Customizable System on Chip Altera Corporation Public
SoC FPGAs Your User-Customizable System on Chip Embedded Developers Needs Low High Increase system performance Reduce system power Reduce board size Reduce system cost 2 Providing the Best of Both Worlds
More informationCS 152 Laboratory Exercise 5
CS 152 Laboratory Exercise 5 Professor: Krste Asanovic TA: Christopher Celio Department of Electrical Engineering & Computer Science University of California, Berkeley April 11, 2012 1 Introduction and
More informationOverview. CSE372 Digital Systems Organization and Design Lab. Hardware CAD. Two Types of Chips
Overview CSE372 Digital Systems Organization and Design Lab Prof. Milo Martin Unit 5: Hardware Synthesis CAD (Computer Aided Design) Use computers to design computers Virtuous cycle Architectural-level,
More informationA Hardware Accelerator for Computing an Exact Dot Product. Jack Koenig, David Biancolin, Jonathan Bachrach, Krste Asanović
A Hardware Accelerator for Computing an Exact Dot Product Jack Koenig, David Biancolin, Jonathan Bachrach, Krste Asanović 1 Challenges with Floating Point Addition and multiplication are not associative
More informationSorry for missing last week s lecture! To talk about some silliness in MPI-3. A more complete view
Sorry for missing last week s lecture! Design of Parallel and High-Performance Computing Fall 2017 Lecture: Cache Coherence & Memory Models Motivational video: https://www.youtube.com/watch?v=zjybff6pqeq
More informationCS250 VLSI Systems Design Lecture 9: Memory
CS250 VLSI Systems esign Lecture 9: Memory John Wawrzynek, Jonathan Bachrach, with Krste Asanovic, John Lazzaro and Rimas Avizienis (TA) UC Berkeley Fall 2012 CMOS Bistable Flip State 1 0 0 1 Cross-coupled
More informationZynq-7000 All Programmable SoC Product Overview
Zynq-7000 All Programmable SoC Product Overview The SW, HW and IO Programmable Platform August 2012 Copyright 2012 2009 Xilinx Introducing the Zynq -7000 All Programmable SoC Breakthrough Processing Platform
More informationKiloCore: A 32 nm 1000-Processor Array
KiloCore: A 32 nm 1000-Processor Array Brent Bohnenstiehl, Aaron Stillmaker, Jon Pimentel, Timothy Andreas, Bin Liu, Anh Tran, Emmanuel Adeagbo, Bevan Baas University of California, Davis VLSI Computation
More informationIntroduction to ICs and Transistor Fundamentals
Introduction to ICs and Transistor Fundamentals A Brief History 1958: First integrated circuit Flip-flop using two transistors Built by Jack Kilby at Texas Instruments 2003 Intel Pentium 4 mprocessor (55
More informationFundamentals of Computer Design
CS359: Computer Architecture Fundamentals of Computer Design Yanyan Shen Department of Computer Science and Engineering 1 Defining Computer Architecture Agenda Introduction Classes of Computers 1.3 Defining
More informationPower, Performance and Area Implementation Analysis.
ARM Cortex -R Series: Power, Performance and Area Implementation Analysis. Authors: Neil Werdmuller and Jatin Mistry, September 2014. Summary: Power, Performance and Area (PPA) implementation analysis
More informationAgile Hardware Design: Building Chips with Small Teams
2017 SiFive. All Rights Reserved. Agile Hardware Design: Building Chips with Small Teams Yunsup Lee ASPIRE Graduate 2016 Co-Founder and CTO 2 2017 SiFive. All Rights Reserved. World s First Single-Chip
More informationNVIDIA GTX200: TeraFLOPS Visual Computing. August 26, 2008 John Tynefield
NVIDIA GTX200: TeraFLOPS Visual Computing August 26, 2008 John Tynefield 2 Outline Execution Model Architecture Demo 3 Execution Model 4 Software Architecture Applications DX10 OpenGL OpenCL CUDA C Host
More informationItanium 2 Processor Microarchitecture Overview
Itanium 2 Processor Microarchitecture Overview Don Soltis, Mark Gibson Cameron McNairy, August 2002 Block Diagram F 16KB L1 I-cache Instr 2 Instr 1 Instr 0 M/A M/A M/A M/A I/A Template I/A B B 2 FMACs
More informationECE 471 Embedded Systems Lecture 2
ECE 471 Embedded Systems Lecture 2 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 7 September 2018 Announcements Reminder: The class notes are posted to the website. HW#1 will
More information45-year CPU Evolution: 1 Law -2 Equations
4004 8086 PowerPC 601 Pentium 4 Prescott 1971 1978 1992 45-year CPU Evolution: 1 Law -2 Equations Daniel Etiemble LRI Université Paris Sud 2004 Xeon X7560 Power9 Nvidia Pascal 2010 2017 2016 Are there
More informationFactors which influence in many core processors
Factors which influence in many core processors ABSTRACT: The applications for multicore and manycore microprocessors as RISC-V are currently useful for the advantages of their friendly nature, compared
More informationConfigurable and Extensible Processors Change System Design. Ricardo E. Gonzalez Tensilica, Inc.
Configurable and Extensible Processors Change System Design Ricardo E. Gonzalez Tensilica, Inc. Presentation Overview Yet Another Processor? No, a new way of building systems Puts system designers in the
More informationVLSI Digital Signal Processing
VLSI Digital Signal Processing EEC 28 Lecture Bevan M. Baas Tuesday, January 9, 28 Today Administrative items Syllabus and course overview My background Digital signal processing overview Read Programmable
More informationECE 5745 Complex Digital ASIC Design Course Overview
ECE 5745 Complex Digital ASIC Design Course Overview Christopher Batten School of Electrical and Computer Engineering Cornell University http://www.csl.cornell.edu/courses/ece5745 Application Algorithm
More informationCPU Project in Western Digital: From Embedded Cores for Flash Controllers to Vision of Datacenter Processors with Open Interfaces
CPU Project in Western Digital: From Embedded Cores for Flash Controllers to Vision of Datacenter Processors with Open Interfaces Zvonimir Z. Bandic, Sr. Director Robert Golla, Sr. Fellow Dejan Vucinic,
More informationA 1-GHz Configurable Processor Core MeP-h1
A 1-GHz Configurable Processor Core MeP-h1 Takashi Miyamori, Takanori Tamai, and Masato Uchiyama SoC Research & Development Center, TOSHIBA Corporation Outline Background Pipeline Structure Bus Interface
More informationSpiral 2-8. Cell Layout
2-8.1 Spiral 2-8 Cell Layout 2-8.2 Learning Outcomes I understand how a digital circuit is composed of layers of materials forming transistors and wires I understand how each layer is expressed as geometric
More informationVLSI Design Automation
VLSI Design Automation IC Products Processors CPU, DSP, Controllers Memory chips RAM, ROM, EEPROM Analog Mobile communication, audio/video processing Programmable PLA, FPGA Embedded systems Used in cars,
More informationECE 571 Advanced Microprocessor-Based Design Lecture 24
ECE 571 Advanced Microprocessor-Based Design Lecture 24 Vince Weaver http://www.eece.maine.edu/ vweaver vincent.weaver@maine.edu 25 April 2013 Project/HW Reminder Project Presentations. 15-20 minutes.
More informationJack Kang ( 剛至堅 ) VP Product June 2018
Jack Kang ( 剛至堅 ) VP Product June 2018 SiFive RISC-V Core IP Product Offering SiFive RISC-V Core IP Industry leading 32-bit and 64-bit Embedded Cores High performance 64-bit Application Cores High Performance
More informationMulti-Gigahertz Parallel FFTs for FPGA and ASIC Implementation
Multi-Gigahertz Parallel FFTs for FPGA and ASIC Implementation Doug Johnson, Applications Consultant Chris Eddington, Technical Marketing Synopsys 2013 1 Synopsys, Inc. 700 E. Middlefield Road Mountain
More informationDo we need more chips (ASICs)?
6.375 Complex Digital System Spring 2007 Lecturers: Arvind & Krste Asanović TAs: Myron King & Ajay Joshi Assistant: Sally Lee L01-1 Do we need more chips (ASICs)? ASIC=Application-Specific Integrated Circuit
More informationHistory. PowerPC based micro-architectures. PowerPC ISA. Introduction
PowerPC based micro-architectures Godfrey van der Linden Presentation for COMP9244 Software view of Processor Architectures 2006-05-25 History 1985 IBM started on AMERICA 1986 Development of RS/6000 1990
More informationNew 130nm Itanium 2 Processors for 2003
New 130nm Itanium s for 003 Harry Muljono, Stefan usu, Brian Cherkauer, Jason Stinson Intel Corporation, Santa Clara, CA 1 Outline highlights Itanium processor evolution Block diagram Power dissipation
More informationMultilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology
1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823
More informationMicroprocessor Trends and Implications for the Future
Microprocessor Trends and Implications for the Future John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 522 Lecture 4 1 September 2016 Context Last two classes: from
More informationNatalie Enright Jerger, Jason Anderson, University of Toronto November 5, 2010
Next Generation FPGA Research Natalie Enright Jerger, Jason Anderson, and Ali Sheikholeslami l i University of Toronto November 5, 2010 Outline Part (I): Next Generation FPGA Architectures Asynchronous
More informationGreenDroid: An Architecture for the Dark Silicon Age
GreenDroid: An Architecture for the Dark Silicon Age Nathan Goulding-Hotta, Jack Sampson, Qiaoshi Zheng, Vikram Bhatt, Joe Auricchio, Steven Swanson, Michael Bedford Taylor University of California, San
More informationThe Design of the KiloCore Chip
The Design of the KiloCore Chip Aaron Stillmaker*, Brent Bohnenstiehl, Bevan Baas DAC 2017: Design Challenges of New Processor Architectures University of California, Davis VLSI Computation Laboratory
More informationProcessor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP
Processor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP Presenter: Course: EEC 289Q: Reconfigurable Computing Course Instructor: Professor Soheil Ghiasi Outline Overview of M.I.T. Raw processor
More informationA 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications
A 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications Ju-Ho Sohn, Jeong-Ho Woo, Min-Wuk Lee, Hye-Jung Kim, Ramchan Woo, Hoi-Jun Yoo Semiconductor System
More informationVLSI Design Automation. Calcolatori Elettronici Ing. Informatica
VLSI Design Automation 1 Outline Technology trends VLSI Design flow (an overview) 2 IC Products Processors CPU, DSP, Controllers Memory chips RAM, ROM, EEPROM Analog Mobile communication, audio/video processing
More informationIntroduction: Modern computer architecture. The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes
Introduction: Modern computer architecture The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes Motivation: Multi-Cores where and why Introduction: Moore s law Intel
More informationVector Architectures Vs. Superscalar and VLIW for Embedded Media Benchmarks
Vector Architectures Vs. Superscalar and VLIW for Embedded Media Benchmarks Christos Kozyrakis Stanford University David Patterson U.C. Berkeley http://csl.stanford.edu/~christos Motivation Ideal processor
More informationLecture 41: Introduction to Reconfigurable Computing
inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 41: Introduction to Reconfigurable Computing Michael Le, Sp07 Head TA April 30, 2007 Slides Courtesy of Hayden So, Sp06 CS61c Head TA Following
More informationThe Nios II Family of Configurable Soft-core Processors
The Nios II Family of Configurable Soft-core Processors James Ball August 16, 2005 2005 Altera Corporation Agenda Nios II Introduction Configuring your CPU FPGA vs. ASIC CPU Design Instruction Set Architecture
More informationPicoServer : Using 3D Stacking Technology To Enable A Compact Energy Efficient Chip Multiprocessor
PicoServer : Using 3D Stacking Technology To Enable A Compact Energy Efficient Chip Multiprocessor Taeho Kgil, Shaun D Souza, Ali Saidi, Nathan Binkert, Ronald Dreslinski, Steve Reinhardt, Krisztian Flautner,
More informationWelcome to CS250 VLSI Systems Design
Image Courtesy: Intel Welcome to CS250 VLSI Systems Design 9/2/10 Yunsup Lee YUNSUP LEE Email: yunsup@cs.berkeley.edu Please add [CS250] in the subject Will try to get back in a day CS250 Newsgroup Post
More informationIBM's POWER5 Micro Processor Design and Methodology
IBM's POWER5 Micro Processor Design and Methodology Ron Kalla IBM Systems Group Outline POWER5 Overview Design Process Power POWER Server Roadmap 2001 POWER4 2002-3 POWER4+ 2004* POWER5 2005* POWER5+ 2006*
More informationKeyStone II. CorePac Overview
KeyStone II ARM Cortex A15 CorePac Overview ARM A15 CorePac in KeyStone II Standard ARM Cortex A15 MPCore processor Cortex A15 MPCore version r2p2 Quad core, dual core, and single core variants 4096kB
More informationTOPIC : Verilog Synthesis examples. Module 4.3 : Verilog synthesis
TOPIC : Verilog Synthesis examples Module 4.3 : Verilog synthesis Example : 4-bit magnitude comptarator Discuss synthesis of a 4-bit magnitude comparator to understand each step in the synthesis flow.
More informationL évolution des architectures et des technologies d intégration des circuits intégrés dans les Data centers
I N S T I T U T D E R E C H E R C H E T E C H N O L O G I Q U E L évolution des architectures et des technologies d intégration des circuits intégrés dans les Data centers 10/04/2017 Les Rendez-vous de
More informationComputer Architecture
Informatics 3 Computer Architecture Dr. Boris Grot and Dr. Vijay Nagarajan Institute for Computing Systems Architecture, School of Informatics University of Edinburgh General Information Instructors: Boris
More informationEITF35: Introduction to Structured VLSI Design
EITF35: Introduction to Structured VLSI Design Introduction to FPGA design Rakesh Gangarajaiah Rakesh.gangarajaiah@eit.lth.se Slides from Chenxin Zhang and Steffan Malkowsky WWW.FPGA What is FPGA? Field
More information6.375 Complex Digital System Spring 2006
6.375 Complex Digital System Spring 2006 Lecturer: TAs: Assistant: Arvind Chris Batten & Mike Pellauer Sally Lee L01-1 Do we need more chips (ASICs)? ASIC=Application Specific IC Some exciting possibilities
More informationUnderstanding Peak Floating-Point Performance Claims
white paper FPGA Understanding Peak ing-point Performance Claims Learn how to calculate and compare the peak floating-point capabilities of digital signal processors (DSPs), graphics processing units (GPUs),
More information