A Scalable Processor Architecture for the Next Generation of Low Power Supercomputer. PRACE Workshop, October 2010 Andreas Olofsson
|
|
- Daniel Green
- 6 years ago
- Views:
Transcription
1 A Scalable Processor Architecture for the Next Generation of Low Power Supercomputer PRACE Workshop, October 2010 Andreas Olofsson
2 Company Introduction Company founded in 2008 with mission to produce programmable processors with 10x the energy efficient of existing products while using 1/10th of the development budget Veteran processor design team with 10 successful low power products, 100M units shipped, and over $200M in product related revenue Adapteva has completed development of a ground breaking programmable soft accelerator with energy efficiency of up to 50 GFLOPS/Watt Now sampling first product to lead customers and strategic partners 2
3 Why efficient supercomputing is a critical need! 10,000, ,000,000 Top500 1,000,000 10,000, ,000 1,000,000 (E) 10,000 FGPA/ MULTI x in 3 years 6x in 3 years 1 (T) Data from James Anderson at Lincoln Labs and Top , (P) MICRO 10 GFLOP WATT , TFLOP
4 DARPA UHPC Program Big Big Winners: Winners: INTEL, INTEL, NVIDIA, NVIDIA, MIT, MIT, SANDIA SANDIA 4
5 Bad News for Semi. Companies 5
6 More Bad News.. $2.36 in 10ku $4 VS. 6 $10M development cost per chip ARM RISC Core up to 120MHz USB 2.0, Ethernet, CAN, GPIO, etc 512KB flash, 65KB SRAM 12 bit A/D, 10 bit D/A Coffee beans Water..
7 Area vs Max Performance (65nm) AMD OPTERON 36 GFLOPS 283 mm^2 Ease of Use Efficient CPU IBM CELL 230 GFLOP/s 220 mm^2 What if we could borrow only the best features? VIRTEX5 LX50 48 GMACS 146mm^2 NVIDIA GTX GFLOPS 576mm^2 Accelerator Scale
8 A History of Embedded Processing MAC-DSP ERA VLIW/SIMD ERA FPGA ERA MANYCORE ERA RIOD STRENGTHS ENERGY EFFICIENCY ENERGY EFFICIENCY & EASE OF USE ENERGY EFFICIENCY & FLEXIBILITY & SCALABILITY ENERGY EFFICIENCY & SCALABILITY & EASE OF USE ACHILLES HEEL(s) PROGRAMMING MODEL NOT SCALABLE AREA INEFFICIENT?? NORMALIZED RFORMANCE (GFLOP/W) The The future future of of embedded embedded processing processing is is programmable programmable massively massively parallel parallel multicore multicore chips! chips! 8
9 Epiphany TM Signal Processing Fabric Easy Easy to to use use programming programming model model Scalable to 1000's of cores Scalable to 1000's of cores IEEE Floating Point Compliant IEEE Floating Point Compliant 9 CPU MEMORY ROUTER DMA GFLOP/W GFLOP/W at at 65nm 65nm 0.7mm per processing tile 0.7mm per processing tile 100 GFLOP/W at 28nm 100 GFLOP/W at 28nm
10 Architecture Summary Memory Memory Model: Model: HW HW driven driven caches caches kill kill energy energy efficiency! efficiency! Cache Cache misses misses very very expensive! expensive! So..NO So..NO HW HW caches. caches. Distributed Distributed Flat Flat Shared Shared Memory Memory All All memory memory accessible accessible by by all all cores cores 32KB 32KB multi multi bank bank SRAM SRAM per per core core CPU: CPU: Network Network On Chip: On Chip: 10 Wires Wires are are cheap! cheap! bit bit atomic atomic packets packets One One packet packet per per clock clock cycle cycle 11 ns ns per per hop hop Stability Stability is is expensive! expensive! Optimized Optimized for for math math Good Good enough enough for for control control code code 11 GHz GHz operation operation 22 FLOPS/cycle FLOPS/cycle entry entry register register file file
11 A Very Simple Flat Memory Model Completely Completely unprotected unprotected Examples Examples of of core core to to core core communication: communication: STR STR R0, R0, [R1,#42] [R1,#42] LDR LDR R0, R0, [R1,#42] [R1,#42] 0x000FFFFF 0x000F0000 MEMORY MAPD REGISTERS RESERVED 0x INTERNAL MEMORY BANK 3 0x INTERNAL MEMORY BANK 2 0x INTERNAL MEMORY BANK 1 0x INTERNAL MEMORY BANK 0 0x RESERVED CORE_0_1_3_3 CORE_0_0_0_3 0x05c00000 CORE_0_0_0_2 CORE_0_1_3_2 0x05b00000 CORE_0_0_0_1 CORE_0_1_3_1 0x05a00000 CORE_0_0_0_0 CORE_0_1_3_0 0x RESERVED 0x CORE_0_0_0_3 CORE_0_1_2_3 0x CORE_0_0_0_2 CORE_0_1_2_2 0x CORE_0_0_0_1 CORE_0_1_2_1 0x CORE_0_0_0_0 CORE_0_1_2_0 0x RESERVED 0x CORE_0_0_0_3 CORE_0_1_1_3 0x04c00000 CORE_0_0_0_2 CORE_0_1_1_2 0x04b00000 CORE_0_0_0_1 CORE_0_1_1_1 0x04a00000 CORE_0_0_0_0 CORE_0_1_1_0 RESERVED 0x x CORE_0_1_0_3 0x CORE_0_1_0_2 0x CORE_0_1_0_1 CORE_0_1_0_0 RESERVED INTERNAL MEMORY 0x x x x x
12 Network On Chip Routing Algorithm Each Eachprocessor processornode nodehas has x,y x,ycoordinate coordinate Destination Destinationaddress address compared comparedto tomesh meshnode node address addressto todetermine determine direction direction Reads Readsare are write writerequests requests X first, X first,then thenyy IF (1,1) IF IF IF (2,1) IF IF IF (3,1) 12 IF IF
13 Architecture Comparison Intel80[2] Tile64[3] Intel48[5] Renesas[4] (This work) Process 65nm 90nm 45nm 45nm 65nm Frequency 3.13GHz 850MHz 2GHz 648MHz 500MHz Cores Area (mm^2) 275 n/a Transistors (Million) n/a 40 Performance (GLFOPS) 1000 No Native Floating Point n/a Power (W) Area/Core(mm) 3.43 n/a Watt/ (GHZ*Cores)
14 A Programmer's Architecture 14 Task Channel Programming Model ANSI C Programmable using efficient GNU C compiler Runs floating point C programs out of the box! No special program constructs needed! Native single cycle floating point instruction support! Shared memory architecture Orthogonal register and instruction set
15 Robust Programming Environment MULTICORE PROGRAMMING MODEL/LIBS (IN DEVELOPMENT) ECLIPSE IDE (AVAILABLE) 15 GNU COMPILER (AVAILABLE) WYSIWYG SIMULATOR (AVAILABLE) GNU DEBUGGER (AVAILABLE) GNU BINUTILS (AVAILABLE)
16 FFT Mapping Example 16 s2 Approach: P0 P point FFT is spread over 16 processors s1,s2,s3,s4 refer to the four FFT stages for combining data with 64 point complex data movements Lower # procs transfer W0 to higher # procs. Lower # proc calculates Wj0+Wj1 x Cj, higher # proc calculates Wj0 Wj1 x Cj Results: s4 P4 P4 P8 P8 s3 <3us execution time High efficiency Work in progress, still room for improvement s1 s3 s4 s2 s1 s2 s1 s4 s3 P12 P12 s2 s1 P1 P1 P2 P2 s3 s4 P5 P5 P6 P6 P9 P9 P10 P10 s4 s3 P13 P13 P14 P14 s1 s2 s3 s4 s1 s2 s1 s2 s4 s3 s1 s2 P3 P3 s3 P7 P7 P11 P11 s4 P15 P15
17 Compromised Computing What users want: An easy to use C programmable device with infinite energy, zero power consumption, and all interconnect standards Soft Floating Point Accelerator FPGA 17 The compromise: Microprocessor takes care of the really complex stuff FPGA driven connectivity A soft accelerator takes care of the 10% code
18 A Radar Signal Chain Example A/D Mult channel High Data Rate FRONT END (ASIC/ FPGA) 10,000+ Giga ops Filterering Fixed Point Low Complexity NEW! MIDDLE END BACK END (up) ,000 Gigaflop Adaptive Beamforming Pulse Compression Floating Point Medium Complexity Gigaflop Clutter Target Detection Large Code High Complexity Implementing complex floating point algorithms in FPGA too complex Microprocessor based middle end processing far too power hungry The answer is a new device optimized for this type middle end floating point signal processing! 18
19 Arcitecture Application Sweet spot What What our our customers customers want! want! Floating Floating point point Massive Massive performance performance Lower Lower power power (<3 (<3 W) W) per per chip chip C programmability C programmability and and ease ease of of use use Scalability Scalability GREAT FIT APPLICATIONS BASED ON CUSTOMER INTERVIEWS Portable Ultrasound Antenna Pre-distorition Airborne Radar Data Center/HPC Automotive Radar Soft real time compression Antenna Beam-forming 19
20 E16xx Product Intro (samples available! ) Features: 16 Independent Processor Cores 4 full duplex FPGA LVDS links 4Mb On chip Shared Memory 65nm IBM Process Platform Performance LVDS LVDS 350mW core power at 500MHz! 32 GFLOPS/sec max performance 8 GB/sec Off chip Bandwidth 512 GB/sec On chip Bandwidth Availability: 20 LVDS LVDS LVDS LVDS LVDS LVDS Silicon in lab confirms power advantage Currently sampling to lead customers
21 Short Term Product Roadmap 1024 CPUs 1.5GHz 4 TFLOPS <40 Watt 28nm Roadmap SCALABLE >100 GFLOP/WATT Performance 64 CPUs 1.5GHz 256 GFLOPS <2.5 Watt 28nm 16 CPUs 1GHZ 32GFLOPS <2 Watt 65nm Q Q
22 Development ECO System State of the art Tool Chain: (available now!) Optimized GNU C compiler Multi core debugger Standard Eclipse IDE Standard C and Math library Unique guaranteed 100% accurate WYSIWYG chip simulator Evaluation Boards: (available now!) Plugin boards for standard Altera FPGA development kits FPGA emulation board available for SW/HW co development Software Libraries: (available November) Multi core FFT implementation 22 Multi core programming libraries
23 Adapteva HPC Capabilities GFLOP FMC (VITA 57) accelerator card available 3/15/11 1 TFLOP 50 Watt accelerator card in the works for Q Adapteva, Bittware, and Petapath brings together key expertise in chip, board, and software design enabling fast execution of prototype HPC systems. Will continue to grow library support base on customer feedback Can spin double precision floating point chip with very limited funding
24 Adapteva Board Offering Based on Altera Stratix IV Family Attaches to ATCA carriers or other cards equipped with AMC bays and used in MicroTCA system Advanced FPGA framework simplifies system design. <50 Watts with daughter card Base card available today Daughter card available 3/15 FPGA handles back plane connectivity 24 Board in Production Today FMC Connector enables up to 128 GFLOPS of soft performance with Adapteva chips.
25 A straw man 1 TFLOP, 50 Watt AMC card Connectors 65mm 180mm 25 1 TFLOP of performance 16 MB of distributed memory X GB of on board memory Shows off performance density achievable Probably not practical for any real applications DRAM FPGA DRAM Connectivity Gateway
26 Conclusions 26 The next generation supercomputers will not be practical without a radically different architecture The only way to improve energy efficiency significantly is through compromise Adapteva has developed a software programmable floating point accelerator demonstrating 25 GFLOPS/W in silicon and 100 GFLOPs/W on the road map for 2011 The programming model needs to change..but not drastically
27 Contact Information General General Contact Contact Information Information infoadapteva.com infoadapteva.com Address: Address: Adapteva Adapteva Inc. Inc Massachusetts Massachusetts Ave, Ave, Suite Suite Lexington Lexington MA MA Mobile: Mobile: (781) (781) (Andreas) (Andreas) Office: Office: (781) (781)
A Closer Look at the Epiphany IV 28nm 64 core Coprocessor. Andreas Olofsson PEGPUM 2013
A Closer Look at the Epiphany IV 28nm 64 core Coprocessor Andreas Olofsson PEGPUM 2013 1 Adapteva Achieves 3 World Firsts 1. First processor company to reach 50 GFLOPS/W 3. First semiconductor company
More informationThere s STILL plenty of room at the bottom! Andreas Olofsson
There s STILL plenty of room at the bottom! Andreas Olofsson 1 Richard Feynman s Lecture (1959) There's Plenty of Room at the Bottom An Invitation to Enter a New Field of Physics Why cannot we write the
More informationHow to build a Megacore microprocessor. by Andreas Olofsson (MULTIPROG WORKSHOP 2017)
How to build a Megacore microprocessor by Andreas Olofsson (MULTIPROG WORKSHOP 2017) 1 Disclaimers 2 This presentation summarizes work done by Adapteva from 2008-2016. Statements and opinions are my own
More informationEuropean energy efficient supercomputer project
http://www.montblanc-project.eu European energy efficient supercomputer project Simon McIntosh-Smith University of Bristol (Based on slides from Alex Ramirez, BSC) Disclaimer: Speaking for myself... All
More informationThe Mont-Blanc approach towards Exascale
http://www.montblanc-project.eu The Mont-Blanc approach towards Exascale Alex Ramirez Barcelona Supercomputing Center Disclaimer: Not only I speak for myself... All references to unavailable products are
More informationAn Alternative to GPU Acceleration For Mobile Platforms
Inventing the Future of Computing An Alternative to GPU Acceleration For Mobile Platforms Andreas Olofsson andreas@adapteva.com 50 th DAC June 5th, Austin, TX Adapteva Achieves 3 World Firsts 1. First
More informationParallella: A $99 Open Hardware Parallel Computing Platform
Inventing the Future of Computing Parallella: A $99 Open Hardware Parallel Computing Platform Andreas Olofsson andreas@adapteva.com IPDPS May 22th, Cambridge, MA Adapteva Achieves 3 World Firsts 1. First
More informationCSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller
Entertainment Graphics: Virtual Realism for the Masses CSE 591: GPU Programming Introduction Computer games need to have: realistic appearance of characters and objects believable and creative shading,
More informationSupercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC?
Supercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC? Nikola Rajovic, Paul M. Carpenter, Isaac Gelado, Nikola Puzovic, Alex Ramirez, Mateo Valero SC 13, November 19 th 2013, Denver, CO, USA
More informationBig Data mais basse consommation L apport des processeurs manycore
Big Data mais basse consommation L apport des processeurs manycore Laurent Julliard - Kalray Le potentiel et les défis du Big Data Séminaire ASPROM 2 et 3 Juillet 2013 Presentation Outline Kalray : the
More informationThe Many Dimensions of SDR Hardware
The Many Dimensions of SDR Hardware Plotting a Course for the Hardware Behind the Software Sept 2017 John Orlando Epiq Solutions LO RFIC Epiq Solutions in a Nutshell Schaumburg, IL EST 2009 N. Virginia
More informationThe Nios II Family of Configurable Soft-core Processors
The Nios II Family of Configurable Soft-core Processors James Ball August 16, 2005 2005 Altera Corporation Agenda Nios II Introduction Configuring your CPU FPGA vs. ASIC CPU Design Instruction Set Architecture
More informationMercury Computer Systems & The Cell Broadband Engine
Mercury Computer Systems & The Cell Broadband Engine Georgia Tech Cell Workshop 18-19 June 2007 About Mercury Leading provider of innovative computing solutions for challenging applications R&D centers
More informationWelcome. Altera Technology Roadshow 2013
Welcome Altera Technology Roadshow 2013 Altera at a Glance Founded in Silicon Valley, California in 1983 Industry s first reprogrammable logic semiconductors $1.78 billion in 2012 sales Over 2,900 employees
More informationAltera SDK for OpenCL
Altera SDK for OpenCL A novel SDK that opens up the world of FPGAs to today s developers Altera Technology Roadshow 2013 Today s News Altera today announces its SDK for OpenCL Altera Joins Khronos Group
More informationThis Unit: Putting It All Together. CIS 371 Computer Organization and Design. What is Computer Architecture? Sources
This Unit: Putting It All Together CIS 371 Computer Organization and Design Unit 15: Putting It All Together: Anatomy of the XBox 360 Game Console Application OS Compiler Firmware CPU I/O Memory Digital
More informationIMPLICIT+EXPLICIT Architecture
IMPLICIT+EXPLICIT Architecture Fortran Carte Programming Environment C Implicitly Controlled Device Dense logic device Typically fixed logic µp, DSP, ASIC, etc. Implicit Device Explicit Device Explicitly
More informationProcessor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP
Processor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP Presenter: Course: EEC 289Q: Reconfigurable Computing Course Instructor: Professor Soheil Ghiasi Outline Overview of M.I.T. Raw processor
More informationThis Unit: Putting It All Together. CIS 371 Computer Organization and Design. Sources. What is Computer Architecture?
This Unit: Putting It All Together CIS 371 Computer Organization and Design Unit 15: Putting It All Together: Anatomy of the XBox 360 Game Console Application OS Compiler Firmware CPU I/O Memory Digital
More informationCatapult: A Reconfigurable Fabric for Petaflop Computing in the Cloud
Catapult: A Reconfigurable Fabric for Petaflop Computing in the Cloud Doug Burger Director, Hardware, Devices, & Experiences MSR NExT November 15, 2015 The Cloud is a Growing Disruptor for HPC Moore s
More informationSignal Conversion in a Modular Open Standard Form Factor. CASPER Workshop August 2017 Saeed Karamooz, VadaTech
Signal Conversion in a Modular Open Standard Form Factor CASPER Workshop August 2017 Saeed Karamooz, VadaTech At VadaTech we are technology leaders First-to-market silicon Continuous innovation Open systems
More informationAnalyzing the Disruptive Impact of a Silicon Compiler
THE ELECTRONICS RESURGENCE INITIATIVE Analyzing the Disruptive Impact of a Silicon Compiler Andreas Olofsson 1947 Source: Wikipedia, Computer Museum 2017 Source: AMD Defense Advanced Research Project Agency
More informationCSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University
CSE 591/392: GPU Programming Introduction Klaus Mueller Computer Science Department Stony Brook University First: A Big Word of Thanks! to the millions of computer game enthusiasts worldwide Who demand
More informationThis Unit: Putting It All Together. CIS 501 Computer Architecture. What is Computer Architecture? Sources
This Unit: Putting It All Together CIS 501 Computer Architecture Unit 12: Putting It All Together: Anatomy of the XBox 360 Game Console Application OS Compiler Firmware CPU I/O Memory Digital Circuits
More informationAim High. Intel Technical Update Teratec 07 Symposium. June 20, Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group
Aim High Intel Technical Update Teratec 07 Symposium June 20, 2007 Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group Risk Factors Today s s presentations contain forward-looking statements.
More informationARM Processors for Embedded Applications
ARM Processors for Embedded Applications Roadmap for ARM Processors ARM Architecture Basics ARM Families AMBA Architecture 1 Current ARM Core Families ARM7: Hard cores and Soft cores Cache with MPU or
More informationZynq-7000 All Programmable SoC Product Overview
Zynq-7000 All Programmable SoC Product Overview The SW, HW and IO Programmable Platform August 2012 Copyright 2012 2009 Xilinx Introducing the Zynq -7000 All Programmable SoC Breakthrough Processing Platform
More informationUnit 11: Putting it All Together: Anatomy of the XBox 360 Game Console
Computer Architecture Unit 11: Putting it All Together: Anatomy of the XBox 360 Game Console Slides originally developed by Milo Martin & Amir Roth at University of Pennsylvania! Computer Architecture
More informationINTRODUCTION TO FPGA ARCHITECTURE
3/3/25 INTRODUCTION TO FPGA ARCHITECTURE DIGITAL LOGIC DESIGN (BASIC TECHNIQUES) a b a y 2input Black Box y b Functional Schematic a b y a b y a b y 2 Truth Table (AND) Truth Table (OR) Truth Table (XOR)
More informationExperts in Application Acceleration Synective Labs AB
Experts in Application Acceleration 1 2009 Synective Labs AB Magnus Peterson Synective Labs Synective Labs quick facts Expert company within software acceleration Based in Sweden with offices in Gothenburg
More informationECE 471 Embedded Systems Lecture 2
ECE 471 Embedded Systems Lecture 2 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 7 September 2018 Announcements Reminder: The class notes are posted to the website. HW#1 will
More informationField Programmable Gate Array (FPGA) Devices
Field Programmable Gate Array (FPGA) Devices 1 Contents Altera FPGAs and CPLDs CPLDs FPGAs with embedded processors ACEX FPGAs Cyclone I,II FPGAs APEX FPGAs Stratix FPGAs Stratix II,III FPGAs Xilinx FPGAs
More informationThe Processor That Don't Cost a Thing
The Processor That Don't Cost a Thing Peter Hsu, Ph.D. Peter Hsu Consulting, Inc. http://cs.wisc.edu/~peterhsu DRAM+Processor Commercial demand Heat stiffling industry's growth Heat density limits small
More informationCPU Agnostic Motherboard design with RapidIO Interconnect in Data Center
Agnostic Motherboard design with RapidIO Interconnect in Data Center Devashish Paul Senior Product Manager IDT Chairman RapidIO Trade Association: Marketing Council 2013 RapidIO Trade Association Agenda
More informationNew System Solutions for Laser Printer Applications by Oreste Emanuele Zagano STMicroelectronics
New System Solutions for Laser Printer Applications by Oreste Emanuele Zagano STMicroelectronics Introduction Recently, the laser printer market has started to move away from custom OEM-designed 1 formatter
More informationGrowth outside Cell Phone Applications
ARM Introduction Growth outside Cell Phone Applications ~1B units shipped into non-mobile applications Embedded segment now accounts for 13% of ARM shipments Automotive, microcontroller and smartcards
More informationHW Trends and Architectures
Pavel Tvrdík, Jiří Kašpar (ČVUT FIT) HW Trends and Architectures MI-POA, 2011, Lecture 1 1/29 HW Trends and Architectures prof. Ing. Pavel Tvrdík CSc. Ing. Jiří Kašpar Department of Computer Systems Faculty
More informationThe Epiphany Manycore Architecture Seminar at Halmstad Högskola March 6,
The Epiphany Manycore Architecture Seminar at Halmstad Högskola March 6, 2012 andreas@adapteva.com +1-781-325-6688 Adapteva Company Introduction Team: Veteran low power processor design team from Analog
More informationOCP Engineering Workshop - Telco
OCP Engineering Workshop - Telco Low Latency Mobile Edge Computing Trevor Hiatt Product Management, IDT IDT Company Overview Founded 1980 Workforce Approximately 1,800 employees Headquarters San Jose,
More informationCUDA on ARM Update. Developing Accelerated Applications on ARM. Bas Aarts and Donald Becker
CUDA on ARM Update Developing Accelerated Applications on ARM Bas Aarts and Donald Becker CUDA on ARM: a forward-looking development platform for high performance, energy efficient hybrid computing It
More informationIntegrating FPGAs in High Performance Computing A System, Architecture, and Implementation Perspective
Integrating FPGAs in High Performance Computing A System, Architecture, and Implementation Perspective Nathan Woods XtremeData FPGA 2007 Outline Background Problem Statement Possible Solutions Description
More informationECE 1160/2160 Embedded Systems Design. Midterm Review. Wei Gao. ECE 1160/2160 Embedded Systems Design
ECE 1160/2160 Embedded Systems Design Midterm Review Wei Gao ECE 1160/2160 Embedded Systems Design 1 Midterm Exam When: next Monday (10/16) 4:30-5:45pm Where: Benedum G26 15% of your final grade What about:
More informationTo hear the audio, please be sure to dial in: ID#
Introduction to the HPP-Heterogeneous Processing Platform A combination of Multi-core, GPUs, FPGAs and Many-core accelerators To hear the audio, please be sure to dial in: 1-866-440-4486 ID# 4503739 Yassine
More informationLecture 1: Welcome, why are you here? James C. Hoe Department of ECE Carnegie Mellon University
18 643 Lecture 1: Welcome, why are you here? James C. Hoe Department of ECE Carnegie Mellon University 18 643 F17 L01 S1, James C. Hoe, CMU/ECE/CALCM, 2017 18 643 F17 L01 S2, James C. Hoe, CMU/ECE/CALCM,
More informationEmbedded Computing Platform. Architecture and Instruction Set
Embedded Computing Platform Microprocessor: Architecture and Instruction Set Ingo Sander ingo@kth.se Microprocessor A central part of the embedded platform A platform is the basic hardware and software
More informationIntroduction to the MMAGIX Multithreading Supercomputer
Introduction to the MMAGIX Multithreading Supercomputer A supercomputer is defined as a computer that can run at over a billion instructions per second (BIPS) sustained while executing over a billion floating
More informationFPGA Technology and Industry Experience
FPGA Technology and Industry Experience Guest Lecture at HSLU, Horw (Lucerne) May 24 2012 Oliver Brndler, FPGA Design Center, Enclustra GmbH Silvio Ziegler, FPGA Design Center, Enclustra GmbH Content Enclustra
More informationGodson Processor and its Application in High Performance Computers
Godson Processor and its Application in High Performance Computers Weiwu Hu Institute of Computing Technology, Chinese Academy of Sciences Loongson Technologies Corporation Limited hww@ict.ac.cn 1 Contents
More informationThe Mont-Blanc Project
http://www.montblanc-project.eu The Mont-Blanc Project Daniele Tafani Leibniz Supercomputing Centre 1 Ter@tec Forum 26 th June 2013 This project and the research leading to these results has received funding
More informationOctopus: A Multi-core implementation
Octopus: A Multi-core implementation Kalpesh Sheth HPEC 2007, MIT, Lincoln Lab Export of this products is subject to U.S. export controls. Licenses may be required. This material provides up-to-date general
More informationEmbedded Systems: Architecture
Embedded Systems: Architecture Jinkyu Jeong (Jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ICE3028: Embedded Systems Design, Fall 2018, Jinkyu Jeong (jinkyu@skku.edu)
More informationComponents of a MicroTCA System
Micro TCA Overview0 Platform, chassis, backplane, and shelf manager specification, being developed through PICMG Allows AMC modules to plug directly into a backplane Fills the performance/cost gap between
More informationThe extreme Adaptive DSP Solution to Sensor Data Processing
The extreme Adaptive DSP Solution to Sensor Data Processing Abstract Martin Vorbach PACT XPP Technologies Leo Mirkin Sky Computers, Inc. The new ISR mobile autonomous sensor platforms present a difficult
More informationFABRICATION TECHNOLOGIES
FABRICATION TECHNOLOGIES DSP Processor Design Approaches Full custom Standard cell** higher performance lower energy (power) lower per-part cost Gate array* FPGA* Programmable DSP Programmable general
More informationReminder. Course project team forming deadline. Course project ideas. Friday 9/8 11:59pm You will be randomly assigned to a team after the deadline
Reminder Course project team forming deadline Friday 9/8 11:59pm You will be randomly assigned to a team after the deadline Course project ideas If you have difficulty in finding team mates, send your
More informationHigh Speed Multi-User ASIC/SoC Prototyping system
High Speed Multi-User ASIC/SoC Prototyping system Technical Resource Document Date: August 23, 2010 About GiDEL GiDEL has become one of the market leaders as a company that continuously provides cuttingedge
More informationMarkets Demanding More Performance
The Tile Processor Architecture: Embedded Multicore for Networking and Digital Multimedia Tilera Corporation August 20 th 2007 Hotchips 2007 Markets Demanding More Performance Networking market - Demand
More informationDesign Techniques for Implementing an 800MHz ARM v5 Core for Foundry-Based SoC Integration. Faraday Technology Corp.
Design Techniques for Implementing an 800MHz ARM v5 Core for Foundry-Based SoC Integration Faraday Technology Corp. Table of Contents 1 2 3 4 Faraday & FA626TE Overview Why We Need an 800MHz ARM v5 Core
More informationGPGPUs in HPC. VILLE TIMONEN Åbo Akademi University CSC
GPGPUs in HPC VILLE TIMONEN Åbo Akademi University 2.11.2010 @ CSC Content Background How do GPUs pull off higher throughput Typical architecture Current situation & the future GPGPU languages A tale of
More informationSimplifying FPGA Design for SDR with a Network on Chip Architecture
Simplifying FPGA Design for SDR with a Network on Chip Architecture Matt Ettus Ettus Research GRCon13 Outline 1 Introduction 2 RF NoC 3 Status and Conclusions USRP FPGA Capability Gen
More informationDr. Yassine Hariri CMC Microsystems
Dr. Yassine Hariri Hariri@cmc.ca CMC Microsystems 03-26-2013 Agenda MCES Workshop Agenda and Topics Canada s National Design Network and CMC Microsystems Processor Eras: Background and History Single core
More informationFlexRIO. FPGAs Bringing Custom Functionality to Instruments. Ravichandran Raghavan Technical Marketing Engineer. ni.com
FlexRIO FPGAs Bringing Custom Functionality to Instruments Ravichandran Raghavan Technical Marketing Engineer Electrical Test Today Acquire, Transfer, Post-Process Paradigm Fixed- Functionality Triggers
More informationNVIDIA'S DEEP LEARNING ACCELERATOR MEETS SIFIVE'S FREEDOM PLATFORM. Frans Sijstermans (NVIDIA) & Yunsup Lee (SiFive)
NVIDIA'S DEEP LEARNING ACCELERATOR MEETS SIFIVE'S FREEDOM PLATFORM Frans Sijstermans (NVIDIA) & Yunsup Lee (SiFive) NVDLA NVIDIA DEEP LEARNING ACCELERATOR IP Core for deep learning part of NVIDIA s Xavier
More informationL2: FPGA HARDWARE : ADVANCED DIGITAL DESIGN PROJECT FALL 2015 BRANDON LUCIA
L2: FPGA HARDWARE 18-545: ADVANCED DIGITAL DESIGN PROJECT FALL 2015 BRANDON LUCIA 18-545: FALL 2014 2 Admin stuff Project Proposals happen on Monday Be prepared to give an in-class presentation Lab 1 is
More information3D Graphics in Future Mobile Devices. Steve Steele, ARM
3D Graphics in Future Mobile Devices Steve Steele, ARM Market Trends Mobile Computing Market Growth Volume in millions Mobile Computing Market Trends 1600 Smart Mobile Device Shipments (Smartphones and
More informationSupport for Programming Reconfigurable Supercomputers
Support for Programming Reconfigurable Supercomputers Miriam Leeser Nicholas Moore, Albert Conti Dept. of Electrical and Computer Engineering Northeastern University Boston, MA Laurie Smith King Dept.
More informationFrom Majorca with love
From Majorca with love IEEE Photonics Society - Winter Topicals 2010 Photonics for Routing and Interconnects January 11, 2010 Organizers: H. Dorren (Technical University of Eindhoven) L. Kimerling (MIT)
More informationThunderbolt. VME Multiprocessor Boards and Systems. Best Price/Performance of High Performance Embedded C o m p u t e r s
Thunderbolt VME Multiprocessor Boards and Systems For nearly 25 years SKY Computers has been building some of the world s fastest and most reliable embedded computers. We supply more than half of the computers
More informationLeveraging OpenSPARC. ESA Round Table 2006 on Next Generation Microprocessors for Space Applications EDD
Leveraging OpenSPARC ESA Round Table 2006 on Next Generation Microprocessors for Space Applications G.Furano, L.Messina TEC- OpenSPARC T1 The T1 is a new-from-the-ground-up SPARC microprocessor implementation
More informationAn Evaluation of an Energy Efficient Many-Core SoC with Parallelized Face Detection
An Evaluation of an Energy Efficient Many-Core SoC with Parallelized Face Detection Hiroyuki Usui, Jun Tanabe, Toru Sano, Hui Xu, and Takashi Miyamori Toshiba Corporation, Kawasaki, Japan Copyright 2013,
More informationMassively Parallel Processor Breadboarding (MPPB)
Massively Parallel Processor Breadboarding (MPPB) 28 August 2012 Final Presentation TRP study 21986 Gerard Rauwerda CTO, Recore Systems Gerard.Rauwerda@RecoreSystems.com Recore Systems BV P.O. Box 77,
More informationBasic FPGA Architectures. Actel FPGAs. PLD Technologies: Antifuse. 3 Digital Systems Implementation Programmable Logic Devices
3 Digital Systems Implementation Programmable Logic Devices Basic FPGA Architectures Why Programmable Logic Devices (PLDs)? Low cost, low risk way of implementing digital circuits as application specific
More informationRZ Embedded Microprocessors
Next Generation HMI Solutions RZ Embedded Microprocessors www.renesas.eu 2013.11 The RZ Family Embedded Microprocessors The RZ is a new family of embedded microprocessors that retains the ease-of-use of
More informationUnderstanding Peak Floating-Point Performance Claims
white paper FPGA Understanding Peak ing-point Performance Claims Learn how to calculate and compare the peak floating-point capabilities of digital signal processors (DSPs), graphics processing units (GPUs),
More informationKiloCore: A 32 nm 1000-Processor Array
KiloCore: A 32 nm 1000-Processor Array Brent Bohnenstiehl, Aaron Stillmaker, Jon Pimentel, Timothy Andreas, Bin Liu, Anh Tran, Emmanuel Adeagbo, Bevan Baas University of California, Davis VLSI Computation
More informationBuilding supercomputers from commodity embedded chips
http://www.montblanc-project.eu Building supercomputers from commodity embedded chips Alex Ramirez Barcelona Supercomputing Center Technical Coordinator This project and the research leading to these results
More informationEttus Research Update
Ettus Research Update Matt Ettus Ettus Research GRCon13 Outline 1 Introduction 2 Recent New Products 3 Third Generation Introduction Who am I? Core GNU Radio contributor since 2001 Designed
More informationMulti-Core Microprocessor Chips: Motivation & Challenges
Multi-Core Microprocessor Chips: Motivation & Challenges Dileep Bhandarkar, Ph. D. Architect at Large DEG Architecture & Planning Digital Enterprise Group Intel Corporation October 2005 Copyright 2005
More informationMemory Systems IRAM. Principle of IRAM
Memory Systems 165 other devices of the module will be in the Standby state (which is the primary state of all RDRAM devices) or another state with low-power consumption. The RDRAM devices provide several
More informationCOSMOS Architecture and Key Technologies. June 1 st, 2018 COSMOS Team
COSMOS Architecture and Key Technologies June 1 st, 2018 COSMOS Team COSMOS: System Architecture (2) System design based on three levels of SDR radio node (S,M,L) with M,L connected via fiber to optical
More informationBuilding supercomputers from embedded technologies
http://www.montblanc-project.eu Building supercomputers from embedded technologies Alex Ramirez Barcelona Supercomputing Center Technical Coordinator This project and the research leading to these results
More informationSoC Platforms and CPU Cores
SoC Platforms and CPU Cores COE838: Systems on Chip Design http://www.ee.ryerson.ca/~courses/coe838/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer Engineering Ryerson University
More informationTrends in HPC (hardware complexity and software challenges)
Trends in HPC (hardware complexity and software challenges) Mike Giles Oxford e-research Centre Mathematical Institute MIT seminar March 13th, 2013 Mike Giles (Oxford) HPC Trends March 13th, 2013 1 / 18
More informationOutline Marquette University
COEN-4710 Computer Hardware Lecture 1 Computer Abstractions and Technology (Ch.1) Cristinel Ababei Department of Electrical and Computer Engineering Credits: Slides adapted primarily from presentations
More informationHigh Performance Computing: Blue-Gene and Road Runner. Ravi Patel
High Performance Computing: Blue-Gene and Road Runner Ravi Patel 1 HPC General Information 2 HPC Considerations Criterion Performance Speed Power Scalability Number of nodes Latency bottlenecks Reliability
More information"On the Capability and Achievable Performance of FPGAs for HPC Applications"
"On the Capability and Achievable Performance of FPGAs for HPC Applications" Wim Vanderbauwhede School of Computing Science, University of Glasgow, UK Or in other words "How Fast Can Those FPGA Thingies
More informationDesigning for Low Power with Programmable System Solutions Dr. Yankin Tanurhan, Vice President, System Solutions and Advanced Applications
Designing for Low Power with Programmable System Solutions Dr. Yankin Tanurhan, Vice President, System Solutions and Advanced Applications Overview Why is power a problem? What can FPGAs do? Are we safe
More informationECE 4514 Digital Design II. Spring Lecture 22: Design Economics: FPGAs, ASICs, Full Custom
ECE 4514 Digital Design II Lecture 22: Design Economics: FPGAs, ASICs, Full Custom A Tools/Methods Lecture Overview Wows and Woes of scaling The case of the Microprocessor How efficiently does a microprocessor
More informationEITF35: Introduction to Structured VLSI Design
EITF35: Introduction to Structured VLSI Design Introduction to FPGA design Rakesh Gangarajaiah Rakesh.gangarajaiah@eit.lth.se Slides from Chenxin Zhang and Steffan Malkowsky WWW.FPGA What is FPGA? Field
More informationChapter 1: Introduction to Parallel Computing
Parallel and Distributed Computing Chapter 1: Introduction to Parallel Computing Jun Zhang Laboratory for High Performance Computing & Computer Simulation Department of Computer Science University of Kentucky
More informationDifferentiation in Flexible Data Center Interconnect
Differentiation in Flexible Data Center Interconnect Industry Leadership Momentum Founded 1984 20,000 customers $238B FY15 revenue 3,500 employees world wide >55% Market segment share 3500+ Patents 60
More informationL évolution des architectures et des technologies d intégration des circuits intégrés dans les Data centers
I N S T I T U T D E R E C H E R C H E T E C H N O L O G I Q U E L évolution des architectures et des technologies d intégration des circuits intégrés dans les Data centers 10/04/2017 Les Rendez-vous de
More informationComputer Architecture s Changing Definition
Computer Architecture s Changing Definition 1950s Computer Architecture Computer Arithmetic 1960s Operating system support, especially memory management 1970s to mid 1980s Computer Architecture Instruction
More informationMulti-Gigahertz Parallel FFTs for FPGA and ASIC Implementation
Multi-Gigahertz Parallel FFTs for FPGA and ASIC Implementation Doug Johnson, Applications Consultant Chris Eddington, Technical Marketing Synopsys 2013 1 Synopsys, Inc. 700 E. Middlefield Road Mountain
More informationECE 574 Cluster Computing Lecture 18
ECE 574 Cluster Computing Lecture 18 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 2 April 2019 HW#8 was posted Announcements 1 Project Topic Notes I responded to everyone s
More informationProgramming Multicore Systems
Programming Multicore Systems Enabling real time applications for multicore with the XMOS development tools 5 th September 2011 Matt Fyles XMOS Company Overview Established Fabless Semiconductor Company
More informationIndustry Collaboration and Innovation
Industry Collaboration and Innovation Industry Landscape Key changes occurring in our industry Historical microprocessor technology continues to deliver far less than the historical rate of cost/performance
More informationHistory. PowerPC based micro-architectures. PowerPC ISA. Introduction
PowerPC based micro-architectures Godfrey van der Linden Presentation for COMP9244 Software view of Processor Architectures 2006-05-25 History 1985 IBM started on AMERICA 1986 Development of RS/6000 1990
More informationH100 Series FPGA Application Accelerators
2 H100 Series FPGA Application Accelerators Products in the H100 Series PCI-X Mainstream IBM EBlade H101-PCIXM» HPC solution for optimal price/performance» PCI-X form factor» Single Xilinx Virtex 4 FPGA
More informationECE 571 Advanced Microprocessor-Based Design Lecture 18
ECE 571 Advanced Microprocessor-Based Design Lecture 18 Vince Weaver http://www.eece.maine.edu/ vweaver vincent.weaver@maine.edu 11 November 2014 Homework #4 comments Project/HW Reminder 1 Stuff from Last
More information