PULP: an open source hardware-software platform for near-sensor analytics. Luca Benini IIS-ETHZ & DEI-UNIBO

Size: px
Start display at page:

Download "PULP: an open source hardware-software platform for near-sensor analytics. Luca Benini IIS-ETHZ & DEI-UNIBO"

Transcription

1 PULP: an open source hardware-software platform for near-sensor analytics Luca Benini IIS-ETHZ & DEI-UNIBO

2 An IoT System View Sense MEMS IMU MEMS Microphone ULP Imager Analyze µcontroller L2 Memory e.g. CortexM IOs Transmit Short range, BW Low rate (periodic) data EMG/ECG/EIT MOPS 1 10 mw SW update, commands Long range, low BW 100 µw 2 mw cm 2 Harvesting powered mw (MaxP) + uw (AvgP) Idle: ~1µW Active: ~ 10mW 2

3 The Computing Bottleneck Microcontroller Landscape *not exhaustive High performance MCUs Low-Power MCUs Our Target Luca Benini 3

4 Reaching pj/op

5 Content Understanding A general pattern 1. Extract descriptors from raw data 2D: Corners, blobs, 1D: LPC coefficients, Usually highly parallel 2. Use descriptors to classify data among family representatives Machine learning, Bayesian,. Also highly parallel

6 Minimum energy operation Source: Vivek De, INTEL Date nm CMOS, 25 o C Energy/Cycle (nj) X Total Energy Leakage Energy Dynamic Energy Logic Vcc / Memory Vcc (V) Near-Threshold Computing (NTC): 1. Don t waste energy pushing devices in strong inversion 2. Recover performance with parallel execution 6

7 The best Processor [AziziISCA10] Single issue in-order is most energy efficient Put more than one + shared memory to fill cluster area Departement Informationstechnologie und Elektrotechnik 7

8 Near-Threshold Multiprocessing SIMD/MIMD/SEQ Shared L1 I$ + configurable broadcasting I$B 0 I$ I$B k Up to 16 simple cores IL0 IL0 PE PE N 1 LOGARITHMIC INTERCONNECT Private Loop/prefetch buffer DMA MB 0 L1 TCDM MB M Tightly Coupled DMA Shared L1 DataMem +configurable interleaving NT but parallel Max. Energy efficiency when Active + strong PM for (partial) idleness

9 Near threshold FDSOI technology Body bias: Highly effective knob for power management! 9

10 Silicon Results Technology UTBB FD-SOI 28nm Transistors Flip well L = 24 nm Cluster area 1.3 mm 2 VDD range (memories) BB range SRAM macros SCM macros Gates Frequency range Power range 0.32V V ( V) 0V V 8 x 32 kbit (TCDM) 16x4 kbit (TCDM) 4x 2x4 kbit (I$) 200K NO BB: MHz MAX FBB: MHz NO FBB: mw MAX FBB: mw Hot Chips 15 V1 Cool Chips 16 V2 ~5pJ/OP Luca Benini 10

11 Cluster Energy Efficiency MHz, 0.46V, 0V FBB, 840 µw 10 1GOPS, 100 MOPS/mW, 0.66V, 0.5V FBB Full FBB heavily degrades energy efficiency at low voltage due to high Leakage! Luca Benini 11

12 PULP Boards 86.5mm x 57 mm Battery supply PULP Interfaces: JTAG I 2 S UART SPI I 2 C LEDs 128 Mbit Flash Apollo M4 Interfaces: SWD SPI Button I 2 C LEDs GPIO Daughterboard expansions PCB Front PCB Back

13 Open Source Parallel ULP computing for the IoT (sub)-pj/op computing platform - let s make it Open! Processor & Hardware IPs Compiler Infrastructure Virtualization Layer Programming Model 13

14 What has been released RISC-V compatible 32-bit Taped out UMC65nm efficient microprocessor 400MHz core with AXI/AMBA peripherals. RV32I, RV32C supported Most of RV32M (full support soon) Custom extensions (needs our compiler extensions) Hardware loops Post-increment load ALU/MAC instructions Hundreds of GIT forks Confirmed by silicon measurement 14

15 Why Open Hardware? Community Building We want that PULP is used, need a community Cooperation with Academic Partners Allows us to exchange ideas, projects freely We find more partners we can work with Cooperation/supporting Industry Lowers costs for an SME in entering IC business Creates jobs/opportunities (for our students and others) IP, Consulting, Customization Funding Funded Projects Volume Chip production Integrated Systems Laboratory 15

16 Towards fj/op

17 Maximizing Silicon Efficiency GOPS/W > 100 SW Mixed HW General-purpose Computing Throughput Computing 1GOPS/mW CPU GPGPU Accelerator Gap HW IP Closing The Accelerator Efficiency Gap with Agile Customization 17

18 Fractal Heterogeneity Fixed function accelerators have limited reuse how to limit proliferation? 18

19 Learn to Accelerate Brain-inspired systems are high performers in many tasks over many domains. [Honglak Lee] Image recognition [E.g., Krizhevsky et al., 2012] Speech recognition [E.g., Heigold et al., 2013] NLP [E.g., Socher et al., ICML 2011; Collobert & Weston, ICML 2008] 19

20 PULP CNN Accelerator Departement Informationstechnologie und Elektrotechnik 20

21 How do we fare? Spiking-Based mw spiking ops/s/w SIMD-like Convolution ISA Extension GOPS/W IBM TrueNorth [Merolla et al.] PULP + HWCE 0.4V: GOPS/W 0.8V: GOPS/W Convolution Engine [Qadeer et al.] Deep Network ASIC GOPS/W ConvNet FPGA / ASIC up to 230 GOPS/W DianNao [Chen et al.] NeuFlow/nn-X [Gokhale et al.] Ample margins for further improvements

PULP: A Parallel Ultra Low Power platform for next generation IoT Applications

PULP: A Parallel Ultra Low Power platform for next generation IoT Applications PULP: A Parallel Ultra Low Power platform for next generation IoT Applications Davide Rossi 1 Francesco Conti 1, Andrea Marongiu 1,2, Antonio Pullini 2, Igor Loi 1, Michael Gautschi 2, Giuseppe Tagliavini

More information

Architetture di Calcolo Ultra-Low-Power per Internet of Things: La piattaforma PULP

Architetture di Calcolo Ultra-Low-Power per Internet of Things: La piattaforma PULP Architetture di Calcolo Ultra-Low-Power per Internet of Things: La piattaforma PULP 31.05.2018 Davide Rossi davide.rossi@unibo.it 1 Department of Electrical, Electronic and Information Engineering 2 Integrated

More information

Evaluating RISC-V Cores for PULP

Evaluating RISC-V Cores for PULP Evaluating RISC-V Cores for PULP An Open Parallel Ultra-Low-Power Platform www.pulp.ethz.ch 30 June 2015 Sven Stucki Antonio Pullini Michael Gautschi Frank K. Gürkaynak Andrea Marongiu Igor Loi Davide

More information

A framework for optimizing OpenVX Applications on Embedded Many Core Accelerators

A framework for optimizing OpenVX Applications on Embedded Many Core Accelerators A framework for optimizing OpenVX Applications on Embedded Many Core Accelerators Giuseppe Tagliavini, DEI University of Bologna Germain Haugou, IIS ETHZ Andrea Marongiu, DEI University of Bologna & IIS

More information

The PULP Cores: A Set of Open-Source Ultra-Low- Power RISC-V Cores for Internet-of-Things Applications

The PULP Cores: A Set of Open-Source Ultra-Low- Power RISC-V Cores for Internet-of-Things Applications The PULP Cores: A Set of Open-Source Ultra-Low- Power RISC-V Cores for Internet-of-Things Applications 29.11.2017 Pasquale Davide Schiavone, Florian Zaruba Davide Rossi, Igor Loi, Antonio Pullini, Francesco

More information

Parallel Ultra Low-Power Processing (PULP) Systems

Parallel Ultra Low-Power Processing (PULP) Systems Parallel Ultra Low-Power Processing (PULP) Systems OPRECOMP SUMMER SCHOOL 19.07.2018 Davide Rossi Frank K. Gürkaynak davide.rossi@unibo.it 1 Department of Electrical, Electronic and Information Engineering

More information

Edge Computing and the Next Generation of IoT Sensors. Alex Raimondi

Edge Computing and the Next Generation of IoT Sensors. Alex Raimondi Edge Computing and the Next Generation of IoT Sensors Alex Raimondi Who I am? Background: o Studied Electrical Engineering at ETH Zurich o Over 20 years of experience in embedded design o Co-founder of

More information

Research Collection. KISS PULPino - Updates on PULPino updates on PULPino. Other Conference Item. ETH Library

Research Collection. KISS PULPino - Updates on PULPino updates on PULPino. Other Conference Item. ETH Library Research Collection Other Conference Item KISS PULPino - Updates on PULPino updates on PULPino Author(s): Pullini, Antonio; Gautschi, Michael; Gürkaynak, Frank Kagan; Glaser, Florian; Mach, Stefan; Rovere,

More information

Smart Ultra-Low Power Visual Sensing

Smart Ultra-Low Power Visual Sensing Smart Ultra-Low Power Visual Sensing Manuele Rusci*, Francesco Conti * manuele.rusci@unibo.it f.conti@unibo.it Energy-Efficient Embedded Systems Laboratory Dipartimento di Ingegneria dell Energia Elettrica

More information

Deep Learning with Low Precision Hardware Challenges and Opportunities for Logic Synthesis

Deep Learning with Low Precision Hardware Challenges and Opportunities for Logic Synthesis Deep Learning with Low Precision Hardware Challenges and Opportunities for Logic Synthesis ETHZ & UNIBO http://www.pulp-platform.org 1 of 40 Deep Learning: Why? First, it was machine vision Now it s everywhere!

More information

Energy-Efficient Near-Threshold Parallel Computing: The PULPv2 Cluster

Energy-Efficient Near-Threshold Parallel Computing: The PULPv2 Cluster Cool Chips Energy-Efficient Near-Threshold Parallel Computing: The PULPv2 Cluster Davide Rossi University of Bologna Antonio Pullini ETH Zurich Igor Loi University of Bologna Michael Gautschi, Frank Kağan

More information

Exercise 6: PULP Programming

Exercise 6: PULP Programming Exercise 6: PULP Programming Introduction to the PULP Computing Platform 24.05.2016 Antonio Pullini Michael Gautschi Davide Schiavone Integrated Systems Laboratory How efficient do we need to be? Integrated

More information

DSP ISA Extensions for an Open-Source RISC-V Implementation

DSP ISA Extensions for an Open-Source RISC-V Implementation DSP ISA Extensions for an Open-Source RISC-V Implementation Davide Schiavone Davide Rossi Michael Gautschi Eric Flamand Andreas Traber Luca Benini Integrated Systems Laboratory Introduction: a typical

More information

Deep ST, Ultra Low Power Artificial Neural Network SOC in 28 FD-SOI. Nitin Chawla,

Deep ST, Ultra Low Power Artificial Neural Network SOC in 28 FD-SOI. Nitin Chawla, Deep learning @ ST, Ultra Low Power Artificial Neural Network SOC in 28 FD-SOI Nitin Chawla, Senior Principal Engineer and Senior Member of Technical Staff at STMicroelectronics Outline Introduction Chip

More information

PULP Project Update. ORCONF 2018, Gdansk, Poland Davide Rossi 1,

PULP Project Update. ORCONF 2018, Gdansk, Poland Davide Rossi 1, PULP Project Update ORCONF 2018, Gdansk, Poland. 21.09.2018 Davide Rossi 1, davide.rossi@unibo.it Antonio Pullini 2, Davide Schiavone 2, Francesco Conti 1, Florian Gasler 1, Florian Zaruba 2, Stefan Mach

More information

Accuracy and Performance Trade-offs of Logarithmic Number Units in Multi-Core Clusters

Accuracy and Performance Trade-offs of Logarithmic Number Units in Multi-Core Clusters Accuracy and Performance Trade-offs of Logarithmic Number Units in Multi-Core Clusters ARITH 2016 Silicon Valley July 10-13, 2016 Michael Schaffner 1 Michael Gautschi 1 Frank K. Gürkaynak 1 Prof. Luca

More information

Oberon M2M IoT Platform. JAN 2016

Oberon M2M IoT Platform. JAN 2016 Oberon M2M IoT Platform JAN 2016 www.imgtec.com Contents Iot Segments and Definitions Targeted Use Cases for IoT Oberon targeted use cases IoT Differentiators IoT Power Management IoT Security Integrated

More information

May Wu, Ravi Iyer, Yatin Hoskote, Steven Zhang, Julio Zamora, German Fabila, Ilya Klotchkov, Mukesh Bhartiya. August, 2015

May Wu, Ravi Iyer, Yatin Hoskote, Steven Zhang, Julio Zamora, German Fabila, Ilya Klotchkov, Mukesh Bhartiya. August, 2015 May Wu, Ravi Iyer, Yatin Hoskote, Steven Zhang, Julio Zamora, German Fabila, Ilya Klotchkov, Mukesh Bhartiya August, 2015 Legal Notices and Disclaimers Intel technologies may require enabled hardware,

More information

Design and Technology Trends

Design and Technology Trends Lecture 1 Design and Technology Trends R. Saleh Dept. of ECE University of British Columbia res@ece.ubc.ca 1 Recently Designed Chips Itanium chip (Intel), 2B tx, 700mm 2, 8 layer 65nm CMOS (4 processors)

More information

Learning Module 9. Managing the Sensor: Embedded Computing. Paul Flikkema. Department of Electrical Engineering Northern Arizona University

Learning Module 9. Managing the Sensor: Embedded Computing. Paul Flikkema. Department of Electrical Engineering Northern Arizona University Learning Module 9 Managing the Sensor: Embedded Computing Paul Flikkema Department of Electrical Engineering Northern Arizona University Outline Networked Embedded Systems Hardware Software Languages Operating

More information

Scaling-up Edge Computing with PULP A many-core Platform for Micropower in-sensor Analytics Southampton

Scaling-up Edge Computing with PULP A many-core Platform for Micropower in-sensor Analytics Southampton Scaling-up Edge Computing with PULP A many-core Platform for Micropower in-sensor Analytics Southampton 19.01.2018 Davide Rossi 1, Antonio Pullini 2, Igor Loi 1, Davide Schiavone 2, Francesco Conti 1,

More information

NVIDIA'S DEEP LEARNING ACCELERATOR MEETS SIFIVE'S FREEDOM PLATFORM. Frans Sijstermans (NVIDIA) & Yunsup Lee (SiFive)

NVIDIA'S DEEP LEARNING ACCELERATOR MEETS SIFIVE'S FREEDOM PLATFORM. Frans Sijstermans (NVIDIA) & Yunsup Lee (SiFive) NVIDIA'S DEEP LEARNING ACCELERATOR MEETS SIFIVE'S FREEDOM PLATFORM Frans Sijstermans (NVIDIA) & Yunsup Lee (SiFive) NVDLA NVIDIA DEEP LEARNING ACCELERATOR IP Core for deep learning part of NVIDIA s Xavier

More information

Transprecision Computing

Transprecision Computing Transprecision Computing Dionysios Speaker Diamantopoulos name, Title Company/Organization Name IBM Research - Zurich Join the Conversation #OpenPOWERSummit A look into the next 15 years -8x Source: The

More information

Agile Hardware Design: Building Chips with Small Teams

Agile Hardware Design: Building Chips with Small Teams 2017 SiFive. All Rights Reserved. Agile Hardware Design: Building Chips with Small Teams Yunsup Lee ASPIRE Graduate 2016 Co-Founder and CTO 2 2017 SiFive. All Rights Reserved. World s First Single-Chip

More information

Centip3De: A 64-Core, 3D Stacked, Near-Threshold System

Centip3De: A 64-Core, 3D Stacked, Near-Threshold System 1 1 1 Centip3De: A 64-Core, 3D Stacked, Near-Threshold System Ronald G. Dreslinski David Fick, Bharan Giridhar, Gyouho Kim, Sangwon Seo, Matthew Fojtik, Sudhir Satpathy, Yoonmyung Lee, Daeyeon Kim, Nurrachman

More information

Computer Architectures for Deep Learning. Ethan Dell and Daniyal Iqbal

Computer Architectures for Deep Learning. Ethan Dell and Daniyal Iqbal Computer Architectures for Deep Learning Ethan Dell and Daniyal Iqbal Agenda Introduction to Deep Learning Challenges Architectural Solutions Hardware Architectures CPUs GPUs Accelerators FPGAs SOCs ASICs

More information

Introduction to ASIC Design

Introduction to ASIC Design Introduction to ASIC Design Victor P. Nelson ELEC 5250/6250 CAD of Digital ICs Design & implementation of ASICs Oops Not these! Application-Specific Integrated Circuit (ASIC) Developed for a specific application

More information

Mercury System SB310

Mercury System SB310 Mercury System SB310 Ultrasonic Board - Product Datasheet Author Francesco Ficili Date 20/05/2018 Status Released Pag. 1 Revision History Version Date Author Changes 1.0 20/05/2018 Francesco Ficili Initial

More information

Zynq-7000 All Programmable SoC Product Overview

Zynq-7000 All Programmable SoC Product Overview Zynq-7000 All Programmable SoC Product Overview The SW, HW and IO Programmable Platform August 2012 Copyright 2012 2009 Xilinx Introducing the Zynq -7000 All Programmable SoC Breakthrough Processing Platform

More information

System-on-Chip Architecture for Mobile Applications. Sabyasachi Dey

System-on-Chip Architecture for Mobile Applications. Sabyasachi Dey System-on-Chip Architecture for Mobile Applications Sabyasachi Dey Email: sabyasachi.dey@gmail.com Agenda What is Mobile Application Platform Challenges Key Architecture Focus Areas Conclusion Mobile Revolution

More information

Process and Design Solutions for Exploiting FD SOI Technology Towards Energy Efficient SOCs

Process and Design Solutions for Exploiting FD SOI Technology Towards Energy Efficient SOCs Process and Design Solutions for Exploiting FD SOI Technology Towards Energy Efficient SOCs Philippe FLATRESSE Technology R&D Central CAD & Design Solutions STMicroelectronics International Symposium on

More information

The Nios II Family of Configurable Soft-core Processors

The Nios II Family of Configurable Soft-core Processors The Nios II Family of Configurable Soft-core Processors James Ball August 16, 2005 2005 Altera Corporation Agenda Nios II Introduction Configuring your CPU FPGA vs. ASIC CPU Design Instruction Set Architecture

More information

THE NVIDIA DEEP LEARNING ACCELERATOR

THE NVIDIA DEEP LEARNING ACCELERATOR THE NVIDIA DEEP LEARNING ACCELERATOR INTRODUCTION NVDLA NVIDIA Deep Learning Accelerator Developed as part of Xavier NVIDIA s SOC for autonomous driving applications Optimized for Convolutional Neural

More information

Flexible Product Demonstrations enabled with the FleX IC Development Kit

Flexible Product Demonstrations enabled with the FleX IC Development Kit Flexible Product Demonstrations enabled with the FleX IC Development Kit Flexible MCU, ADC and RFIC high-performance ICs provide needed capability for sophisticated flexible electronic products. Session

More information

Gigascale Integration Design Challenges & Opportunities. Shekhar Borkar Circuit Research, Intel Labs October 24, 2004

Gigascale Integration Design Challenges & Opportunities. Shekhar Borkar Circuit Research, Intel Labs October 24, 2004 Gigascale Integration Design Challenges & Opportunities Shekhar Borkar Circuit Research, Intel Labs October 24, 2004 Outline CMOS technology challenges Technology, circuit and μarchitecture solutions Integration

More information

VLSI Design Automation. Maurizio Palesi

VLSI Design Automation. Maurizio Palesi VLSI Design Automation 1 Outline Technology trends VLSI Design flow (an overview) 2 Outline Technology trends VLSI Design flow (an overview) 3 IC Products Processors CPU, DSP, Controllers Memory chips

More information

HotChips An innovative HD video and digital image processor for low-cost digital entertainment products. Deepu Talla.

HotChips An innovative HD video and digital image processor for low-cost digital entertainment products. Deepu Talla. HotChips 2007 An innovative HD video and digital image processor for low-cost digital entertainment products Deepu Talla Texas Instruments 1 Salient features of the SoC HD video encode and decode using

More information

Design challenges for wireless smart cameras

Design challenges for wireless smart cameras Design challenges for wireless smart cameras Marc Heijligers Richard Kleihorst, Anteheh Abbo, Vishal Choudhary, Leo Sevat, Ben Schueler Philips Research, Eindhoven Contents Context Vision Platform Architecture

More information

ECE 4514 Digital Design II. Spring Lecture 22: Design Economics: FPGAs, ASICs, Full Custom

ECE 4514 Digital Design II. Spring Lecture 22: Design Economics: FPGAs, ASICs, Full Custom ECE 4514 Digital Design II Lecture 22: Design Economics: FPGAs, ASICs, Full Custom A Tools/Methods Lecture Overview Wows and Woes of scaling The case of the Microprocessor How efficiently does a microprocessor

More information

A new Computer Vision Processor Chip Design for automotive ADAS CNN applications in 22nm FDSOI based on Cadence VP6 Technology

A new Computer Vision Processor Chip Design for automotive ADAS CNN applications in 22nm FDSOI based on Cadence VP6 Technology Dr.-Ing Jens Benndorf (DCT) Gregor Schewior (DCT) A new Computer Vision Processor Chip Design for automotive ADAS CNN applications in 22nm FDSOI based on Cadence VP6 Technology Tensilica Day 2017 16th

More information

Product specification

Product specification MJIOT-AMB-03 Product specification 1 MJIOT-AMB-03module appearance 2 目录 1. Product overview...4 1.1 Characteristic... 5 1.2 main parameters...6 1.2 Interface definition... 7 2. appearance and size... 8

More information

Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks

Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks Naveen Suda, Vikas Chandra *, Ganesh Dasika *, Abinash Mohanty, Yufei Ma, Sarma Vrudhula, Jae-sun Seo, Yu

More information

OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions

OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions 04/15/14 1 Introduction: Low Power Technology Process Hardware Architecture Software Multi VTH Low-power circuits Parallelism

More information

A 50% Lower Power ARM Cortex CPU using DDC Technology with Body Bias. David Kidd August 26, 2013

A 50% Lower Power ARM Cortex CPU using DDC Technology with Body Bias. David Kidd August 26, 2013 A 50% Lower Power ARM Cortex CPU using DDC Technology with Body Bias David Kidd August 26, 2013 1 HOTCHIPS 2013 Copyright 2013 SuVolta, Inc. All rights reserved. Agenda DDC transistor and PowerShrink platform

More information

Hello, and welcome to this presentation of the STM32L4 power controller. The STM32L4 s power management functions and all power modes will also be

Hello, and welcome to this presentation of the STM32L4 power controller. The STM32L4 s power management functions and all power modes will also be Hello, and welcome to this presentation of the STM32L4 power controller. The STM32L4 s power management functions and all power modes will also be covered in this presentation. 1 Please note that this

More information

Hardware-Software Design of Embedded Systems

Hardware-Software Design of Embedded Systems Universität Dortmund Hardware-Software Design of Embedded Systems Credits: Marwedel 2013, Wolf 2008 Luca Benini DEIS Università di Bologna AA 2017-2018 Universität Dortmund Motivation for Course Electronics

More information

Soitec ultra-thin SOI substrates enabling FD-SOI technology. July, 2015

Soitec ultra-thin SOI substrates enabling FD-SOI technology. July, 2015 Soitec ultra-thin SOI substrates enabling FD-SOI technology July, 2015 Agenda FD-SOI: Background & Value Proposition C1- Restricted July 8, 2015 2 Today Ultra-mobile & Connected Consumer At Any Time With

More information

Moore s Law: Alive and Well. Mark Bohr Intel Senior Fellow

Moore s Law: Alive and Well. Mark Bohr Intel Senior Fellow Moore s Law: Alive and Well Mark Bohr Intel Senior Fellow Intel Scaling Trend 10 10000 1 1000 Micron 0.1 100 nm 0.01 22 nm 14 nm 10 nm 10 0.001 1 1970 1980 1990 2000 2010 2020 2030 Intel Scaling Trend

More information

DNN Accelerator Architectures

DNN Accelerator Architectures DNN Accelerator Architectures ISCA Tutorial (2017) Website: http://eyeriss.mit.edu/tutorial.html Joel Emer, Vivienne Sze, Yu-Hsin Chen 1 2 Highly-Parallel Compute Paradigms Temporal Architecture (SIMD/SIMT)

More information

GAUSS OBC ABACUS 2017

GAUSS OBC ABACUS 2017 [] Table of contents Table of contents... 1 1. Introduction... 3 1.1. ABACUS Features... 3 1.2. Block Diagram... 6 2. Pinouts... 7 3. Inertial Measurement Unit Details... 10 3.1. Orientation of Axes...

More information

ECE 747 Digital Signal Processing Architecture. DSP Implementation Architectures

ECE 747 Digital Signal Processing Architecture. DSP Implementation Architectures ECE 747 Digital Signal Processing Architecture DSP Implementation Architectures Spring 2006 W. Rhett Davis NC State University W. Rhett Davis NC State University ECE 406 Spring 2006 Slide 1 My Goal Challenge

More information

FPGA Programming Technology

FPGA Programming Technology FPGA Programming Technology Static RAM: This Xilinx SRAM configuration cell is constructed from two cross-coupled inverters and uses a standard CMOS process. The configuration cell drives the gates of

More information

A 297MOPS/0.4mW Ultra Low Power Coarse-grained Reconfigurable Accelerator CMA-SOTB-2

A 297MOPS/0.4mW Ultra Low Power Coarse-grained Reconfigurable Accelerator CMA-SOTB-2 A 297MOPS/.4mW Ultra Low Power Coarse-grained Reconfigurable Accelerator CMA-SOTB-2 Koichiro Masuyama, Yu Fujita, Hayate Okuhara, Hideharu Amano Dept. of ICS, Keio University, Yokohama Japan Email: {wasmii,

More information

Software Defined Modem A commercial platform for wireless handsets

Software Defined Modem A commercial platform for wireless handsets Software Defined Modem A commercial platform for wireless handsets Charles F Sturman VP Marketing June 22 nd ~ 24 th Brussels charles.stuman@cognovo.com www.cognovo.com Agenda SDM Separating hardware from

More information

Techniques for Optimizing Performance and Energy Consumption: Results of a Case Study on an ARM9 Platform

Techniques for Optimizing Performance and Energy Consumption: Results of a Case Study on an ARM9 Platform Techniques for Optimizing Performance and Energy Consumption: Results of a Case Study on an ARM9 Platform BL Standard IC s, PL Microcontrollers October 2007 Outline LPC3180 Description What makes this

More information

Accelerating Function Kernels for Elliptic Curve Operations and Mobile Communication Algorithms

Accelerating Function Kernels for Elliptic Curve Operations and Mobile Communication Algorithms Accelerating Function Kernels for Elliptic Curve Operations and Mobile Communication Algorithms Tensilica Day, Hannover Michael Gautschi Prof. Luca Benini Our group: Prof. Luca Benini ETH Zurich, Integrated

More information

DEEP LEARNING ACCELERATOR UNIT WITH HIGH EFFICIENCY ON FPGA

DEEP LEARNING ACCELERATOR UNIT WITH HIGH EFFICIENCY ON FPGA DEEP LEARNING ACCELERATOR UNIT WITH HIGH EFFICIENCY ON FPGA J.Jayalakshmi 1, S.Ali Asgar 2, V.Thrimurthulu 3 1 M.tech Student, Department of ECE, Chadalawada Ramanamma Engineering College, Tirupati Email

More information

M.Tech Student, Department of ECE, S.V. College of Engineering, Tirupati, India

M.Tech Student, Department of ECE, S.V. College of Engineering, Tirupati, India International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 5 ISSN : 2456-3307 High Performance Scalable Deep Learning Accelerator

More information

DNN ENGINE: A 16nm Sub-uJ DNN Inference Accelerator for the Embedded Masses

DNN ENGINE: A 16nm Sub-uJ DNN Inference Accelerator for the Embedded Masses DNN ENGINE: A 16nm Sub-uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough 1,2 S. K. Lee 2, N. Mulholland 2, P. Hansen 2, S. Kodali 3, D. Brooks 2, G.-Y. Wei 2 1 ARM Research, Boston,

More information

A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on

A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on on-chip Donghyun Kim, Kangmin Lee, Se-joong Lee and Hoi-Jun Yoo Semiconductor System Laboratory, Dept. of EECS, Korea Advanced

More information

Brainchip OCTOBER

Brainchip OCTOBER Brainchip OCTOBER 2017 1 Agenda Neuromorphic computing background Akida Neuromorphic System-on-Chip (NSoC) Brainchip OCTOBER 2017 2 Neuromorphic Computing Background Brainchip OCTOBER 2017 3 A Brief History

More information

Reminder. Course project team forming deadline. Course project ideas. Friday 9/8 11:59pm You will be randomly assigned to a team after the deadline

Reminder. Course project team forming deadline. Course project ideas. Friday 9/8 11:59pm You will be randomly assigned to a team after the deadline Reminder Course project team forming deadline Friday 9/8 11:59pm You will be randomly assigned to a team after the deadline Course project ideas If you have difficulty in finding team mates, send your

More information

Revolutionizing the Datacenter

Revolutionizing the Datacenter Power-Efficient Machine Learning using FPGAs on POWER Systems Ralph Wittig, Distinguished Engineer Office of the CTO, Xilinx Revolutionizing the Datacenter Join the Conversation #OpenPOWERSummit Top-5

More information

Revolutionizing RISC-V based application design possibilities with GLOBALFOUNDRIES. Gregg Bartlett Senior Vice President, CMOS Business Unit

Revolutionizing RISC-V based application design possibilities with GLOBALFOUNDRIES. Gregg Bartlett Senior Vice President, CMOS Business Unit Revolutionizing RISC-V based application design possibilities with GLOBALFOUNDRIES Gregg Bartlett Senior Vice President, CMOS Business Unit RISC-V: Driving New Architectures and Multi-core Systems GF Enabling

More information

The S6000 Family of Processors

The S6000 Family of Processors The S6000 Family of Processors Today s Design Challenges The advent of software configurable processors In recent years, the widespread adoption of digital technologies has revolutionized the way in which

More information

Adaptable Intelligence The Next Computing Era

Adaptable Intelligence The Next Computing Era Adaptable Intelligence The Next Computing Era Hot Chips, August 21, 2018 Victor Peng, CEO, Xilinx Pervasive Intelligence from Cloud to Edge to Endpoints >> 1 Exponential Growth and Opportunities Data Explosion

More information

Research Challenges for FPGAs

Research Challenges for FPGAs Research Challenges for FPGAs Vaughn Betz CAD Scalability Recent FPGA Capacity Growth Logic Eleme ents (Thousands) 400 350 300 250 200 150 100 50 0 MCNC Benchmarks 250 nm FLEX 10KE Logic: 34X Memory Bits:

More information

Lab 1 Introduction to Microcontroller

Lab 1 Introduction to Microcontroller Lab 1 Introduction to Microcontroller Feb. 2016 1 Objective 1. To be familiar with microcontrollers. 2. Introducing LPC2138 microcontroller. 3. To be familiar with Keil and Proteus software tools. Introduction

More information

Neural Computer Architectures

Neural Computer Architectures Neural Computer Architectures 5kk73 Embedded Computer Architecture By: Maurice Peemen Date: Convergence of different domains Neurobiology Applications 1 Constraints Machine Learning Technology Innovations

More information

New System Solutions for Laser Printer Applications by Oreste Emanuele Zagano STMicroelectronics

New System Solutions for Laser Printer Applications by Oreste Emanuele Zagano STMicroelectronics New System Solutions for Laser Printer Applications by Oreste Emanuele Zagano STMicroelectronics Introduction Recently, the laser printer market has started to move away from custom OEM-designed 1 formatter

More information

ELCT708 MicroLab Session #1 Introduction to Embedded Systems and Microcontrollers. Eng. Salma Hesham

ELCT708 MicroLab Session #1 Introduction to Embedded Systems and Microcontrollers. Eng. Salma Hesham ELCT708 MicroLab Session #1 Introduction to Embedded Systems and Microcontrollers What is common between these systems? What is common between these systems? Each consists of an internal smart computer

More information

Parallel digital signal processing in a mw power envelope: how and why? Multithermand AdG Multiscale Thermal Management of Computing Systems

Parallel digital signal processing in a mw power envelope: how and why? Multithermand AdG Multiscale Thermal Management of Computing Systems Parallel digital signal processing in a mw power envelope: how and why? Multithermand AdG Multiscale Thermal Management of Computing Systems Luca Benini DEI-UNIBO & IIS-ETHZ IOT or Data Deluge? Highly

More information

Deep Learning on Arm Cortex-M Microcontrollers. Rod Crawford Director Software Technologies, Arm

Deep Learning on Arm Cortex-M Microcontrollers. Rod Crawford Director Software Technologies, Arm Deep Learning on Arm Cortex-M Microcontrollers Rod Crawford Director Software Technologies, Arm What is Machine Learning (ML)? Artificial Intelligence Machine Learning Deep Learning Neural Networks Additional

More information

THE LPC84X MCU FAMILY A MULTI-TESTER TOOL OFFERING FEATURES FOR YOUR NEXT IOT DESIGN

THE LPC84X MCU FAMILY A MULTI-TESTER TOOL OFFERING FEATURES FOR YOUR NEXT IOT DESIGN THE LPC84X MCU FAMILY A MULTI-TESTER TOOL OFFERING FEATURES FOR YOUR NEXT IOT DESIGN KEVIN TOWNSEND (MICROBUILDER) BRENDON SLADE (NXP) Agenda Part I Overview of the LPC84x Multi-Tester Swiss army knife

More information

AVR XMEGA Product Line Introduction AVR XMEGA TM. Product Introduction.

AVR XMEGA Product Line Introduction AVR XMEGA TM. Product Introduction. AVR XMEGA TM Product Introduction 32-bit AVR UC3 AVR Flash Microcontrollers The highest performance AVR in the world 8/16-bit AVR XMEGA Peripheral Performance 8-bit megaavr The world s most successful

More information

Management building blocks speed AdvancedTCA product development

Management building blocks speed AdvancedTCA product development TELECOM S P E C I A L F E A T U R E Management building blocks speed AdvancedTCA product development By Mark Overgaard The IPM Sentry Intelligent Platform Management products provide off-the-shelf building

More information

Runtime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays

Runtime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays Runtime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays Éricles Sousa 1, Frank Hannig 1, Jürgen Teich 1, Qingqing Chen 2, and Ulf Schlichtmann

More information

NANOIOTECH The Future of Nanotechnologies for IoT & Smart Wearables Semiconductor Technology at the Core of IoT Applications

NANOIOTECH The Future of Nanotechnologies for IoT & Smart Wearables Semiconductor Technology at the Core of IoT Applications NANOIOTECH The Future of Nanotechnologies for IoT & Smart Wearables Semiconductor Technology at the Core of IoT Applications Giorgio Cesana STMicroelectronics Success Factors for new smart connected Applications

More information

Outline Marquette University

Outline Marquette University COEN-4710 Computer Hardware Lecture 1 Computer Abstractions and Technology (Ch.1) Cristinel Ababei Department of Electrical and Computer Engineering Credits: Slides adapted primarily from presentations

More information

ARROW ARIS EDGE Board User s Guide 27/09/2017

ARROW ARIS EDGE Board User s Guide 27/09/2017 ARROW ARIS EDGE Board User s Guide All information contained in these materials, including products and product specifications, represents information on the product at the time of publication and is subject

More information

Computer Architecture

Computer Architecture Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 10 Thread and Task Level Parallelism Computer Architecture Part 10 page 1 of 36 Prof. Dr. Uwe Brinkschulte,

More information

Embedded Computing Platform. Architecture and Instruction Set

Embedded Computing Platform. Architecture and Instruction Set Embedded Computing Platform Microprocessor: Architecture and Instruction Set Ingo Sander ingo@kth.se Microprocessor A central part of the embedded platform A platform is the basic hardware and software

More information

Low-Power Neural Processor for Embedded Human and Face detection

Low-Power Neural Processor for Embedded Human and Face detection Low-Power Neural Processor for Embedded Human and Face detection Olivier Brousse 1, Olivier Boisard 1, Michel Paindavoine 1,2, Jean-Marc Philippe, Alexandre Carbon (1) GlobalSensing Technologies (GST)

More information

The Use Of Virtual Platforms In MP-SoC Design. Eshel Haritan, VP Engineering CoWare Inc. MPSoC 2006

The Use Of Virtual Platforms In MP-SoC Design. Eshel Haritan, VP Engineering CoWare Inc. MPSoC 2006 The Use Of Virtual Platforms In MP-SoC Design Eshel Haritan, VP Engineering CoWare Inc. MPSoC 2006 1 MPSoC Is MP SoC design happening? Why? Consumer Electronics Complexity Cost of ASIC Increased SW Content

More information

SpiNNaker a Neuromorphic Supercomputer. Steve Temple University of Manchester, UK SOS21-21 Mar 2017

SpiNNaker a Neuromorphic Supercomputer. Steve Temple University of Manchester, UK SOS21-21 Mar 2017 SpiNNaker a Neuromorphic Supercomputer Steve Temple University of Manchester, UK SOS21-21 Mar 2017 Outline of talk Introduction Modelling neurons Architecture and technology Principles of operation Summary

More information

Unleashing the Power of Embedded DRAM

Unleashing the Power of Embedded DRAM Copyright 2005 Design And Reuse S.A. All rights reserved. Unleashing the Power of Embedded DRAM by Peter Gillingham, MOSAID Technologies Incorporated Ottawa, Canada Abstract Embedded DRAM technology offers

More information

EE241 - Spring 2004 Advanced Digital Integrated Circuits

EE241 - Spring 2004 Advanced Digital Integrated Circuits EE24 - Spring 2004 Advanced Digital Integrated Circuits Borivoje Nikolić Lecture 2 Impact of Scaling Class Material Last lecture Class scope, organization Today s lecture Impact of scaling 2 Major Roadblocks.

More information

High performance, power-efficient DSPs based on the TI C64x

High performance, power-efficient DSPs based on the TI C64x High performance, power-efficient DSPs based on the TI C64x Sridhar Rajagopal, Joseph R. Cavallaro, Scott Rixner Rice University {sridhar,cavallar,rixner}@rice.edu RICE UNIVERSITY Recent (2003) Research

More information

SiFive Freedom SoCs: Industry s First Open-Source RISC-V Chips

SiFive Freedom SoCs: Industry s First Open-Source RISC-V Chips SiFive Freedom SoCs: Industry s First Open-Source RISC-V Chips Yunsup Lee Co-Founder and CTO High Upfront Cost Has Killed Innovation Our industry needs a fundamental change Total SoC Development Cost Design

More information

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture Lecture 9: Multiprocessors Challenges of Parallel Processing First challenge is % of program inherently

More information

L évolution des architectures et des technologies d intégration des circuits intégrés dans les Data centers

L évolution des architectures et des technologies d intégration des circuits intégrés dans les Data centers I N S T I T U T D E R E C H E R C H E T E C H N O L O G I Q U E L évolution des architectures et des technologies d intégration des circuits intégrés dans les Data centers 10/04/2017 Les Rendez-vous de

More information

New Silicon Frontiers: Physically Flexible System-on-a-Chip

New Silicon Frontiers: Physically Flexible System-on-a-Chip New Silicon Frontiers: Physically Flexible System-on-a-Chip Richard L. Chaney, Douglas R. Hackler, Kelly J. DeGregorio, Dale G. Wilson This work sponsored in part by the Rapid Response Technology Office

More information

Xynergy It really makes the difference!

Xynergy It really makes the difference! Xynergy It really makes the difference! STM32F217 meets XILINX Spartan-6 Why Xynergy? Very easy: There is a clear Synergy achieved by combining the last generation of the most popular ARM Cortex-M3 implementation

More information

COL862 - Low Power Computing

COL862 - Low Power Computing COL862 - Low Power Computing Power Measurements using performance counters and studying the low power computing techniques in IoT development board (PSoC 4 BLE Pioneer Kit) and Arduino Mega 2560 Submitted

More information

Product Technical Brief S3C2440X Series Rev 2.0, Oct. 2003

Product Technical Brief S3C2440X Series Rev 2.0, Oct. 2003 Product Technical Brief S3C2440X Series Rev 2.0, Oct. 2003 S3C2440X is a derivative product of Samsung s S3C24XXX family of microprocessors for mobile communication market. The S3C2440X s main enhancement

More information

Low-Power Processor Solutions for Always-on Devices

Low-Power Processor Solutions for Always-on Devices Low-Power Processor Solutions for Always-on Devices Pieter van der Wolf MPSoC 2014 July 7 11, 2014 2014 Synopsys, Inc. All rights reserved. 1 Always-on Mobile Devices Mobile devices on the move Mobile

More information

Abbas El Gamal. Joint work with: Mingjie Lin, Yi-Chang Lu, Simon Wong Work partially supported by DARPA 3D-IC program. Stanford University

Abbas El Gamal. Joint work with: Mingjie Lin, Yi-Chang Lu, Simon Wong Work partially supported by DARPA 3D-IC program. Stanford University Abbas El Gamal Joint work with: Mingjie Lin, Yi-Chang Lu, Simon Wong Work partially supported by DARPA 3D-IC program Stanford University Chip stacking Vertical interconnect density < 20/mm Wafer Stacking

More information

CMP Conference 20 th January Director of Business Development EMEA

CMP Conference 20 th January Director of Business Development EMEA CMP Conference 20 th January 2011 eric.lalardie@arm.com Director of Business Development EMEA +33 6 07 83 09 60 1 1 Unparalleled Applicability ARM Cortex Advanced Processors Architectural innovation, compatibility

More information

Maximize energy efficiency in a normally-off system using NVRAM. Stéphane Gros Yeter Akgul

Maximize energy efficiency in a normally-off system using NVRAM. Stéphane Gros Yeter Akgul Maximize energy efficiency in a normally-off system using NVRAM Stéphane Gros Yeter Akgul Summary THE COMPANY THE CONTEXT THE TECHNOLOGY THE SYSTEM THE CO-DEVELOPMENT CONCLUSION May 31, 2017 2 Summary

More information

Embedded System Design

Embedded System Design Embedded System Design Stephen A. Edwards Columbia University Spring 2015 Spot the Computer Cars These Days... Embedded Systems: Ubiquitous Computers iphone Laser Keyboard Nikon D300 Video Watch GPS Playstation

More information