Parallella: A $99 Open Hardware Parallel Computing Platform

Similar documents
An Alternative to GPU Acceleration For Mobile Platforms

A Closer Look at the Epiphany IV 28nm 64 core Coprocessor. Andreas Olofsson PEGPUM 2013

There s STILL plenty of room at the bottom! Andreas Olofsson

How to build a Megacore microprocessor. by Andreas Olofsson (MULTIPROG WORKSHOP 2017)

Trends and Challenges in Multicore Programming

GPUs and Emerging Architectures

The Epiphany Manycore Architecture Seminar at Halmstad Högskola March 6,

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D.

A Scalable Processor Architecture for the Next Generation of Low Power Supercomputer. PRACE Workshop, October 2010 Andreas Olofsson

epython An Implementation of Python Enabling Accessible Programming of Micro-Core Architectures Nick Brown, EPCC

Instruction Set Architecture ( ISA ) 1 / 28

Ten (or so) Small Computers

Tutorial. Preparing for Stampede: Programming Heterogeneous Many-Core Supercomputers

Things I learned while designing the Epiphany & Parallella. Presentation at Chalmers University of Technology Feb 2, 2015

Parallella Technical Conference Tokyo May 30, 2015

Kalray MPPA Manycore Challenges for the Next Generation of Professional Applications Benoît Dupont de Dinechin MPSoC 2013

A General Discussion on! Parallelism!

An Evaluation of an Energy Efficient Many-Core SoC with Parallelized Face Detection

Parallel Hybrid Computing Stéphane Bihan, CAPS

Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins

COMPUTING ELEMENT EVOLUTION AND ITS IMPACT ON SIMULATION CODES

NoC Simulation in Heterogeneous Architectures for PGAS Programming Model

Introduction to Xeon Phi. Bill Barth January 11, 2013

What does Heterogeneity bring?

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D.

EXASCALE COMPUTING ROADMAP IMPACT ON LEGACY CODES MARCH 17 TH, MIC Workshop PAGE 1. MIC workshop Guillaume Colin de Verdière

Experiences Using Tegra K1 and X1 for Highly Energy Efficient Computing

Intel Xeon Phi Coprocessors

Trends in HPC (hardware complexity and software challenges)

Heterogeneous Computing and OpenCL

A Multicore Processor Designed For PetaFLOPS Computation

Real Parallel Computers

March 10 11, 2015 San Jose

Simplify System Complexity

2 Introduction to parallel computing. Chip Multiprocessors (ACS MPhil) Robert Mullins

Introduction: Modern computer architecture. The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes

Overview. 2 Introduction to parallel computing. The control structure. Parallel computers

Ettus Research Update

Simplify System Complexity

11/2/17. Agenda. Example: CPU with Two Cores. Parallel Computer Architectures. Improving Performance. CS 61C: Great Ideas in Computer Architecture

Addressing the Increasing Challenges of Debugging on Accelerated HPC Systems. Ed Hinkel Senior Sales Engineer

Dr. Yassine Hariri CMC Microsystems

Threaded Programming. Lecture 9: Alternatives to OpenMP

ECE 574 Cluster Computing Lecture 18

Computer Architecture and Structured Parallel Programming James Reinders, Intel

PORTING CP2K TO THE INTEL XEON PHI. ARCHER Technical Forum, Wed 30 th July Iain Bethune

Parallel Programming Libraries and implementations

n N c CIni.o ewsrg.au

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620

Tile Processor (TILEPro64)

Integrating DMA capabilities into BLIS for on-chip data movement. Devangi Parikh Ilya Polkovnichenko Francisco Igual Peña Murtaza Ali

Computing architectures Part 2 TMA4280 Introduction to Supercomputing

Ian Buck, GM GPU Computing Software

COMPLEX EMBEDDED SYSTEMS

An innovative compilation tool-chain for embedded multi-core architectures M. Torquati, Computer Science Departmente, Univ.

CS 110 Computer Architecture

Technology for a better society. hetcomp.com

Big Data mais basse consommation L apport des processeurs manycore

Experts in Application Acceleration Synective Labs AB

The Era of Heterogeneous Computing

Complexity and Advanced Algorithms. Introduction to Parallel Algorithms

To hear the audio, please be sure to dial in: ID#

arxiv: v1 [cs.dc] 18 Aug 2016

Parallelism. Master 1 International. Andrea G. B. Tettamanzi. Université de Nice Sophia Antipolis Département Informatique

Performance Analysis of Memory Transfers and GEMM Subroutines on NVIDIA TESLA GPU Cluster

A 101 Guide to Heterogeneous, Accelerated, Data Centric Computing Architectures

Godson Processor and its Application in High Performance Computers

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.

The Stampede is Coming: A New Petascale Resource for the Open Science Community

Accelerating sequential computer vision algorithms using commodity parallel hardware

Dealing with Heterogeneous Multicores

PROGRAMOVÁNÍ V C++ CVIČENÍ. Michal Brabec

Bulk-synchronous pseudo-streaming for many-core accelerators

Practical Parallella expansion board design

Introduction to GPU hardware and to CUDA

Concurrency: what, why, how

UMBC. Rubini and Corbet, Linux Device Drivers, 2nd Edition, O Reilly. Systems Design and Programming

Achieving High Performance. Jim Cownie Principal Engineer SSG/DPD/TCAR Multicore Challenge 2013

Simplifying FPGA Design for SDR with a Network on Chip Architecture

Parallel Programming on Ranger and Stampede

The Heterogeneous Programming Jungle. Service d Expérimentation et de développement Centre Inria Bordeaux Sud-Ouest

Introduction to the Intel Xeon Phi on Stampede

Parallel and High Performance Computing CSE 745

Many-core Processor Programming for beginners. Hongsuk Yi ( 李泓錫 ) KISTI (Korea Institute of Science and Technology Information)

Programming for the Intel Many Integrated Core Architecture By James Reinders. The Architecture for Discovery. PowerPoint Title

Real Parallel Computers

Architecture, Programming and Performance of MIC Phi Coprocessor

Chapter 1 Computer System Overview

THE LEADER IN VISUAL COMPUTING

Efficiency and Programmability: Enablers for ExaScale. Bill Dally Chief Scientist and SVP, Research NVIDIA Professor (Research), EE&CS, Stanford

Introduction to GPU computing

CPU-GPU Heterogeneous Computing

Outline Marquette University

Chapter 1 Computer System Overview

Heterogeneous Architecture. Luca Benini

Constructing a low cost and low powered cluster with Parallella boards

Tracing embedded heterogeneous systems

ECE 571 Advanced Microprocessor-Based Design Lecture 20

High performance Computing and O&G Challenges

Parallel Languages: Past, Present and Future

Transcription:

Inventing the Future of Computing Parallella: A $99 Open Hardware Parallel Computing Platform Andreas Olofsson andreas@adapteva.com IPDPS May 22th, Cambridge, MA

Adapteva Achieves 3 World Firsts 1. First commercial processor to reach 50 GFLOPS/W 3. First semiconductor company to successfully crowd source project 2. First mobile processor with an open source OpenCL TM SDK 2

Adapteva s Goals in 2008 A C/C++ programmable multicore processor Scalable to 1000 s of cores on a chip Native IEEE floating point support Easy to Use 50 GFLOPS/Watt in 65nm 3

Our Inspiration Transputer Inmos (1984) RAW MIT (1997) Tile Tilera (2006) Teraflop Intel (2007) http://www.adapteva.com/white papers/thesiren song of parallel computing/ 4

Our guiding light Parallel Efficient Heterogeneous Robust 5

Any Reason to Think the Future of Computing is NOT Parallel? No Computing Parallel Computing No Electronic Computing 1943 Von Neumann Age Serial Computing 1943 2013? Parallel Computing 2013?? 6

A Practical Start: True Heterogeneous Computing SYSTEM ON CHIP BIG BIG BIG O/S Application BIG The CPU Joker FPGA CPU Graphics CPU Weird Math GPU Analog CPU 1000 s of small RISC CPUs Math 7

The Epiphany Coprocessor 32 128KB Local Memory 1.6 GFLOPS Per Core @ ~25mW Packet Based Network On Chip With 100GB/s Bisection BW Coprocessor for ARM/x86 Host <20pJ / FLOP! MIMD/Task-Parallel Accelerator 8

Pragmatic Architecture Tradeoffs IN Dual issue RISC processors 64 entry register file Shared memory architecture 32-128KB per core memory Multi-banked local memory Packet based Mesh NOC 32 Bit IEEE float/int arithmetic Memory protection Timers, Interrupts, DMAs OUT Any special purpose instructions Hardware caching SIMD Optimized remote read accesses Memory management unit Strict memory order model 9

How the $#@% Do We Program This Thing? 10

The Current State of Parallel Programming Source: Github How To Make Every Programmer a Parallel Programmer? 11

Parallel Programming Frameworks Erlang SystemC Intel TBB Co Fortran Lisp Janus Scala Haskell Pragmas Fortress Hadoop Linda Smalltalk CUDA Clojure UPC PVM Alef Julia OpenCL Go X10 Posix XC Occam OpenHMPP ParaSail APL Simulink Charm++ Occam pi OpenMP Ada Labview Ptolemy StreamIt Verilog OpenACC C++ Amp Rust Sisal Star P VHDL Cilk Chapel MPI MCAPI????????? 12

Stupid Hurdles That Get in the Way of Progress Proprietary SDKs and programming frameworks Lack of datasheets/documents Closed source drivers Expensive lock-in hardware NDA requirements Exlcusive access 13

The Parallella Project Guidelines A $99 single board parallel computer that runs Linux Open source (SDK, board files, drivers) (github.com/parallella) Open documentation (adapteva.com/all-documents) Open to all (forums.parallella.org) 14

The Parallella Board Gigabit Ethernet uusb 5V DC Zynq dual core ARM- A9 (with FPGA Logic) 16-core Epiphany Coprocessor uusb uhdmi usd 1GB SDRAM 15

The Parallella Backside (optional) FPGA Logic Connector 6 GB/s! BW Epiphany North Connector Instrumentation Connector Epiphany South Connector 16

Parallella Kickstarter Campaign 5,000 customers 6,300 boards pre-sold in 4 weeks 67 countries, all 50 US states 50-75% of backers are developers 11,000 more signups since Jan 1st Backer Application Interest: Software Defined Radio Ray tracing/rendering Image processing Robotics Gaming Cryptography Parallel computing research Distributed Computing Machine Learning HPC 17

The Parallella 1% Academic Program Adapteva will donate (at least) 1 board for every 100 boards sold! Open to all academic institutions active in parallel computing research & education Program starts June 1st, 2013! 18