Heterogeneous SoCs. May 28, 2014 COMPUTER SYSTEM COLLOQUIUM 1

Size: px
Start display at page:

Download "Heterogeneous SoCs. May 28, 2014 COMPUTER SYSTEM COLLOQUIUM 1"

Transcription

1 COSCOⅣ Heterogeneous SoCs M HASEGAWA TORU M IDONUMA TOSHIICHI May 28, 2014 COMPUTER SYSTEM COLLOQUIUM 1

2 Contents Background Heterogeneous technology May 28, 2014 COMPUTER SYSTEM COLLOQUIUM 2

3 Background Moore s Law today May 28, 2014 COMPUTER SYSTEM COLLOQUIUM 3

4 Background System-on-a-Chip A computer system is realized on a chip Due to the improvement of the integration technology Metris Small area High performance Low power Demerits Complex design Decrease of yield May 28, 2014 COMPUTER SYSTEM COLLOQUIUM 4

5 Background ARM SoC Block Diagram Reference: Wikipedia May 28, 2014 COMPUTER SYSTEM COLLOQUIUM 5

6 Background Roadmap of SoC May 28, 2014 COMPUTER SYSTEM COLLOQUIUM 6

7 Background Inflections in Processor Design May 28, 2014 COMPUTER SYSTEM COLLOQUIUM 7

8 Background Homogeneous Using the same kind processor Main architecture in recently Major processor are corei7, Xeon, SPARC64 VII, and etc. Fig. Intel Xeon Phi 5110P May 28, 2014 COMPUTER SYSTEM COLLOQUIUM 8

9 Background Heterogeneous Next generation architecture Using the different kind processor Difference of architecture Fast CPU + slow CPU CPU + GPU ARM + FPGA etc Purpose Optimize the efficiency of CPU core Keep the single thread performance and extract the high performance of multi threads May 28, 2014 COMPUTER SYSTEM COLLOQUIUM 9

10 Heterogeneous Technology HETEROGENEOUS SYSTEM ARCHITECTURE(HSA) May 28, 2014 COMPUTER SYSTEM COLLOQUIUM 10

11 What is HSA? The HSA Foundations launched mid 2012 HSA is a new, open architecture specification HSAIL virtual (parallel) instruction set HSA memory model HSA dispatcher and run-time Provide an optimized platform architecture for heterogeneous programming models such as OpenCL, C++AMP, et al May 28, 2014 COMPUTER SYSTEM COLLOQUIUM 11

12 What is HSA? Processor design that makes it easy to harness the entire computing power of an APU for faster and more power-efficient devise, tablets, smartphones and cloud servers GPU PARALLEL WORKLOADS huma(memory) HAS CPU SERIAL WORKLOADS APU ACCELERATED PROCESSING UNIT May 28, 2014 COMPUTER SYSTEM COLLOQUIUM 12

13 State-of-the-art Heterogeneous Processor May 28, 2014 COMPUTER SYSTEM COLLOQUIUM 13

14 HSA Running Model May 28, 2014 COMPUTER SYSTEM COLLOQUIUM 14

15 HSA Software Stack Make GPU easily accessible Support mainstream languages Expandable to domain specific languages Make compute offload efficient Eliminate memory copying Low-latency dispatch Make it ubiquitous Drive-standard through HAS Foundation Open Source key components Optimized Compiler Technology Leverage llvm frame work HASIL as new IR for heterogeneous computing May 28, 2014 COMPUTER SYSTEM COLLOQUIUM 15

16 Key HSA Point HETEROGENEOUS UNIFORM MEMORY ACCESS( HUMA) HQ(HETEROGENEOUS QUEING) May 28, 2014 COMPUTER SYSTEM COLLOQUIUM 16

17 Understating UMA Original meaning of UMA is Uniform Memory Access Refers to how to processing cores in a system view and access memory All processing cores in a true UMA system share a single memory address space Introduction of GPU compute created Non-Uniform Memory Access(NUMA) Require data to be managed across multiple heaps with different address spaces Add programming complexity due to frequent copies, synchronization, and address translation HAS restores the GPU to Uniform memory Access Heterogeneous computing replace GPU Computing May 28, 2014 COMPUTER SYSTEM COLLOQUIUM 17

18 Introducing huma CPU UMA Memory CPU CPU CPU APU NUMA CPU Memory CPU CPU CPU GPU Memory GPU GPU APU With HSA huma CPU CPU CPU Memory GPU GPU GPU May 28, 2014 COMPUTER SYSTEM COLLOQUIUM 18

19 huma Key Features Coherent memory: Ensures CPU and GPU Caches both see An up-to-date view of data CPU Cache Physical Memory Virtual Memory GPU Cache Page-able memory: The GPU can seamlessly access virtual memory addresses that are not (yet) Present in physical memory Entire memory space: Both CPU and GPU can access and allocate any location in the system s virtual memory space May 28, 2014 COMPUTER SYSTEM COLLOQUIUM 19

20 huma Features Access to Entire Memory Space Page-able memory Bi-directional Coherency Fast GPU access to system memory Dynamic Memory Allocation May 28, 2014 COMPUTER SYSTEM COLLOQUIUM 20

21 What is hq? huma defines how processors inside an APU access memory hq defines how processors inside an APU interact with each other to handle computational tasks With hq, the relationship between the CPU and GPU changes May 28, 2014 COMPUTER SYSTEM COLLOQUIUM 21

22 The Trouble With The Old Way of Queuing Application Application Task Queues CPU OS Service Kernel Driver OS-Managed Task Queue GPU CPU->CPU dispatch model is fast and direct CPU->GPU dispatch is through an OS service which adds large due to Kernel Mode transitions and scheduling overhead Kernel driver translates task packets to vendor specific format GPU can only consume tasks and cannot dispatch new tasks to itself or to the CPU May 28, 2014 COMPUTER SYSTEM COLLOQUIUM 22

23 From Master/Slavs to All- Processors-Equal Application Task Queues Application Application Task Queues Application Task Queue enables reduced dispatch latency to GPU Powerful GPU dispatch model adds flexibility to create new work for GPU and/or CPU Application include ray-tracing, graphtraversal, recursive algorithms CPU GPU May 28, 2014 COMPUTER SYSTEM COLLOQUIUM 23

24 hq Provides Direct to Hardware Application A Application B Application C Hardware Queue GPU Application codes to the hardware queue User mode queuing Hardware scheduling Low dispatch times Multiple Queues No Kernel Mode Drivers No Kernel Mode Transitions No Kernel Overhead! May 28, 2014 COMPUTER SYSTEM COLLOQUIUM 24

25 HSA Solution Stack HAS Intermediate Language(HSAIL) The HAS design allows multiple hardware solutions to be exposed to software through a common standard low-level interface May 28, 2014 COMPUTER SYSTEM COLLOQUIUM 25

26 Heterogeneous Technology SILICON PHONICS TECHNOLOGY May 28, 2014 COMPUTER SYSTEM COLLOQUIUM 26

27 Silicon Photonics Technology(1/2) Photonics emit, detect, send, and control the light using the electronics Silicon it is a material for semiconductor Silicon Photonics approach to transfer large amounts of data using optical fiber compare the copper cable lower the power consumption accelerate the transfer time May 28, 2014 COMPUTER SYSTEM COLLOQUIUM 27

28 Silicon Photonics Technology(2/2) The light converts to digital signal Use the phase shifter Path the waveguides of light Infrared(IR) can path the silicon, however it is transparence Components are waveguides, modulator, detection Intel researches this, and products the MXC MXC chooses SPT and New optical fiber ClearCurve, and has possibility are transfer rate is 1.6Tbps and miniaturization May 28, 2014 COMPUTER SYSTEM COLLOQUIUM 28

29 Silicon Photonics Components Fig. Intel Silicon Photonics May 28, 2014 COMPUTER SYSTEM COLLOQUIUM 29

30 Overview of Silicon Photonic Fig. Demonstration of a High Speed 4-Channel Integrated Silicon Photonics WDM Link with Hybrid Silicon Lasers May 28, 2014 COMPUTER SYSTEM COLLOQUIUM 30

31 Transmitter Fig. Demonstration of a High Speed 4-Channel Integrated Silicon Photonics WDM Link with Hybrid Silicon Lasers May 28, 2014 COMPUTER SYSTEM COLLOQUIUM 31

32 Receiver Fig. Demonstration of a High Speed 4-Channel Integrated Silicon Photonics WDM Link with Hybrid Silicon Lasers May 28, 2014 COMPUTER SYSTEM COLLOQUIUM 32

33 Tx & Rx Packages chip Fig. Intel developed Tx & Rx chip May 28, 2014 COMPUTER SYSTEM COLLOQUIUM 33

34 Estimation Data Rates Fig. Demonstration of a High Speed 4-Channel Integrated Silicon Photonics WDM Link with Hybrid Silicon Lasers May 28, 2014 COMPUTER SYSTEM COLLOQUIUM 34

35 Photonics + CMOS-cicuit Fig. PHOTONICS RESEARCH GROUP, integrating a Tb/s optical interconnect layer into CMOS-systems May 28, 2014 COMPUTER SYSTEM COLLOQUIUM 35

36 Thank you May 28, 2014 COMPUTER SYSTEM COLLOQUIUM 36

HSA Foundation! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar Room (Bld 20)! 15 December, 2017!

HSA Foundation! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar Room (Bld 20)! 15 December, 2017! Advanced Topics on Heterogeneous System Architectures HSA Foundation! Politecnico di Milano! Seminar Room (Bld 20)! 15 December, 2017! Antonio R. Miele! Marco D. Santambrogio! Politecnico di Milano! 2

More information

HSA foundation! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar Room A. Alario! 23 November, 2015!

HSA foundation! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar Room A. Alario! 23 November, 2015! Advanced Topics on Heterogeneous System Architectures HSA foundation! Politecnico di Milano! Seminar Room A. Alario! 23 November, 2015! Antonio R. Miele! Marco D. Santambrogio! Politecnico di Milano! 2

More information

HETEROGENEOUS SYSTEM ARCHITECTURE: PLATFORM FOR THE FUTURE

HETEROGENEOUS SYSTEM ARCHITECTURE: PLATFORM FOR THE FUTURE HETEROGENEOUS SYSTEM ARCHITECTURE: PLATFORM FOR THE FUTURE Haibo Xie, Ph.D. Chief HSA Evangelist AMD China OUTLINE: The Challenges with Computing Today Introducing Heterogeneous System Architecture (HSA)

More information

Heterogeneous Architecture. Luca Benini

Heterogeneous Architecture. Luca Benini Heterogeneous Architecture Luca Benini lbenini@iis.ee.ethz.ch Intel s Broadwell 03.05.2016 2 Qualcomm s Snapdragon 810 03.05.2016 3 AMD Bristol Ridge Departement Informationstechnologie und Elektrotechnik

More information

SIMULATOR AMD RESEARCH JUNE 14, 2015

SIMULATOR AMD RESEARCH JUNE 14, 2015 AMD'S gem5apu SIMULATOR AMD RESEARCH JUNE 14, 2015 OVERVIEW Introducing AMD s gem5 APU Simulator Extends gem5 with a GPU timing model Supports Heterogeneous System Architecture in SE mode Includes several

More information

CLICK TO EDIT MASTER TITLE STYLE. Click to edit Master text styles. Second level Third level Fourth level Fifth level

CLICK TO EDIT MASTER TITLE STYLE. Click to edit Master text styles. Second level Third level Fourth level Fifth level CLICK TO EDIT MASTER TITLE STYLE Second level THE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPU PAUL BLINZER, FELLOW, HSA SYSTEM SOFTWARE, AMD SYSTEM ARCHITECTURE WORKGROUP CHAIR, HSA FOUNDATION

More information

Modern Processor Architectures. L25: Modern Compiler Design

Modern Processor Architectures. L25: Modern Compiler Design Modern Processor Architectures L25: Modern Compiler Design The 1960s - 1970s Instructions took multiple cycles Only one instruction in flight at once Optimisation meant minimising the number of instructions

More information

Modern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design

Modern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design Modern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design The 1960s - 1970s Instructions took multiple cycles Only one instruction in flight at once Optimisation meant

More information

THE HETEROGENEOUS SYSTEM ARCHITECTURE IT S BEYOND THE GPU

THE HETEROGENEOUS SYSTEM ARCHITECTURE IT S BEYOND THE GPU THE HETEROGENEOUS SYSTEM ARCHITECTURE IT S BEYOND THE GPU PAUL BLINZER AMD INC, FELLOW, SYSTEM SOFTWARE SYSTEM ARCHITECTURE WORKGROUP CHAIR HSA FOUNDATION THE HSA VISION MAKE HETEROGENEOUS PROGRAMMING

More information

THE PROGRAMMER S GUIDE TO THE APU GALAXY. Phil Rogers, Corporate Fellow AMD

THE PROGRAMMER S GUIDE TO THE APU GALAXY. Phil Rogers, Corporate Fellow AMD THE PROGRAMMER S GUIDE TO THE APU GALAXY Phil Rogers, Corporate Fellow AMD THE OPPORTUNITY WE ARE SEIZING Make the unprecedented processing capability of the APU as accessible to programmers as the CPU

More information

CS550. TA: TBA Office: xxx Office hours: TBA. Blackboard:

CS550. TA: TBA   Office: xxx Office hours: TBA. Blackboard: CS550 Advanced Operating Systems (Distributed Operating Systems) Instructor: Xian-He Sun Email: sun@iit.edu, Phone: (312) 567-5260 Office hours: 1:30pm-2:30pm Tuesday, Thursday at SB229C, or by appointment

More information

AMD s Unified CPU & GPU Processor Concept

AMD s Unified CPU & GPU Processor Concept Advanced Seminar Computer Engineering Institute of Computer Engineering (ZITI) University of Heidelberg February 5, 2014 Overview 1 2 Current Platforms: 3 4 5 Architecture 6 2/37 Single-thread Performance

More information

Energy Efficient K-Means Clustering for an Intel Hybrid Multi-Chip Package

Energy Efficient K-Means Clustering for an Intel Hybrid Multi-Chip Package High Performance Machine Learning Workshop Energy Efficient K-Means Clustering for an Intel Hybrid Multi-Chip Package Matheus Souza, Lucas Maciel, Pedro Penna, Henrique Freitas 24/09/2018 Agenda Introduction

More information

Flavors of Memory supported by Linux, their use and benefit. Christoph Lameter, Ph.D,

Flavors of Memory supported by Linux, their use and benefit. Christoph Lameter, Ph.D, Flavors of Memory supported by Linux, their use and benefit Christoph Lameter, Ph.D, Twitter: @qant Flavors Of Memory The term computer memory is a simple term but there are numerous nuances

More information

Next Generation Enterprise Solutions from ARM

Next Generation Enterprise Solutions from ARM Next Generation Enterprise Solutions from ARM Ian Forsyth Director Product Marketing Enterprise and Infrastructure Applications Processor Product Line Ian.forsyth@arm.com 1 Enterprise Trends IT is the

More information

CCIX: a new coherent multichip interconnect for accelerated use cases

CCIX: a new coherent multichip interconnect for accelerated use cases : a new coherent multichip interconnect for accelerated use cases Akira Shimizu Senior Manager, Operator relations Arm 2017 Arm Limited Arm 2017 Interconnects for different scale SoC interconnect. Connectivity

More information

ad-heap: an Efficient Heap Data Structure for Asymmetric Multicore Processors

ad-heap: an Efficient Heap Data Structure for Asymmetric Multicore Processors ad-heap: an Efficient Heap Data Structure for Asymmetric Multicore Processors Weifeng Liu and Brian Vinter Niels Bohr Institute University of Copenhagen Denmark {weifeng, vinter}@nbi.dk March 1, 2014 Weifeng

More information

Spring 2011 Parallel Computer Architecture Lecture 4: Multi-core. Prof. Onur Mutlu Carnegie Mellon University

Spring 2011 Parallel Computer Architecture Lecture 4: Multi-core. Prof. Onur Mutlu Carnegie Mellon University 18-742 Spring 2011 Parallel Computer Architecture Lecture 4: Multi-core Prof. Onur Mutlu Carnegie Mellon University Research Project Project proposal due: Jan 31 Project topics Does everyone have a topic?

More information

Fast packet processing in the cloud. Dániel Géhberger Ericsson Research

Fast packet processing in the cloud. Dániel Géhberger Ericsson Research Fast packet processing in the cloud Dániel Géhberger Ericsson Research Outline Motivation Service chains Hardware related topics, acceleration Virtualization basics Software performance and acceleration

More information

NVIDIA Think about Computing as Heterogeneous One Leo Liao, 1/29/2106, NTU

NVIDIA Think about Computing as Heterogeneous One Leo Liao, 1/29/2106, NTU NVIDIA Think about Computing as Heterogeneous One Leo Liao, 1/29/2106, NTU GPGPU opens the door for co-design HPC, moreover middleware-support embedded system designs to harness the power of GPUaccelerated

More information

MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구

MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구 MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구 Leading Supplier of End-to-End Interconnect Solutions Analyze Enabling the Use of Data Store ICs Comprehensive End-to-End InfiniBand and Ethernet Portfolio

More information

Panel Discussion: The Future of I/O From a CPU Architecture Perspective

Panel Discussion: The Future of I/O From a CPU Architecture Perspective Panel Discussion: The Future of I/O From a CPU Architecture Perspective Brad Benton AMD, Inc. #OFADevWorkshop Issues Move to Exascale involves more parallel processing across more processing elements GPUs,

More information

Chap. 2 part 1. CIS*3090 Fall Fall 2016 CIS*3090 Parallel Programming 1

Chap. 2 part 1. CIS*3090 Fall Fall 2016 CIS*3090 Parallel Programming 1 Chap. 2 part 1 CIS*3090 Fall 2016 Fall 2016 CIS*3090 Parallel Programming 1 Provocative question (p30) How much do we need to know about the HW to write good par. prog.? Chap. gives HW background knowledge

More information

TOOLS FOR IMPROVING CROSS-PLATFORM SOFTWARE DEVELOPMENT

TOOLS FOR IMPROVING CROSS-PLATFORM SOFTWARE DEVELOPMENT TOOLS FOR IMPROVING CROSS-PLATFORM SOFTWARE DEVELOPMENT Eric Kelmelis 28 March 2018 OVERVIEW BACKGROUND Evolution of processing hardware CROSS-PLATFORM KERNEL DEVELOPMENT Write once, target multiple hardware

More information

HSA QUEUEING HOT CHIPS TUTORIAL - AUGUST 2013 IAN BRATT PRINCIPAL ENGINEER ARM

HSA QUEUEING HOT CHIPS TUTORIAL - AUGUST 2013 IAN BRATT PRINCIPAL ENGINEER ARM HSA QUEUEING HOT CHIPS TUTORIAL - AUGUST 2013 IAN BRATT PRINCIPAL ENGINEER ARM HSA QUEUEING, MOTIVATION MOTIVATION (TODAY S PICTURE) Application OS GPU Transfer buffer to GPU Copy/Map Memory Queue Job

More information

Expressing Heterogeneous Parallelism in C++ with Intel Threading Building Blocks A full-day tutorial proposal for SC17

Expressing Heterogeneous Parallelism in C++ with Intel Threading Building Blocks A full-day tutorial proposal for SC17 Expressing Heterogeneous Parallelism in C++ with Intel Threading Building Blocks A full-day tutorial proposal for SC17 Tutorial Instructors [James Reinders, Michael J. Voss, Pablo Reble, Rafael Asenjo]

More information

White paper FUJITSU Supercomputer PRIMEHPC FX100 Evolution to the Next Generation

White paper FUJITSU Supercomputer PRIMEHPC FX100 Evolution to the Next Generation White paper FUJITSU Supercomputer PRIMEHPC FX100 Evolution to the Next Generation Next Generation Technical Computing Unit Fujitsu Limited Contents FUJITSU Supercomputer PRIMEHPC FX100 System Overview

More information

Heterogeneous Computing

Heterogeneous Computing Heterogeneous Computing Featured Speaker Ben Sander Senior Fellow Advanced Micro Devices (AMD) DR. DOBB S: GPU AND CPU PROGRAMMING WITH HETEROGENEOUS SYSTEM ARCHITECTURE Ben Sander AMD Senior Fellow APU:

More information

Compiling for HSA accelerators with GCC

Compiling for HSA accelerators with GCC Compiling for HSA accelerators with GCC Martin Jambor SUSE Labs 8th August 2015 Outline HSA branch: svn://gcc.gnu.org/svn/gcc/branches/hsa Table of contents: Very Brief Overview of HSA Generating HSAIL

More information

Tutorial. Preparing for Stampede: Programming Heterogeneous Many-Core Supercomputers

Tutorial. Preparing for Stampede: Programming Heterogeneous Many-Core Supercomputers Tutorial Preparing for Stampede: Programming Heterogeneous Many-Core Supercomputers Dan Stanzione, Lars Koesterke, Bill Barth, Kent Milfeld dan/lars/bbarth/milfeld@tacc.utexas.edu XSEDE 12 July 16, 2012

More information

Paving the Road to Exascale

Paving the Road to Exascale Paving the Road to Exascale Gilad Shainer August 2015, MVAPICH User Group (MUG) Meeting The Ever Growing Demand for Performance Performance Terascale Petascale Exascale 1 st Roadrunner 2000 2005 2010 2015

More information

General Purpose GPU Programming (1) Advanced Operating Systems Lecture 14

General Purpose GPU Programming (1) Advanced Operating Systems Lecture 14 General Purpose GPU Programming (1) Advanced Operating Systems Lecture 14 Lecture Outline Heterogenous multi-core systems and general purpose GPU programming Programming models Heterogenous multi-kernels

More information

Chapter 18 - Multicore Computers

Chapter 18 - Multicore Computers Chapter 18 - Multicore Computers Luis Tarrataca luis.tarrataca@gmail.com CEFET-RJ Luis Tarrataca Chapter 18 - Multicore Computers 1 / 28 Table of Contents I 1 2 Where to focus your study Luis Tarrataca

More information

Integrating CPU and GPU, The ARM Methodology. Edvard Sørgård, Senior Principal Graphics Architect, ARM Ian Rickards, Senior Product Manager, ARM

Integrating CPU and GPU, The ARM Methodology. Edvard Sørgård, Senior Principal Graphics Architect, ARM Ian Rickards, Senior Product Manager, ARM Integrating CPU and GPU, The ARM Methodology Edvard Sørgård, Senior Principal Graphics Architect, ARM Ian Rickards, Senior Product Manager, ARM The ARM Business Model Global leader in the development of

More information

Maximum Performance. How to get it and how to avoid pitfalls. Christoph Lameter, PhD

Maximum Performance. How to get it and how to avoid pitfalls. Christoph Lameter, PhD Maximum Performance How to get it and how to avoid pitfalls Christoph Lameter, PhD cl@linux.com Performance Just push a button? Systems are optimized by default for good general performance in all areas.

More information

Parallel Architectures

Parallel Architectures Parallel Architectures Part 1: The rise of parallel machines Intel Core i7 4 CPU cores 2 hardware thread per core (8 cores ) Lab Cluster Intel Xeon 4/10/16/18 CPU cores 2 hardware thread per core (8/20/32/36

More information

Profiling and Debugging OpenCL Applications with ARM Development Tools. October 2014

Profiling and Debugging OpenCL Applications with ARM Development Tools. October 2014 Profiling and Debugging OpenCL Applications with ARM Development Tools October 2014 1 Agenda 1. Introduction to GPU Compute 2. ARM Development Solutions 3. Mali GPU Architecture 4. Using ARM DS-5 Streamline

More information

DRAM and Storage-Class Memory (SCM) Overview

DRAM and Storage-Class Memory (SCM) Overview Page 1 of 7 DRAM and Storage-Class Memory (SCM) Overview Introduction/Motivation Looking forward, volatile and non-volatile memory will play a much greater role in future infrastructure solutions. Figure

More information

An FPGA-based In-line Accelerator for Memcached

An FPGA-based In-line Accelerator for Memcached An FPGA-based In-line Accelerator for Memcached MAYSAM LAVASANI, HARI ANGEPAT, AND DEREK CHIOU THE UNIVERSITY OF TEXAS AT AUSTIN 1 Challenges for Server Processors Workload changes Social networking Cloud

More information

Parallel and Distributed Computing

Parallel and Distributed Computing Parallel and Distributed Computing NUMA; OpenCL; MapReduce José Monteiro MSc in Information Systems and Computer Engineering DEA in Computational Engineering Department of Computer Science and Engineering

More information

Carlos Reaño, Javier Prades and Federico Silla Technical University of Valencia (Spain)

Carlos Reaño, Javier Prades and Federico Silla Technical University of Valencia (Spain) Carlos Reaño, Javier Prades and Federico Silla Technical University of Valencia (Spain) 4th IEEE International Workshop of High-Performance Interconnection Networks in the Exascale and Big-Data Era (HiPINEB

More information

Parallel Processors. The dream of computer architects since 1950s: replicate processors to add performance vs. design a faster processor

Parallel Processors. The dream of computer architects since 1950s: replicate processors to add performance vs. design a faster processor Multiprocessing Parallel Computers Definition: A parallel computer is a collection of processing elements that cooperate and communicate to solve large problems fast. Almasi and Gottlieb, Highly Parallel

More information

ECE 574 Cluster Computing Lecture 23

ECE 574 Cluster Computing Lecture 23 ECE 574 Cluster Computing Lecture 23 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 1 December 2015 Announcements Project presentations next week There is a final. time. Maybe

More information

Heterogeneous Computing Architecture. Adaptive Systems Laboratory

Heterogeneous Computing Architecture. Adaptive Systems Laboratory 1 Heterogeneous Computing Architecture m5181121 d8151102 s1190197 Mitsuhiro Nakamura Achraf Ben Ahmed Maiko Tanaka Contents 2 CPU Period Background Single core Multi core Heterogeneous Multi core Heterogeneous

More information

INTRODUCTION TO OPENCL TM A Beginner s Tutorial. Udeepta Bordoloi AMD

INTRODUCTION TO OPENCL TM A Beginner s Tutorial. Udeepta Bordoloi AMD INTRODUCTION TO OPENCL TM A Beginner s Tutorial Udeepta Bordoloi AMD IT S A HETEROGENEOUS WORLD Heterogeneous computing The new normal CPU Many CPU s 2, 4, 8, Very many GPU processing elements 100 s Different

More information

Network-on-Chip Architecture

Network-on-Chip Architecture Multiple Processor Systems(CMPE-655) Network-on-Chip Architecture Performance aspect and Firefly network architecture By Siva Shankar Chandrasekaran and SreeGowri Shankar Agenda (Enhancing performance)

More information

The Power of Batching in the Click Modular Router

The Power of Batching in the Click Modular Router The Power of Batching in the Click Modular Router Joongi Kim, Seonggu Huh, Keon Jang, * KyoungSoo Park, Sue Moon Computer Science Dept., KAIST Microsoft Research Cambridge, UK * Electrical Engineering

More information

ParalleX. A Cure for Scaling Impaired Parallel Applications. Hartmut Kaiser

ParalleX. A Cure for Scaling Impaired Parallel Applications. Hartmut Kaiser ParalleX A Cure for Scaling Impaired Parallel Applications Hartmut Kaiser (hkaiser@cct.lsu.edu) 2 Tianhe-1A 2.566 Petaflops Rmax Heterogeneous Architecture: 14,336 Intel Xeon CPUs 7,168 Nvidia Tesla M2050

More information

Reducing CPU and network overhead for small I/O requests in network storage protocols over raw Ethernet

Reducing CPU and network overhead for small I/O requests in network storage protocols over raw Ethernet Reducing CPU and network overhead for small I/O requests in network storage protocols over raw Ethernet Pilar González-Férez and Angelos Bilas 31 th International Conference on Massive Storage Systems

More information

Big Data Systems on Future Hardware. Bingsheng He NUS Computing

Big Data Systems on Future Hardware. Bingsheng He NUS Computing Big Data Systems on Future Hardware Bingsheng He NUS Computing http://www.comp.nus.edu.sg/~hebs/ 1 Outline Challenges for Big Data Systems Why Hardware Matters? Open Challenges Summary 2 3 ANYs in Big

More information

This presentation covers Gen Z coherency operations and semantics.

This presentation covers Gen Z coherency operations and semantics. This presentation covers Gen Z coherency operations and semantics. 1 2 The traditional I/O work queue model is well understood, highly optimized, and pervasive. It will continue to serve the industry for

More information

The Stampede is Coming: A New Petascale Resource for the Open Science Community

The Stampede is Coming: A New Petascale Resource for the Open Science Community The Stampede is Coming: A New Petascale Resource for the Open Science Community Jay Boisseau Texas Advanced Computing Center boisseau@tacc.utexas.edu Stampede: Solicitation US National Science Foundation

More information

Programmable NICs. Lecture 14, Computer Networks (198:552)

Programmable NICs. Lecture 14, Computer Networks (198:552) Programmable NICs Lecture 14, Computer Networks (198:552) Network Interface Cards (NICs) The physical interface between a machine and the wire Life of a transmitted packet Userspace application NIC Transport

More information

Brief Background in Fiber Optics

Brief Background in Fiber Optics The Future of Photonics in Upcoming Processors ECE 4750 Fall 08 Brief Background in Fiber Optics Light can travel down an optical fiber if it is completely confined Determined by Snells Law Various modes

More information

All Programmable: from Silicon to System

All Programmable: from Silicon to System All Programmable: from Silicon to System Ivo Bolsens, Senior Vice President & CTO Page 1 Moore s Law: The Technology Pipeline Page 2 Industry Debates Variability Page 3 Industry Debates on Cost Page 4

More information

Pactron FPGA Accelerated Computing Solutions

Pactron FPGA Accelerated Computing Solutions Pactron FPGA Accelerated Computing Solutions Intel Xeon + Altera FPGA 2015 Pactron HJPC Corporation 1 Motivation for Accelerators Enhanced Performance: Accelerators compliment CPU cores to meet market

More information

RapidIO.org Update. Mar RapidIO.org 1

RapidIO.org Update. Mar RapidIO.org 1 RapidIO.org Update rickoco@rapidio.org Mar 2015 2015 RapidIO.org 1 Outline RapidIO Overview & Markets Data Center & HPC Communications Infrastructure Industrial Automation Military & Aerospace RapidIO.org

More information

AMD ACCELERATING TECHNOLOGIES FOR EXASCALE COMPUTING FELLOW 3 OCTOBER 2016

AMD ACCELERATING TECHNOLOGIES FOR EXASCALE COMPUTING FELLOW 3 OCTOBER 2016 AMD ACCELERATING TECHNOLOGIES FOR EXASCALE COMPUTING BILL.BRANTLEY@AMD.COM, FELLOW 3 OCTOBER 2016 AMD S VISION FOR EXASCALE COMPUTING EMBRACING HETEROGENEITY CHAMPIONING OPEN SOLUTIONS ENABLING LEADERSHIP

More information

Messaging Overview. Introduction. Gen-Z Messaging

Messaging Overview. Introduction. Gen-Z Messaging Page 1 of 6 Messaging Overview Introduction Gen-Z is a new data access technology that not only enhances memory and data storage solutions, but also provides a framework for both optimized and traditional

More information

Maximizing heterogeneous system performance with ARM interconnect and CCIX

Maximizing heterogeneous system performance with ARM interconnect and CCIX Maximizing heterogeneous system performance with ARM interconnect and CCIX Neil Parris, Director of product marketing Systems and software group, ARM Teratec June 2017 Intelligent flexible cloud to enable

More information

Computer Architecture!

Computer Architecture! Informatics 3 Computer Architecture! Dr. Vijay Nagarajan and Prof. Nigel Topham! Institute for Computing Systems Architecture, School of Informatics! University of Edinburgh! General Information! Instructors

More information

Take GPU Processing Power Beyond Graphics with Mali GPU Computing

Take GPU Processing Power Beyond Graphics with Mali GPU Computing Take GPU Processing Power Beyond Graphics with Mali GPU Computing Roberto Mijat Visual Computing Marketing Manager August 2012 Introduction Modern processor and SoC architectures endorse parallelism as

More information

OpenCL: History & Future. November 20, 2017

OpenCL: History & Future. November 20, 2017 Mitglied der Helmholtz-Gemeinschaft OpenCL: History & Future November 20, 2017 OpenCL Portable Heterogeneous Computing 2 APIs and 2 kernel languages C Platform Layer API OpenCL C and C++ kernel language

More information

Copyright Khronos Group Page 1. Vulkan Overview. June 2015

Copyright Khronos Group Page 1. Vulkan Overview. June 2015 Copyright Khronos Group 2015 - Page 1 Vulkan Overview June 2015 Copyright Khronos Group 2015 - Page 2 Khronos Connects Software to Silicon Open Consortium creating OPEN STANDARD APIs for hardware acceleration

More information

6.9. Communicating to the Outside World: Cluster Networking

6.9. Communicating to the Outside World: Cluster Networking 6.9 Communicating to the Outside World: Cluster Networking This online section describes the networking hardware and software used to connect the nodes of cluster together. As there are whole books and

More information

Visualization of OpenCL Application Execution on CPU-GPU Systems

Visualization of OpenCL Application Execution on CPU-GPU Systems Visualization of OpenCL Application Execution on CPU-GPU Systems A. Ziabari*, R. Ubal*, D. Schaa**, D. Kaeli* *NUCAR Group, Northeastern Universiy **AMD Northeastern University Computer Architecture Research

More information

Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming ( )

Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming ( ) Systems Group Department of Computer Science ETH Zürich Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming (252-0061-00) Timothy Roscoe Herbstsemester 2012 Today Non-Uniform

More information

Scaling the Compute and High Speed Networking Needs of the Data Center with Silicon Photonics ECOC 2017

Scaling the Compute and High Speed Networking Needs of the Data Center with Silicon Photonics ECOC 2017 Scaling the Compute and High Speed Networking Needs of the Data Center with Silicon Photonics ECOC 2017 September 19, 2017 Robert Blum Director, Strategic Marketing and Business Development 1 Data Center

More information

Efficient Hardware Acceleration on SoC- FPGA using OpenCL

Efficient Hardware Acceleration on SoC- FPGA using OpenCL Efficient Hardware Acceleration on SoC- FPGA using OpenCL Advisor : Dr. Benjamin Carrion Schafer Susmitha Gogineni 30 th August 17 Presentation Overview 1.Objective & Motivation 2.Configurable SoC -FPGA

More information

CPU-GPU Heterogeneous Computing

CPU-GPU Heterogeneous Computing CPU-GPU Heterogeneous Computing Advanced Seminar "Computer Engineering Winter-Term 2015/16 Steffen Lammel 1 Content Introduction Motivation Characteristics of CPUs and GPUs Heterogeneous Computing Systems

More information

COMPUTING ELEMENT EVOLUTION AND ITS IMPACT ON SIMULATION CODES

COMPUTING ELEMENT EVOLUTION AND ITS IMPACT ON SIMULATION CODES COMPUTING ELEMENT EVOLUTION AND ITS IMPACT ON SIMULATION CODES P(ND) 2-2 2014 Guillaume Colin de Verdière OCTOBER 14TH, 2014 P(ND)^2-2 PAGE 1 CEA, DAM, DIF, F-91297 Arpajon, France October 14th, 2014 Abstract:

More information

Convergence of Parallel Architecture

Convergence of Parallel Architecture Parallel Computing Convergence of Parallel Architecture Hwansoo Han History Parallel architectures tied closely to programming models Divergent architectures, with no predictable pattern of growth Uncertainty

More information

OCP Engineering Workshop - Telco

OCP Engineering Workshop - Telco OCP Engineering Workshop - Telco Low Latency Mobile Edge Computing Trevor Hiatt Product Management, IDT IDT Company Overview Founded 1980 Workforce Approximately 1,800 employees Headquarters San Jose,

More information

GPUs and Emerging Architectures

GPUs and Emerging Architectures GPUs and Emerging Architectures Mike Giles mike.giles@maths.ox.ac.uk Mathematical Institute, Oxford University e-infrastructure South Consortium Oxford e-research Centre Emerging Architectures p. 1 CPUs

More information

An FPGA-Based Optical IOH Architecture for Embedded System

An FPGA-Based Optical IOH Architecture for Embedded System An FPGA-Based Optical IOH Architecture for Embedded System Saravana.S Assistant Professor, Bharath University, Chennai 600073, India Abstract Data traffic has tremendously increased and is still increasing

More information

CSCI-GA Multicore Processors: Architecture & Programming Lecture 10: Heterogeneous Multicore

CSCI-GA Multicore Processors: Architecture & Programming Lecture 10: Heterogeneous Multicore CSCI-GA.3033-012 Multicore Processors: Architecture & Programming Lecture 10: Heterogeneous Multicore Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Status Quo Previously, CPU vendors

More information

OVERHEADS ENHANCEMENT IN MUTIPLE PROCESSING SYSTEMS BY ANURAG REDDY GANKAT KARTHIK REDDY AKKATI

OVERHEADS ENHANCEMENT IN MUTIPLE PROCESSING SYSTEMS BY ANURAG REDDY GANKAT KARTHIK REDDY AKKATI CMPE 655- MULTIPLE PROCESSOR SYSTEMS OVERHEADS ENHANCEMENT IN MUTIPLE PROCESSING SYSTEMS BY ANURAG REDDY GANKAT KARTHIK REDDY AKKATI What is MULTI PROCESSING?? Multiprocessing is the coordinated processing

More information

PSMC Roadmap For Integrated Photonics Manufacturing

PSMC Roadmap For Integrated Photonics Manufacturing PSMC Roadmap For Integrated Photonics Manufacturing Richard Otte Promex Industries Inc. Santa Clara California For the Photonics Systems Manufacturing Consortium April 21, 2016 Meeting the Grand Challenges

More information

and Parallel Algorithms Programming with CUDA, WS09 Waqar Saleem, Jens Müller

and Parallel Algorithms Programming with CUDA, WS09 Waqar Saleem, Jens Müller Programming with CUDA and Parallel Algorithms Waqar Saleem Jens Müller Organization People Waqar Saleem, waqar.saleem@uni-jena.de Jens Mueller, jkm@informatik.uni-jena.de Room 3335, Ernst-Abbe-Platz 2

More information

CS516 Programming Languages and Compilers II

CS516 Programming Languages and Compilers II CS516 Programming Languages and Compilers II Zheng Zhang Spring 2015 Jan 22 Overview and GPU Programming I Rutgers University CS516 Course Information Staff Instructor: zheng zhang (eddy.zhengzhang@cs.rutgers.edu)

More information

RapidIO.org Update.

RapidIO.org Update. RapidIO.org Update rickoco@rapidio.org June 2015 2015 RapidIO.org 1 Outline RapidIO Overview Benefits Interconnect Comparison Ecosystem System Challenges RapidIO Markets Data Center & HPC Communications

More information

Linux multi-core scalability

Linux multi-core scalability Linux multi-core scalability Oct 2009 Andi Kleen Intel Corporation andi@firstfloor.org Overview Scalability theory Linux history Some common scalability trouble-spots Application workarounds Motivation

More information

Hypervisor support for emerging scale-out / scale-up architectures. Julian Chesterfield Chief Scientific Officer, OnApp Ltd

Hypervisor support for emerging scale-out / scale-up architectures. Julian Chesterfield Chief Scientific Officer, OnApp Ltd Hypervisor support for emerging scale-out / scale-up architectures Julian Chesterfield Chief Scientific Officer, OnApp Ltd julian@onapp.com Brief Intro to OnApp Company founded 2010 Spun out of a major

More information

Intel Xeon Phi архитектура, модели программирования, оптимизация.

Intel Xeon Phi архитектура, модели программирования, оптимизация. Нижний Новгород, 2017 Intel Xeon Phi архитектура, модели программирования, оптимизация. Дмитрий Прохоров, Дмитрий Рябцев, Intel Agenda What and Why Intel Xeon Phi Top 500 insights, roadmap, architecture

More information

Beyond Hardware IP An overview of Arm development solutions

Beyond Hardware IP An overview of Arm development solutions Beyond Hardware IP An overview of Arm development solutions 2018 Arm Limited Arm Technical Symposia 2018 Advanced first design cost (US$ million) IC design complexity and cost aren t slowing down 542.2

More information

SAVE: Towards efficient resource management in heterogeneous system architectures

SAVE: Towards efficient resource management in heterogeneous system architectures SAVE: Towards efficient resource management in heterogeneous system architectures G. Durelli 1, M. Coppola 2, K. Djafarian 3, G. Kornaros 4, A. Miele 1, M. Paolino 5, O. Pell 6, C. Plessl 7, M.D. Santambrogio

More information

AMD Disaggregates the Server, Defines New Hyperscale Building Block

AMD Disaggregates the Server, Defines New Hyperscale Building Block AMD Disaggregates the Server, Defines New Hyperscale Building Block Fabric Based Architecture Enables Next Generation Data Center Optimization Executive Summary AMD SeaMicro s disaggregated server enables

More information

Solutions for Scalable HPC

Solutions for Scalable HPC Solutions for Scalable HPC Scot Schultz, Director HPC/Technical Computing HPC Advisory Council Stanford Conference Feb 2014 Leading Supplier of End-to-End Interconnect Solutions Comprehensive End-to-End

More information

Lecture 5: Process Description and Control Multithreading Basics in Interprocess communication Introduction to multiprocessors

Lecture 5: Process Description and Control Multithreading Basics in Interprocess communication Introduction to multiprocessors Lecture 5: Process Description and Control Multithreading Basics in Interprocess communication Introduction to multiprocessors 1 Process:the concept Process = a program in execution Example processes:

More information

Heterogeneous Processing Systems. Heterogeneous Multiset of Homogeneous Arrays (Multi-multi-core)

Heterogeneous Processing Systems. Heterogeneous Multiset of Homogeneous Arrays (Multi-multi-core) Heterogeneous Processing Systems Heterogeneous Multiset of Homogeneous Arrays (Multi-multi-core) Processing Heterogeneity CPU (x86, SPARC, PowerPC) GPU (AMD/ATI, NVIDIA) DSP (TI, ADI) Vector processors

More information

MediaTek CorePilot 2.0. Delivering extreme compute performance with maximum power efficiency

MediaTek CorePilot 2.0. Delivering extreme compute performance with maximum power efficiency MediaTek CorePilot 2.0 Heterogeneous Computing Technology Delivering extreme compute performance with maximum power efficiency In July 2013, MediaTek delivered the industry s first mobile system on a chip

More information

Interconnect Challenges in a Many Core Compute Environment. Jerry Bautista, PhD Gen Mgr, New Business Initiatives Intel, Tech and Manuf Grp

Interconnect Challenges in a Many Core Compute Environment. Jerry Bautista, PhD Gen Mgr, New Business Initiatives Intel, Tech and Manuf Grp Interconnect Challenges in a Many Core Compute Environment Jerry Bautista, PhD Gen Mgr, New Business Initiatives Intel, Tech and Manuf Grp Agenda Microprocessor general trends Implications Tradeoffs Summary

More information

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.

More information

Linux Kernel Driver Support to Heterogeneous System Architecture

Linux Kernel Driver Support to Heterogeneous System Architecture to Heterogeneous System Architecture 1 2 E-mail: zhangwenbo@bjut.edu.cn Chong Chen Fei Liu Zhenshan Bao E-mail: baozhenshan@bjut.edu.cn Jianli Liu E-mail: liujianl@bjut.edu.cn With the development of CPU-GPU

More information

Next Generation Visual Computing

Next Generation Visual Computing Next Generation Visual Computing (Making GPU Computing a Reality with Mali ) Taipei, 18 June 2013 Roberto Mijat ARM Addressing Computational Challenges Trends Growing display sizes and resolutions Increasing

More information

IsoStack Highly Efficient Network Processing on Dedicated Cores

IsoStack Highly Efficient Network Processing on Dedicated Cores IsoStack Highly Efficient Network Processing on Dedicated Cores Leah Shalev Eran Borovik, Julian Satran, Muli Ben-Yehuda Outline Motivation IsoStack architecture Prototype TCP/IP over 10GE on a single

More information

High Performance Computing (HPC) Introduction

High Performance Computing (HPC) Introduction High Performance Computing (HPC) Introduction Ontario Summer School on High Performance Computing Scott Northrup SciNet HPC Consortium Compute Canada June 25th, 2012 Outline 1 HPC Overview 2 Parallel Computing

More information

PowerVR GPU IP from Wearables to Servers. Kristof Beets Director of Business Development May 2015

PowerVR GPU IP from Wearables to Servers. Kristof Beets Director of Business Development May 2015 PowerVR GPU IP from Wearables to Servers Kristof Beets Director of Business Development May 2015 www.imgtec.com Expanding embedded GPU market opportunities Huge range of market opportunities equates to

More information

When MPPDB Meets GPU:

When MPPDB Meets GPU: When MPPDB Meets GPU: An Extendible Framework for Acceleration Laura Chen, Le Cai, Yongyan Wang Background: Heterogeneous Computing Hardware Trend stops growing with Moore s Law Fast development of GPU

More information

Energy Efficient Computing Systems (EECS) Magnus Jahre Coordinator, EECS

Energy Efficient Computing Systems (EECS) Magnus Jahre Coordinator, EECS Energy Efficient Computing Systems (EECS) Magnus Jahre Coordinator, EECS Who am I? Education Master of Technology, NTNU, 2007 PhD, NTNU, 2010. Title: «Managing Shared Resources in Chip Multiprocessor Memory

More information