Saman Amarasinghe and Rodric Rabbah Massachusetts Institute of Technology

Similar documents
CS758: Multicore Programming

Blue-Steel Ray Tracer

Parallelization. Saman Amarasinghe. Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Parallel Architecture. Hwansoo Han

Shared Memory Parallel Programming. Shared Memory Systems Introduction to OpenMP

Outline. Why Parallelism Parallel Execution Parallelizing Compilers Dependence Analysis Increasing Parallelization Opportunities

IBM Cell Processor. Gilbert Hendry Mark Kretschmann

INF5063: Programming heterogeneous multi-core processors Introduction

Lecture 8. The StreamIt Language IAP 2007 MIT

Application-Platform Mapping in Multiprocessor Systems-on-Chip

Multicore Hardware and Parallelism

6.189 IAP Lecture 3. Introduction to Parallel Architectures. Prof. Saman Amarasinghe, MIT IAP 2007 MIT

WHY PARALLEL PROCESSING? (CE-401)

Parallel Computing. Hwansoo Han (SKKU)

CSCI-GA Multicore Processors: Architecture & Programming Lecture 10: Heterogeneous Multicore

MIT OpenCourseWare Multicore Programming Primer, January (IAP) Please use the following citation format:

Parallelism in Hardware

CS4230 Parallel Programming. Lecture 3: Introduction to Parallel Architectures 8/28/12. Homework 1: Parallel Programming Basics

CS4961 Parallel Programming. Lecture 3: Introduction to Parallel Architectures 8/30/11. Administrative UPDATE. Mary Hall August 30, 2011

Computer Systems Architecture I. CSE 560M Lecture 19 Prof. Patrick Crowley

Fundamentals of Computers Design

Multicore and Parallel Processing

Microarchitecture Overview. Performance

Fundamentals of Computer Design

How to Write Fast Code , spring th Lecture, Mar. 31 st

CS4961 Parallel Programming. Lecture 7: Introduction to SIMD 09/14/2010. Homework 2, Due Friday, Sept. 10, 11:59 PM. Mary Hall September 14, 2010

Advanced d Processor Architecture. Computer Systems Laboratory Sungkyunkwan University

Computer Architecture!

Introduction to Parallel Computing

Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University. P & H Chapter 4.10, 1.7, 1.8, 5.10, 6

Computer Architecture!

Complexity and Advanced Algorithms. Introduction to Parallel Algorithms

Module 18: "TLP on Chip: HT/SMT and CMP" Lecture 39: "Simultaneous Multithreading and Chip-multiprocessing" TLP on Chip: HT/SMT and CMP SMT

HW Trends and Architectures

COSC 6385 Computer Architecture - Data Level Parallelism (III) The Intel Larrabee, Intel Xeon Phi and IBM Cell processors

ECE 588/688 Advanced Computer Architecture II

Intel Enterprise Processors Technology

What Next? Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University. * slides thanks to Kavita Bala & many others

All About the Cell Processor

Microarchitecture Overview. Performance

CPI < 1? How? What if dynamic branch prediction is wrong? Multiple issue processors: Speculative Tomasulo Processor

Multimedia in Mobile Phones. Architectures and Trends Lund

THREAD LEVEL PARALLELISM

The Shift to Multicore Architectures

ECE 588/688 Advanced Computer Architecture II

Computer Architecture

MIT OpenCourseWare Multicore Programming Primer, January (IAP) Please use the following citation format:

Advanced Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Online Course Evaluation. What we will do in the last week?

ECE 2162 Intro & Trends. Jun Yang Fall 2009

Multi-core Architectures. Dr. Yingwu Zhu

CPI IPC. 1 - One At Best 1 - One At best. Multiple issue processors: VLIW (Very Long Instruction Word) Speculative Tomasulo Processor

General introduction: GPUs and the realm of parallel architectures

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

Spring 2011 Prof. Hyesoon Kim

Computer Architecture Spring 2016

Calendar Description

CSC501 Operating Systems Principles. OS Structure

CSE 392/CS 378: High-performance Computing - Principles and Practice

Experts in Application Acceleration Synective Labs AB

CS 152 Computer Architecture and Engineering

Advanced Processor Architecture

Administrivia. Administrivia. Administrivia. CIS 565: GPU Programming and Architecture. Meeting

Cell Broadband Engine. Spencer Dennis Nicholas Barlow

Advance CPU Design. MMX technology. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ. ! Basic concepts

Introduction to Computing and Systems Architecture

Introduction to the MMAGIX Multithreading Supercomputer

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

Spring 2011 Parallel Computer Architecture Lecture 4: Multi-core. Prof. Onur Mutlu Carnegie Mellon University

Intel released new technology call P6P

Several Common Compiler Strategies. Instruction scheduling Loop unrolling Static Branch Prediction Software Pipelining

Concurrent Object Oriented Languages

Exercise 1 Due 02.November 2010, 12:15pm

UC Berkeley CS61C : Machine Structures

Outline. Lecture 40 Hardware Parallel Computing Thanks to John Lazarro for his CS152 slides inst.eecs.berkeley.

Why Parallel Architecture

Computer Architecture

How many cores are too many cores? Dr. Avi Mendelson, Intel - Mobile Processors Architecture group

Compilation for Heterogeneous Platforms

CSC 447: Parallel Programming for Multi- Core and Cluster Systems. Lectures TTh, 11:00-12:15 from January 16, 2018 until 25, 2018 Prerequisites

POLITECNICO DI MILANO. Advanced Topics on Heterogeneous System Architectures! Multiprocessors

HPC code modernization with Intel development tools

Multi-core Architectures. Dr. Yingwu Zhu

COSC 6385 Computer Architecture - Thread Level Parallelism (I)

Computer Architecture!

The Art of Parallel Processing

CS420/CSE 402/ECE 492. Introduction to Parallel Programming for Scientists and Engineers. Spring 2006

Multiple Cores, Multiple Pipes, Multiple Threads Do we have more Parallelism than we can handle? David B. Kirk

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Parallel Computing: Parallel Architectures Jin, Hai

Convergence of Parallel Architecture

Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming ( )

Multi-Core Microprocessor Chips: Motivation & Challenges

Today. SMP architecture. SMP architecture. Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming ( )

CISC 879 Software Support for Multicore Architectures Spring Student Presentation 6: April 8. Presenter: Pujan Kafle, Deephan Mohan

Multithreading: Exploiting Thread-Level Parallelism within a Processor

MINIMUM HARDWARE AND OS SPECIFICATIONS File Stream Document Management Software - System Requirements for V4.2

Introduction to CELL B.E. and GPU Programming. Agenda

Analyzing Cache Bandwidth on the Intel Core 2 Architecture

Many-cores: Supercomputer-on-chip How many? And how? (how not to?)

Transcription:

Saman Amarasinghe and Rodric Rabbah Massachusetts Institute of Technology http://cag.csail.mit.edu/ps3 6.189-chair@mit.edu

A new processor design pattern emerges: The Arrival of Multicores MIT Raw 16 Cores Since 2002 Intel Montecito 1.7 Billion transistors Dual Core IA/64 Intel Tanglewood Dual Core IA/64 Intel Pentium D (Smithfield) Intel Dempsey Dual Core Xeon Cancelled Intel Tejas & Jayhawk Unicore (4GHz P4) Intel Pentium Extreme 3.2GHz Dual Core Intel Yonah Dual Core Mobile AMD Opteron Dual Core Sun Olympus and Niagara 8 Processor Cores IBM Cell Scalable Multicore IBM Power 4 and 5 Dual Cores Since 2001 IBM Power 6 Dual Core 2H 2004 1H 2005 2H 2005 1H 2006 2H 2006

What is Multicore? Multiple, externally visible processors on a single die where the processors have independent control-flow, separate internal state and no critical resource sharing. Multicores have many names Parallel Processors on a Chip Chip Multiprocessor (CMP) Tiled Processor

Why Move to Multicores? Many issues with scaling a unicore Power Efficiency Complexity Wire Delay Diminishing returns from optimizing a single instruction stream

Multicores Of The Future # of cores 512 256 128 64 32 16 8 4 2 1 4004 Agarwal s Corollary (to Moore s Law): number of cores on a chip will double every ~year! 8008 8080 8086 286 386 486 Pentium P2 P3 Raw Xbox360 PA-8800 Opteron Power4 Athlon Niagara Boardcom 1480 Picochip PC102 Cisco CSR-1 Intel Tflops Raza XLR PExtreme Power6 Yonah Itanium P4 Itanium 2 Ambric AM2045 Cavium Octeon Cell Opteron 4P Xeon MP Tanglewood 1970 1975 1980 1985 1990 1995 2000 2005 20??

The PS3: A Multicore Architecture Cell Inside 9 processing units 8 cores are the same (SPE) 1 Power PC core (PPE)

Synergistic Processing Element (SPE) Short vector processor SIMD-only 16 bytes register/memory access 128 x 128-bit registers 256KB local store Aligned accesses only 16 byte for load/store 128 byte for IFETCH/DMA Dedicated DMA engine Explicitly move data in and out of the local store 16 outstanding requests Dual issue (under certain constraints) No H/W branch prediction Compare/select predication Branch hint instruction

Programming Multicores Programming multicores is different from traditional programming (for uniprocessors) Uniprocessors Single flow of control Single memory image Sequential Programming: no explicit need for orchestration of computation or communication Multicores Multiple flows of control Multiple local memories Parallel Programming: orchestration of computation and communication is a sure necessity Parallel programming traditionally in exclusive domains NASA, Scientific codes, and the like

What s New in Parallel Programming Parallel programming for the masses Parallel programming in many domains Parallel programming can be fun

The New Parallel Programming Phenomenon

What Is This Course About Learn about the emergence of multicores Why is parallel hardware finally becoming mainstream Learn about parallel programming patterns Classical and new work in the field Learn about challenges in parallel programming with hands on experience Programming languages, tools for debugging, tools for performance monitoring Get your hands on a PS3 to make it all fun Group projects will run on PS3 hardware

Project Ideas New feature rich games or applications Exploit available resources for new effects or features Multimedia feature extraction and indexing Simulation of molecular dynamics For drug discovery, protein folding Security applications Feature detection (face recognition), pattern matching (network intrusion detection, gene discovery) Monte Carlo simulations Medical imaging to recognize abnormal tissues Oil field analysis to find oil rich wells Models for financial markets to maximize profits Algorithms that exploit SIMD properties of SPEs

More Project Ideas Linear Algebra Library Multi-pattern string matching Black Scholes PDE Solver JPEG or MPEG encoding Viterbi Algorithm applied to bioinformatics

Examples of Project Scope Can take existing algorithms and re-implement them in a parallel or SIMD equivalent Pattern matching for security applications Can take existing applications and modify them for the PS3 adding new capabilities Add new features to the Quake 3 game engine Design and implement a new project form scratch that harnesses the power of the Cell and PS3

Course Organization Lectures three times a week Monday, Wednesday, and Friday 2 1 hour lectures with small break between lectures 10 am 12:15pm Labs and recitation two times a week Tuesday and Thursday Group project Small teams (no more than 4-5 per group) Project presentation on Thursday January 29 Each group will have dedicated access to PS3 and tools Awards and Reception on Friday January 30 Course will be available on OCW

Course Enrollment Interested students should discuss project ideas first by Friday December 15 Choose from existing ideas Or come up with your own project idea We ll help you Enrollment limited and by invitation Preference to interesting projects

Project Support Direct access to PS3 hardware Direct access to Tools from Sony and IBM Compilers, tutorials, example codes Projects can be implemented in C using threads and Cell intrinsic instructions for direct access to the bare metal StreamIt and StreamIt virtual machine which hides the bare metal and provides a rich programming interface Other approaches are also possible (e.g., OpenMP or MPI)

Project Resources Go to the course Wiki Browse list of projects idea Submit your project idea Sign up for a project Find team members Follow link from course web page http://cag.csail.mit.edu/ps3

Who to Contact 6.189-chair@mit.edu Email to ask general questions about the course Or to discuss (feasibility of) project ideas 6189@lists.csail.mit.edu Students and instructors Send project ideas Find team members

Acknowledgements Sony IBM Toshiba