Cell Broadband Engine. Spencer Dennis Nicholas Barlow

Similar documents
IBM Cell Processor. Gilbert Hendry Mark Kretschmann

Roadrunner. By Diana Lleva Julissa Campos Justina Tandar

All About the Cell Processor

Technology Trends Presentation For Power Symposium

INF5063: Programming heterogeneous multi-core processors Introduction

How to Write Fast Code , spring th Lecture, Mar. 31 st

Computer Systems Architecture I. CSE 560M Lecture 19 Prof. Patrick Crowley

Sony/Toshiba/IBM (STI) CELL Processor. Scientific Computing for Engineers: Spring 2008

Spring 2011 Prof. Hyesoon Kim

CellSs Making it easier to program the Cell Broadband Engine processor

Introduction to Computing and Systems Architecture

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

COSC 6385 Computer Architecture - Data Level Parallelism (III) The Intel Larrabee, Intel Xeon Phi and IBM Cell processors

Optimizing Data Sharing and Address Translation for the Cell BE Heterogeneous CMP

Cell Processor and Playstation 3

Parallel Computing: Parallel Architectures Jin, Hai

( ZIH ) Center for Information Services and High Performance Computing. Event Tracing and Visualization for Cell Broadband Engine Systems

Computer Architecture

CSCI-GA Multicore Processors: Architecture & Programming Lecture 10: Heterogeneous Multicore

Revisiting Parallelism

CONSOLE ARCHITECTURE

Xbox 360 Architecture. Lennard Streat Samuel Echefu

QDP++/Chroma on IBM PowerXCell 8i Processor

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. What is Computer Architecture? Sources

Cell Broadband Engine CMOS SOI 65 nm Hardware Initialization Guide

Bruno Pereira Evangelista

Systems-on-a-Chip (SoCs)

High-Performance Modular Multiplication on the Cell Broadband Engine

Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins

This Unit: Putting It All Together. CIS 501 Computer Architecture. What is Computer Architecture? Sources

Original PlayStation: no vector processing or floating point support. Photorealism at the core of design strategy

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. Sources. What is Computer Architecture?

Parallel Computing. Hwansoo Han (SKKU)

Unit 11: Putting it All Together: Anatomy of the XBox 360 Game Console

Software Development Kit for Multicore Acceleration Version 3.0

high performance medical reconstruction using stream programming paradigms

Xbox 360 high-level architecture

Introduction to CELL B.E. and GPU Programming. Agenda

High Performance Computing: Blue-Gene and Road Runner. Ravi Patel

Accelerating the Implicit Integration of Stiff Chemical Systems with Emerging Multi-core Technologies

Experts in Application Acceleration Synective Labs AB

Amir Khorsandi Spring 2012

A Brief View of the Cell Broadband Engine

NVIDIA GTX200: TeraFLOPS Visual Computing. August 26, 2008 John Tynefield

Introduction to the Cell multiprocessor

OpenMP on the IBM Cell BE

Parallel and Distributed Computing

EECS 583 Class 20 Research Topic 2: Stream Compilation, GPU Compilation

Concurrent Programming with the Cell Processor. Dietmar Kühl Bloomberg L.P.

Parallel Hyperbolic PDE Simulation on Clusters: Cell versus GPU

Massively Parallel Architectures

Memory Systems IRAM. Principle of IRAM

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

Outline Marquette University

arxiv: v1 [astro-ph.im] 2 Feb 2017

The University of Texas at Austin

Lecture 9: MIMD Architecture

How to build a Megacore microprocessor. by Andreas Olofsson (MULTIPROG WORKSHOP 2017)

HyperTransport. Dennis Vega Ryan Rawlins

Chap. 2 part 1. CIS*3090 Fall Fall 2016 CIS*3090 Parallel Programming 1

High Performance Computing. University questions with solution

Cell Broadband Engine Processor: Motivation, Architecture,Programming

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller

Parallel Computing Platforms. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Parallel Architectures

An Evaluation of an Energy Efficient Many-Core SoC with Parallelized Face Detection

Evaluating Multicore Architectures for Application in High Assurance Systems

Neil Costigan School of Computing, Dublin City University PhD student / 2 nd year of research.

Complexity and Advanced Algorithms. Introduction to Parallel Algorithms

Cell Broadband Engine Overview

AC : IMPLEMENTING AN AFFORDABLE HIGH PERFORMANCE COMPUTING PLATFORM FOR TEACHING-ORIENTED COMPUTER SCIENCE CURRICULUM

Computing architectures Part 2 TMA4280 Introduction to Supercomputing

OpenMP on the IBM Cell BE

1. Microprocessor Architectures. 1.1 Intel 1.2 Motorola

PS3 programming basics. Week 1. SIMD programming on PPE Materials are adapted from the textbook

Towards Efficient Video Compression Using Scalable Vector Graphics on the Cell Broadband Engine

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620

ROEVER ENGINEERING COLLEGE DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Portable Parallel Programming for Multicore Computing

An Extension of the StarSs Programming Model for Platforms with Multiple GPUs

Part IV. Review of hardware-trends for real-time ray tracing

Trends in the Infrastructure of Computing

GPU > CPU. FOR HIGH PERFORMANCE COMPUTING PRESENTATION BY - SADIQ PASHA CHETHANA DILIP

Parallel Architecture. Hwansoo Han

MIMD Overview. Intel Paragon XP/S Overview. XP/S Usage. XP/S Nodes and Interconnection. ! Distributed-memory MIMD multicomputer

Introduction to GPU hardware and to CUDA

Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks

High Performance Computing with Accelerators

Parallel Computing Platforms

Optimization of FEM solver for heterogeneous multicore processor Cell. Noriyuki Kushida 1

A Transport Kernel on the Cell Broadband Engine

Mercury Computer Systems & The Cell Broadband Engine

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

arxiv: v1 [physics.comp-ph] 4 Nov 2013

Cell Broadband Engine Architecture. Version 1.0

Saman Amarasinghe and Rodric Rabbah Massachusetts Institute of Technology

Convergence of Parallel Architecture

Power 7. Dan Christiani Kyle Wieschowski

Evaluating the Portability of UPC to the Cell Broadband Engine

Professional Multicore Programming. Design and Implementation for C++ Developers

Transcription:

Cell Broadband Engine Spencer Dennis Nicholas Barlow

The Cell Processor Objective: [to bring] supercomputer power to everyday life Bridge the gap between conventional CPU s and high performance GPU s

History Original patent application in 2002 Generations 90 nm - 2005 65 nm - 2007 (PowerXCell 8i) 45 nm - 2009

Cost $400 Million to develop Team of 400 engineers STI Design Center Sony Toshiba IBM Design

PS3 Employed as CPU Clocked at 3.2 GHz theoretical maximum performance of 23.04 GFLOPS Utilized alongside NVIDIA RSX 'Reality Synthesizer' GPU Complimented graphical performance

8 Synergistic Processing Elements (SPE) Single Dual Issue Power Processing Element (PPE) Memory IO Controller (MIC) Element Interconnect Bus (EIB) Memory IO Controller (MIC) Bus Interface Controller (BIC) Architecture Overview

SPU/SPE Synergistic Processing Unit/Element SXU - Synergistic Execution Unit LS - Local Store SMF - Synergistic Memory Frontend EIB - Element Interconnect Bus PPE - Power Processing Element MIC - Memory IO Controller BIC - Bus Interface Controller

Synergistic Processing Element (SPE) 128-bit dual-issue SIMD dataflow Single Instruction Multiple Data Optimized for data-level parallelism Designed for vectorized floating point calculations.

SPE Continued Workhorses of the Processor Handle most of the computational workload Each contains its own Instruction + Data Memory Local Store Embedded SRAM

Responsible for governing SPEs Extensions of the PPE Shares main memory with SPE can initiate accesses for SPE cores Power Architecture Implements Power Architecture Hypervisor can run multiple operating systems concurrently Memory (1st generation) 32KB split L1 instruction & Data cache unified 512KB L2 Cache Power Processor Element (PPE)

Element Interconnect Bus High bandwidth internal bus 1st generation: 96 Bytes/cycle 4 16B rings can handle up to 3 simultaneous data transfers 12 on and off ramps Each SPE + PPE memory controller 2 Off-chip I/O interfaces

Memory Flow Controller Asynchronous Memory Controller Retrieves data from main memory to SPE s local storage & PPE s Cache. Supports two Rambus XDR memory banks

Bus Interface Controller Provides asynchronous interface between EIB and IO interfaces Two flexible IO interfaces to rest of system One Interface can be reconfigured to provide Symmetric Multiprocessing (SMP) interface Contains pervasive unit provides test, debug and monitoring functionality Chip level error checking provides clock generation & distribution control Power on Reset Unit (POR) Responsible for unit initialization Performance monitoring Power Management Unit (PMU) Allows software controlled power reduction Thermal Management Unit (TMU)

Developing for Cell Octopiler Takes high level sequential code and parallelizes it to optimize it for a multiprocessor system High level languages Divides code nine ways 8 sets of instructions are written for the SPE s The final set is written for the Power PC PPE GCC IBM sourced plugins for cell PPU/SPU development

SPU ISA

SPU ISA (cont d)

Applications (In Depth) Console Gaming PS3 PPE controls 6 SPE s delegating tasks 1 SPE is OS reserved, 1SPE is redundant Supercomputing IBM BladeCenter QS Series Easy Scalability Password cracking High parallelism allows for high floating point brute force performance

Conclusion Discontinued in 2009 Difficult development environment Programmer managed SPE memory Explicit parallelism Two separate ISAs Idea still lives on General Purpose GPU Intel Larabee Architecture Intel Many Integrated Core Architecture AMD FireStream Nvidia Tesla

https://www- 01.ibm.com/chips/techlib/techlib.nsf/techdocs/76CA6C7304210F39872570600 06F2C44/$file/SPU_ISA_v1.2_27Jan2007_pub.pdf http://en.wikipedia.org/wiki/simd http://en.wikipedia.org/wiki/cell_(microprocessor) ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1564359 http://arstechnica.com/uncategorized/2006/02/6265-2/ http://www2.lbl.gov/science- Articles/Archive/sabl/2006/Jul/CellProcessorPotential.pdf http://en.wikipedia.org/wiki/symmetric_multiprocessing http://researcher.watson.ibm.com/researcher/view.php?person=usmkg/papers/2006_ieeemicro.pdf References