Cell Broadband Engine Spencer Dennis Nicholas Barlow
The Cell Processor Objective: [to bring] supercomputer power to everyday life Bridge the gap between conventional CPU s and high performance GPU s
History Original patent application in 2002 Generations 90 nm - 2005 65 nm - 2007 (PowerXCell 8i) 45 nm - 2009
Cost $400 Million to develop Team of 400 engineers STI Design Center Sony Toshiba IBM Design
PS3 Employed as CPU Clocked at 3.2 GHz theoretical maximum performance of 23.04 GFLOPS Utilized alongside NVIDIA RSX 'Reality Synthesizer' GPU Complimented graphical performance
8 Synergistic Processing Elements (SPE) Single Dual Issue Power Processing Element (PPE) Memory IO Controller (MIC) Element Interconnect Bus (EIB) Memory IO Controller (MIC) Bus Interface Controller (BIC) Architecture Overview
SPU/SPE Synergistic Processing Unit/Element SXU - Synergistic Execution Unit LS - Local Store SMF - Synergistic Memory Frontend EIB - Element Interconnect Bus PPE - Power Processing Element MIC - Memory IO Controller BIC - Bus Interface Controller
Synergistic Processing Element (SPE) 128-bit dual-issue SIMD dataflow Single Instruction Multiple Data Optimized for data-level parallelism Designed for vectorized floating point calculations.
SPE Continued Workhorses of the Processor Handle most of the computational workload Each contains its own Instruction + Data Memory Local Store Embedded SRAM
Responsible for governing SPEs Extensions of the PPE Shares main memory with SPE can initiate accesses for SPE cores Power Architecture Implements Power Architecture Hypervisor can run multiple operating systems concurrently Memory (1st generation) 32KB split L1 instruction & Data cache unified 512KB L2 Cache Power Processor Element (PPE)
Element Interconnect Bus High bandwidth internal bus 1st generation: 96 Bytes/cycle 4 16B rings can handle up to 3 simultaneous data transfers 12 on and off ramps Each SPE + PPE memory controller 2 Off-chip I/O interfaces
Memory Flow Controller Asynchronous Memory Controller Retrieves data from main memory to SPE s local storage & PPE s Cache. Supports two Rambus XDR memory banks
Bus Interface Controller Provides asynchronous interface between EIB and IO interfaces Two flexible IO interfaces to rest of system One Interface can be reconfigured to provide Symmetric Multiprocessing (SMP) interface Contains pervasive unit provides test, debug and monitoring functionality Chip level error checking provides clock generation & distribution control Power on Reset Unit (POR) Responsible for unit initialization Performance monitoring Power Management Unit (PMU) Allows software controlled power reduction Thermal Management Unit (TMU)
Developing for Cell Octopiler Takes high level sequential code and parallelizes it to optimize it for a multiprocessor system High level languages Divides code nine ways 8 sets of instructions are written for the SPE s The final set is written for the Power PC PPE GCC IBM sourced plugins for cell PPU/SPU development
SPU ISA
SPU ISA (cont d)
Applications (In Depth) Console Gaming PS3 PPE controls 6 SPE s delegating tasks 1 SPE is OS reserved, 1SPE is redundant Supercomputing IBM BladeCenter QS Series Easy Scalability Password cracking High parallelism allows for high floating point brute force performance
Conclusion Discontinued in 2009 Difficult development environment Programmer managed SPE memory Explicit parallelism Two separate ISAs Idea still lives on General Purpose GPU Intel Larabee Architecture Intel Many Integrated Core Architecture AMD FireStream Nvidia Tesla
https://www- 01.ibm.com/chips/techlib/techlib.nsf/techdocs/76CA6C7304210F39872570600 06F2C44/$file/SPU_ISA_v1.2_27Jan2007_pub.pdf http://en.wikipedia.org/wiki/simd http://en.wikipedia.org/wiki/cell_(microprocessor) ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1564359 http://arstechnica.com/uncategorized/2006/02/6265-2/ http://www2.lbl.gov/science- Articles/Archive/sabl/2006/Jul/CellProcessorPotential.pdf http://en.wikipedia.org/wiki/symmetric_multiprocessing http://researcher.watson.ibm.com/researcher/view.php?person=usmkg/papers/2006_ieeemicro.pdf References