Bruno Pereira Evangelista

Similar documents
CONSOLE ARCHITECTURE

Spring 2011 Prof. Hyesoon Kim

Cell Broadband Engine. Spencer Dennis Nicholas Barlow

IBM Cell Processor. Gilbert Hendry Mark Kretschmann

Sony/Toshiba/IBM (STI) CELL Processor. Scientific Computing for Engineers: Spring 2008

RSX Best Practices. Mark Cerny, Cerny Games David Simpson, Naughty Dog Jon Olick, Naughty Dog

Software Development Kit for Multicore Acceleration Version 3.0

Computer Systems Architecture I. CSE 560M Lecture 19 Prof. Patrick Crowley

COSC 6385 Computer Architecture - Data Level Parallelism (III) The Intel Larrabee, Intel Xeon Phi and IBM Cell processors

high performance medical reconstruction using stream programming paradigms

Xbox 360 Architecture. Lennard Streat Samuel Echefu

Cell Processor and Playstation 3

Parallel Computing: Parallel Architectures Jin, Hai

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

All About the Cell Processor

Xbox 360 high-level architecture

Mali Developer Resources. Kevin Ho ARM Taiwan FAE

Evolution of CPUs & Memory in Video Game Consoles. Curtis Geiger & Matthew Meehan

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. What is Computer Architecture? Sources

INF5063: Programming heterogeneous multi-core processors Introduction

Introduction to Computing and Systems Architecture

Crypto On the Playstation 3

This Unit: Putting It All Together. CIS 501 Computer Architecture. What is Computer Architecture? Sources

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. Sources. What is Computer Architecture?

Unit 11: Putting it All Together: Anatomy of the XBox 360 Game Console

Amir Khorsandi Spring 2012

CSCI-GA Multicore Processors: Architecture & Programming Lecture 10: Heterogeneous Multicore

CellSs Making it easier to program the Cell Broadband Engine processor

Blue-Steel Ray Tracer

Introduction to CELL B.E. and GPU Programming. Agenda

Roadrunner. By Diana Lleva Julissa Campos Justina Tandar

Computer Architecture

Copyright Khronos Group Page 1

The Application Stage. The Game Loop, Resource Management and Renderer Design

( ZIH ) Center for Information Services and High Performance Computing. Event Tracing and Visualization for Cell Broadband Engine Systems

Massively Parallel Architectures

Portland State University ECE 588/688. Graphics Processors

Shaders : the sky is the limit Sébastien Dominé NVIDIA Richard Stenson SCEA

Neil Costigan School of Computing, Dublin City University PhD student / 2 nd year of research.

Lecture 16. Introduction to Game Development IAP 2007 MIT

Coming to a Pixel Near You: Mobile 3D Graphics on the GoForce WMP. Chris Wynn NVIDIA Corporation

Architectures. Michael Doggett Department of Computer Science Lund University 2009 Tomas Akenine-Möller and Michael Doggett 1

Content Loader Introduction

Graphics Processing Unit Architecture (GPU Arch)

Original PlayStation: no vector processing or floating point support. Photorealism at the core of design strategy

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University

Spring 2009 Prof. Hyesoon Kim

Windowing System on a 3D Pipeline. February 2005

Scanline Rendering 2 1/42

PLAYSTATION Edge. Mark Cerny Jon Olick Vince Diesi

Profiling and Debugging Games on Mobile Platforms

Technology Trends Presentation For Power Symposium

CS427 Multicore Architecture and Parallel Computing

Optimizing and Profiling Unity Games for Mobile Platforms. Angelo Theodorou Senior Software Engineer, MPG Gamelab 2014, 25 th -27 th June

Real-Time Rendering (Echtzeitgraphik) Michael Wimmer

Dave Shreiner, ARM March 2009

Concurrent Programming with the Cell Processor. Dietmar Kühl Bloomberg L.P.

Cornell University CS 569: Interactive Computer Graphics. Introduction. Lecture 1. [John C. Stone, UIUC] NASA. University of Calgary

Graphics Hardware, Graphics APIs, and Computation on GPUs. Mark Segal

Challenges for GPU Architecture. Michael Doggett Graphics Architecture Group April 2, 2008

Graphics Hardware. Graphics Processing Unit (GPU) is a Subsidiary hardware. With massively multi-threaded many-core. Dedicated to 2D and 3D graphics

Cell SDK and Best Practices

Developing Technology for Ratchet and Clank Future: Tools of Destruction

Introduction to Multicore architecture. Tao Zhang Oct. 21, 2010

Spring 2010 Prof. Hyesoon Kim. AMD presentations from Richard Huddy and Michael Doggett

Next Generation OpenGL Neil Trevett Khronos President NVIDIA VP Mobile Copyright Khronos Group Page 1

Pump Up Your Pipeline

Spring 2011 Prof. Hyesoon Kim

A Transport Kernel on the Cell Broadband Engine

The University of Texas at Austin

Building scalable 3D applications. Ville Miettinen Hybrid Graphics

GPU Computation Strategies & Tricks. Ian Buck NVIDIA

Optimizing Data Sharing and Address Translation for the Cell BE Heterogeneous CMP

GPGPU, 1st Meeting Mordechai Butrashvily, CEO GASS

CISC 879 Software Support for Multicore Architectures Spring Student Presentation 6: April 8. Presenter: Pujan Kafle, Deephan Mohan

NVIDIA Tools for Artists

From Brook to CUDA. GPU Technology Conference

Adding Advanced Shader Features and Handling Fragmentation

developer.nvidia.com The Source for GPU Programming

NVIDIA Developer Tools for Graphics and PhysX

NVSG NVIDIA Scene Graph

Cell Broadband Engine Overview

X. GPU Programming. Jacobs University Visualization and Computer Graphics Lab : Advanced Graphics - Chapter X 1

Shaders. Slide credit to Prof. Zwicker

Free Downloads OpenGL ES 3.0 Programming Guide

Multimedia in Mobile Phones. Architectures and Trends Lund

Real - Time Rendering. Graphics pipeline. Michal Červeňanský Juraj Starinský

High Performance Computing. University questions with solution

Rendering Grass with Instancing in DirectX* 10

Real - Time Rendering. Pipeline optimization. Michal Červeňanský Juraj Starinský

Parallel Exact Inference on the Cell Broadband Engine Processor

Copyright Khronos Group, Page Graphic Remedy. All Rights Reserved

Real-time Graphics 9. GPGPU

Optimizing Games for ATI s IMAGEON Aaftab Munshi. 3D Architect ATI Research

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.

Today s Agenda. DirectX 9 Features Sim Dietrich, nvidia - Multisample antialising Jason Mitchell, ATI - Shader models and coding tips

Cellular Planets: Optimizing Planetary Simulations for the Cell Processor

The PowerVR Insider SDK. PowerVR Developer Technology

Cg 2.0. Mark Kilgard

Transcription:

Bruno Pereira Evangelista

Introduction The multi-core era Playstation3 Architecture Cell Broadband Engine Processor Cell Architecture How games are using SPUs Cell SDK RSX Graphics Processor PSGL Cg COLLADA Playstation Edge 2

3 Developing games for consoles Restrict to professional certificated developers Development kits are expensive Nintento Wii ~US$ 2.000,00 Playstation 3 ~ US$ 30.000,00 Development kits are necessary Development kits contains software and hardware You need the hardware to deploy and test your games

4 In this lecture we will focus on The SDKs, APIs and Tools used by professional developers to create games for the Playstation 3 But almost all the SDKs, APIs and Tools used on the Playstation 3 are based on open standarts Cell Processor, OpenGL ES, Cg, COLLADA Everything is also available to you!

5 Microprocessors are approaching the physical limits of semiconductors Small gains in processor performance from frequency scaling One possible solution Increase the number of cores We are in the multi-core era!!! Intel Core2 Duo, AMD X2, IBM Cell Quad cores are comming Single core processors are vanishing

6 Playstation 3 9 cores (Cell Processor) Xbox 360 3 cores (PowerPC based) In the next generation all consoles should be multi-core!!!

CPU: Cell Processor PowerPC-base Core @3.2GHz 6 x accessible SPEs @3.2GHz 1 SPE runs in a special mode (OS) 1 of 8 SPEs disabled to improve production yields GPU: RSX @550MHz (based on GeForce 7 series) Full HD (up to 1080p) x 2 channels Multi-way programmable parallel floating point shader pipelines Memory: 256MB XDR Main RAM @3.2GHz 256MB GDDR3 VRAM @700MHz System Floating Point Performance 2 TFLOPS Sound: Dolby 5.1ch, DTS, LPCM, etc Communications: Ethernet, Wi-Fi, Bluetooth Storage: Deatachable HDD slot Disc Media: CD/DVD/Blu-ray 7

8 XDRAM 256 MB 25.6GB/s 2.5GB/s Cell 3.2 GHz I/O Bridge 2.5GB/s 20GB/s 15GB/s RSX GDDR3 256 MB 22.4GB/s HD/HD SD AV out BD/DVD/CD BT Controller ROM Drive 54GB USB 2.0 x 6 Gbit Ether/WiFi Removable Storage MemoryStick,SD,CF

9

10 The CBE(Cell Broadband Engine) processor is the result of a collaboration between Sony, Toshiba and IBM Alliance formed in 2000 and design center opened in 2001 First implementation in 2004 Investments approaching US$400 million

11 Heterogeneous single-chip multiprocessor Nine processor elements operating on a shared, coherent memory Designed to support a very broad range of applications Overcomes three important limitations of contemporary microprocessors Power use, memory use and clock frequency

Power use Non Homogenous Coherent Multiprocessor Improve power efficiency at approximately the same rate as the performance increase Memory usage Asynchronous DMA transfers 3-level SPE memory structure (main storage, local stores, and large register files) Clock Frequency Specialize the PPE for control-intensive tasks and the SPEs for compute-intensive tasks Run at high frequencies without excessive overhead 12

13

14 Heterogeneous single-chip multiprocessor 1x PPE (PowerPC Processor Element) 8x SPE (Synergistic Processor Element) It s not a collection of different processors, but a synergistic whole, Michael Perrone, IBM

15 PPE (PowerPC Processor Element) 64-bit PowerPC Architecture RISC core General purpose processor Dual Thread Two way multi-processor with shared dataflow 32 x 128 bit registers 2x 32KB L1 Caches (Instruction/Data) 512KB L2 Cache (Instruction and data) VMX (Vector/SIMD multimedia extensions)

16 SPE (Synergic Processor Element) 128-bit RISC core Execute a new SIMD instruction set Specialized for data-rich compute intensive SIMD and scalar applications 128 x 128 bit registers 256KB Local Store (Instruction/Data) Coherent with main storage SPU can only access its local store

17 SPE (Synergic Processor Element) MFC DMA controller that moves instructions and data between its LS and main storage DMA 1/2/4/8/16 bytes up to 16KB Up to 16 in-flight DMA transfers The PS3 has 7 SPUs but only 6 are available to use

18 Element Interconnect Bus (EIB) Communication path for commands and data between all processors Four 16-byte-wide data rings Memory Interface Controller (MIC) Provides the interface between the EIB and the physical memory Cell Broadband Engine Interface Unit (BEI) Provides a wide connection to external devices Supports two Rambus FlexIO interfaces

19

20 Different programs running on the PPU and the SPU SPE PPU: General purpose programs SPU: Intensive computation programs Both cooperating to carry out computations All the instructions are SIMD SPU can only access its local store Access to main memory done through asynchronous DMA

21 Video Simulating 12.000 boids at 60 fps

22 Goal Simulate large groups of autonomous characters Running on the Playstation 3 Make use of the PPU, SPUs and RSX All the simulation runs on the PPU and SPUs Simulate up to 15.000 boids in real time Individuals sorted by position into buckets Each SPU is used to update one bucket SPUs are idle more than half of each frame!

MotorStorm Video 23

24 MotorStorm SPU tasks Havok physics Determination of object visibility Concatenation of hierarchies Billboard object culling and vertex buffer creation Updating of particles and vertex buffer creation Updating of vehicle dynamics Audio (MultiStream) Video decoding Only uses 15%~20% of available SPU resources

Lair Video 25

26 Lair SPU tasks Physics Skinning models Culling triangles Fluid Dynamics Others

27 The SPUs are the key strenght of the PS3 Ideal for offloading work from the PPU and RSX Could be used to do a lot of different tasks Many studios are trying to offload as much work as possible to the SPUs How to use the SPU? Direct create threads on the SPU and run your code Run a kernel and a job manager on each SPU Send jobs and tasks for each SPU Sony has developed the SSW job manager for this purpose

28 Complete Cell Broadband Engine development environment Documentation, libraries, samples, tools, IDE and a full system simulator for PC Compatible with Fedora Core distribution You don t need a Cell processor to program for the IBM Cell

Documentation Programming Hand Book SPE Runtime Management Library PPU & SPU Language Extension Tutorials Libraries SPE Runtime management Library SPE Libraries: FFT, gmath, matrix, surface, sync, vector Samples Many SPU samples Optimizing code on SPU samples (Euler) 29

30 Tools IDE IBM XL C/C++ Compiler GNU based C/C++ compiler GNU GDB GNU based binutils (assembler, linker, others) Eclipse 3.1.1 CDT (C/C++) Plugin IBM Cell System Simulator Plugin

31 System Simulator Full system simulator (emulates the behavior of a Cell Processor) Provides modes of functional-only and performance simulation Fast Mode/Simple Mode/Pipeline Mode

32

33 Since 2000 Sony is promoting Linux on the PS2 There are some distributions available for the PS3 Fedora Yellow Dog Ubunto Gentoo

34

Based on nvidia G70 architecture @550 MHz Fully programmable pipeline Supports shader model 3.0 Independent pixel/vertex shader architecture Multi-way programmable parallel floating-point shader pipelines 256MB GDDR3 dedicated video memory @650 MHz High Definition 720p/1080p Sony implemented a hypervisor to restrict RSX access on Linux =( 35

36 High-level graphics library for PlayStation3 Based on OpenGL ES 1.0 Officially passed ES 1.0 conformance test OpenGL ES 2.0 was not ready yet Add programmable pipeline to OpenGL ES 1.0

37 Why OpenGL ES? Embrace an industry standard Excellent specifications Well-defined behavior Industry collaboration Conformance tests for quality Expertise available

Supports many extensions OpenGL ES 1.1 extensions Programmable pipeline with Cg Primitive/rendering extensions Instancing, Primitive Restart, Queries, Conditional Rendering Texture extensions Floating Point, DXT, 3D, Non Power of 2, Anisotropic, Depth, Vertex Textures Synchronization extensions Synchronize with the PPU, SPU or another GPU Fences, Events Others 38

39 High-level shading language created by nvidia Very similar to the Microsoft's HLSL RSX supports Cg 1.5 Has a specific compiler for the PS3 Great tools for developers FX Composer 2.0 nvidia Shader Perf

40

41

42 No file format covered all the Next-Gen features Multiple texture sets and values per vertex Polygons, triangles, tri strips and fans Curves (Splines) Animation, skinning, blending, morphing Shaders, effects Physics COLLADA was designed to solve this

43 Intermediate Digital Asset Exchange format Defines an open standard XML schema for exchanging digital assets COLLADA is an industry standard Originally created by Sony Computer Entertainment Adopted as industry standard by The Khronos Group COLLADA 1.4.1 specification released on June 2006 298 pages (English/Japanese) Supported by many DCC Tools 3D Studio Max, Maya, Softimage XSI, Blender

Binary files Must be specific optimized for the target Plataform/API Difficult to debug Expensive to create XML files Very easy do debug / Humam readable Can use schemas to valid the models Changes in the format are easy to handle Don't need to worry about optimizations Binary files can be generated targeting specific plataforms 44

<library type="geometry"> <geometry name="box"> <mesh> <source id="box-pos"> <array id="box-position-array" type="float" count="24"> -0.5 0.5 0.5... (vertex data) </array> <technique profile="common"> <accessor source="#box-position-array" count="8" stride="3"> <param name="x" type="float" /> <param name="y" type="float" /> <param name="z" type="float" /> </accessor> </technique> </source> <polygons>... </polygons> </mesh> </geometry> </library> 45

46

47 COLLADA FX First cross-platform standard shader and effects definition written in XML Next generation lighting, shading and texturing High level effects and shaders Support for all shader models COLLADA Physics Enables data interchange between Ageia (PhysX), Havok, Bullet, ODE and others Rigid Body, Dynamics Rag Dolls, Contraints, Collision Volumes

48

49

50 Different from previous Playstations SDKs, the PS3 SDK uses many open standarts Cell SDK PSGL (Playstation Graphics Library) Cg (C for graphics) COLLADA Only available to professional certificated developers

New development tools for the Playstation 3 First party tech teams will be transfering technology to the general playstation 3 development public, Mark Cerny SPU Systems Animation engine (Many SPU systems) Geometry system Skinning Triange culling Blend shapes Data compression (ZLib based) GCM replay Powerful RSX analysis, debugging and profiling tool Allows speculative performance analysis 51

52 Bruno P. Evangelista bpevangelista@gmail.com Home Page www.brunoevangelista.com "For what is a man profited, if he shall gain the whole world, and lose his own soul? or what shall a man give in exchange for his soul?" Matthew 16:26