Coming to a Pixel Near You: Mobile 3D Graphics on the GoForce WMP. Chris Wynn NVIDIA Corporation

Similar documents
GoForce 3D: Coming to a Pixel Near You

Feeding the Beast: How to Satiate Your GoForce While Differentiating Your Game

Whiz-Bang Graphics and Media Performance for Java Platform, Micro Edition (JavaME)

Dynamic Ambient Occlusion and Indirect Lighting. Michael Bunnell NVIDIA Corporation

developer.nvidia.com The Source for GPU Programming

Mobile Performance Tools and GPU Performance Tuning. Lars M. Bishop, NVIDIA Handheld DevTech Jason Allen, NVIDIA Handheld DevTools

Optimizing Games for ATI s IMAGEON Aaftab Munshi. 3D Architect ATI Research

Building scalable 3D applications. Ville Miettinen Hybrid Graphics

Pipeline Integration with FX Composer. Chris Maughan NVIDIA Corporation

Cloth Simulation on GPU

Evolution of GPUs Chris Seitz

2D/3D Graphics Accelerator for Mobile Multimedia Applications. Ramchan Woo, Sohn, Seong-Jun Song, Young-Don

Graphics Processing Unit Architecture (GPU Arch)

Profiling and Debugging Games on Mobile Platforms

Programming Graphics Hardware

CS GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1. Markus Hadwiger, KAUST

CS427 Multicore Architecture and Parallel Computing

Bringing it all together: The challenge in delivering a complete graphics system architecture. Chris Porthouse

Squeezing Performance out of your Game with ATI Developer Performance Tools and Optimization Techniques

Graphics Performance Optimisation. John Spitzer Director of European Developer Technology

Real - Time Rendering. Pipeline optimization. Michal Červeňanský Juraj Starinský

0;L$+LJK3HUIRUPDQFH ;3URFHVVRU:LWK,QWHJUDWHG'*UDSKLFV

Overview. Technology Details. D/AVE NX Preliminary Product Brief

Dave Shreiner, ARM March 2009

Programming Graphics Hardware. GPU Applications. Randy Fernando, Cyril Zeller

Mali Developer Resources. Kevin Ho ARM Taiwan FAE

Optimizing and Profiling Unity Games for Mobile Platforms. Angelo Theodorou Senior Software Engineer, MPG Gamelab 2014, 25 th -27 th June

Real-Time Rendering (Echtzeitgraphik) Michael Wimmer

RSX Best Practices. Mark Cerny, Cerny Games David Simpson, Naughty Dog Jon Olick, Naughty Dog

Mattan Erez. The University of Texas at Austin

Optimizing DirectX Graphics. Richard Huddy European Developer Relations Manager

Mali-400 MP: A Scalable GPU for Mobile Devices Tom Olson

Real - Time Rendering. Graphics pipeline. Michal Červeňanský Juraj Starinský

Spring 2009 Prof. Hyesoon Kim

Enabling immersive gaming experiences Intro to Ray Tracing

Module Introduction. Content 15 pages 2 questions. Learning Time 25 minutes

Spring 2011 Prof. Hyesoon Kim

Graphics Hardware, Graphics APIs, and Computation on GPUs. Mark Segal

Introduction to Shaders.

GeForce3 OpenGL Performance. John Spitzer

Practical Performance Analysis Koji Ashida NVIDIA Developer Technology Group

Optimisation. CS7GV3 Real-time Rendering

ARM Multimedia IP: working together to drive down system power and bandwidth

Architectures. Michael Doggett Department of Computer Science Lund University 2009 Tomas Akenine-Möller and Michael Doggett 1

CS 4620 Program 3: Pipeline

The Application Stage. The Game Loop, Resource Management and Renderer Design

Module 13C: Using The 3D Graphics APIs OpenGL ES

Mobile Graphics Ecosystem. Tom Olson OpenGL ES working group chair

Adding Advanced Shader Features and Handling Fragmentation

Working with Metal Overview

What was removed? (1) OpenGL ES vs. OpenGL

Today s Agenda. DirectX 9 Features Sim Dietrich, nvidia - Multisample antialising Jason Mitchell, ATI - Shader models and coding tips

OpenGL SUPERBIBLE. Fifth Edition. Comprehensive Tutorial and Reference. Richard S. Wright, Jr. Nicholas Haemel Graham Sellers Benjamin Lipchak

PROFESSIONAL. WebGL Programming DEVELOPING 3D GRAPHICS FOR THE WEB. Andreas Anyuru WILEY. John Wiley & Sons, Ltd.

Lecture 2. Shaders, GLSL and GPGPU

Pump Up Your Pipeline

ARM. Mali GPU. OpenGL ES Application Optimization Guide. Version: 3.0. Copyright 2011, 2013 ARM. All rights reserved. ARM DUI 0555C (ID102813)

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)

Course Recap + 3D Graphics on Mobile GPUs

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)

PowerVR Hardware. Architecture Overview for Developers

Save the Nanosecond! PC Graphics Performance for the next 3 years. Richard Huddy European Developer Relations Manager ATI Technologies, Inc.

Monday Morning. Graphics Hardware

CSE 167: Introduction to Computer Graphics Lecture #5: Rasterization. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2015

GeForce4. John Montrym Henry Moreton

Saving the Planet Designing Low-Power, Low-Bandwidth GPUs

CS 498 VR. Lecture 19-4/9/18. go.illinois.edu/vrlect19

Windowing System on a 3D Pipeline. February 2005

CS GAME PROGRAMMING Question bank

POWERVR MBX & SGX OpenVG Support and Resources

Sign up for crits! Announcments

Performance Analysis and Culling Algorithms

POWERVR. 3D Application Development Recommendations

Bringing AAA graphics to mobile platforms. Niklas Smedberg Senior Engine Programmer, Epic Games

Mobile 3D Devices. -- They re not little PCs! Stephen Wilkinson Graphics Software Technical Lead Texas Instruments CSSD/OMAP

CS451Real-time Rendering Pipeline

Copyright Khronos Group, Page Graphic Remedy. All Rights Reserved

Threading Hardware in G80

Graphics Hardware. Instructor Stephen J. Guy

Cornell University CS 569: Interactive Computer Graphics. Introduction. Lecture 1. [John C. Stone, UIUC] NASA. University of Calgary

Optimizing for DirectX Graphics. Richard Huddy European Developer Relations Manager

Free Downloads OpenGL ES 3.0 Programming Guide

The Rasterization Pipeline

ARM. Mali GPU. OpenGL ES Application Optimization Guide. Version: 2.0. Copyright 2011, 2013 ARM. All rights reserved. ARM DUI 0555B (ID051413)

CS179: GPU Programming

Baback Elmieh, Software Lead James Ritts, Profiler Lead Qualcomm Incorporated Advanced Content Group

Programming Guide. Aaftab Munshi Dan Ginsburg Dave Shreiner. TT r^addison-wesley

Distributed Virtual Reality Computation

Understanding M3G 2.0 and its Effect on Producing Exceptional 3D Java-Based Graphics. Sean Ellis Consultant Graphics Engineer ARM, Maidenhead

Ciril Bohak. - INTRODUCTION TO WEBGL

Multimedia in Mobile Phones. Architectures and Trends Lund

Textures. Texture coordinates. Introduce one more component to geometry

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. What is Computer Architecture? Sources

PowerVR Series5. Architecture Guide for Developers

Streaming Massive Environments From Zero to 200MPH

This Unit: Putting It All Together. CIS 501 Computer Architecture. What is Computer Architecture? Sources

Graphics Architectures and OpenCL. Michael Doggett Department of Computer Science Lund university

Voxel Cone Tracing and Sparse Voxel Octree for Real-time Global Illumination. Cyril Crassin NVIDIA Research

The Rasterization Pipeline

Real-Time Volumetric Smoke using D3D10. Sarah Tariq and Ignacio Llamas NVIDIA Developer Technology

Transcription:

Coming to a Pixel Near You: Mobile 3D Graphics on the GoForce WMP Chris Wynn NVIDIA Corporation

What is GoForce 3D? Licensable 3D Core for Mobile Devices Discrete Solutions: GoForce 3D 4500/4800 OpenGL ES / Direct3Dm compliant Low Power Integrated SRAM Up to VGA resolution Modern Feature Set

GoForce 3D 4800 Features Geometry Engine 16-bit color w/ 16-bit Z (40-bit color internal) Multi-texturing texturing w/ up to 4 textures Bilinear / Trilinear Filtering Flexible Texture Formats (DXT1/4-bit/8 bit/8-bit) bit) Fully Perspective Correct Sub-Pixel Accuracy Per-Pixel Pixel Fog, Alpha Blending, Alpha-Test

Traditional Architecture Setup/Raster Texture Addr Texture Fog AlphaTest DepthTest AlphaBlend Mem Write Deep pipeline Always have to go through all stages Pipelines always clocking Fast, but too much power consumption ~750mW per 100M pixel/sec (~200 pipe stages)

GoForce 3D: New Low-Power Architecture Flexible Fragment ALU Transform/Setup Raster Texture Fragment ALU Data Write (~50 pipe stages) Raster fragment generation and loop management Pipelines only trigger on activity Low Power < 50 mw per 100M pixel/sec During actual gameplay Very scalable architecture

Developing for GoForce 3D Java Applications Native Programming Model Carrier / Middleware Models Sherman (The Astonishing Tribe) Bubble (NVIDIA Tech Demo) World War II Airplane Demo (Futuremark Corporation)

Java Programming Model App JSR 184 JSR 239 OpenGL-ES EGL GoForce 3D Hardware

Native Programming Model App OpenGL-ES EGL D3Dm GoForce 3D Hardware

Middleware Programming Model App Middleware OpenGL-ES GoForce 3D Hardware EGL Audio Networking Other

GoForce 3D Development Kit

What s s Included? Handset Hardware Environment Freescale i.mx21 (ARM926EJ-S core) GoForce 3D 4500 / 4800 2.2 or 3.5 QVGA (240x320) LCD Joypad Input Device Audio, SD, USB Multi-OS capable (Linux now, WinCE soon) Software (drivers, tools, libraries, sample code) Documentation (2D/3D) 3 rd party support (Java VM w/ JSR184 available)

What to Expect? Host Development PC GoForce DevKit Null Modem Cable boot sequence Crossover Ethernet Cable console / remote debug GNU Tools embedded Visual C++ 4.0 Tools

From PC to Mobile: Things to Be Aware Of CPU Usage No floating point or integer division avoid these Simplify physics, collision, visibility Small caches avoid non-coherent algorithms Avoid CPU-assisted driver paths (lighting) System Memory Bandwidth Cell Phones and PDAs low system memory bandwidth Monitor/Conserve System BW

Alternative Lighting Strategies Fake Phong highlights using Multitexture Pre-Computed Vertex Lighting Stuntcar Extreme (Images Courtesy of Fathammer)

Conserving Bandwidth: Geometry Use only the amount of geometry required (LODs( LODs) Minimize amount of data per vertex Use packed ARGB (not floating point ARGB) Don t t specify Z or W if you don t t need it Use multi-texturing texturing instead of multi-pass rendering Example: Applying 2 textures single-pass w/ multi-texture: texture: pass 1 (xyzs 1 t 1 s 2 t 2 ) two rendering passes: pass 1 (xyzs 1 t 1 ) pass 2 (xyzs 2 t 2 ) single-pass up to 30% reduction in data

Vertex Cache Optimization What is a Vertex Cache? Place for the WMP to store Vertex Data If a vertex is already in the cache, WMP uses the cached copy How to utilize this? Use triangle strips or fans Or Use Indexed Triangle Lists Optimize for 32- or 16-entry LRU Vertex Cache

Conserving Bandwidth: Texture GoForce 3D 4800 has 1280K SRAM Fast but scarce resource On-chip texture memory is 1280K Frame Buffer Ex. QVGA (320x240) 16-bit color double buffered w/ Z: 450K Frame buffer 830K Texture memory available

Working within a Memory Budget Fit Working Set of Textures in Vidmem(!) Use compact texture formats EXT_texture_compression_dxt1 (4-bit/texel) Use smallest resolutions possible Avoid allocating full-screen buffers If budget exceeded, sort by texture

The Benefits of DXT1 Bubble: Each scene uses 8 textures 2 256x256 textures (mip( mip-mapped) mapped) 6 256x256 textures (non-mipmapped mipmapped) RGB565 (16-bits/texel) = 786432 + 349524 = 1.08Mb DXT1 (4-bits/texel) = 196608 + 87381 = 0.27Mb DXT1 is High-Quality and 25% the cost of RGB565

Leveraging Multi-Texture Four Full Res (256 x 256) 4-bit 4 = 128kb Full Res Base + four 1/4 res lightmaps = 40kb Store diffuse maps in lower resolution and use multitexture to save memory

OpenGL ES 1.0 Primer Roughly OpenGL 1.3 Removes Display List glbegin/glend glend Texgen Environment Maps Evaluators Adds Fixed Point type/entry points Byte type more universal

Direct3Dm Primer Largely based on Direct3D8 No Vertex or Pixel Shaders Float and Fixed-Point (16.16) types Texture Coordinate Generation Up to 4-way 4 multi-texture texture Cascading texture blending like OpenGL ES 1.0 Additional Caps Bits Profiles collection of caps bits required for a given profile

Interested in Developing for GoForce 3D 4800? Hardware DevKits available now! Register for NVIDIA Handheld Developer Program http://developer.nvidia.com Email handset-dev@nvidia.com dev@nvidia.com

GPU Gems 2 Programming Techniques for High-Performance Graphics and General-Purpose Computation 880 full-color pages, 330 figures, hard cover $59.99 Experts from universities and industry The topics covered in GPU Gems 2 are critical to the next generation of game engines. Gary McTaggart, Software Engineer at Valve, Creators of Half-Life and Counter-Strike GPU Gems 2 isn t meant to simply adorn your bookshelf it s required reading for anyone trying to keep pace with the rapid evolution of programmable graphics. If you re serious about graphics, this book will take you to the edge of what the GPU can do. Rémi Arnaud, Graphics Architect at Sony Computer Entertainment