GRAPHIC RENDERING APPLICATION PROFILING ON A SHARED MEMORY MPSOC ARCHITECTURE. Matthieu Texier, Raphaël David, Karim Ben Chehida

Size: px
Start display at page:

Download "GRAPHIC RENDERING APPLICATION PROFILING ON A SHARED MEMORY MPSOC ARCHITECTURE. Matthieu Texier, Raphaël David, Karim Ben Chehida"

Transcription

1 GRAPHIC RENDERING APPLICATION PROFILING ON A SHARED MEMORY MPSOC ARCHITECTURE Matthieu Texier, Raphaël David, Karim Ben Chehida CEA, LIST, Embedded Computing Lab PC 94, F Gif-sur-Yvette Cedex firstname.name@cea.fr Olivier Sentieys University of Rennes 1, IRISA/INRIA 6, rue de Krampont BP Lannion sentieys@irisa.fr ABSTRACT This paper describes the implementation of a graphic rendering pipeline on an MPSoC architecture devoted to the dynamic management of static task graphs. It exhibits the highly non stationary workloads of this application domain and provides first useful feedbacks motivating the design of innovative embedded architectures that have to face heterogeneous computation domains such as graphics and telecommunications. Especially these experiments stress the needs for data dependent resource allocation strategies. Index Terms Multi-core, simulator, graphic rendering, load balancing. 1. INTRODUCTION Embedded devices have to handle an increasing amount of applications. Each has different computing requirements and is dedicated to a specific domain. To efficiently handle this variety of applications, embedded systems usually use different hardware accelerators. Current embedded devices are based on system on chip made of several cores [1] like general purpose processors, multimedia processors (video and audio encoding and decoding) and others IPs that can be dedicated to imaging or telecoms for example. More and more mobile systems also embed a Graphic Processing Unit (GPU). Well known in desktop computers, the graphic processing units are more and more used in embedded devices like mobiles, tablets, etc. for applications such as games and user interfaces. The last generation of embedded GPUs is able to render millions of vertices per second and is becoming programmable. As in desktop computers, GPUs become General Purpose Graphic Processing Units (GPGPUs) and embedded GPUs expect to support new application classes like multimedia for example. In this paper we look at a different approach consisting in extending MPSoCs to support graphic applications. We thus study the ability of standard execution models devoted to multi-domain multi-core architectures to sustain performance for graphic applications. Next section is devoted to the description of graphic pipeline and Section 3 presents a brief state of the art about graphic architectures. Section 4 describes several implementations of graphic pipeline on a multi-core architecture modeled in an approximated timed TLM (Transaction-Level Model) simulator and the results are described in Section 5. In the last section, the main conclusions of the performance analysis and future work to design a multi-purpose embedded parallel architecture are explained. 2. THE GRAPHIC RENDERING PIPELINE The main graphic processing unit job is to render a threedimensional scene to a two-dimensional screen [2]. The input data is a set of points (vertices) defined in a threedimensional space. These points define triangles and the gathering of triangles can make a form like a sphere, a cube, etc. A complex triangle mix can draw all the forms showed in games. This rendering process is done in three main stages: the geometry, the setup and the fragmentation. All these stages are handling different kinds of data. Figure 1: The graphic rendering pipeline stages. As showed in Figure 1, the geometry stage consists in modifying the incoming data (set of vertices) according to the transformations defined by the user (translation, rotation, etc.). It also computes the coordinates related to the different points of view (origins) required for the next computations and calculates the impact of the lights on each incoming vertex (shading). Vertices are finally grouped by three to build triangles. The setup stage verifies whether each vertex is visible from the camera (clipping). The triangles that show their

2 back-faces to the camera are also removed (culling). It calculates the texture mapping on each triangle and cuts them into tiles. This implies some interpolations of vertices parameters (color, position, etc.) for each tile corner and to check if the tile fully covers the triangle. The fragment stage works on tiles. It is in charge of computing the final color of each pixel constituting the tile according to the texture, the light and the material colors. It can blend the incoming pixel color with the actual value in the framebuffer according to the user defined parameters like adding the colors, color components, etc. applying effects like transparency. The depth of the incoming pixel and the pixel in the framebuffer can also be compared (depth test) so that the chosen pixel replaces the old one. Each stage is configurable and each transformation can be modified, enabled or disabled by the user through a set of APIs defined for example in OpenGL ES [3]. According to the scene and expected transformations, the user can modify through small programs, called shaders, the scene by the creation or destruction of vertices or the creation of effects like mechanical simulation, elasticity, etc. This leads to very important variations in the computing requirements balance between the stages and the complete rendering. In the next section, we describe how these stages are handled by current state of the art graphic architectures. 3. EMBEDDED GRAPHIC RENDERING ARCHITECTURES Due to the high computation demand of graphic applications [4], a wide diversity of embedded GPUs from ASIPs (Application Specific Instruction Set Processor) to fully programmable multi-core architectures has been designed. For example the Mobile Unified Shader [5] is an ASIP dedicated to graphic rendering. The programmable core is a SIMD (Single Instruction Multiple Data) processor that computes vertices, shaders, lighting and texturing. A set of configurable accelerators compute the other pipeline stages. The ARM Mali [6] is a GPU scalable from one to four programmable cores. In fact there are two different kinds of cores: the vertex core and the fragment core. Each of them is specific to a kind of shaders. The vertex shader typically supports geometry transformations and lighting through dedicated hardware SIMD instructions and dedicated lighting acceleration. The fragment shader focuses on the pixel generation by modifying its color components, it also applies the texture colors. This is done by using dedicated hardware for texture loading for example and specific instructions. The PowerVR [7] architecture is a multi-core architecture with multithreaded computing cores. The number of cores is variable but they are able to compute vertices and fragments. This is called an unified architecture because the processing elements can be used to compute vertices and fragment shaders. The latest embedded GPU generation has become an highly optimized multi-core and multi-threaded architecture. As desktop GPUs, embedded GPUs begin to support general purpose computation. The last generation of embedded GPUs inaugurates the support of GPGPU programming languages like OpenCL [8], [9]. There is a wide diversity of embedded GPUs which are becoming more and more programmable. This programmability can be used to accelerate other applications [10]. Their execution models are however very constrained in terms of data s position in the memory hierarchy for example because of their initial purpose which is graphic rendering. They also use dedicated computing elements that are optimized for graphic rendering. This makes these architectures under-optimized for accelerating all kind of applications that have data sets organized differently (e.g. augmented reality, content understanding, etc.), especially when the data size can dynamically vary [11]. In the next section, the implementation results of the graphic pipeline on an MPSoC architecture, initially designed for complex image and vision applications, will be analyzed in order to extend its execution model to support efficiently the graphic rendering. 4. RENDERING PIPELINE PARALLELIZATION This section presents the software implementation of a rendering pipe on a multi-core architecture. First the targeted architecture and its associated simulator are presented. Then a first implementation of the pipeline in a dataflow mode is showed, followed by a second implementation that uses more parallelism. 4.1 The SCMP architecture Figure 2: SCMP architecture. The SCMP architecture [12] depicted in Figure 2 is a compute-intensive resource that is seen by an host processor as a coprocessor. It uses a central scheduler that can be based on a dedicated hardware IP or a programmable processor with hardware acceleration through coprocessors.

3 Figure 3: The Control Data Flow Graph of a labeling application. It dynamically determines the list of eligible tasks to be executed, based on control and data dependencies. The Memory Configuration and Management Unit (MCMU) allocates the memory for the tasks and loads the instruction code. It also manages memory allocations and the exclusive sharing of a physically distributed and logically shared memory space. SCMP uses heterogeneous computing resources like SPARC or MIPS processors. The scheduler uses a Control Data Flow Graph (CDFG) (see Figure 3 for example) stored in a local memory to describe the control and data dependencies between the tasks of an application. In this example task 0 initiates eight DMA tasks (tasks 1 to 8) that are followed by computing tasks (tasks 9 to 24). The work is distributed in order to have the maximum possible acceleration. Tasks 25, 26 and 28 are merge tasks. Finally, task 33 is a DMA task that sends back the results. The SCMP architecture targets applications having dynamic behavior like vision systems or content understanding. 4.2 Dataflow Implementation Starting from a monolithic application code executed on a single processor, we have performed a code profiling to partition the application. The profiling results and the pipeline organization lead us to a set of five tasks communicating through FIFOs (First-In, First-Out) in a dataflow mode complying with the SCMP programming model (Figure 4). The first stage (A) can be seen as the user interface. It catches the OpenGL API to draw the scene. All the OpenGL commands are encoded and sent with the vertices (v0, v1) to the second stage of the pipeline trough a FIFO. Figure 4: Dataflow implementation of the rendering pipeline. The second stage (G) is the geometry stage. It receives the commands from the application stage and executes the commands related to geometry, otherwise the commands are sent to the next stage. First the geometry consists in computing the coordinates related to the different origins. Then it performs shading, culling and clipping. Finally the triangles (t0, t1) are sent to the rasterizer stage. The rasterizer (or setup) stage (R) is in charge of preparing the rendering of the triangles. Its main job is to divide the triangles into pixels blocks (tiles). This stage also computes the texture mapping on the tiles. Then the tiles (b0, b1) are sent to the fragment stage. The fragment stage (F) computes the final color of each pixel by interpolating the corners parameters and applying the texture. For the tile crossing the triangle borders, it checks whether each pixel is inside the triangle before starting to render it or not. Finally it sends the computed tiles to the display stage. The display stage (D) is in charge of the blend and the depth test steps. It is also the only stage that has access to the framebuffer to write the pixel color and depth. The rendering pipeline has been implemented as a five stage pipeline. Each has its own parameters which can be dynamically modified. The implementation has been done in the SESAM simulator [13] using FIFOs with a size of thirtytwo data between stages, one MIPS (Microprocessor without Interlocked Pipeline Stages) processor for each stage and one DMA (Direct Memory Access) to load textures. 4.3 The Parallel and Dataflow Implementation In order to leverage all the processing in SCMP, the application has been parallelized by duplicating the entire pipeline or only parts of the pipeline, for example the geometry or the fragment stages. Finally the application uses two types of parallelism: thread and data parallelisms. To parallelize a stage, the data have to be fairly dispatched between the parallelized stages. Furthermore this also requires merging the data after the computation. A key point is that the data need to be kept in order. Moreover a stage can generate a variable amount of data for each processed data. The input data are written in the FIFOs in a round robin manner. They are followed by a tag that specifies that the next data is in the second FIFO, like in chained lists. This allows the merge stage to reorder the generated data by following the tags and fetch the data in the appropriate FIFO.

4 Figure 5: Implementation of the rendering pipeline with two parallel fragment stages. Different pipelines with distinct parallelism levels have been implemented. Figure 5 shows a pipeline with two parallel fragment tasks, the rasterizer task dispatches the data to the fragment tasks as described before. The display task merges the data by following the information provided by the tags inserted by the rasterizer task. Figure 7: Time to compute each data for the Geometry and the Display stages (for the first hundred data of the cube scene). Figure 6: Implementation of the rendering pipeline with two parallel rasterizer and fragment stages. Figure 6 shows a pipeline with two parallel rasterizer tasks and fragment tasks. The geometry task dispatches the triangles to the two rasterizers. The rasterizer tasks generate a variable amount of data that are sent to the fragment tasks. They also broadcast the tags in order to keep the information about the order of the data for the display task. The next section presents the results of the implementation of the rendering pipeline on the SESAM simulator. 5. RESULTS The SCMP architecture is modeled within the SESAM [10] framework to allow a fast exploration of the execution model with accurate results. The SESAM framework allows the exploration of asymmetric multi-core architecture at TLM (Transactional Level Modeling). The components can be parameterized for defining the memory map, the amount of processors, the cache parameters, the number of memories and their sizes, etc. SESAM produces a set of simulation statistics about: cache miss rates, memory allocation history, processor occupation rate, amount of preemptions, network bandwidth, etc. The implemented applications have been executed on the simulator in order to measure the impact of the input scenes on the pipeline complexity. In the first subsection these scenes are described. The second subsection details the execution profiling results that are analyzed in the next two subsections. Figure 8: Time to compute each data for the rasterizer and fragment stages (for the first hundred data of the cube scene). 5.1 Scenes Descriptions Used as Scenarios Three different scenes have been chosen as execution scenarios. The first one was a rotating cube made of twelve textured triangles. The cube rotates by 45 degrees in the x and y axis between each frames. Secondly a vertically translating sphere from the bottom to the top made of a set of one hundred triangles has been implemented. In the first and last frames, the sphere is completely out of the camera s point of view. Between these extremes, the sphere is partially or completely visible. Finally the third scenario was a set of appearing entities made of five triangles with increasing amounts, random positions and sizes. This example also uses the blend stage in order to allow transparency between entities. 5.2 Profiling To identify the slower part of the application, the time to compute each scene was measured for each stage and for each data. To do so, the processing of each data has been annotated so that it gives precise information about the

5 Figure 9: Amount of data per stage for the different scenes. varying computing requirements according to the input data set (the scene). Figure 7 and Figure 8 show the time needed to compute the hundred first data for the cube scene. The geometry and display stages are less compute intensive than the other stages. They need from 5000 to cycles, while the rasterizer and fragment stages need from a few to more than cycles. Moreover, the amount of data between the stages varies depending on the scene. Figure 9 shows the differences in the repartition of the data between the stages for the different scenes. For the cube scene, the rasterizer stage computes six triangles and the fragment ninety five blocks. For the sphere, the rasterizer stage computes one hundred and fourteen triangles and the fragment stage three hundred and thirteen. These profiling results show that the time required to compute each data and the amount of data vary a lot between the stages and they also depend on the rendered scene. The communication time is not displayed here. Figure 10: Cycles required to render frames for each stage on the cube scene. Figure 11: Cycles required to render frames for each stage on the sphere scene. 5.3 Dataflow Mode The scenes have also been benchmarked at the frame level in order to show the computation disparities between the different frames of the same scene. Figure 10 shows the amount of cycles per stage required to render each frame without taking into account the communication time. It shows that the fragment is the most compute intensive stage. This stage takes from four million cycles to more than nine million cycles per frame and the other stages require up to one and an half million cycles. This is due to the fact that the cube is composed of twelve triangles and each triangle takes a big amount of pixels. So the rasterizer stage cuts each triangle into up to fifteen tiles and this brings a big amount of tiles that have to be computed by the fragment stage. Furthermore each triangle is textured, that also adds a lot of work to the fragment stage. The disparity in the amount of cycles required to render each frame (difference between rendering time of frames 7, 12, 13) is due to the variation in the amount of triangles visible from the camera s point of view because of the cube rotation. Figure 11 shows the same results for the sphere scene. The amount of cycles required for the geometry stage slightly varies from seven hundred thousand to one million cycles. This is due to the variation of the triangles visible to the user and to the computation needed to render the triangles that cross the camera s view borders. The rasterizer and fragment stages also vary according to the amount of triangles visible to the user. The amounts of cycles required for these two stages are close because the triangles are very small and there is approximately one or two block(s) per triangle. This means that these two stages have almost the same amount of data to compute. The amount of cycles for the display stage varies according to the amount of data sent from the fragment stages. Figure 12 shows the results for the entities scene. As the amount of entities rendered for each frame increases, we can

6 Figure 12: Cycles required to render frames for each stage on the entities scene. Figure 14: Cycles required to render frames for each stage on the sphere scene using two rasterizer and four fragment tasks. Figure 13: Cycles required to render frames for each stage on the cube scene using six fragment tasks. see that the amount of cycles required increases linearly with the amount of entities in the screen. The fragment stage complexity is dependent on the amount of entities and is the most compute intensive stage. The display stage needs up to three million cycles for one frame because the entities scene needs to activate the blending stage to render transparency. The simulation results show that the load of each stage is dependent on the rendered scene and even within the same scene, the load requirements can also evolve during the rendering on different frames. For example, up to 90 % of the time needed to render the cube scene is dedicated to the fragment stage. In the case of the sphere scene, the fragment stage uses almost 40 % of the global time. However the variations are more predictable between the different frames of a given scene than between the different data. To efficiently balance the computation along the different stages of the pipeline, the slowest stages have to be accelerated. This can be done by adding more computing tasks for one stage. For the cube scene, the fragment stage is clearly the slowest. So it needs to be accelerated by using two or more tasks for the fragment stage in order to dispatch the amount of work along different computing resources. The optimal amount of tasks for each pipeline stage is Figure 15: Cycles required to render frames for each stage on the entities scene with four fragment stages. dependent on the amount of work, and thus depends on the rendered scene. This is the object of the next subsection. 5.4 Parallel and Dataflow Modes The simulation results have shown a significant disparity in the choice of the optimal parallelization of the pipeline stages for the different scenes. To efficiently balance the computation among the pipeline, different versions of the rendering pipeline have been implemented (e.g. the example showed in Figure 6) based on the profiling results. For the cube scene, the fragment stage is clearly the most compute intensive stage, so a new version with six fragment tasks has been implemented in the same manner as the one showed in Figure 5. The sphere case is more compute intensive for the rasterizer and the fragment stages. Therefore, these stages require to be duplicated like in Figure 6. The sphere scene needs two rasterizer tasks and four fragment tasks. The entities scene is even more computational intensive for the fragment stage, since it needs four fragment stages. The results for the cube scene with six fragment stages are shown in Figure 13. The amount of work is equitably spread along the fragment tasks and along the other pipeline

7 stages. The fragment tasks take up to two million cycles, in the dataflow implementation it took up to ten million cycles. The fragment and rasterizer stages of the pipeline are now balanced. The display task takes from two hundred thousand cycles to one million, and the geometry task takes one hundred thousand cycles and does not vary. Figure 14 shows the results for the sphere scene using two rasterizer and four fragment stages. As before, the computation is spread between the two fragment and the rasterizer tasks. The second rasterizer task is more computational intensive than the first one, since it depends on the incoming data from the geometry task. The geometry and display stages remain unchanged. The results for the entities scene are shown in Figure 15. The pipeline stages tasks are now balanced with the geometry and display stages. The display stage is in charge of merging data and cannot be duplicated. To efficiently balance the pipeline to render the cube scene, six fragment stages are needed. The sphere scene needs two rasterizer stages and four fragment stages. Finally the entities scene needs two rasterizer stages and four fragment stages. The mapping of the rendering application with a static parallelism able to support efficiently each scene is thus clearly unfeasible and a static data parallelism is clearly not the solution. Moreover this mapping is also unpredictable, because it is data dependent. The balanced versions of the pipeline globally improve the overall performance. With taking into account the communication times, the parallelized cube version renders at 148fps (frames per second) compared to 91fps for the dataflow only version. The parallelized sphere version renders at 142fps compared with the dataflow version that renders at 71fps. Finally the parallelized and dataflow version of the entities scene renders only at 75fps while it is rendered at 69fps for the dataflow version. This is due to FIFO contention problems due to the data dispatch methodology. 6. CONCLUSIONS AND PERSPECTIVES This paper shows some figures on the profiling of a graphic rendering application implementation on a multi-core architecture. It provides interesting insights on the highly non-stationary workloads of the different pipeline stages through experiments on three distinct rendering scenes. It points out the need of a dynamic adaptation of the pipeline stage parallelism to improve the performance. The idea is to link the resource allocation process to the amount of data required to compute in each stage by monitoring buffer usage for example. Looking at a scene level the load requirements can be predicted from the preceding scenes in order to balance the pipeline for the next frame. The more global objective of this work is to define an architecture able to support a variety of mobile applications, each application having different performance requirements that can evolve dynamically. These mobile applications may require specific accelerators, so the architecture needs to support the mapping of tasks and the dynamic parallelism adjustment with different computing resources and low power constraints. The SCMP architecture uses a static graph which describes the application task dependencies. To support a dynamic load balancing, this graph has to be modified to describe the possible adaptation stages. Future work involves the definition of an efficient load balancing algorithm by taking into account the cost of creating/removing tasks and the data transfers in the context of heterogeneous computing resources. 7. REFERENCES [1] C.H. Van Berkel, Multi-Core for Mobile Phones, Proceedings of the IEEE/ACM Conference on Design, Automation and Test in Europe DATE 09, pp , [2] Tomas Akenine-Moller, Eric Haines, and Naty Hoffman. Real- Time Rendering, Third Edition. A K Peters/CRC Press, July [3] OpenGL ES low level API. [4] Bren C. Mochocki, Kanishka Lahiri, Srihari Cadambi, X. Sharon Hu Signature-Based Workload Estimation for Mobile 3D Graphics, Proceedings of the 43rd IEEE/ACM Design Automation Conference DAC, pp , [5] Jeong-Ho Woo, Sohn Ju-Ho ; Kim Hyejung ; Yoo Hoi-Jun, A 195 mw, 9.1 Mvertices/s Fully Programmable 3-D Graphics Processor for Low-Power Mobile Devices, IEEE Journal of Solid State Circuits, vol. 43, no. 11, pp , [6] ARM Mali [7] PowerVR SGX Series 5 [8] Imagination submits POWERVR SGX cores for OpenCL conformance [9] ARM Mali-T604 [10] W. Plishker, G. Zaki, S. S. Bhattacharyya, C. Clancy, and J. Kuykendall. Applying graphics processor acceleration in a software defined radio prototyping environment. In Proceedings of the International Symposium on Rapid System Prototyping, pages 67-72, Karlsruhe, Germany, May [11] Vuduc, Richard and Chandramowlishwaran, Aparna and Choi, Jee and Guney, Murat and Shringarpure, Aashay, On the limits of GPU acceleration, Proceedings of the 2nd USENIX conference on Hot topics in parallelism, pp 13, Berkeley, CA, USA, [12] N. Ventroux, R. David, SCMP Architecture: An Asymmetric Multiprocessor System-on-Chip for Dynamic Applications, Proceedings of the ACM International Forum on Next Generation Multicore/Manycore Technologies (IFMT), pp. 6, Saint-Malo, France, [13] N. Ventroux, A. Guerre, T. Sassolas, L. Moutaoukil, G. Blanc, C. Bechara and R. David, SESAM: an MPSoC Simulation Environment for Dynamic Application Processing, Proceedings of the IEEE 10th International Conference on Embedded Software and Systems (ICESS), pp , Bradford, UK, 2010.

LPGPU Workshop on Power-Efficient GPU and Many-core Computing (PEGPUM 2014)

LPGPU Workshop on Power-Efficient GPU and Many-core Computing (PEGPUM 2014) A practitioner s view of challenges faced with power and performance on mobile GPU Prashant Sharma Samsung R&D Institute UK LPGPU Workshop on Power-Efficient GPU and Many-core Computing (PEGPUM 2014) SERI

More information

PowerVR Hardware. Architecture Overview for Developers

PowerVR Hardware. Architecture Overview for Developers Public Imagination Technologies PowerVR Hardware Public. This publication contains proprietary information which is subject to change without notice and is supplied 'as is' without warranty of any kind.

More information

Spring 2011 Prof. Hyesoon Kim

Spring 2011 Prof. Hyesoon Kim Spring 2011 Prof. Hyesoon Kim Application Geometry Rasterizer CPU Each stage cane be also pipelined The slowest of the pipeline stage determines the rendering speed. Frames per second (fps) Executes on

More information

Spring 2009 Prof. Hyesoon Kim

Spring 2009 Prof. Hyesoon Kim Spring 2009 Prof. Hyesoon Kim Application Geometry Rasterizer CPU Each stage cane be also pipelined The slowest of the pipeline stage determines the rendering speed. Frames per second (fps) Executes on

More information

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload) Lecture 2: Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload) Visual Computing Systems Today Finishing up from last time Brief discussion of graphics workload metrics

More information

Architectures. Michael Doggett Department of Computer Science Lund University 2009 Tomas Akenine-Möller and Michael Doggett 1

Architectures. Michael Doggett Department of Computer Science Lund University 2009 Tomas Akenine-Möller and Michael Doggett 1 Architectures Michael Doggett Department of Computer Science Lund University 2009 Tomas Akenine-Möller and Michael Doggett 1 Overview of today s lecture The idea is to cover some of the existing graphics

More information

PowerVR Series5. Architecture Guide for Developers

PowerVR Series5. Architecture Guide for Developers Public Imagination Technologies PowerVR Series5 Public. This publication contains proprietary information which is subject to change without notice and is supplied 'as is' without warranty of any kind.

More information

Case 1:17-cv SLR Document 1-3 Filed 01/23/17 Page 1 of 33 PageID #: 60 EXHIBIT C

Case 1:17-cv SLR Document 1-3 Filed 01/23/17 Page 1 of 33 PageID #: 60 EXHIBIT C Case 1:17-cv-00064-SLR Document 1-3 Filed 01/23/17 Page 1 of 33 PageID #: 60 EXHIBIT C Case 1:17-cv-00064-SLR Document 1-3 Filed 01/23/17 Page 2 of 33 PageID #: 61 U.S. Patent No. 7,633,506 VIZIO / Sigma

More information

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload) Lecture 2: Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload) Visual Computing Systems Analyzing a 3D Graphics Workload Where is most of the work done? Memory Vertex

More information

Windowing System on a 3D Pipeline. February 2005

Windowing System on a 3D Pipeline. February 2005 Windowing System on a 3D Pipeline February 2005 Agenda 1.Overview of the 3D pipeline 2.NVIDIA software overview 3.Strengths and challenges with using the 3D pipeline GeForce 6800 220M Transistors April

More information

Rasterization Overview

Rasterization Overview Rendering Overview The process of generating an image given a virtual camera objects light sources Various techniques rasterization (topic of this course) raytracing (topic of the course Advanced Computer

More information

Scheduling the Graphics Pipeline on a GPU

Scheduling the Graphics Pipeline on a GPU Lecture 20: Scheduling the Graphics Pipeline on a GPU Visual Computing Systems Today Real-time 3D graphics workload metrics Scheduling the graphics pipeline on a modern GPU Quick aside: tessellation Triangle

More information

Optimizing and Profiling Unity Games for Mobile Platforms. Angelo Theodorou Senior Software Engineer, MPG Gamelab 2014, 25 th -27 th June

Optimizing and Profiling Unity Games for Mobile Platforms. Angelo Theodorou Senior Software Engineer, MPG Gamelab 2014, 25 th -27 th June Optimizing and Profiling Unity Games for Mobile Platforms Angelo Theodorou Senior Software Engineer, MPG Gamelab 2014, 25 th -27 th June 1 Agenda Introduction ARM and the presenter Preliminary knowledge

More information

CS451Real-time Rendering Pipeline

CS451Real-time Rendering Pipeline 1 CS451Real-time Rendering Pipeline JYH-MING LIEN DEPARTMENT OF COMPUTER SCIENCE GEORGE MASON UNIVERSITY Based on Tomas Akenine-Möller s lecture note You say that you render a 3D 2 scene, but what does

More information

Towards an automatic co-generator for manycores. architecture and runtime: STHORM case-study

Towards an automatic co-generator for manycores. architecture and runtime: STHORM case-study Procedia Computer Science Towards an automatic co-generator for manycores Volume 51, 2015, Pages 2809 2813 architecture and runtime: STHORM case-study ICCS 2015 International Conference On Computational

More information

Portland State University ECE 588/688. Graphics Processors

Portland State University ECE 588/688. Graphics Processors Portland State University ECE 588/688 Graphics Processors Copyright by Alaa Alameldeen 2018 Why Graphics Processors? Graphics programs have different characteristics from general purpose programs Highly

More information

CS427 Multicore Architecture and Parallel Computing

CS427 Multicore Architecture and Parallel Computing CS427 Multicore Architecture and Parallel Computing Lecture 6 GPU Architecture Li Jiang 2014/10/9 1 GPU Scaling A quiet revolution and potential build-up Calculation: 936 GFLOPS vs. 102 GFLOPS Memory Bandwidth:

More information

A Reconfigurable Architecture for Load-Balanced Rendering

A Reconfigurable Architecture for Load-Balanced Rendering A Reconfigurable Architecture for Load-Balanced Rendering Jiawen Chen Michael I. Gordon William Thies Matthias Zwicker Kari Pulli Frédo Durand Graphics Hardware July 31, 2005, Los Angeles, CA The Load

More information

Structure. Woo-Chan Park, Kil-Whan Lee, Seung-Gi Lee, Moon-Hee Choi, Won-Jong Lee, Cheol-Ho Jeong, Byung-Uck Kim, Woo-Nam Jung,

Structure. Woo-Chan Park, Kil-Whan Lee, Seung-Gi Lee, Moon-Hee Choi, Won-Jong Lee, Cheol-Ho Jeong, Byung-Uck Kim, Woo-Nam Jung, A High Performance 3D Graphics Rasterizer with Effective Memory Structure Woo-Chan Park, Kil-Whan Lee, Seung-Gi Lee, Moon-Hee Choi, Won-Jong Lee, Cheol-Ho Jeong, Byung-Uck Kim, Woo-Nam Jung, Il-San Kim,

More information

Lecture 25: Board Notes: Threads and GPUs

Lecture 25: Board Notes: Threads and GPUs Lecture 25: Board Notes: Threads and GPUs Announcements: - Reminder: HW 7 due today - Reminder: Submit project idea via (plain text) email by 11/24 Recap: - Slide 4: Lecture 23: Introduction to Parallel

More information

Vulkan: Architecture positive How Vulkan maps to PowerVR GPUs Kevin sun Lead Developer Support Engineer, APAC PowerVR Graphics.

Vulkan: Architecture positive How Vulkan maps to PowerVR GPUs Kevin sun Lead Developer Support Engineer, APAC PowerVR Graphics. Vulkan: Architecture positive How Vulkan maps to PowerVR GPUs Kevin sun Lead Developer Support Engineer, APAC PowerVR Graphics www.imgtec.com Introduction Who am I? Kevin Sun Working at Imagination Technologies

More information

Graphics Architectures and OpenCL. Michael Doggett Department of Computer Science Lund university

Graphics Architectures and OpenCL. Michael Doggett Department of Computer Science Lund university Graphics Architectures and OpenCL Michael Doggett Department of Computer Science Lund university Overview Parallelism Radeon 5870 Tiled Graphics Architectures Important when Memory and Bandwidth limited

More information

Course Recap + 3D Graphics on Mobile GPUs

Course Recap + 3D Graphics on Mobile GPUs Lecture 18: Course Recap + 3D Graphics on Mobile GPUs Interactive Computer Graphics Q. What is a big concern in mobile computing? A. Power Two reasons to save power Run at higher performance for a fixed

More information

GeForce4. John Montrym Henry Moreton

GeForce4. John Montrym Henry Moreton GeForce4 John Montrym Henry Moreton 1 Architectural Drivers Programmability Parallelism Memory bandwidth 2 Recent History: GeForce 1&2 First integrated geometry engine & 4 pixels/clk Fixed-function transform,

More information

Bringing AAA graphics to mobile platforms. Niklas Smedberg Senior Engine Programmer, Epic Games

Bringing AAA graphics to mobile platforms. Niklas Smedberg Senior Engine Programmer, Epic Games Bringing AAA graphics to mobile platforms Niklas Smedberg Senior Engine Programmer, Epic Games Who Am I A.k.a. Smedis Platform team at Epic Games Unreal Engine 15 years in the industry 30 years of programming

More information

Real - Time Rendering. Graphics pipeline. Michal Červeňanský Juraj Starinský

Real - Time Rendering. Graphics pipeline. Michal Červeňanský Juraj Starinský Real - Time Rendering Graphics pipeline Michal Červeňanský Juraj Starinský Overview History of Graphics HW Rendering pipeline Shaders Debugging 2 History of Graphics HW First generation Second generation

More information

Distributed Virtual Reality Computation

Distributed Virtual Reality Computation Jeff Russell 4/15/05 Distributed Virtual Reality Computation Introduction Virtual Reality is generally understood today to mean the combination of digitally generated graphics, sound, and input. The goal

More information

Threading Hardware in G80

Threading Hardware in G80 ing Hardware in G80 1 Sources Slides by ECE 498 AL : Programming Massively Parallel Processors : Wen-Mei Hwu John Nickolls, NVIDIA 2 3D 3D API: API: OpenGL OpenGL or or Direct3D Direct3D GPU Command &

More information

Vertex Shader Design I

Vertex Shader Design I The following content is extracted from the paper shown in next page. If any wrong citation or reference missing, please contact ldvan@cs.nctu.edu.tw. I will correct the error asap. This course used only

More information

Adaptive Point Cloud Rendering

Adaptive Point Cloud Rendering 1 Adaptive Point Cloud Rendering Project Plan Final Group: May13-11 Christopher Jeffers Eric Jensen Joel Rausch Client: Siemens PLM Software Client Contact: Michael Carter Adviser: Simanta Mitra 4/29/13

More information

A 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications

A 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications A 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications Ju-Ho Sohn, Jeong-Ho Woo, Min-Wuk Lee, Hye-Jung Kim, Ramchan Woo, Hoi-Jun Yoo Semiconductor System

More information

Mali-400 MP: A Scalable GPU for Mobile Devices Tom Olson

Mali-400 MP: A Scalable GPU for Mobile Devices Tom Olson Mali-400 MP: A Scalable GPU for Mobile Devices Tom Olson Director, Graphics Research, ARM Outline ARM and Mobile Graphics Design Constraints for Mobile GPUs Mali Architecture Overview Multicore Scaling

More information

A SXGA 3D Display Processor with Reduced Rendering Data and Enhanced Precision

A SXGA 3D Display Processor with Reduced Rendering Data and Enhanced Precision A SXGA 3D Display Processor with Reduced Rendering Data and Enhanced Precision Seok-Hoon Kim KAIST, Daejeon, Republic of Korea I. INTRODUCTION Recently, there has been tremendous progress in 3D graphics

More information

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology CS8803SC Software and Hardware Cooperative Computing GPGPU Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology Why GPU? A quiet revolution and potential build-up Calculation: 367

More information

The Rasterization Pipeline

The Rasterization Pipeline Lecture 5: The Rasterization Pipeline (and its implementation on GPUs) Computer Graphics CMU 15-462/15-662, Fall 2015 What you know how to do (at this point in the course) y y z x (w, h) z x Position objects

More information

Working with Metal Overview

Working with Metal Overview Graphics and Games #WWDC14 Working with Metal Overview Session 603 Jeremy Sandmel GPU Software 2014 Apple Inc. All rights reserved. Redistribution or public display not permitted without written permission

More information

Dave Shreiner, ARM March 2009

Dave Shreiner, ARM March 2009 4 th Annual Dave Shreiner, ARM March 2009 Copyright Khronos Group, 2009 - Page 1 Motivation - What s OpenGL ES, and what can it do for me? Overview - Lingo decoder - Overview of the OpenGL ES Pipeline

More information

The Bifrost GPU architecture and the ARM Mali-G71 GPU

The Bifrost GPU architecture and the ARM Mali-G71 GPU The Bifrost GPU architecture and the ARM Mali-G71 GPU Jem Davies ARM Fellow and VP of Technology Hot Chips 28 Aug 2016 Introduction to ARM Soft IP ARM licenses Soft IP cores (amongst other things) to our

More information

Current Trends in Computer Graphics Hardware

Current Trends in Computer Graphics Hardware Current Trends in Computer Graphics Hardware Dirk Reiners University of Louisiana Lafayette, LA Quick Introduction Assistant Professor in Computer Science at University of Louisiana, Lafayette (since 2006)

More information

CS 4620 Program 3: Pipeline

CS 4620 Program 3: Pipeline CS 4620 Program 3: Pipeline out: Wednesday 14 October 2009 due: Friday 30 October 2009 1 Introduction In this assignment, you will implement several types of shading in a simple software graphics pipeline.

More information

Rendering Objects. Need to transform all geometry then

Rendering Objects. Need to transform all geometry then Intro to OpenGL Rendering Objects Object has internal geometry (Model) Object relative to other objects (World) Object relative to camera (View) Object relative to screen (Projection) Need to transform

More information

Enabling immersive gaming experiences Intro to Ray Tracing

Enabling immersive gaming experiences Intro to Ray Tracing Enabling immersive gaming experiences Intro to Ray Tracing Overview What is Ray Tracing? Why Ray Tracing? PowerVR Wizard Architecture Example Content Unity Hybrid Rendering Demonstration 3 What is Ray

More information

Real-Time Rendering (Echtzeitgraphik) Michael Wimmer

Real-Time Rendering (Echtzeitgraphik) Michael Wimmer Real-Time Rendering (Echtzeitgraphik) Michael Wimmer wimmer@cg.tuwien.ac.at Walking down the graphics pipeline Application Geometry Rasterizer What for? Understanding the rendering pipeline is the key

More information

A Low Cost Tile-based 3D Graphics Full Pipeline with Real-time Performance Monitoring Support for OpenGL ES in Consumer Electronics

A Low Cost Tile-based 3D Graphics Full Pipeline with Real-time Performance Monitoring Support for OpenGL ES in Consumer Electronics A Low Cost Tile-based 3 Graphics Full Pipeline with Real-time Performance Monitoring Support for OpenGL ES in Consumer Electronics Ruei-Ting Gu, Tse-Chen Yeh, Wei-Sheng Hunag, Ting-Yun Huang, Chung-Hua

More information

EECS 487: Interactive Computer Graphics

EECS 487: Interactive Computer Graphics EECS 487: Interactive Computer Graphics Lecture 21: Overview of Low-level Graphics API Metal, Direct3D 12, Vulkan Console Games Why do games look and perform so much better on consoles than on PCs with

More information

Mali Developer Resources. Kevin Ho ARM Taiwan FAE

Mali Developer Resources. Kevin Ho ARM Taiwan FAE Mali Developer Resources Kevin Ho ARM Taiwan FAE ARM Mali Developer Tools Software Development SDKs for OpenGL ES & OpenCL OpenGL ES Emulators Shader Development Studio Shader Library Asset Creation Texture

More information

GPU Architecture and Function. Michael Foster and Ian Frasch

GPU Architecture and Function. Michael Foster and Ian Frasch GPU Architecture and Function Michael Foster and Ian Frasch Overview What is a GPU? How is a GPU different from a CPU? The graphics pipeline History of the GPU GPU architecture Optimizations GPU performance

More information

Why modern versions of OpenGL should be used Some useful API commands and extensions

Why modern versions of OpenGL should be used Some useful API commands and extensions Michał Radziszewski Why modern versions of OpenGL should be used Some useful API commands and extensions Timer Query EXT Direct State Access (DSA) Geometry Programs Position in pipeline Rendering wireframe

More information

Graphics Hardware. Instructor Stephen J. Guy

Graphics Hardware. Instructor Stephen J. Guy Instructor Stephen J. Guy Overview What is a GPU Evolution of GPU GPU Design Modern Features Programmability! Programming Examples Overview What is a GPU Evolution of GPU GPU Design Modern Features Programmability!

More information

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University CSE 591/392: GPU Programming Introduction Klaus Mueller Computer Science Department Stony Brook University First: A Big Word of Thanks! to the millions of computer game enthusiasts worldwide Who demand

More information

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono Introduction to CUDA Algoritmi e Calcolo Parallelo References q This set of slides is mainly based on: " CUDA Technical Training, Dr. Antonino Tumeo, Pacific Northwest National Laboratory " Slide of Applied

More information

Copyright Khronos Group Page 1

Copyright Khronos Group Page 1 Gaming Market Briefing Overview of APIs GDC March 2016 Neil Trevett Khronos President NVIDIA Vice President Developer Ecosystem ntrevett@nvidia.com @neilt3d Copyright Khronos Group 2016 - Page 1 Copyright

More information

2D/3D Graphics Accelerator for Mobile Multimedia Applications. Ramchan Woo, Sohn, Seong-Jun Song, Young-Don

2D/3D Graphics Accelerator for Mobile Multimedia Applications. Ramchan Woo, Sohn, Seong-Jun Song, Young-Don RAMP-IV: A Low-Power and High-Performance 2D/3D Graphics Accelerator for Mobile Multimedia Applications Woo, Sungdae Choi, Ju-Ho Sohn, Seong-Jun Song, Young-Don Bae,, and Hoi-Jun Yoo oratory Dept. of EECS,

More information

Graphics and Imaging Architectures

Graphics and Imaging Architectures Graphics and Imaging Architectures Kayvon Fatahalian http://www.cs.cmu.edu/afs/cs/academic/class/15869-f11/www/ About Kayvon New faculty, just arrived from Stanford Dissertation: Evolving real-time graphics

More information

Graphics Processing Unit Architecture (GPU Arch)

Graphics Processing Unit Architecture (GPU Arch) Graphics Processing Unit Architecture (GPU Arch) With a focus on NVIDIA GeForce 6800 GPU 1 What is a GPU From Wikipedia : A specialized processor efficient at manipulating and displaying computer graphics

More information

Next-Generation Graphics on Larrabee. Tim Foley Intel Corp

Next-Generation Graphics on Larrabee. Tim Foley Intel Corp Next-Generation Graphics on Larrabee Tim Foley Intel Corp Motivation The killer app for GPGPU is graphics We ve seen Abstract models for parallel programming How those models map efficiently to Larrabee

More information

GPU Memory Model Overview

GPU Memory Model Overview GPU Memory Model Overview John Owens University of California, Davis Department of Electrical and Computer Engineering Institute for Data Analysis and Visualization SciDAC Institute for Ultrascale Visualization

More information

Overview. Think Silicon is a privately held company founded in 2007 by the core team of Atmel MMC IC group

Overview. Think Silicon is a privately held company founded in 2007 by the core team of Atmel MMC IC group Nema An OpenGL & OpenCL Embedded Programmable Engine Georgios Keramidas & Iakovos Stamoulis Think Silicon mobile GRAPHICS Overview Think Silicon is a privately held company founded in 2007 by the core

More information

Water Simulation on WebGL and Three.js

Water Simulation on WebGL and Three.js The University of Southern Mississippi The Aquila Digital Community Honors Theses Honors College 5-2013 Water Simulation on WebGL and Three.js Kerim J. Pereira Follow this and additional works at: http://aquila.usm.edu/honors_theses

More information

A SXGA 3D Display Processor with Reduced Rendering Data and Enhanced Precision. Seok-Hoon Kim MVLSI Lab., KAIST

A SXGA 3D Display Processor with Reduced Rendering Data and Enhanced Precision. Seok-Hoon Kim MVLSI Lab., KAIST A SXGA 3D Display Processor with Reduced Rendering Data and Enhanced Precision Seok-Hoon Kim MVLSI Lab., KAIST Contents Background Motivation 3D Graphics + 3D Display Previous Works Conventional 3D Image

More information

Cornell University CS 569: Interactive Computer Graphics. Introduction. Lecture 1. [John C. Stone, UIUC] NASA. University of Calgary

Cornell University CS 569: Interactive Computer Graphics. Introduction. Lecture 1. [John C. Stone, UIUC] NASA. University of Calgary Cornell University CS 569: Interactive Computer Graphics Introduction Lecture 1 [John C. Stone, UIUC] 2008 Steve Marschner 1 2008 Steve Marschner 2 NASA University of Calgary 2008 Steve Marschner 3 2008

More information

Enhancing Traditional Rasterization Graphics with Ray Tracing. October 2015

Enhancing Traditional Rasterization Graphics with Ray Tracing. October 2015 Enhancing Traditional Rasterization Graphics with Ray Tracing October 2015 James Rumble Developer Technology Engineer, PowerVR Graphics Overview Ray Tracing Fundamentals PowerVR Ray Tracing Pipeline Using

More information

The Graphics Pipeline

The Graphics Pipeline The Graphics Pipeline Ray Tracing: Why Slow? Basic ray tracing: 1 ray/pixel Ray Tracing: Why Slow? Basic ray tracing: 1 ray/pixel But you really want shadows, reflections, global illumination, antialiasing

More information

WebGL (Web Graphics Library) is the new standard for 3D graphics on the Web, designed for rendering 2D graphics and interactive 3D graphics.

WebGL (Web Graphics Library) is the new standard for 3D graphics on the Web, designed for rendering 2D graphics and interactive 3D graphics. About the Tutorial WebGL (Web Graphics Library) is the new standard for 3D graphics on the Web, designed for rendering 2D graphics and interactive 3D graphics. This tutorial starts with a basic introduction

More information

Introduction to Multicore architecture. Tao Zhang Oct. 21, 2010

Introduction to Multicore architecture. Tao Zhang Oct. 21, 2010 Introduction to Multicore architecture Tao Zhang Oct. 21, 2010 Overview Part1: General multicore architecture Part2: GPU architecture Part1: General Multicore architecture Uniprocessor Performance (ECint)

More information

POWERVR MBX & SGX OpenVG Support and Resources

POWERVR MBX & SGX OpenVG Support and Resources POWERVR MBX & SGX OpenVG Support and Resources Kristof Beets 3 rd Party Relations Manager - Imagination Technologies kristof.beets@imgtec.com Copyright Khronos Group, 2006 - Page 1 Copyright Khronos Group,

More information

Performance Analysis and Culling Algorithms

Performance Analysis and Culling Algorithms Performance Analysis and Culling Algorithms Michael Doggett Department of Computer Science Lund University 2009 Tomas Akenine-Möller and Michael Doggett 1 Assignment 2 Sign up for Pluto labs on the web

More information

GPGPU on Mobile Devices

GPGPU on Mobile Devices GPGPU on Mobile Devices Introduction Addressing GPGPU for very mobile devices Tablets Smartphones Introduction Why dedicated GPUs in mobile devices? Gaming Physics simulation for realistic effects 3D-GUI

More information

Lecture 2. Shaders, GLSL and GPGPU

Lecture 2. Shaders, GLSL and GPGPU Lecture 2 Shaders, GLSL and GPGPU Is it interesting to do GPU computing with graphics APIs today? Lecture overview Why care about shaders for computing? Shaders for graphics GLSL Computing with shaders

More information

Case 1:17-cv SLR Document 1-4 Filed 01/23/17 Page 1 of 30 PageID #: 75 EXHIBIT D

Case 1:17-cv SLR Document 1-4 Filed 01/23/17 Page 1 of 30 PageID #: 75 EXHIBIT D Case 1:17-cv-00065-SLR Document 1-4 Filed 01/23/17 Page 1 of 30 PageID #: 75 EXHIBIT D Case 1:17-cv-00065-SLR Document 1-4 Filed 01/23/17 Page 2 of 30 PageID #: 76 U.S. Patent No. 7,633,506 LG / MediaTek

More information

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller Entertainment Graphics: Virtual Realism for the Masses CSE 591: GPU Programming Introduction Computer games need to have: realistic appearance of characters and objects believable and creative shading,

More information

Hardware-driven visibility culling

Hardware-driven visibility culling Hardware-driven visibility culling I. Introduction 20073114 김정현 The goal of the 3D graphics is to generate a realistic and accurate 3D image. To achieve this, it needs to process not only large amount

More information

Bifrost - The GPU architecture for next five billion

Bifrost - The GPU architecture for next five billion Bifrost - The GPU architecture for next five billion Hessed Choi Senior FAE / ARM ARM Tech Forum June 28 th, 2016 Vulkan 2 ARM 2016 What is Vulkan? A 3D graphics API for the next twenty years Logical successor

More information

A Low Power Multimedia SoC with Fully Programmable 3D Graphics and MPEG4/H.264/JPEG for Mobile Devices

A Low Power Multimedia SoC with Fully Programmable 3D Graphics and MPEG4/H.264/JPEG for Mobile Devices A Low Power Multimedia SoC with Fully Programmable 3D Graphics and MPEG4/H.264/JPEG for Mobile Devices Jeong-Ho Woo, Ju-Ho Sohn, Hyejung Kim, Jongcheol Jeong 1, Euljoo Jeong 1, Suk Joong Lee 1 and Hoi-Jun

More information

Beyond Programmable Shading. Scheduling the Graphics Pipeline

Beyond Programmable Shading. Scheduling the Graphics Pipeline Beyond Programmable Shading Scheduling the Graphics Pipeline Jonathan Ragan-Kelley, MIT CSAIL 9 August 2011 Mike s just showed how shaders can use large, coherent batches of work to achieve high throughput.

More information

CS230 : Computer Graphics Lecture 4. Tamar Shinar Computer Science & Engineering UC Riverside

CS230 : Computer Graphics Lecture 4. Tamar Shinar Computer Science & Engineering UC Riverside CS230 : Computer Graphics Lecture 4 Tamar Shinar Computer Science & Engineering UC Riverside Shadows Shadows for each pixel do compute viewing ray if ( ray hits an object with t in [0, inf] ) then compute

More information

Building scalable 3D applications. Ville Miettinen Hybrid Graphics

Building scalable 3D applications. Ville Miettinen Hybrid Graphics Building scalable 3D applications Ville Miettinen Hybrid Graphics What s going to happen... (1/2) Mass market: 3D apps will become a huge success on low-end and mid-tier cell phones Retro-gaming New game

More information

ARM Multimedia IP: working together to drive down system power and bandwidth

ARM Multimedia IP: working together to drive down system power and bandwidth ARM Multimedia IP: working together to drive down system power and bandwidth Speaker: Robert Kong ARM China FAE Author: Sean Ellis ARM Architect 1 Agenda System power overview Bandwidth, bandwidth, bandwidth!

More information

COMP 4801 Final Year Project. Ray Tracing for Computer Graphics. Final Project Report FYP Runjing Liu. Advised by. Dr. L.Y.

COMP 4801 Final Year Project. Ray Tracing for Computer Graphics. Final Project Report FYP Runjing Liu. Advised by. Dr. L.Y. COMP 4801 Final Year Project Ray Tracing for Computer Graphics Final Project Report FYP 15014 by Runjing Liu Advised by Dr. L.Y. Wei 1 Abstract The goal of this project was to use ray tracing in a rendering

More information

Multi-view Rendering using GPU for 3-D Displays

Multi-view Rendering using GPU for 3-D Displays Multi-view Rendering using GPU for 3-D Displays François de Sorbier Graduate School of Science and Technology Keio University,Japan Email: fdesorbi@hvrl.ics.keio.ac.jp Vincent Nozick Université Paris-Est

More information

S U N G - E U I YO O N, K A I S T R E N D E R I N G F R E E LY A VA I L A B L E O N T H E I N T E R N E T

S U N G - E U I YO O N, K A I S T R E N D E R I N G F R E E LY A VA I L A B L E O N T H E I N T E R N E T S U N G - E U I YO O N, K A I S T R E N D E R I N G F R E E LY A VA I L A B L E O N T H E I N T E R N E T Copyright 2018 Sung-eui Yoon, KAIST freely available on the internet http://sglab.kaist.ac.kr/~sungeui/render

More information

Mali Demos: Behind the Pixels. Stacy Smith

Mali Demos: Behind the Pixels. Stacy Smith Mali Demos: Behind the Pixels Stacy Smith Mali Graphics: Behind the demos Mali Demo Team: Doug Day Stacy Smith (Me) Sylwester Bala Roberto Lopez Mendez PHOTOGRAPH UNAVAILABLE These days I spend more time

More information

Mali-G72 Enabling tomorrow s technology today

Mali-G72 Enabling tomorrow s technology today Mali-G72 Enabling tomorrow s technology today Alan Tsai Senior Regional Marketing Manager Media Processing Group, ARM ARM Tech Forum Taipei July 4 th 2017 Mali High Performance GPU success 2 Mali-G71 in

More information

3D Rendering Pipeline

3D Rendering Pipeline 3D Rendering Pipeline Reference: Real-Time Rendering 3 rd Edition Chapters 2 4 OpenGL SuperBible 6 th Edition Overview Rendering Pipeline Modern CG Inside a Desktop Architecture Shaders Tool Stage Asset

More information

Copyright Khronos Group Page 1. Vulkan Overview. June 2015

Copyright Khronos Group Page 1. Vulkan Overview. June 2015 Copyright Khronos Group 2015 - Page 1 Vulkan Overview June 2015 Copyright Khronos Group 2015 - Page 2 Khronos Connects Software to Silicon Open Consortium creating OPEN STANDARD APIs for hardware acceleration

More information

Graphics Hardware. Graphics Processing Unit (GPU) is a Subsidiary hardware. With massively multi-threaded many-core. Dedicated to 2D and 3D graphics

Graphics Hardware. Graphics Processing Unit (GPU) is a Subsidiary hardware. With massively multi-threaded many-core. Dedicated to 2D and 3D graphics Why GPU? Chapter 1 Graphics Hardware Graphics Processing Unit (GPU) is a Subsidiary hardware With massively multi-threaded many-core Dedicated to 2D and 3D graphics Special purpose low functionality, high

More information

LOD and Occlusion Christian Miller CS Fall 2011

LOD and Occlusion Christian Miller CS Fall 2011 LOD and Occlusion Christian Miller CS 354 - Fall 2011 Problem You want to render an enormous island covered in dense vegetation in realtime [Crysis] Scene complexity Many billions of triangles Many gigabytes

More information

POWERVR MBX. Technology Overview

POWERVR MBX. Technology Overview POWERVR MBX Technology Overview Copyright 2009, Imagination Technologies Ltd. All Rights Reserved. This publication contains proprietary information which is subject to change without notice and is supplied

More information

Transforms 3: Projection Christian Miller CS Fall 2011

Transforms 3: Projection Christian Miller CS Fall 2011 Transforms 3: Projection Christian Miller CS 354 - Fall 2011 Eye coordinates Eye space is the coordinate system at the camera: x right, y up, z out (i.e. looking down -z) [RTR] The setup Once we ve applied

More information

TEAPOT: A Toolset for Evaluating Performance, Power and Image Quality on Mobile Graphics Systems

TEAPOT: A Toolset for Evaluating Performance, Power and Image Quality on Mobile Graphics Systems International Conference on Supercomputing June 2013 TEAPOT: A Toolset for Evaluating Performance, Power and Image Quality on Mobile Graphics Systems Joan-Manuel Parcerisa Polychronis Xekalakis Computer

More information

CS4620/5620: Lecture 14 Pipeline

CS4620/5620: Lecture 14 Pipeline CS4620/5620: Lecture 14 Pipeline 1 Rasterizing triangles Summary 1! evaluation of linear functions on pixel grid 2! functions defined by parameter values at vertices 3! using extra parameters to determine

More information

Profiling and Debugging Games on Mobile Platforms

Profiling and Debugging Games on Mobile Platforms Profiling and Debugging Games on Mobile Platforms Lorenzo Dal Col Senior Software Engineer, Graphics Tools Gamelab 2013, Barcelona 26 th June 2013 Agenda Introduction to Performance Analysis with ARM DS-5

More information

1. Introduction 2. Methods for I/O Operations 3. Buses 4. Liquid Crystal Displays 5. Other Types of Displays 6. Graphics Adapters 7.

1. Introduction 2. Methods for I/O Operations 3. Buses 4. Liquid Crystal Displays 5. Other Types of Displays 6. Graphics Adapters 7. 1. Introduction 2. Methods for I/O Operations 3. Buses 4. Liquid Crystal Displays 5. Other Types of Displays 6. Graphics Adapters 7. Optical Discs 1 Structure of a Graphics Adapter Video Memory Graphics

More information

X. GPU Programming. Jacobs University Visualization and Computer Graphics Lab : Advanced Graphics - Chapter X 1

X. GPU Programming. Jacobs University Visualization and Computer Graphics Lab : Advanced Graphics - Chapter X 1 X. GPU Programming 320491: Advanced Graphics - Chapter X 1 X.1 GPU Architecture 320491: Advanced Graphics - Chapter X 2 GPU Graphics Processing Unit Parallelized SIMD Architecture 112 processing cores

More information

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono Introduction to CUDA Algoritmi e Calcolo Parallelo References This set of slides is mainly based on: CUDA Technical Training, Dr. Antonino Tumeo, Pacific Northwest National Laboratory Slide of Applied

More information

Programming Graphics Hardware

Programming Graphics Hardware Tutorial 5 Programming Graphics Hardware Randy Fernando, Mark Harris, Matthias Wloka, Cyril Zeller Overview of the Tutorial: Morning 8:30 9:30 10:15 10:45 Introduction to the Hardware Graphics Pipeline

More information

Computer Graphics (CS 543) Lecture 10: Normal Maps, Parametrization, Tone Mapping

Computer Graphics (CS 543) Lecture 10: Normal Maps, Parametrization, Tone Mapping Computer Graphics (CS 543) Lecture 10: Normal Maps, Parametrization, Tone Mapping Prof Emmanuel Agu Computer Science Dept. Worcester Polytechnic Institute (WPI) Normal Mapping Store normals in texture

More information

Software Occlusion Culling

Software Occlusion Culling Software Occlusion Culling Abstract This article details an algorithm and associated sample code for software occlusion culling which is available for download. The technique divides scene objects into

More information

CS130 : Computer Graphics. Tamar Shinar Computer Science & Engineering UC Riverside

CS130 : Computer Graphics. Tamar Shinar Computer Science & Engineering UC Riverside CS130 : Computer Graphics Tamar Shinar Computer Science & Engineering UC Riverside Raster Devices and Images Raster Devices Hearn, Baker, Carithers Raster Display Transmissive vs. Emissive Display anode

More information

A Trip Down The (2011) Rasterization Pipeline

A Trip Down The (2011) Rasterization Pipeline A Trip Down The (2011) Rasterization Pipeline Aaron Lefohn - Intel / University of Washington Mike Houston AMD / Stanford 1 This talk Overview of the real-time rendering pipeline available in ~2011 corresponding

More information