A Low Cost Tile-based 3D Graphics Full Pipeline with Real-time Performance Monitoring Support for OpenGL ES in Consumer Electronics
|
|
- Virgil Chase
- 6 years ago
- Views:
Transcription
1 A Low Cost Tile-based 3 Graphics Full Pipeline with Real-time Performance Monitoring Support for OpenGL ES in Consumer Electronics Ruei-Ting Gu, Tse-Chen Yeh, Wei-Sheng Hunag, Ting-Yun Huang, Chung-Hua Tsai Chung-Nan Lee, Ming-Chao Chiang, Shen-Fu Hsiao, Yun-Nan Chang, Ing-Jer Huang Abstract This paper presents a 3 graphics engine which is specifically designed to minimize the hardware cost while providing sufficient computing capability for consumer electronics with small to medium screen sizes (up to 800x600) such as digital television. The presented 3 engine consists of a fixed full 3 graphics pipeline for both geometry and rendering operation. This engine provides a standard AHB interface that makes it easily to be integrated into an AMBA-based SoC. The development of the 3 engine has gone through a rigorous design process: starting from system modeling (using System-C), RTL implementation, hardware/software co-simulation and FPGA verification to test chip fabrication. This 3 engine provides 8.34M vertices/s and 278M pixels/s in maximum performance at 139 MHz using 0.18 silicon technology with 987K gates that is sufficient for most applications for digital television. At the same time, a complete OpenGL-ES 1.1 API, windowing system, Linux operating system, device driver and a 3 performance monitoring tool have been developed for our 3 engine. This performance monitoring tool provides run-time performance information include frame rate, triangle rate, pixel rate, involved OpenGL function list, function counts, memory utilization and etc. Moreover, a built-in real-time AHB bus tracer is also provided to monitor the bus activities of the 3 engine and other components on the system bus. The bus tracer captures on-chip bus signals at ether cycle accurate or transaction levels and applies real-time compression to both levels of signals. With the performance monitoring tool and the bus tracer, the 3 application developer can easily analyze the communication of the components and fine tune the 3 application to optimize the entire SoC system performance and to satisfy performance/cost constrains on consumer electronics. Both of the hardware and software have been carefully verified and demonstrated on FPGA using ARM versatile SoC develop board. Index Terms 3 graphics pipeline, OpenGL ES, geometry engine, rendering engine, 3 graphics performance monitoring. I. INTROUCTION Accompanying with the improvement of silicon process the single chip has a capacity for some computation exhaustive jobs such like 3 graphics calculations. Hardware acceleration for 3 graphics processing is no longer only proper to the desktop PCs or workstations but also to the embedded system. In the 3 graphics field, it is a mature technology in the PC world. But for the consumer electronics how to build a 3 graphics chip with low cost, low power but enough performance has became a new challenge. The famous graphic IP provider Imagination Technologies TM [1] develops the PowerVR IP cores for graphics named MBX [2][3]. The integrated PowerVR solutions are used in a wide range of applications, from the most performance-hungry 3-enabled VB set-top box to the most power-sensitive portable media units. The Bitboys TM [4] also provide 3 and Vector Graphics Acceleration, called Acceleon TM, for Mobile Phones and Embedded evices. The ATI s [5] Imageon TM products accelerate 2 & 3 graphics, gaming, video applications for many mobile phones. Another IP provider, Falanx Microsystems TM [6], provides the Mali TM Graphics Solution of 3 graphics accelerator IP Cores that support OpenGL ES [7] v1.1 and H.264 functionality and targets different performance, power and die size levels. The GSHARK-TAKUMI's [8] product family lineups graphics engine IP cores for embedded systems and personal information devices such as mobile phones that performing real time display of 3 graphics And also the academic field, there are several outstanding implementations of 3 graphics for consumer electronics [9-15]. [9, 13, 14] presented very high performance 3 graphics SoCs with programmability that can fully support OpenGL ES standard API. And another research team [12] also implemented a low power 3 graphics engine using RAM fabrication process that provides very low power consumption. Although several 3 graphics hardware accelerators have been proposed for consumer electronics, a careful balance between the application features and the
2 hardware cost is still a challenging problem. This paper presents a 3 graphics engine which is specifically designed to minimize the hardware cost while providing sufficient computing capability for consumer electronics with small to medium screen sizes (up to 800x600) such as digital television. On the other hand, it is hard to measure the run time 3 hardware performance. The AMBA-based SoC platform was selected to integrate the presented 3 graphics engine because the AMBA is the most popular system bus for SoC design. This engine provides a standard AHB interface that makes it easily to be integrated into an AMBA-based SoC. In addition, a built-in real-time AHB bus tracer is provided to monitor the bus activities of the 3 engine and other components on the system bus such that the 3 application developer can fine tune the 3 application to optimize the entire SoC system performance. Furthermore, the development of the 3 engine has gone through a rigorous design process: starting from system-c modeling to test chip fabrication. In addition, a complete OpenGL-ES 1.1 library, windowing system, Linux operating system, device drivers and a 3 performance monitoring tool have been developed for our 3 engine. The 3 engine provides 8.34M vertices/s and 278M pixels/s in maximum performance at 139 MHz which is targeting at applications of the digital television, and the area cost is about 987K logic gates including the geometry engine(295k), rendering engine(395k) and bus tracer(71k). II. SYSTEM MOELING AN PERFORMANCE ANALYSIS Before hardware implementation, we build the hardware models using system-c and integrate into the Coware TM Platform Architect [16] which is a System-C based simulation and analysis tool with graphical user interface. Figure 1 System model overview on Platform Architect 3 Chip There is a performance analysis after the system model was build. This performance can tell us that where the performance bottleneck is and can also verify the hardware functions and timing behavior. Figure 2 shows the performance analysis result example of system modeling. The benchmark is 100 cubes include 1200 vertices and render 4 frames with hardware acceleration. It shows the total execution time of each component. Thus we can find out the performance bottleneck. In this example, the Geometry Module (dark green color) and Rendering Module (purple color) are using hardware but Tile ivider is using software implementation. And the ratio of accelerator execution time to total execution time is very small. The green color is the CPU execution time that means the software performance. The result showed that most of the execution time is using for software that performs initial memory (clear frame buffer) and the software tile dividing. Thus we decide to add a hardware tile divider. After adding hardware tile divider model and compare to the software implementation so that we can know the performance effect and costs of the hardware tile divider and so as other hardware components. GM RM IAHB AHB Total time : 110,508,000cycle Platform : CPU + GM + RM in burst mode 501,486 GM Initial Memory 68,491,400 RM Tile ivider Benchmark : 12 triangle (Cube) *100 Four frame RM GM Initial Memory 36,591,000 Initial memory GM Tile ivider RM Memory Module : SRAM Priority GM > RM > CPU 1,430,370 Tile ivider Figure 2 Performance analysis on Platform Architect While building the system model and integrating into Platform Architect, the interface of each component is exactly the same with the real hardware. It is important because that the Platform Architect is not only for simulation and system modeling, but it can also do the hardware and software co-simulation for functional verification. We can reuse the system platform and have a very fast functional verification by only replacing the models to real hardware RTL designs. III. HARWARE/SOFTWARE IMPLEMENTATION Figure 3 is the proposed 3 graphics engine block diagram. This engine integrates geometry and rendering
3 engines for 3 computation, a bus tracer that provides the on-chip bus activity trace and a standard AHB interface that can connect to AMBA. The feature of this design is that the chip provides a complete AHB bus that makes itself a SoC platform so that users can add hardware into it and also provides the interface to connect to another platform as a component. 3 Graphics Engine Test Chip Geometry Engine Rendering Engine AMBA AHB Bus Tracer Master Interface External Master Input Next pixel Input M U X Pixel info. REGs REGs Interpolation parameter REGs REGs M U X S E L E C T O R PE1 PE2 PE3 PE4 M U X OUTPUT Arbiter ecoder Slave Interface Figure 3 3 graphics engine architecture External Slave A. Hardware evelopment According to the OpenGL ES, we divide the 3 pipeline into two operations. The first is geometry transform that transforms 3 vertices, normals, and texture coordinates to produce a primitive drawing passes to the second operation rendering. The rendering operation includes two sub functions called rasterization and pre-fragment. Rasterization converts the primitive drawing to a two-dimensional image and assigns a color and a depth. Pre-fragment modifies the pixel, produced by rasterization with window coordinates, in the framebuffer by series of test and then displays on screen. Thus the hardware has two main modules to provide geometry transform and rendering. The Geometry Module (GM) includes culling, clipping, lighting, model view transformation, view transformation, projection transformation, and vertex normal transformation. Because of such complex calculation process, we divide the geometry module into 3 pipeline stages and each stage runs 16. To reduce the area cost we try to reuse the hardware as more as possible. The Rendering Module adopts a tiled-based approach in order to reduce memory requirement. The tile-based RM consists of two sub modules. The rasterizers perform the scan conversion to fill the triangles with colors and then passes the image to the Pre-fragment Operation Units to draw the image into the frame buffer that will display to the monitor. There can be four rasterizers that perform efficient scan conversion operations to gain more performance. If the area cost is critical, the rasterizers can be reduced and there will be only a little performance drop. Actually in our final chip only uses two rasterizers Figure 4 Rendering engine block diagram An AMBA Multi-Resolution Trace Analyzer is built in this chip that provides different observe resolutions to trace the bus activities. As shown in figure 5, there are two abstraction levels, which are the timing and signal separately. For trace timing, the bus activities can be sampled in cycle level or transaction level. The cycle level means that the signals will be traced cycle by cycle. On contrast, the transaction level samples the signals only when the bus has a transaction. On the other hand, the complex bus signals are also abstract to three levels such as all signals level bus state level, and master operation level. On all signals level, the trace analyzer records all bus signals at cycle or transaction time. As the level arises to the bus level, it will record the bus behavior rather than every detail signals. Finally, on the master operation level, we will only record the signals while a master access data. After combine sample timing and signal abstraction, the trace analyzer supports five different trace modes. esigners can change trace mode dynamically during program execution at any time. For example, in mode 1 it records every bus signals cycle by cycle and it only samples partial bus signals like address, data and etc. at each transaction in mode 4. The bus tracer greatly reduces the trace size, ranging from 78% to 98% depending on the selected mode. Figure 5 Abstraction definition of bus tracer
4 B. Software evelopment This project plans to provide a 3 total solution for consumer electronics. The software can not be neglected. We need to provide the hardware device driver, OpenGL ES API and a real-time performance monitoring tool. The first thing for software implementation is porting the Linux kernel to the chosen platform, which is the ARM RealView Versatile [17] family of boards. The linux kernel has ported on the development board. And then the pure software implemented OpenGL ES API was developed as the base contrast for the performance analysis and hardware verification. There are 70 OpenGL ES functions to be implemented and after some main functions had finished we start to work with hardware team to implement the device driver simultaneously. While the device driver can successfully drive the 3 hardware, we start to develop the GPTT (Graphics Performance Tuning Tool). This tool can provide the performance information to the remote PC via network. The performance information includes frame rate, triangles/vertices per second, pixels per second, memory usage, CPU time for each functions, and etc. These will help the programmer to find the performance bottleneck of their 3 applications and then adjust it to fit the required performance. The device driver also provides the interface for GPTT that can communicate with hardware to provide hardware performance information to the GPTT and then it becomes a real-time hardware performance monitoring tool that could help to find the hardware performance bottleneck and optimize it to reach higher performance. Figure 6 shows the demonstration of the GPTT. A 3 application is running at the target platform (ARM versatile) using the provided OpenGL ES API and the GPTT can receive the performance from target versatile and display the result on the remote PC. Figure 6 Screen shot of GPTT demonstration The target platform performs 3 applications. The GPTT can display FPS, memory usage and etc. performance information to help programmer to refine their 3 software. C. Hardware/Software Integration To integrate this complex hardware and software system is not an easy thing. The first thing to integration is to define the hardware control flow and the interface between hardware and software before implementation. Then we build the SoC platform that includes CPU, AMBA and SRAM to integrate the 3 hardware components to work with OpenGL ES API. The hardware control flow has to operate in coordination with OpenGL ES API. Thus the hardware operation authority is controlled by API through device driver. The 3 hardware is mapped to the memory space and the API can freely set the hardware control registers to command the hardware operation. After defining the control and interface, we start to integrate the 3 chip into to the Easy platform [18] which is proposed by ARM that support standard AMBA AHB bus. Because the 3 chip already has the on-chip bus, we need an AHB-to-AHB bridge to connect to the Easy platform. Figure shows the 3 chip integration architecture. For chip verification and tape out, the 3 chip includes core and the I/O pads that make the simulation close the real case. The Easy platform provides standard AHB interface and protocol to our chip. With this platform we can actually run a real 3 program and verify our hardware and software. ARM Easy Platform Memory (3 Test Program) Arbiter MI Bridge Slave Figure 7 3 chip integration platform 3 Chip SI Bridge Master ecoder IV. VERIFICATION AN CHIP FABRICATION To verify this complex system is another struggle job. It is easily to verify the functions small components while the function is simpler, but it becomes very tough after integration because of the complex operations and functions. Even though the sub modules has been verified perfectly no one could ensure that it will work while putting them together.
5 A good test plan is very important. It will not only provide a good verification quality but also reduce the verification time for time to market. We propose a program level and cross verification method to increase the verification quality. There are three platforms that have been used during our design: the System-C model, the Easy platform and the versatile development board. We can reuse these platforms and cooperate to verify the chip. Firstly, we synchronize the 3 chip model, RTL design, gate-level and FPGA design using exactly the same interface with real chip so that we can change designs between these platforms. Secondly, we synchronize these three platforms that can use the same test programs. It will save a lot of time that we can reuse the test programs rather than different programs for different platforms. Another benefit for these synchronized platforms is that they can generate patterns to each other. For example, the System-C can easily dump the signals for the chip as cycle level test patterns. The Easy platform can dump the PA signals as the pattern for chip testing. And also the Versatile with FPGA can fast dump the frame buffer result as the golden pattern for System-C and Easy platform. By cross verification we can not only increase the verification quality but also reduce the verification time. 3 application example: 9 rotating objects based on OpenGL ES API 3 Graphics Engine ARM926EJ-S runs Linux Kernel, Application, OpenGL-ES API and Windowing System Figure 8 3 chip demonstration on ARM Versatile FPGA platform Figure 8 shows the demonstration result on ARM Versatile. This demonstration draws 9 colored cubes on the screen and then performs the rotate action and lighting. And these actions all worked perfectly. Finally a whole system demonstration is presented. The last thing for verification is the whole system demonstration which includes OS kernel, real 3 applications, OpenGL ES API, device driver and 3 engine. As shown in figure 8, this is a series of complex sceneries with a lot of complex objects. The screen resolution is , and over 330 thousand vertices in total are drawn. The draw effect includes fog, lighting and texture. Figure 8 Whole system demonstration There is a particular design of the I/O ports. This chip supports AMBA AHB I/O port to connect to ARM Versatile and makes us to test the system by easily replace the FPGA board. Thus we can reuse the software and do not need to build another test board that saves a lot of testing efforts. This 3 graphics SoC also supports single chip mode. The implemented 3 engine integrates about 3M transistors and occupies mm 2 die area using the TSMC 0.18 μm 1P6M CMOS process and consumes 400 mw. The core voltage is 1.8V and I/O cells voltage is 3.3V. The 3 graphics IP runs at 139MHz. V. CONCLUSIONS In this paper, we introduce a 3 graphics SoC for consumer electronics. We propose the total solution for 3 graphics from porting OS, device driver, OpenGL ES APT to 3 chip. A complete SoC design flow is presented. This flow includes specification definition, system modeling, performance analysis, hardware and software implementation and integration, chip verification and tape out. A cross verification method using different platforms is also presented to increase verification quality and reduce verification time by reuse test programs and test pattern generation. The 3 graphics engine provides three features to achieve more convenience for usage and integration. Firstly, a complete system bus, AMBA AHB, is included in the chip and also provides additional AHB interface to gain the best integration ability. Secondly, the 3 graphics chip is designed fully support OpenGL-ES. With this ability makes it easy to develop advanced 3 graphics application or to transplant games from other platform. Thirdly, a real-time bus
6 tracer and a performance monitor are also embedded in this chip. The performance monitor can work with the GPTT that provides real-time 3 hardware performance, and the bus tracer can record complete bus activity information for the advanced system debugging and monitoring. The whole system has been verified on the FPGA development board. The maximum performance of the 3 chip is 8.34M vertices/s and 278M pixels/s at 139 MHz. The chip area is 987K gates. This chip is now fabrication and will be tested in July when it is back. REFERENCES [1] Imagination Technologies, Ltd. [2] ARM MBX HR-S 3 Graphics Core - Technical Overview, ARM Ltd. and Imagination Technologies Ltd., [3] Ashley Stevens, ARM 3 Graphics Solutions, ARM Ltd. and Imagination Technologies Ltd. [4] Bitboys Ltd., [5] ATI Technologies Inc., [6] Falanx MALI series specification [Online]. Available: [7] avid Blythe and Aaftab Munshi, OpenGL ES Common/Common-Lite Profile Specification, Khronos Group, Inc., [Online] Available: [8] TAKUMI Corporation, [Online]. Available: [9] Yong-Ha Park et al., A 7.1-GB/s low-power rendering engine in 2- array-embedded memory logic CMOS for portable multimedia system, IEEE J. Solid-State Circuits, Vol 36, pp , June [10] R. Woo et al., A 210mW Graphics LSI Implementing Full 3 Pipeline with 264Mtexels/s Texturing for Mobile Multimedia Applications, ISSCC ig. Tech. Papers, pp , Feb [11] M. Imai et al., A 109.5mW 1.2V 600M texels/s 3 Graphics Engine, ISSCC ig. Tech. Papers, pp , Feb [12] R. Woo et al., A low-power 3 rendering engine with two texture units and 29-Mb embedded RAM for 3G multimedia terminals, IEEE J. Solid-State Circuits, Volume: 39, Issue: 7, pp: , July 2004 [13] J. H. Sohn et al., A 50 Mvertices/s graphics processor with fixed-point programmable vertex shader for mobile applications, IEEE ISSCC ig. Tech. Papers, pp , Feb [14]. Kim et al., An SoC with 1.3 Gtexels/sec 3- graphics full pipeline for consumer applications, IEEE ISSCC ig. Tech. Papers, pp , Feb [15] onghyun Kim et al., An SoC with 1.3 gtexels/s 3- graphics full pipeline for consumer applications, IEEE J. Solid-State Circuits, Vol. 41, pp , Jan [16] CoWare Inc., [17] Platform Baseboard for ARM926EJ-S User Guide, ARM Ltd. [18] AMBA esign Kit,
SystemC-Based Design Space Exploration of a 3D Graphics Acceleration SoC for Consumer Electronics
SystemC-Based Design Space Exploration of a 3D Graphics Acceleration SoC for Consumer Electronics Tse-Chen Yeh, Tsung-Yu Ho, Hung-Yu Chen, and Ing-Jer Huang Department of Computer Science and Engineering,
More information2D/3D Graphics Accelerator for Mobile Multimedia Applications. Ramchan Woo, Sohn, Seong-Jun Song, Young-Don
RAMP-IV: A Low-Power and High-Performance 2D/3D Graphics Accelerator for Mobile Multimedia Applications Woo, Sungdae Choi, Ju-Ho Sohn, Seong-Jun Song, Young-Don Bae,, and Hoi-Jun Yoo oratory Dept. of EECS,
More information3-D Accelerator on Chip
3-D Accelerator on Chip Third Prize 3-D Accelerator on Chip Institution: Participants: Instructor: Donga & Pusan University Young-Hee Won, Jin-Sung Park, Woo-Sung Moon Sam-Hak Jin Design Introduction Recently,
More informationOptimizing Games for ATI s IMAGEON Aaftab Munshi. 3D Architect ATI Research
Optimizing Games for ATI s IMAGEON 2300 Aaftab Munshi 3D Architect ATI Research A A 3D hardware solution enables publishers to extend brands to mobile devices while remaining close to original vision of
More informationFalanx Microsystems. Company Overview
Image Quality no compromise Company Falanx Overview Microsystems Company Overview Design and license silicon graphics IP cores targeted at mobile phones and system-on-chip Core Competencies Computer Graphics
More informationArchitectures. Michael Doggett Department of Computer Science Lund University 2009 Tomas Akenine-Möller and Michael Doggett 1
Architectures Michael Doggett Department of Computer Science Lund University 2009 Tomas Akenine-Möller and Michael Doggett 1 Overview of today s lecture The idea is to cover some of the existing graphics
More informationA 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications
A 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications Ju-Ho Sohn, Jeong-Ho Woo, Min-Wuk Lee, Hye-Jung Kim, Ramchan Woo, Hoi-Jun Yoo Semiconductor System
More informationMultimedia in Mobile Phones. Architectures and Trends Lund
Multimedia in Mobile Phones Architectures and Trends Lund 091124 Presentation Henrik Ohlsson Contact: henrik.h.ohlsson@stericsson.com Working with multimedia hardware (graphics and displays) at ST- Ericsson
More informationMali-400 MP: A Scalable GPU for Mobile Devices Tom Olson
Mali-400 MP: A Scalable GPU for Mobile Devices Tom Olson Director, Graphics Research, ARM Outline ARM and Mobile Graphics Design Constraints for Mobile GPUs Mali Architecture Overview Multicore Scaling
More informationPowerVR Hardware. Architecture Overview for Developers
Public Imagination Technologies PowerVR Hardware Public. This publication contains proprietary information which is subject to change without notice and is supplied 'as is' without warranty of any kind.
More informationPOWERVR MBX & SGX OpenVG Support and Resources
POWERVR MBX & SGX OpenVG Support and Resources Kristof Beets 3 rd Party Relations Manager - Imagination Technologies kristof.beets@imgtec.com Copyright Khronos Group, 2006 - Page 1 Copyright Khronos Group,
More informationCase 1:17-cv SLR Document 1-3 Filed 01/23/17 Page 1 of 33 PageID #: 60 EXHIBIT C
Case 1:17-cv-00064-SLR Document 1-3 Filed 01/23/17 Page 1 of 33 PageID #: 60 EXHIBIT C Case 1:17-cv-00064-SLR Document 1-3 Filed 01/23/17 Page 2 of 33 PageID #: 61 U.S. Patent No. 7,633,506 VIZIO / Sigma
More informationMulticore SoC is coming. Scalable and Reconfigurable Stream Processor for Mobile Multimedia Systems. Source: 2007 ISSCC and IDF.
Scalable and Reconfigurable Stream Processor for Mobile Multimedia Systems Liang-Gee Chen Distinguished Professor General Director, SOC Center National Taiwan University DSP/IC Design Lab, GIEE, NTU 1
More informationScanline-based rendering of 2D vector graphics
Scanline-based rendering of 2D vector graphics Sang-Woo Seo 1, Yong-Luo Shen 1,2, Kwan-Young Kim 3, and Hyeong-Cheol Oh 4a) 1 Dept. of Elec. & Info. Eng., Graduate School, Korea Univ., Seoul 136 701, Korea
More informationGraphics Processing Unit Architecture (GPU Arch)
Graphics Processing Unit Architecture (GPU Arch) With a focus on NVIDIA GeForce 6800 GPU 1 What is a GPU From Wikipedia : A specialized processor efficient at manipulating and displaying computer graphics
More informationModule Introduction. Content 15 pages 2 questions. Learning Time 25 minutes
Purpose The intent of this module is to introduce you to the multimedia features and functions of the i.mx31. You will learn about the Imagination PowerVR MBX- Lite hardware core, graphics rendering, video
More informationGraphics Hardware. Instructor Stephen J. Guy
Instructor Stephen J. Guy Overview What is a GPU Evolution of GPU GPU Design Modern Features Programmability! Programming Examples Overview What is a GPU Evolution of GPU GPU Design Modern Features Programmability!
More informationOverview. Technology Details. D/AVE NX Preliminary Product Brief
Overview D/AVE NX is the latest and most powerful addition to the D/AVE family of rendering cores. It is the first IP to bring full OpenGL ES 2.0/3.1 rendering to the FPGA and SoC world. Targeted for graphics
More informationCMP Conference 20 th January Director of Business Development EMEA
CMP Conference 20 th January 2011 eric.lalardie@arm.com Director of Business Development EMEA +33 6 07 83 09 60 1 1 Unparalleled Applicability ARM Cortex Advanced Processors Architectural innovation, compatibility
More informationVLSI Design of Multichannel AMBA AHB
RESEARCH ARTICLE OPEN ACCESS VLSI Design of Multichannel AMBA AHB Shraddha Divekar,Archana Tiwari M-Tech, Department Of Electronics, Assistant professor, Department Of Electronics RKNEC Nagpur,RKNEC Nagpur
More informationNext Generation OpenGL Neil Trevett Khronos President NVIDIA VP Mobile Copyright Khronos Group Page 1
Next Generation OpenGL Neil Trevett Khronos President NVIDIA VP Mobile Ecosystem @neilt3d Copyright Khronos Group 2015 - Page 1 Copyright Khronos Group 2015 - Page 2 Khronos Connects Software to Silicon
More informationARM Processors for Embedded Applications
ARM Processors for Embedded Applications Roadmap for ARM Processors ARM Architecture Basics ARM Families AMBA Architecture 1 Current ARM Core Families ARM7: Hard cores and Soft cores Cache with MPU or
More informationVertex Shader Design I
The following content is extracted from the paper shown in next page. If any wrong citation or reference missing, please contact ldvan@cs.nctu.edu.tw. I will correct the error asap. This course used only
More informationHardware-driven visibility culling
Hardware-driven visibility culling I. Introduction 20073114 김정현 The goal of the 3D graphics is to generate a realistic and accurate 3D image. To achieve this, it needs to process not only large amount
More informationDave Shreiner, ARM March 2009
4 th Annual Dave Shreiner, ARM March 2009 Copyright Khronos Group, 2009 - Page 1 Motivation - What s OpenGL ES, and what can it do for me? Overview - Lingo decoder - Overview of the OpenGL ES Pipeline
More informationThe Challenges of System Design. Raising Performance and Reducing Power Consumption
The Challenges of System Design Raising Performance and Reducing Power Consumption 1 Agenda The key challenges Visibility for software optimisation Efficiency for improved PPA 2 Product Challenge - Software
More informationEffective System Design with ARM System IP
Effective System Design with ARM System IP Mentor Technical Forum 2009 Serge Poublan Product Marketing Manager ARM 1 Higher level of integration WiFi Platform OS Graphic 13 days standby Bluetooth MP3 Camera
More informationA Low Power Multimedia SoC with Fully Programmable 3D Graphics and MPEG4/H.264/JPEG for Mobile Devices
A Low Power Multimedia SoC with Fully Programmable 3D Graphics and MPEG4/H.264/JPEG for Mobile Devices Jeong-Ho Woo, Ju-Ho Sohn, Hyejung Kim, Jongcheol Jeong 1, Euljoo Jeong 1, Suk Joong Lee 1 and Hoi-Jun
More informationBringing it all together: The challenge in delivering a complete graphics system architecture. Chris Porthouse
Bringing it all together: The challenge in delivering a complete graphics system architecture Chris Porthouse System Integration & the role of standards Content Ecosystem Java Execution Environment Native
More informationLPGPU Workshop on Power-Efficient GPU and Many-core Computing (PEGPUM 2014)
A practitioner s view of challenges faced with power and performance on mobile GPU Prashant Sharma Samsung R&D Institute UK LPGPU Workshop on Power-Efficient GPU and Many-core Computing (PEGPUM 2014) SERI
More informationGraphics, Mobile Computing, APIs and Life
Graphics, Mobile Computing, APIs and Life Dave Shreiner Director, Graphics and GPU Computing ARM, Inc. 12 November 2012 1 Agenda ARM and the IP Business That Computer in your Pocket Graphics: Techniques
More informationProfiling and Debugging Games on Mobile Platforms
Profiling and Debugging Games on Mobile Platforms Lorenzo Dal Col Senior Software Engineer, Graphics Tools Gamelab 2013, Barcelona 26 th June 2013 Agenda Introduction to Performance Analysis with ARM DS-5
More informationIMPROVES. Initial Investment is Low Compared to SoC Performance and Cost Benefits
NOC INTERCONNECT IMPROVES SOC ECONO CONOMICS Initial Investment is Low Compared to SoC Performance and Cost Benefits A s systems on chip (SoCs) have interconnect, along with its configuration, verification,
More informationGeForce4. John Montrym Henry Moreton
GeForce4 John Montrym Henry Moreton 1 Architectural Drivers Programmability Parallelism Memory bandwidth 2 Recent History: GeForce 1&2 First integrated geometry engine & 4 pixels/clk Fixed-function transform,
More informationOptimizing and Profiling Unity Games for Mobile Platforms. Angelo Theodorou Senior Software Engineer, MPG Gamelab 2014, 25 th -27 th June
Optimizing and Profiling Unity Games for Mobile Platforms Angelo Theodorou Senior Software Engineer, MPG Gamelab 2014, 25 th -27 th June 1 Agenda Introduction ARM and the presenter Preliminary knowledge
More informationISSCC 2001 / SESSION 9 / INTEGRATED MULTIMEDIA PROCESSORS / 9.2
ISSCC 2001 / SESSION 9 / INTEGRATED MULTIMEDIA PROCESSORS / 9.2 9.2 A 80/20MHz 160mW Multimedia Processor integrated with Embedded DRAM MPEG-4 Accelerator and 3D Rendering Engine for Mobile Applications
More informationRendering Objects. Need to transform all geometry then
Intro to OpenGL Rendering Objects Object has internal geometry (Model) Object relative to other objects (World) Object relative to camera (View) Object relative to screen (Projection) Need to transform
More informationCS450/550. Pipeline Architecture. Adapted From: Angel and Shreiner: Interactive Computer Graphics6E Addison-Wesley 2012
CS450/550 Pipeline Architecture Adapted From: Angel and Shreiner: Interactive Computer Graphics6E Addison-Wesley 2012 0 Objectives Learn the basic components of a graphics system Introduce the OpenGL pipeline
More informationPowerVR Series5. Architecture Guide for Developers
Public Imagination Technologies PowerVR Series5 Public. This publication contains proprietary information which is subject to change without notice and is supplied 'as is' without warranty of any kind.
More informationGraphics Architectures and OpenCL. Michael Doggett Department of Computer Science Lund university
Graphics Architectures and OpenCL Michael Doggett Department of Computer Science Lund university Overview Parallelism Radeon 5870 Tiled Graphics Architectures Important when Memory and Bandwidth limited
More informationMobile Graphics Ecosystem. Tom Olson OpenGL ES working group chair
OpenGL ES in the Mobile Graphics Ecosystem Tom Olson OpenGL ES working group chair Director, Graphics Research, ARM Ltd 1 Outline Why Mobile Graphics? OpenGL ES Overview Getting Started with OpenGL ES
More informationSpring 2011 Prof. Hyesoon Kim
Spring 2011 Prof. Hyesoon Kim Application Geometry Rasterizer CPU Each stage cane be also pipelined The slowest of the pipeline stage determines the rendering speed. Frames per second (fps) Executes on
More informationLecture 2. Shaders, GLSL and GPGPU
Lecture 2 Shaders, GLSL and GPGPU Is it interesting to do GPU computing with graphics APIs today? Lecture overview Why care about shaders for computing? Shaders for graphics GLSL Computing with shaders
More informationSystem Verification of Hardware Optimization Based on Edge Detection
Circuits and Systems, 2013, 4, 293-298 http://dx.doi.org/10.4236/cs.2013.43040 Published Online July 2013 (http://www.scirp.org/journal/cs) System Verification of Hardware Optimization Based on Edge Detection
More informationISSCC 2006 / SESSION 22 / LOW POWER MULTIMEDIA / 22.1
ISSCC 26 / SESSION 22 / LOW POWER MULTIMEDIA / 22.1 22.1 A 125µW, Fully Scalable MPEG-2 and H.264/AVC Video Decoder for Mobile Applications Tsu-Ming Liu 1, Ting-An Lin 2, Sheng-Zen Wang 2, Wen-Ping Lee
More informationSYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS
SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS Embedded System System Set of components needed to perform a function Hardware + software +. Embedded Main function not computing Usually not autonomous
More informationFrom Concept to Silicon
From Concept to Silicon How an idea becomes a part of a new chip at ATI Richard Huddy ATI Research From Concept to Silicon Creating a new Visual Processing Unit (VPU) is a complex task involving many people
More informationSimulation and development environment for mobile 3D graphics architectures
SPECIAL SECTION ON ADVANCES IN ELECTRONICS SYSTEMS SIMULATION Simulation and development environment for mobile 3D graphics architectures W.-J. Lee, W.-C. Park, V.P. Srini and T.-D. Han Abstract: This
More informationARM Multimedia IP: working together to drive down system power and bandwidth
ARM Multimedia IP: working together to drive down system power and bandwidth Speaker: Robert Kong ARM China FAE Author: Sean Ellis ARM Architect 1 Agenda System power overview Bandwidth, bandwidth, bandwidth!
More informationBuilding scalable 3D applications. Ville Miettinen Hybrid Graphics
Building scalable 3D applications Ville Miettinen Hybrid Graphics What s going to happen... (1/2) Mass market: 3D apps will become a huge success on low-end and mid-tier cell phones Retro-gaming New game
More informationCS451Real-time Rendering Pipeline
1 CS451Real-time Rendering Pipeline JYH-MING LIEN DEPARTMENT OF COMPUTER SCIENCE GEORGE MASON UNIVERSITY Based on Tomas Akenine-Möller s lecture note You say that you render a 3D 2 scene, but what does
More informationMali Developer Resources. Kevin Ho ARM Taiwan FAE
Mali Developer Resources Kevin Ho ARM Taiwan FAE ARM Mali Developer Tools Software Development SDKs for OpenGL ES & OpenCL OpenGL ES Emulators Shader Development Studio Shader Library Asset Creation Texture
More informationSpring 2009 Prof. Hyesoon Kim
Spring 2009 Prof. Hyesoon Kim Application Geometry Rasterizer CPU Each stage cane be also pipelined The slowest of the pipeline stage determines the rendering speed. Frames per second (fps) Executes on
More informationAS THE MOBILE electronics market matures, third-generation
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39, NO. 7, JULY 2004 1101 A Low-Power 3-D Rendering Engine With Two Texture Units and 29-Mb Embedded DRAM for 3G Multimedia Terminals Ramchan Woo, Student Member,
More informationRendering. Converting a 3D scene to a 2D image. Camera. Light. Rendering. View Plane
Rendering Pipeline Rendering Converting a 3D scene to a 2D image Rendering Light Camera 3D Model View Plane Rendering Converting a 3D scene to a 2D image Basic rendering tasks: Modeling: creating the world
More informationApplications and Implementations
Copyright Khronos Group, 2010 - Page 1 Applications and Implementations Hwanyong LEE CTO and Technical Marketing Director HUONE System Integration Application Acceleration Authoring and accessibility Khronos
More informationWindowing System on a 3D Pipeline. February 2005
Windowing System on a 3D Pipeline February 2005 Agenda 1.Overview of the 3D pipeline 2.NVIDIA software overview 3.Strengths and challenges with using the 3D pipeline GeForce 6800 220M Transistors April
More informationDesign & Implementation of OCP on a On-Chip Bus
Design & Implementation of OCP on a On-Chip Bus K.Mounika Student, Department of ECE, Vidya Bharathi Institute of Technology. B.Ajay Kumar Yadidya, M.E Assistant Professor & Internal Guide, Department
More informationHot Chips Bringing Workstation Graphics Performance to a Desktop Near You. S3 Incorporated August 18-20, 1996
Hot Chips 1996 Bringing Workstation Graphics Performance to a Desktop Near You S3 Incorporated August 18-20, 1996 Agenda ViRGE/VX Marketing Slide! Overview of ViRGE/VX accelerator features 3D rendering
More informationEECS 487: Interactive Computer Graphics
EECS 487: Interactive Computer Graphics Lecture 21: Overview of Low-level Graphics API Metal, Direct3D 12, Vulkan Console Games Why do games look and perform so much better on consoles than on PCs with
More informationGRAPHIC RENDERING APPLICATION PROFILING ON A SHARED MEMORY MPSOC ARCHITECTURE. Matthieu Texier, Raphaël David, Karim Ben Chehida
GRAPHIC RENDERING APPLICATION PROFILING ON A SHARED MEMORY MPSOC ARCHITECTURE Matthieu Texier, Raphaël David, Karim Ben Chehida CEA, LIST, Embedded Computing Lab PC 94, F-91191 Gif-sur-Yvette Cedex Email:
More informationA SXGA 3D Display Processor with Reduced Rendering Data and Enhanced Precision
A SXGA 3D Display Processor with Reduced Rendering Data and Enhanced Precision Seok-Hoon Kim KAIST, Daejeon, Republic of Korea I. INTRODUCTION Recently, there has been tremendous progress in 3D graphics
More informationCopyright Khronos Group, Page Graphic Remedy. All Rights Reserved
Avi Shapira Graphic Remedy Copyright Khronos Group, 2009 - Page 1 2004 2009 Graphic Remedy. All Rights Reserved Debugging and profiling 3D applications are both hard and time consuming tasks Companies
More informationSIGGRAPH Briefing August 2014
Copyright Khronos Group 2014 - Page 1 SIGGRAPH Briefing August 2014 Neil Trevett VP Mobile Ecosystem, NVIDIA President, Khronos Copyright Khronos Group 2014 - Page 2 Significant Khronos API Ecosystem Advances
More informationOptimizing ARM SoC s with Carbon Performance Analysis Kits. ARM Technical Symposia, Fall 2014 Andy Ladd
Optimizing ARM SoC s with Carbon Performance Analysis Kits ARM Technical Symposia, Fall 2014 Andy Ladd Evolving System Requirements Processor Advances big.little Multicore Unicore DSP Cortex -R7 Block
More informationReal - Time Rendering. Pipeline optimization. Michal Červeňanský Juraj Starinský
Real - Time Rendering Pipeline optimization Michal Červeňanský Juraj Starinský Motivation Resolution 1600x1200, at 60 fps Hw power not enough Acceleration is still necessary 3.3.2010 2 Overview Application
More informationA System-Level Model of Design Space Exploration for a Tile-Based 3D Graphics SoC Refinement
IEICE TRANS. FUNDAMENTALS, VOL.E92 A, NO.12 DECEMBER 2009 3193 PAPER Special Section on VLSI Design and CAD Algorithms A System-Level Model of Design Space Exploration for a Tile-Based 3D Graphics SoC
More informationScanline Rendering 2 1/42
Scanline Rendering 2 1/42 Review 1. Set up a Camera the viewing frustum has near and far clipping planes 2. Create some Geometry made out of triangles 3. Place the geometry in the scene using Transforms
More informationDevelopment of a 3-D Graphics Rendering Engine with Lighting Acceleration for Handheld Multimedia Systems
1020 IEEE Transactions on Consumer Electronics, Vol. 51, No. 3, AUGUST 2005 Development of a 3-D Graphics Rendering Engine with Lighting Acceleration for Handheld Multimedia Systems Byeong-Gyu Nam, Min-wuk
More informationThe Graphics Pipeline
The Graphics Pipeline Ray Tracing: Why Slow? Basic ray tracing: 1 ray/pixel Ray Tracing: Why Slow? Basic ray tracing: 1 ray/pixel But you really want shadows, reflections, global illumination, antialiasing
More informationHotChips An innovative HD video and digital image processor for low-cost digital entertainment products. Deepu Talla.
HotChips 2007 An innovative HD video and digital image processor for low-cost digital entertainment products Deepu Talla Texas Instruments 1 Salient features of the SoC HD video encode and decode using
More informationWebGL (Web Graphics Library) is the new standard for 3D graphics on the Web, designed for rendering 2D graphics and interactive 3D graphics.
About the Tutorial WebGL (Web Graphics Library) is the new standard for 3D graphics on the Web, designed for rendering 2D graphics and interactive 3D graphics. This tutorial starts with a basic introduction
More informationDigital Blocks Semiconductor IP
Digital Blocks Semiconductor IP General Description The Digital Blocks LCD Controller IP Core interfaces a video image in frame buffer memory via the AMBA 3.0 / 4.0 AXI Protocol Interconnect to a 4K and
More informationParallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)
Lecture 2: Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload) Visual Computing Systems Analyzing a 3D Graphics Workload Where is most of the work done? Memory Vertex
More informationHands-On Workshop: 3D Automotive Graphics on Connected Radios Using Rayleigh and OpenGL ES 2.0
Hands-On Workshop: 3D Automotive Graphics on Connected Radios Using Rayleigh and OpenGL ES 2.0 FTF-AUT-F0348 Hugo Osornio Luis Olea A P R. 2 0 1 4 TM External Use Agenda Back to the Basics! What is a GPU?
More informationCase 1:17-cv SLR Document 1-4 Filed 01/23/17 Page 1 of 30 PageID #: 75 EXHIBIT D
Case 1:17-cv-00065-SLR Document 1-4 Filed 01/23/17 Page 1 of 30 PageID #: 75 EXHIBIT D Case 1:17-cv-00065-SLR Document 1-4 Filed 01/23/17 Page 2 of 30 PageID #: 76 U.S. Patent No. 7,633,506 LG / MediaTek
More informationApplications and Implementations
Copyright Khronos Group, 2010 - Page 1 Applications and Implementations Hwanyong LEE CTO and Technical Marketing Director HUONE OpenVG Royalty-free open standard API Low-level 2D vector graphics rendering
More informationCornell University CS 569: Interactive Computer Graphics. Introduction. Lecture 1. [John C. Stone, UIUC] NASA. University of Calgary
Cornell University CS 569: Interactive Computer Graphics Introduction Lecture 1 [John C. Stone, UIUC] 2008 Steve Marschner 1 2008 Steve Marschner 2 NASA University of Calgary 2008 Steve Marschner 3 2008
More informationPowerVR: Getting Great Graphics Performance with the PowerVR Insider SDK. PowerVR Developer Technology
PowerVR: Getting Great Graphics Performance with the PowerVR Insider SDK PowerVR Developer Technology Company Overview Leading silicon, software & cloud IP supplier Graphics, video, comms, processor, cloud
More informationReal-Time Rendering (Echtzeitgraphik) Michael Wimmer
Real-Time Rendering (Echtzeitgraphik) Michael Wimmer wimmer@cg.tuwien.ac.at Walking down the graphics pipeline Application Geometry Rasterizer What for? Understanding the rendering pipeline is the key
More informationX. GPU Programming. Jacobs University Visualization and Computer Graphics Lab : Advanced Graphics - Chapter X 1
X. GPU Programming 320491: Advanced Graphics - Chapter X 1 X.1 GPU Architecture 320491: Advanced Graphics - Chapter X 2 GPU Graphics Processing Unit Parallelized SIMD Architecture 112 processing cores
More informationBifrost - The GPU architecture for next five billion
Bifrost - The GPU architecture for next five billion Hessed Choi Senior FAE / ARM ARM Tech Forum June 28 th, 2016 Vulkan 2 ARM 2016 What is Vulkan? A 3D graphics API for the next twenty years Logical successor
More informationThis Unit: Putting It All Together. CIS 371 Computer Organization and Design. What is Computer Architecture? Sources
This Unit: Putting It All Together CIS 371 Computer Organization and Design Unit 15: Putting It All Together: Anatomy of the XBox 360 Game Console Application OS Compiler Firmware CPU I/O Memory Digital
More informationDigital Blocks Semiconductor IP
Digital Blocks Semiconductor IP TFT Controller General Description The Digital Blocks TFT Controller IP Core interfaces a microprocessor and frame buffer memory via the AMBA 2.0 to a TFT panel. In an FPGA,
More informationGrafica Computazionale: Lezione 30. Grafica Computazionale. Hiding complexity... ;) Introduction to OpenGL. lezione30 Introduction to OpenGL
Grafica Computazionale: Lezione 30 Grafica Computazionale lezione30 Introduction to OpenGL Informatica e Automazione, "Roma Tre" May 20, 2010 OpenGL Shading Language Introduction to OpenGL OpenGL (Open
More informationPerformance OpenGL Programming (for whatever reason)
Performance OpenGL Programming (for whatever reason) Mike Bailey Oregon State University Performance Bottlenecks In general there are four places a graphics system can become bottlenecked: 1. The computer
More informationDeveloping the Bifrost GPU architecture for mainstream graphics
Developing the Bifrost GPU architecture for mainstream graphics Anand Patel Senior Product Manager, Media Processing Group ARM Tech Symposia India December 7 th 2016 Graphics processing drivers Virtual
More informationThe Application Stage. The Game Loop, Resource Management and Renderer Design
1 The Application Stage The Game Loop, Resource Management and Renderer Design Application Stage Responsibilities 2 Set up the rendering pipeline Resource Management 3D meshes Textures etc. Prepare data
More informationFABRICATION TECHNOLOGIES
FABRICATION TECHNOLOGIES DSP Processor Design Approaches Full custom Standard cell** higher performance lower energy (power) lower per-part cost Gate array* FPGA* Programmable DSP Programmable general
More informationOpenGL on Android. Lecture 7. Android and Low-level Optimizations Summer School. 27 July 2015
OpenGL on Android Lecture 7 Android and Low-level Optimizations Summer School 27 July 2015 This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this
More informationCSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller
Entertainment Graphics: Virtual Realism for the Masses CSE 591: GPU Programming Introduction Computer games need to have: realistic appearance of characters and objects believable and creative shading,
More informationStructure. Woo-Chan Park, Kil-Whan Lee, Seung-Gi Lee, Moon-Hee Choi, Won-Jong Lee, Cheol-Ho Jeong, Byung-Uck Kim, Woo-Nam Jung,
A High Performance 3D Graphics Rasterizer with Effective Memory Structure Woo-Chan Park, Kil-Whan Lee, Seung-Gi Lee, Moon-Hee Choi, Won-Jong Lee, Cheol-Ho Jeong, Byung-Uck Kim, Woo-Nam Jung, Il-San Kim,
More informationThe Rendering Pipeline (1)
The Rendering Pipeline (1) Alessandro Martinelli alessandro.martinelli@unipv.it 30 settembre 2014 The Rendering Pipeline (1) Rendering Architecture First Rendering Pipeline Second Pipeline: Illumination
More informationAn Efficient Multi Mode and Multi Resolution Based AHB Bus Tracer
An Efficient Multi Mode and Multi Resolution Based AHB Bus Tracer Abstract: Waheeda Begum M.Tech, VLSI Design & Embedded System, Department of E&CE, Lingaraj Appa Engineering College, Bidar. On-Chip program
More informationARM System-Level Modeling. Platform constructed from welltested
ARM System-Level Modeling Jon Connell Version 1.0, June 25, 2003 Abstract Embedded hardware and software design tools often work under the assumption that designers will have full visibility into the implementation
More informationModule 13C: Using The 3D Graphics APIs OpenGL ES
Module 13C: Using The 3D Graphics APIs OpenGL ES BREW TM Developer Training Module Objectives See the steps involved in 3D rendering View the 3D graphics capabilities 2 1 3D Overview The 3D graphics library
More informationL10 Layered Depth Normal Images. Introduction Related Work Structured Point Representation Boolean Operations Conclusion
L10 Layered Depth Normal Images Introduction Related Work Structured Point Representation Boolean Operations Conclusion 1 Introduction Purpose: using the computational power on GPU to speed up solid modeling
More informationDesign and Implementation of High Performance Application Specific Memory
Design and Implementation of High Performance Application Specific Memory - 고성능 Application Specific Memory 의설계와구현 - M.S. Thesis Sungdae Choi Dec. 20th, 2002 Outline Introduction Memory for Mobile 3D Graphics
More informationPowerVR Performance Recommendations. The Golden Rules
PowerVR Performance Recommendations Copyright Imagination Technologies Limited. All Rights Reserved. This publication contains proprietary information which is subject to change without notice and is supplied
More informationE.Order of Operations
Appendix E E.Order of Operations This book describes all the performed between initial specification of vertices and final writing of fragments into the framebuffer. The chapters of this book are arranged
More information