1. Introduction 2. Methods for I/O Operations 3. Buses 4. Liquid Crystal Displays 5. Other Types of Displays 6. Graphics Adapters 7.

Size: px
Start display at page:

Download "1. Introduction 2. Methods for I/O Operations 3. Buses 4. Liquid Crystal Displays 5. Other Types of Displays 6. Graphics Adapters 7."

Transcription

1 1. Introduction 2. Methods for I/O Operations 3. Buses 4. Liquid Crystal Displays 5. Other Types of Displays 6. Graphics Adapters 7. Optical Discs 1

2 Structure of a Graphics Adapter Video Memory Graphics Accelerators 3D Accelerators Graphics Processing Units Digital Interfaces for Monitors 2

3 3

4 Graphics Controller Implements the main functions of the graphics adapter System Bus Interface Transfers in burst mode Transfers with no wait states when reading the video memory FIFO memory for efficient write to the video memory 4

5 Video Memory Interface Allows to update the video images VGA Registers and Control Registers Enable programming of the video adapter for operation in VGA modes There are adapters that are no longer compatible with the VGA standard Cursor Generator Graphic Functions Implemented by graphics accelerators 5

6 Video BIOS Provides video functions for access to the graphics adapter The BIOS programs of different adapters are different difficult programming VESA (Video Electronics Standards Association) standard for high-resolution BIOS functions Video Memory Holds the video image frame buffer 6

7 RAMDAC Circuit (RAMDAC RAM Digital to Analog Converter) Reads the digital image and converts it into analog signals The RAMDAC functions may be integrated into the graphics controller Only required for displays with analog inputs Displays that operate in the digital domain reconvert the analog signals to digital form 7

8 Video Ports Enable to transfer the video images to a monitor There are several variants of video ports VGA (Video Graphics Array) Analog interface Designed for CRT displays, but also used by some liquid crystal displays Electrical noise may occur DB-15 connector 8

9 VIVO (Video In Video Out) Analog interface for connecting to TV sets, DVD players, game consoles (TV Out) Signals: S-Video (Y/C); composite video; component video (e.g., RGB) 9-pin mini-din connector DVI (Digital Visual Interface) Digital interface DVI-I (digital and analog signals) or DVI-D (digital signals only) connector 9

10 HDMI (High-Definition Multimedia Interface) Digital interface for uncompressed video data Allows to send digital audio data over the same cable 19-pin (single-link) or 29-pin (dual-link) connector DisplayPort Digital interface for video and audio data Targeted to replace the VGA and DVI interfaces 20-pin connectors for 1, 2, or 4 lanes 10

11 Structure of a Graphics Adapter Video Memory Graphics Accelerators 3D Accelerators Graphics Processing Units Digital Interfaces for Monitors 11

12 Can be single-ported or dual-ported Single-ported video memory The data port is used to refresh the screen and to write new data Dual-ported video memory One of the ports is used to update the images in memory The second port has serial access and is used to refresh the images on the screen 12

13 Video Memory Transfer Rate The maximum transfer rate bandwidth Affected by video memory technology and access time Bandwidth has to be shared by: screen refreshing circuits, CPU, graphics controller % of the bandwidth should be reserved for other functions, different than refreshing 13

14 DDR-400 (PC3200) memory Maximum transfer rate: 3,200 MB/s Average transfer rate: ~1,600 MB/s DDR2-667 (PC2-5300) memory Maximum transfer rate : GB/s DDR (PC ) memory Maximum transfer rate : 17 GB/s DDR (PC ) memory Maximum transfer rate : 25.6 GB/s 14

15 GDDR (Graphics Double Data Rate) Designed by ATI Technologies with the collaboration of the JEDEC committee Several versions: GDDR2.. GDDR6 GDDR2 and GDDR3: based on DDR2 technology GDDR4 and GDDR5: based on DDR3 technology Low voltage: 1.8 V V reduced power consumption and heat output Separate data strobe signals for read and write 15

16 GDDR6 Combines high performance with stable operation and low implementation costs Memory organization: x32 Differential command clock signal (CK, CK#) Two differential write clock signals (WCK, WCK#) Two data bytes are aligned to one WCK signal Example for a data rate of 6 Gbits/s: f CK = 1.5 GHz; f WCK = 3 GHz 16

17 Data bus inversion Reduces the number of zero bits transmitted Indicated with a DBI# signal for each byte Transmission lines have high level termination power dissipation is reduced Address and command bus inversion Signal training Phase adjustment of clock, data, and address signals 17

18 Address training: alignment of the address bus to the CK clock signal Alignment of WCK signal to the CK signal Data training: alignment of the data lines to the corresponding WCK signal A hidden data re-training is possible Calibration: improves the reliability Auto-calibration: drive strength, termination impedance Software-controlled adjustment 18

19 Burst read/write access to the internal memory 16n prefetch: the internal data bus is 16 times as wide as the I/O interface The 32-bit I/O interface is divided into two independent 16-bit channels, A and B At each read and write 2 x 256 = 512 bits are transferred (64 B in two CK cycles) Maximum transfer rate: 16 Gbits/s per pin (64 GB/s for 32 pins) 19

20 Error detection Dedicated EDC (Error Detection Code) pins for sending CRC codes to the memory controller For each channel: a 16-bit CRC code for the data signals and DBI# signals of a burst transfer block The concatenation of two 8-bit CRC codes, for the first half and the second half of the block Allows full detection of single-bit, double-bit, and triple-bit errors, and >99% detection of other random errors 20

21 Power management Features that allow to consume power only when it is needed Scalable clock frequency: from 50 MHz (400 Mbits/s) to the maximum frequency Low power mode for the DRAM core Increasing the termination impedance at slower data rates Self-refresh and hibernate self-refresh modes Low supply voltage: 1.35 V Data and address bus inversion 21

22 Structure of a Graphics Adapter Video Memory Graphics Accelerators 3D Accelerators Graphics Processing Units Digital Interfaces for Monitors 22

23 Contain specialized circuits to execute the mathematical operations required for graphics rendering Release the CPU from the task of executing these operations The first graphics accelerators: AVGA (Accelerated VGA) adapters Subsequent graphics accelerators: 2D accelerators The link between the accelerator circuitry and the OS is made via a driver 23

24 Common 2D graphics functions: BitBlt (Bit Block Transfer) Two bitmaps are combined with a raster operation Boolean operator The result is transferred to the destination area Blitter: dedicated circuit for the BitBlt operation Tracing lines, drawing rectangles, circles Filling surfaces or polygons Adding color 24

25 Multimedia accelerators: graphics accelerators extended with audio and video acceleration functions Functions: Decoding audio data streams Scaling video images in x, y directions Converting digital video signals into RGB signals Decompressing video images represented in various formats 25

26 Structure of a Graphics Adapter Video Memory Graphics Accelerators 3D Accelerators Graphics Processing Units Digital Interfaces for Monitors 26

27 3D Accelerators 3D Images 3D Operations 27

28 Are managed using abstract models An object is represented as a set of points defined by its x, y, and z coordinates position of vertices If the object vertices are connected with lines, surfaces are obtained can be filled with a certain color or texture Each 3D object is composed of a large number of triangles (or polygons) that describe its surface 28

29 Animated 3D graphics requires to perform a series of geometry computations that define the position of objects in 3D space The geometry computations that handle the vertices of triangles can be performed by the CPU or by the graphics processor The graphics processor must convert these triangles into solid surfaces intensive computations are needed 29

30 In the real world, objects interact with each other Complex mathematical equations are used to determine whether an object is visible in a scene from a given angle Besides the color components, for each pixel an alpha value must also be stored Indicates the degree of transparency of the pixel in the final image 30

31 Another information that must be stored: the depth in space or z coordinate The accelerator determines the z value of the objects pixels in a plane and displays those with a smaller z value The pixels depth information is stored in a separate buffer z-buffer Usually, 32 bits are allocated in the z-buffer for each pixel 31

32 Each time the image is updated, the color and depth of pixels must be recomputed Applying different 3D computations to the scene process of rendering Fills in all of the points on the surface of the object that previously was stored only as a set of vertices A solid object with 3D effects will be drawn on the monitor 32

33 3D Accelerators 3D Images 3D Operations 33

34 3D operations are performed in two stages: Geometry stage: clipping, transformation, lighting Rendering stage: shading, texture mapping with adding the perspective effect, texture filtering, alpha blending On current 3D accelerators, operations in both stages are performed by the graphics processor 34

35 35

36 Clipping Determines what part of an object is visible on the screen Eliminates the parts that are not visible Lighting Objects are modeled by light sources in the scene Lighting effects create color shading, light reflection, shadows 36

37 Transformation Translation: moving every point by a fixed distance in the same direction Reflection: transforming an object into its mirror image Glide reflection: combining a reflection with the translation along the reflection axis Scaling: linear transformation to change the size of objects 37

38 Tessellation Dividing polygons into smaller structures for rendering Dividing into triangles: triangulation 38

39 Shading Enables the realistic representation of 3D objects on 2D screens Algorithms: Gouraud, Phong Reading the color information of vertices Interpolating the intensities for the color components 39

40 Texture Mapping Adding surface details (textures) to polygons that represent objects Loading the texture elements (texels) from a bitmap Combining the texels Writing the resulting pixel to video memory Applying a single texture Multi-texturing: a combination of textures is applied to an object 40

41 Textures may require a large space in memory compression is used Textures must be corrected to create the perspective effect 41

42 Texture Filtering Reduces some unwanted effects that may occur with texture mapping The color of a new pixel is determined through interpolation between the colors of several texels in the original texture Bilinear filtering: uses the weighted average of the four texels nearest to a particular texel 42

43 Trilinear Filtering The texture resolution is reduced when the distance to the object increases 3D accelerators store in memory several variants of a texture MIP mapping Combining this feature with bilinear filtering 43

44 Fogging Gradually fading objects in the distance The scene will appear more realistic illusion of distant objects Allows to perform the 3D processing faster Alpha Blending Used to create the transparency effect for some objects (e.g., windows) 44

45 Anti-Aliasing Oblique lines: approximated by combining vertical segments with horizontal segments the aliasing effect occurs Removing this effect ( anti-aliasing ): Changing the color of pixels near the outlines The background color is gradually mixed with the object's color The clarity of the outlines is reduced 45

46 Structure of a Graphics Adapter Color Representation Video Memory Graphics Accelerators 3D Accelerators Graphics Processing Units Digital Interfaces for Monitors 46

47 Graphics Processing Units Overview GPGPU Computing The CUDA Architecture The NVIDIA TU102 GPU 47

48 GPU Graphics Processing Unit Dedicated graphics processors for PCs, workstations, and game consoles Initially used to accelerate the rendering stage for 3D graphics (e.g., texture mapping) Later also used to accelerate the geometric computations (rotation, translation) GPUs contain shader units, modules for texture mapping, anti-aliasing etc. 48

49 Vertex shader units Transform the 3D position of each vertex to the 2D coordinates on the screen and to the depth value for the z-buffer Modify the attributes of vertices: position, color, texture coordinates Geometry shader units Generate geometric figures or add volumetric details to objects 49

50 Pixel/fragment shader units Determine the color, z depth, and alpha value for each pixel or fragment Unified shader units Programmable units Able to perform various shading operations (vertex, geometry, pixel) GPUs contain an array of computing units and a unit that distributes the operations to be performed 50

51 The architecture with programmable units allows a more flexible use of the hardware resources The programmable units can also be used for other types of computations A flexible parallel architecture is obtained GPUs also include modules for 2D acceleration, MPEG compression, highdefinition video decoding 51

52 GPUs can be dedicated or integrated Dedicated GPUs Used in graphics cards interfaced with the motherboard via a PCI Express bus or AGP (Accelerated Graphics Port) interface Have a dedicated memory to the card use Examples AMD Radeon RX 500 (e.g., RX 590) NVIDIA GeForce RTX (e.g., RTX 2080) 52

53 Integrated GPUs Are integrated into a chipset or processor Use a portion of the system memory Have lower performance compared to dedicated GPUs Examples Intel HD Graphics (e.g., UHD Graphics 630) AMD Vega (e.g., Vega 10) in the Ryzen Mobile APU (Accelerated Processing Unit) series ARM Mali (e.g., Mali-G72 in the Exynos 9810 processors) 53

54 The design of GPUs was influenced by the 2D and 3D programming interfaces Implement API functions in hardware OpenGL (Open Graphics Library) For various platforms and languages Functions to draw 3D scenes from primitives Direct3D (component of DirectX) Only for the Microsoft operating systems Low-level interface to the 3D hardware functions 54

55 Technologies for connecting multiple GPUs on different graphics cards NVIDIA: SLI (Scalable Link Interface) and HB SLI (High Bandwidth SLI) identical graphics cards are connected via a motherboard or a bridge Interconnects: PCIe (SLI) or NVLink (HB SLI) AMD: CrossFireX Up to 4 graphics cards can be connected The graphics cards do not have to be identical 55

56 Graphics Processing Units Overview GPGPU Computing The CUDA Architecture The NVIDIA TU102 GPU 56

57 GPGPU (General Purpose computing on GPU) The GPU processing cores provide massive FP computational power Example: a single NVIDIA Turing TU102 GPU (4,608 cores) achieves 16.3 TFLOPS The graphics pipeline can also be used for general-purpose applications The performance can be orders of magnitude higher than that of conventional CPUs 57

58 GPUs can process independent vertices and pixels/fragments stream processors Stream: set of records that require similar computation Kernel function: applied to each element in the stream Shared memories cannot be used Ideal GPGPU applications: large data sets, high parallelism, reduced dependencies 58

59 Disadvantages of GPGPU computing: The programmer needs to be familiar with the graphics APIs and the GPU architecture Problems need to be expressed in terms of coordinates, textures, shader functions The need to use graphics programming languages: OpenGL, DirectX, Cg API extensions for running some program functions on GPU's processors: CUDA (NVIDIA), OpenCL (Khronos Group) 59

60 Graphics Processing Units Overview GPGPU Computing The CUDA Architecture The NVIDIA TU102 GPU 60

61 CUDA (Compute Unified Device Architecture) Software and hardware architecture Enables GPUs to execute programs written in C, C++, Fortran, OpenCL languages Allows to use Microsoft's DirectCompute API Allows to access directly the GPU resources for general-purpose computing Exploits the GPU's capability to operate on large matrices in parallel 61

62 A CUDA program calls kernel functions executed by threads Threads are organized into blocks and groups of blocks (grids) Thread block: Set of concurrent threads Communicate via a shared memory Each thread has an identifier, registers, private memory, inputs, outputs 62

63 Grid of blocks: Group (array) of thread blocks The blocks execute the same kernel function Ensure synchronization between dependent kernel functions Results are shared in a global memory allocated to an application global synchronization 63

64 64

65 The hierarchy of threads is executed on a hierarchy of processors on the GPU Threads: executed by CUDA cores and other execution units Thread blocks: executed by a streaming multiprocessor (SM) Group of 32 threads: warp Grids of blocks: executed by the GPU 65

66 Unified Virtual Addressing Provides a single virtual memory address space for CPU and GPU memory Unified Memory Part of the GPU physical memory is shared between the CPU and GPU CUDA software migrates data allocated in the unified memory between GPU and CPU The memory modified by the CPU should be synchronized with the GPU memory 66

67 Graphics Processing Units Overview GPGPU Computing The CUDA Architecture The NVIDIA TU102 GPU 67

68 Uses a new architecture, Turing Previous architectures: Fermi, Kepler, Maxwell, Pascal Main features: 18.6 billion transistors, 12 nm technology Uses the second generation of NVIDIA's NVLink interconnect up to 100 GB/s Contains GDDR6 memory interfaces New types of cores: tensor cores, ray tracing cores 68

69 A full implementation contains: Six Graphics Processing Cluster (GPC) units 36 Texture Processing Cluster (TPC) units 72 SM units with 64 CUDA cores and 4 texture units (4,608 cores; 288 texture units) 576 tensor cores; 72 ray-tracing (RT) cores Twelve 32-bit GDDR6 memory controllers (384-bit memory interface) Common L2 cache memory for the SM units Global scheduler GigaThread Engine 69

70 70

71 Each SM unit contains: 64 integer cores (INT32) 64 single-precision FP cores (FP32) Floating-point units IEEE Fused multiply-add instruction (FMA) Concurrent execution of INT32, FP32 operations Two double-precision FP units (FP64) 16 Load/Store units (LD/ST) 16 special-function units (SFU) transcendental functions 71

72 Four texture units Four register files of 64 KB each (4 x 16,384 registers of 32 bits) Eight tensor cores Accelerate matrix multiplications for neural network training and inferencing 64 FMA operations per clock cycle (FP16), or 256 integer operations per clock cycle (INT8), or 512 integer operations per clock cycle (INT4) One ray-tracing (RT) core Enables real-time ray tracing: shadows, reflections, refractions 72

73 73

74 Threads are scheduled in groups of 32 (warps) Each SM unit also contains: Four warp schedulers provide increased performance and reduced power consumption Four instruction dispatch units From each warp, two instructions can be dispatched in each clock cycle 74

75 Memory subsystem Each SM block contains an instruction cache (L0) memory New unified architecture for the L1 cache, texture cache, and shared memory Combined L1 data cache/shared memory (96 KB) 6144 KB of unified L2 cache memory: allows to share data between the SM units The register files and memories are protected by an ECC code 75

76 The NVIDIA GeForce RTX 2080 Ti Card Based on the Turing GPU architecture Contains one TU102 GPU Number of CUDA cores: 68 x 64 = 4,352 GDDR6 memory size: 11 GB Nominal clock frequency: 1350 MHz Peak FP32 performance: 13.4 TFLOPS Peak FP16 performance: 26.9 TFLOPS Peak FP16 tensor FMA performance with FP32 accumulate: 53.8 TFLOPS 76

77 77

78 The main component of a graphics adapter is the graphics controller It contains the system bus interface; the speed of this interface is an important performance factor Video ports enable to combine video images from other sources with graphics images The transfer rate of the video memory has a major impact on performance Dual-ported memories: updating the images and refreshing the screen can be performed in parallel 78

79 The GDDR6 memory has advanced features for high performance and stable operation Data and address bus inversion; signal training; calibration; error detection; power management 3D accelerators are required to convert 3D objects into 2D images in a realistic manner For each pixel of a 3D object, an alpha value and the z coordinate have to be stored 3D operations are performed in two stages: geometry stage and rendering stage 79

80 GPUs are used to accelerate the geometric stage and the rendering stage of 3D graphics Can be dedicated or integrated in a chipset GPUs contain a large number of processing cores, programmable for various shadings The processing power of GPUs can also be used for applications that require vector operations The CUDA architecture allows to directly access the GPU resources for general-purpose computing 80

81 Structure of a graphics adapter Components of the graphics controller Function of the RAMDAC circuit Features of the GDDR6 graphics memory Data and address bus inversion of the GDDR6 graphics memory Signal training of the GDDR6 graphics memory Representation of 3D objects 3D operations performed in the geometry stage 81

82 3D operations performed in the rendering stage Types of shader units contained by GPUs Dedicated and integrated GPUs Advantages and disadvantages of GPGPU computing The CUDA architecture Thread block in the CUDA architecture Grid of blocks in the CUDA architecture 82

83 1. What is the advantage of a dual-ported video memory? 2. What are the power management features of the GDDR6 video memory? 3. What information is required for representing 3D objects? 4. What operations are performed in the rendering stage for 3D images? 83

CS427 Multicore Architecture and Parallel Computing

CS427 Multicore Architecture and Parallel Computing CS427 Multicore Architecture and Parallel Computing Lecture 6 GPU Architecture Li Jiang 2014/10/9 1 GPU Scaling A quiet revolution and potential build-up Calculation: 936 GFLOPS vs. 102 GFLOPS Memory Bandwidth:

More information

Spring 2009 Prof. Hyesoon Kim

Spring 2009 Prof. Hyesoon Kim Spring 2009 Prof. Hyesoon Kim Application Geometry Rasterizer CPU Each stage cane be also pipelined The slowest of the pipeline stage determines the rendering speed. Frames per second (fps) Executes on

More information

ECE 571 Advanced Microprocessor-Based Design Lecture 20

ECE 571 Advanced Microprocessor-Based Design Lecture 20 ECE 571 Advanced Microprocessor-Based Design Lecture 20 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 12 April 2016 Project/HW Reminder Homework #9 was posted 1 Raspberry Pi

More information

ECE 571 Advanced Microprocessor-Based Design Lecture 18

ECE 571 Advanced Microprocessor-Based Design Lecture 18 ECE 571 Advanced Microprocessor-Based Design Lecture 18 Vince Weaver http://www.eece.maine.edu/ vweaver vincent.weaver@maine.edu 11 November 2014 Homework #4 comments Project/HW Reminder 1 Stuff from Last

More information

Spring 2010 Prof. Hyesoon Kim. AMD presentations from Richard Huddy and Michael Doggett

Spring 2010 Prof. Hyesoon Kim. AMD presentations from Richard Huddy and Michael Doggett Spring 2010 Prof. Hyesoon Kim AMD presentations from Richard Huddy and Michael Doggett Radeon 2900 2600 2400 Stream Processors 320 120 40 SIMDs 4 3 2 Pipelines 16 8 4 Texture Units 16 8 4 Render Backens

More information

Spring 2011 Prof. Hyesoon Kim

Spring 2011 Prof. Hyesoon Kim Spring 2011 Prof. Hyesoon Kim Application Geometry Rasterizer CPU Each stage cane be also pipelined The slowest of the pipeline stage determines the rendering speed. Frames per second (fps) Executes on

More information

CS GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8. Markus Hadwiger, KAUST

CS GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8. Markus Hadwiger, KAUST CS 380 - GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8 Markus Hadwiger, KAUST Reading Assignment #5 (until March 12) Read (required): Programming Massively Parallel Processors book, Chapter

More information

What is GPU? CS 590: High Performance Computing. GPU Architectures and CUDA Concepts/Terms

What is GPU? CS 590: High Performance Computing. GPU Architectures and CUDA Concepts/Terms CS 590: High Performance Computing GPU Architectures and CUDA Concepts/Terms Fengguang Song Department of Computer & Information Science IUPUI What is GPU? Conventional GPUs are used to generate 2D, 3D

More information

Graphics Hardware. Instructor Stephen J. Guy

Graphics Hardware. Instructor Stephen J. Guy Instructor Stephen J. Guy Overview What is a GPU Evolution of GPU GPU Design Modern Features Programmability! Programming Examples Overview What is a GPU Evolution of GPU GPU Design Modern Features Programmability!

More information

Programming Graphics Hardware

Programming Graphics Hardware Tutorial 5 Programming Graphics Hardware Randy Fernando, Mark Harris, Matthias Wloka, Cyril Zeller Overview of the Tutorial: Morning 8:30 9:30 10:15 10:45 Introduction to the Hardware Graphics Pipeline

More information

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller Entertainment Graphics: Virtual Realism for the Masses CSE 591: GPU Programming Introduction Computer games need to have: realistic appearance of characters and objects believable and creative shading,

More information

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University CSE 591/392: GPU Programming Introduction Klaus Mueller Computer Science Department Stony Brook University First: A Big Word of Thanks! to the millions of computer game enthusiasts worldwide Who demand

More information

AMD HD5450 PCIe ADD-IN BOARD. Datasheet AEGX-A3T5-01FST1

AMD HD5450 PCIe ADD-IN BOARD. Datasheet AEGX-A3T5-01FST1 AMD HD5450 PCIe ADD-IN BOARD Datasheet AEGX-A3T5-01FST1 CONTENTS 1. Feature... 3 2. Functional Overview... 4 2.1. Memory Interface... 4 2.2. Acceleration Features... 4 2.3. Avivo Display System... 5 2.4.

More information

Antonio R. Miele Marco D. Santambrogio

Antonio R. Miele Marco D. Santambrogio Advanced Topics on Heterogeneous System Architectures GPU Politecnico di Milano Seminar Room A. Alario 18 November, 2015 Antonio R. Miele Marco D. Santambrogio Politecnico di Milano 2 Introduction First

More information

HiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and Parallelization of LDPC codes.

HiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and Parallelization of LDPC codes. HiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and Parallelization of LDPC codes Ian Glendinning Outline NVIDIA GPU cards CUDA & OpenCL Parallel Implementation

More information

Graphics Hardware. Graphics Processing Unit (GPU) is a Subsidiary hardware. With massively multi-threaded many-core. Dedicated to 2D and 3D graphics

Graphics Hardware. Graphics Processing Unit (GPU) is a Subsidiary hardware. With massively multi-threaded many-core. Dedicated to 2D and 3D graphics Why GPU? Chapter 1 Graphics Hardware Graphics Processing Unit (GPU) is a Subsidiary hardware With massively multi-threaded many-core Dedicated to 2D and 3D graphics Special purpose low functionality, high

More information

Portland State University ECE 588/688. Graphics Processors

Portland State University ECE 588/688. Graphics Processors Portland State University ECE 588/688 Graphics Processors Copyright by Alaa Alameldeen 2018 Why Graphics Processors? Graphics programs have different characteristics from general purpose programs Highly

More information

Scanline Rendering 2 1/42

Scanline Rendering 2 1/42 Scanline Rendering 2 1/42 Review 1. Set up a Camera the viewing frustum has near and far clipping planes 2. Create some Geometry made out of triangles 3. Place the geometry in the scene using Transforms

More information

AMD Embedded PCIe ADD-IN BOARD E6760/E6460 Datasheet. (ER93FLA/ER91FLA-xx)

AMD Embedded PCIe ADD-IN BOARD E6760/E6460 Datasheet. (ER93FLA/ER91FLA-xx) AMD Embedded PCIe ADD-IN BOARD E6760/E6460 Datasheet (ER93FLA/ER91FLA-xx) CONTENTS 1. Feature... 3 2. Functional Overview... 4 2.1. Memory Interface... 4 2.2. Acceleration Features... 4 2.3. Avivo Display

More information

Graphics Processing Unit Architecture (GPU Arch)

Graphics Processing Unit Architecture (GPU Arch) Graphics Processing Unit Architecture (GPU Arch) With a focus on NVIDIA GeForce 6800 GPU 1 What is a GPU From Wikipedia : A specialized processor efficient at manipulating and displaying computer graphics

More information

Build cost-effective, reliable signage solutions with the 8 display output, single slot form factor NVIDIA NVS 810

Build cost-effective, reliable signage solutions with the 8 display output, single slot form factor NVIDIA NVS 810 WEB COPY NVIDIA NVS 810 for Eight DP Displays Part No. VCNVS810DP-PB Overview Build cost-effective, reliable signage solutions with the 8 display output, single slot form factor NVIDIA NVS 810 The NVIDIA

More information

CSCI-GA Graphics Processing Units (GPUs): Architecture and Programming Lecture 2: Hardware Perspective of GPUs

CSCI-GA Graphics Processing Units (GPUs): Architecture and Programming Lecture 2: Hardware Perspective of GPUs CSCI-GA.3033-004 Graphics Processing Units (GPUs): Architecture and Programming Lecture 2: Hardware Perspective of GPUs Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com History of GPUs

More information

Threading Hardware in G80

Threading Hardware in G80 ing Hardware in G80 1 Sources Slides by ECE 498 AL : Programming Massively Parallel Processors : Wen-Mei Hwu John Nickolls, NVIDIA 2 3D 3D API: API: OpenGL OpenGL or or Direct3D Direct3D GPU Command &

More information

User Guide. NVIDIA Quadro FX 4700 X2 BY PNY Technologies Part No. VCQFX4700X2-PCIE-PB

User Guide. NVIDIA Quadro FX 4700 X2 BY PNY Technologies Part No. VCQFX4700X2-PCIE-PB NVIDIA Quadro FX 4700 X2 BY PNY Technologies Part No. VCQFX4700X2-PCIE-PB User Guide PNY Technologies, Inc. 299 Webro Rd. Parsippany, NJ 07054-0218 Tel: 408.567.5500 Fax: 408.855.0680 Features and specifications

More information

0;L$+LJK3HUIRUPDQFH ;3URFHVVRU:LWK,QWHJUDWHG'*UDSKLFV

0;L$+LJK3HUIRUPDQFH ;3URFHVVRU:LWK,QWHJUDWHG'*UDSKLFV 0;L$+LJK3HUIRUPDQFH ;3URFHVVRU:LWK,QWHJUDWHG'*UDSKLFV Rajeev Jayavant Cyrix Corporation A National Semiconductor Company 8/18/98 1 0;L$UFKLWHFWXUDO)HDWXUHV ¾ Next-generation Cayenne Core Dual-issue pipelined

More information

NVidia s GPU Microarchitectures. By Stephen Lucas and Gerald Kotas

NVidia s GPU Microarchitectures. By Stephen Lucas and Gerald Kotas NVidia s GPU Microarchitectures By Stephen Lucas and Gerald Kotas Intro Discussion Points - Difference between CPU and GPU - Use s of GPUS - Brie f History - Te sla Archite cture - Fermi Architecture -

More information

Mattan Erez. The University of Texas at Austin

Mattan Erez. The University of Texas at Austin EE382V: Principles in Computer Architecture Parallelism and Locality Fall 2008 Lecture 10 The Graphics Processing Unit Mattan Erez The University of Texas at Austin Outline What is a GPU? Why should we

More information

NVIDIA GTX200: TeraFLOPS Visual Computing. August 26, 2008 John Tynefield

NVIDIA GTX200: TeraFLOPS Visual Computing. August 26, 2008 John Tynefield NVIDIA GTX200: TeraFLOPS Visual Computing August 26, 2008 John Tynefield 2 Outline Execution Model Architecture Demo 3 Execution Model 4 Software Architecture Applications DX10 OpenGL OpenCL CUDA C Host

More information

Real - Time Rendering. Graphics pipeline. Michal Červeňanský Juraj Starinský

Real - Time Rendering. Graphics pipeline. Michal Červeňanský Juraj Starinský Real - Time Rendering Graphics pipeline Michal Červeňanský Juraj Starinský Overview History of Graphics HW Rendering pipeline Shaders Debugging 2 History of Graphics HW First generation Second generation

More information

AMD E M PCIEx16 GFX-AE6460F16-5C

AMD E M PCIEx16 GFX-AE6460F16-5C AMD E6460 512M PCIEx16 GFX-AE6460F16-5C MPN numbers: 1A1-E000141ADP Embedded PCIe Graphics 1 x DL DVI-D, 1 x HDMI, 1 x VGA CONTENTS 1. Feature... 3 2. Functional Overview... 4 2.1. Memory Interface...

More information

Rendering. Converting a 3D scene to a 2D image. Camera. Light. Rendering. View Plane

Rendering. Converting a 3D scene to a 2D image. Camera. Light. Rendering. View Plane Rendering Pipeline Rendering Converting a 3D scene to a 2D image Rendering Light Camera 3D Model View Plane Rendering Converting a 3D scene to a 2D image Basic rendering tasks: Modeling: creating the world

More information

Accelerator cards are typically PCIx cards that supplement a host processor, which they require to operate Today, the most common accelerators include

Accelerator cards are typically PCIx cards that supplement a host processor, which they require to operate Today, the most common accelerators include 3.1 Overview Accelerator cards are typically PCIx cards that supplement a host processor, which they require to operate Today, the most common accelerators include GPUs (Graphics Processing Units) AMD/ATI

More information

Technical Report on IEIIT-CNR

Technical Report on IEIIT-CNR Technical Report on Architectural Evolution of NVIDIA GPUs for High-Performance Computing (IEIIT-CNR-150212) Angelo Corana (Decision Support Methods and Models Group) IEIIT-CNR Istituto di Elettronica

More information

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI. CSCI 402: Computer Architectures Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI 6.6 - End Today s Contents GPU Cluster and its network topology The Roofline performance

More information

2 x Maximum Display Monitor(s) support MHz Core Clock 28 nm Chip 384 x Stream Processors. 145(L)X95(W)X26(H) mm Size. 1.

2 x Maximum Display Monitor(s) support MHz Core Clock 28 nm Chip 384 x Stream Processors. 145(L)X95(W)X26(H) mm Size. 1. Model 11215-01-20G SAPPHIRE R7 250 2GB DDR3 WITH BOOST Specification Display Support Output GPU Video Memory Dimension Software 2 x Maximum Display Monitor(s) support 1 x D-Sub(VGA) 1 x HDMI (with 3D)

More information

GPU Architecture and Function. Michael Foster and Ian Frasch

GPU Architecture and Function. Michael Foster and Ian Frasch GPU Architecture and Function Michael Foster and Ian Frasch Overview What is a GPU? How is a GPU different from a CPU? The graphics pipeline History of the GPU GPU architecture Optimizations GPU performance

More information

AMD HD5450 PCI ADD-IN BOARD. Datasheet. Advantech model number:gfx-a3t5-61fst1

AMD HD5450 PCI ADD-IN BOARD. Datasheet. Advantech model number:gfx-a3t5-61fst1 AMD HD5450 PCI ADD-IN BOARD Datasheet Advantech model number:gfx-a3t5-61fst1 CONTENTS 1. Feature... 3 2. Functional Overview... 4 2.1. Memory Interface... 4 2.2. Acceleration Features... 4 2.3. Avivo Display

More information

GRAPHICS PROCESSING UNITS

GRAPHICS PROCESSING UNITS GRAPHICS PROCESSING UNITS Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 4, John L. Hennessy and David A. Patterson, Morgan Kaufmann, 2011

More information

Windowing System on a 3D Pipeline. February 2005

Windowing System on a 3D Pipeline. February 2005 Windowing System on a 3D Pipeline February 2005 Agenda 1.Overview of the 3D pipeline 2.NVIDIA software overview 3.Strengths and challenges with using the 3D pipeline GeForce 6800 220M Transistors April

More information

The NVIDIA GeForce 8800 GPU

The NVIDIA GeForce 8800 GPU The NVIDIA GeForce 8800 GPU August 2007 Erik Lindholm / Stuart Oberman Outline GeForce 8800 Architecture Overview Streaming Processor Array Streaming Multiprocessor Texture ROP: Raster Operation Pipeline

More information

Overview. NVIDIA Quadro M GB Real Interactive Expression. NVIDIA Quadro M GB Part No. VCQM GB-PB.

Overview. NVIDIA Quadro M GB Real Interactive Expression. NVIDIA Quadro M GB Part No. VCQM GB-PB. WEB COPY NVIDIA Quadro M6000 24GB Part No. VCQM6000-24GB-PB Overview NVIDIA Quadro M6000 24GB Real Interactive Expression Get real interactive expression with NVIDIA Quadro the world s most powerful workstation

More information

SAPPHIRE R7 260X 2GB GDDR5 OC BATLELFIELD 4 EDITION

SAPPHIRE R7 260X 2GB GDDR5 OC BATLELFIELD 4 EDITION SAPPHIRE R7 260X 2GB GDDR5 OC BATLELFIELD 4 EDITION Specification Display Support Output GPU Video Memory Dimension Software Accessory 4 x Maximum Display Monitor(s) support 1 x HDMI (with 3D) 1 x DisplayPort

More information

AMD HD5450 PCIe X1 ADD-IN BOARD. Datasheet. Advantech model number: GFX-A3T5-71FST1

AMD HD5450 PCIe X1 ADD-IN BOARD. Datasheet. Advantech model number: GFX-A3T5-71FST1 AMD HD5450 PCIe X1 ADD-IN BOARD Datasheet Advantech model number: GFX-A3T5-71FST1 CONTENTS 1. Feature... 3 2. Functional Overview... 4 2.1. Memory Interface... 4 2.2. Acceleration Features... 4 2.3. Avivo

More information

ECE 571 Advanced Microprocessor-Based Design Lecture 21

ECE 571 Advanced Microprocessor-Based Design Lecture 21 ECE 571 Advanced Microprocessor-Based Design Lecture 21 Vince Weaver http://www.eece.maine.edu/ vweaver vincent.weaver@maine.edu 9 April 2013 Project/HW Reminder Homework #4 comments Good job finding references,

More information

Introduction Electrical Considerations Data Transfer Synchronization Bus Arbitration VME Bus Local Buses PCI Bus PCI Bus Variants Serial Buses

Introduction Electrical Considerations Data Transfer Synchronization Bus Arbitration VME Bus Local Buses PCI Bus PCI Bus Variants Serial Buses Introduction Electrical Considerations Data Transfer Synchronization Bus Arbitration VME Bus Local Buses PCI Bus PCI Bus Variants Serial Buses 1 Most of the integrated I/O subsystems are connected to the

More information

AMD E8870 4GB PCIEX16 Mini DP X4 Low profile ER24FL-SK4 GFX-AE8870L16-5J

AMD E8870 4GB PCIEX16 Mini DP X4 Low profile ER24FL-SK4 GFX-AE8870L16-5J AMD E8870 4GB PCIEX16 Mini DP X4 Low profile ER24FL-SK4 GFX-AE8870L16-5J MPN : 1A1-E000236ADP Embedded PCIe Graphics 4 x Mini DP with cable locking REV 1.0 Page 2 of 15 2016 CONTENTS 1. Specification...

More information

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology CS8803SC Software and Hardware Cooperative Computing GPGPU Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology Why GPU? A quiet revolution and potential build-up Calculation: 367

More information

Nvidia Quadro K5200 8GB two DVI-I two DisplayPort Graphics Card by ThinkStation (4X60G69025)

Nvidia Quadro K5200 8GB two DVI-I two DisplayPort Graphics Card by ThinkStation (4X60G69025) OVERVIEW Nvidia Quadro K5200 8GB two DVI-I two DisplayPort Graphics Card by ThinkStation (4X60G69025) The Nvidia Quadro K5200 8GB DVI-I, two DisplayPort Graphics Card by ThinkStation is based on Nvidia

More information

Multimedia in Mobile Phones. Architectures and Trends Lund

Multimedia in Mobile Phones. Architectures and Trends Lund Multimedia in Mobile Phones Architectures and Trends Lund 091124 Presentation Henrik Ohlsson Contact: henrik.h.ohlsson@stericsson.com Working with multimedia hardware (graphics and displays) at ST- Ericsson

More information

CS451Real-time Rendering Pipeline

CS451Real-time Rendering Pipeline 1 CS451Real-time Rendering Pipeline JYH-MING LIEN DEPARTMENT OF COMPUTER SCIENCE GEORGE MASON UNIVERSITY Based on Tomas Akenine-Möller s lecture note You say that you render a 3D 2 scene, but what does

More information

Beyond One Core. Multi-Cores. Evolution in Frequency Has Stalled. Source: ISCA 2009 Keynote of Kathy Yelick

Beyond One Core. Multi-Cores. Evolution in Frequency Has Stalled. Source: ISCA 2009 Keynote of Kathy Yelick Beyond One Multi-s Evolution in Frequency Has Stalled Source: ISCA 2009 Keynote of Kathy Yelick 1 Multi-s Network or Bus Memory Private and/or shared caches Bus Network (on chip = NoC) Example: ARM Cortex

More information

Computer Graphics. Chapter 1 (Related to Introduction to Computer Graphics Using Java 2D and 3D)

Computer Graphics. Chapter 1 (Related to Introduction to Computer Graphics Using Java 2D and 3D) Computer Graphics Chapter 1 (Related to Introduction to Computer Graphics Using Java 2D and 3D) Introduction Applications of Computer Graphics: 1) Display of Information 2) Design 3) Simulation 4) User

More information

GPU Architecture. Michael Doggett Department of Computer Science Lund university

GPU Architecture. Michael Doggett Department of Computer Science Lund university GPU Architecture Michael Doggett Department of Computer Science Lund university GPUs from my time at ATI R200 Xbox360 GPU R630 R610 R770 Let s start at the beginning... Graphics Hardware before GPUs 1970s

More information

Overview. Web Copy. NVIDIA Quadro M4000 Extreme Performance in a Single-Slot Form Factor

Overview. Web Copy. NVIDIA Quadro M4000 Extreme Performance in a Single-Slot Form Factor Web Copy NVIDIA Quadro M4000 Part No. VCQM4000-PB Overview NVIDIA Quadro M4000 Extreme Performance in a Single-Slot Form Factor Get real interactive expression with NVIDIA Quadro the world s most powerful

More information

Shaders. Slide credit to Prof. Zwicker

Shaders. Slide credit to Prof. Zwicker Shaders Slide credit to Prof. Zwicker 2 Today Shader programming 3 Complete model Blinn model with several light sources i diffuse specular ambient How is this implemented on the graphics processor (GPU)?

More information

SAPPHIRE TOXIC R9 280X 3GB GDDR5

SAPPHIRE TOXIC R9 280X 3GB GDDR5 SAPPHIRE TOXIC R9 280X 3GB GDDR5 Specification Display Support Output GPU Video Memory Dimension Software Accessory 5 x Maximum Display Monitor(s) support 1 x HDMI (with 3D) 2 x Mini-DisplayPort 1 x Single-Link

More information

Hardware Accelerated Volume Visualization. Leonid I. Dimitrov & Milos Sramek GMI Austrian Academy of Sciences

Hardware Accelerated Volume Visualization. Leonid I. Dimitrov & Milos Sramek GMI Austrian Academy of Sciences Hardware Accelerated Volume Visualization Leonid I. Dimitrov & Milos Sramek GMI Austrian Academy of Sciences A Real-Time VR System Real-Time: 25-30 frames per second 4D visualization: real time input of

More information

EE382N (20): Computer Architecture - Parallelism and Locality Spring 2015 Lecture 09 GPUs (II) Mattan Erez. The University of Texas at Austin

EE382N (20): Computer Architecture - Parallelism and Locality Spring 2015 Lecture 09 GPUs (II) Mattan Erez. The University of Texas at Austin EE382 (20): Computer Architecture - ism and Locality Spring 2015 Lecture 09 GPUs (II) Mattan Erez The University of Texas at Austin 1 Recap 2 Streaming model 1. Use many slimmed down cores to run in parallel

More information

Real-Time Rendering (Echtzeitgraphik) Michael Wimmer

Real-Time Rendering (Echtzeitgraphik) Michael Wimmer Real-Time Rendering (Echtzeitgraphik) Michael Wimmer wimmer@cg.tuwien.ac.at Walking down the graphics pipeline Application Geometry Rasterizer What for? Understanding the rendering pipeline is the key

More information

Evolution of GPUs Chris Seitz

Evolution of GPUs Chris Seitz Evolution of GPUs Chris Seitz Overview Concepts: Real-time rendering Hardware graphics pipeline Evolution of the PC hardware graphics pipeline: 1995-1998: Texture mapping and z-buffer 1998: Multitexturing

More information

AMD HD7750 2GB PCIEx16

AMD HD7750 2GB PCIEx16 AMD HD7750 2GB PCIEx16 ADVANTECH MODEL: GFX-AH7750L16-5J MPN number: 1A1-E000130ADP Performance PCIe Graphics 4 x Mini DP CONTENTS 1. Specification... 3 2. Functional Overview... 4 2.1. Memory Interface...

More information

Current Trends in Computer Graphics Hardware

Current Trends in Computer Graphics Hardware Current Trends in Computer Graphics Hardware Dirk Reiners University of Louisiana Lafayette, LA Quick Introduction Assistant Professor in Computer Science at University of Louisiana, Lafayette (since 2006)

More information

high performance medical reconstruction using stream programming paradigms

high performance medical reconstruction using stream programming paradigms high performance medical reconstruction using stream programming paradigms This Paper describes the implementation and results of CT reconstruction using Filtered Back Projection on various stream programming

More information

GPGPU, 1st Meeting Mordechai Butrashvily, CEO GASS

GPGPU, 1st Meeting Mordechai Butrashvily, CEO GASS GPGPU, 1st Meeting Mordechai Butrashvily, CEO GASS Agenda Forming a GPGPU WG 1 st meeting Future meetings Activities Forming a GPGPU WG To raise needs and enhance information sharing A platform for knowledge

More information

CS GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1. Markus Hadwiger, KAUST

CS GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1. Markus Hadwiger, KAUST CS 380 - GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1 Markus Hadwiger, KAUST Reading Assignment #2 (until Feb. 17) Read (required): GLSL book, chapter 4 (The OpenGL Programmable

More information

The Need for Programmability

The Need for Programmability Visual Processing The next graphics revolution GPUs Graphics Processors have been engineered for extreme speed - Highly parallel pipelines exploits natural parallelism in pixel and vertex processing -

More information

SAPPHIRE DUAL-X R9 270X 2GB GDDR5 OC WITH BOOST

SAPPHIRE DUAL-X R9 270X 2GB GDDR5 OC WITH BOOST SAPPHIRE DUAL-X R9 270X 2GB GDDR5 OC WITH BOOST Specification Display Support Output GPU Video Memory Dimension Software Accessory 3 x Maximum Display Monitor(s) support 1 x HDMI (with 3D) 1 x DisplayPort

More information

CHAPTER 1 Graphics Systems and Models 3

CHAPTER 1 Graphics Systems and Models 3 ?????? 1 CHAPTER 1 Graphics Systems and Models 3 1.1 Applications of Computer Graphics 4 1.1.1 Display of Information............. 4 1.1.2 Design.................... 5 1.1.3 Simulation and Animation...........

More information

CS230 : Computer Graphics Lecture 4. Tamar Shinar Computer Science & Engineering UC Riverside

CS230 : Computer Graphics Lecture 4. Tamar Shinar Computer Science & Engineering UC Riverside CS230 : Computer Graphics Lecture 4 Tamar Shinar Computer Science & Engineering UC Riverside Shadows Shadows for each pixel do compute viewing ray if ( ray hits an object with t in [0, inf] ) then compute

More information

AMD HD7750 PCIe ADD-IN BOARD. Datasheet (GFX-A3T2-01FST1)

AMD HD7750 PCIe ADD-IN BOARD. Datasheet (GFX-A3T2-01FST1) AMD HD7750 PCIe ADD-IN BOARD Datasheet (GFX-A3T2-01FST1) CONTENTS 1. Feature... 3 2. Functional Overview... 4 2.1. Memory Interface... 4 2.2. Memory Aperture Size... 4 2.3. Avivo Display System... 5 2.4.

More information

Introduction to Modern GPU Hardware

Introduction to Modern GPU Hardware The following content are extracted from the material in the references on last page. If any wrong citation or reference missing, please contact ldvan@cs.nctu.edu.tw. I will correct the error asap. This

More information

Lecture 15: Introduction to GPU programming. Lecture 15: Introduction to GPU programming p. 1

Lecture 15: Introduction to GPU programming. Lecture 15: Introduction to GPU programming p. 1 Lecture 15: Introduction to GPU programming Lecture 15: Introduction to GPU programming p. 1 Overview Hardware features of GPGPU Principles of GPU programming A good reference: David B. Kirk and Wen-mei

More information

GPU Basics. Introduction to GPU. S. Sundar and M. Panchatcharam. GPU Basics. S. Sundar & M. Panchatcharam. Super Computing GPU.

GPU Basics. Introduction to GPU. S. Sundar and M. Panchatcharam. GPU Basics. S. Sundar & M. Panchatcharam. Super Computing GPU. Basics of s Basics Introduction to Why vs CPU S. Sundar and Computing architecture August 9, 2014 1 / 70 Outline Basics of s Why vs CPU Computing architecture 1 2 3 of s 4 5 Why 6 vs CPU 7 Computing 8

More information

This Unit: Putting It All Together. CIS 501 Computer Architecture. What is Computer Architecture? Sources

This Unit: Putting It All Together. CIS 501 Computer Architecture. What is Computer Architecture? Sources This Unit: Putting It All Together CIS 501 Computer Architecture Unit 12: Putting It All Together: Anatomy of the XBox 360 Game Console Application OS Compiler Firmware CPU I/O Memory Digital Circuits

More information

Architectures. Michael Doggett Department of Computer Science Lund University 2009 Tomas Akenine-Möller and Michael Doggett 1

Architectures. Michael Doggett Department of Computer Science Lund University 2009 Tomas Akenine-Möller and Michael Doggett 1 Architectures Michael Doggett Department of Computer Science Lund University 2009 Tomas Akenine-Möller and Michael Doggett 1 Overview of today s lecture The idea is to cover some of the existing graphics

More information

Parallel Computing: Parallel Architectures Jin, Hai

Parallel Computing: Parallel Architectures Jin, Hai Parallel Computing: Parallel Architectures Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology Peripherals Computer Central Processing Unit Main Memory Computer

More information

Anatomy of AMD s TeraScale Graphics Engine

Anatomy of AMD s TeraScale Graphics Engine Anatomy of AMD s TeraScale Graphics Engine Mike Houston Design Goals Focus on Efficiency f(perf/watt, Perf/$) Scale up processing power and AA performance Target >2x previous generation Enhance stream

More information

Overview. NVIDIA Quadro M6000 Real Interactive Expression. CUDA Cores Memory Bandwidth 317 GB/s. DisplayPort 1.2 WEB COPY

Overview. NVIDIA Quadro M6000 Real Interactive Expression. CUDA Cores Memory Bandwidth 317 GB/s. DisplayPort 1.2 WEB COPY WEB COPY NVIDIA Quadro M6000 Part No. VCQM6000-PB Overview NVIDIA Quadro M6000 Real Interactive Expression Get real interactive expression with NVIDIA Quadro the world s most powerful workstation graphics.

More information

GPU A rchitectures Architectures Patrick Neill May

GPU A rchitectures Architectures Patrick Neill May GPU Architectures Patrick Neill May 30, 2014 Outline CPU versus GPU CUDA GPU Why are they different? Terminology Kepler/Maxwell Graphics Tiled deferred rendering Opportunities What skills you should know

More information

Optimizing and Profiling Unity Games for Mobile Platforms. Angelo Theodorou Senior Software Engineer, MPG Gamelab 2014, 25 th -27 th June

Optimizing and Profiling Unity Games for Mobile Platforms. Angelo Theodorou Senior Software Engineer, MPG Gamelab 2014, 25 th -27 th June Optimizing and Profiling Unity Games for Mobile Platforms Angelo Theodorou Senior Software Engineer, MPG Gamelab 2014, 25 th -27 th June 1 Agenda Introduction ARM and the presenter Preliminary knowledge

More information

Fundamental CUDA Optimization. NVIDIA Corporation

Fundamental CUDA Optimization. NVIDIA Corporation Fundamental CUDA Optimization NVIDIA Corporation Outline! Fermi Architecture! Kernel optimizations! Launch configuration! Global memory throughput! Shared memory access! Instruction throughput / control

More information

Tesla Architecture, CUDA and Optimization Strategies

Tesla Architecture, CUDA and Optimization Strategies Tesla Architecture, CUDA and Optimization Strategies Lan Shi, Li Yi & Liyuan Zhang Hauptseminar: Multicore Architectures and Programming Page 1 Outline Tesla Architecture & CUDA CUDA Programming Optimization

More information

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. Sources. What is Computer Architecture?

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. Sources. What is Computer Architecture? This Unit: Putting It All Together CIS 371 Computer Organization and Design Unit 15: Putting It All Together: Anatomy of the XBox 360 Game Console Application OS Compiler Firmware CPU I/O Memory Digital

More information

Unit 11: Putting it All Together: Anatomy of the XBox 360 Game Console

Unit 11: Putting it All Together: Anatomy of the XBox 360 Game Console Computer Architecture Unit 11: Putting it All Together: Anatomy of the XBox 360 Game Console Slides originally developed by Milo Martin & Amir Roth at University of Pennsylvania! Computer Architecture

More information

Mathematical computations with GPUs

Mathematical computations with GPUs Master Educational Program Information technology in applications Mathematical computations with GPUs GPU architecture Alexey A. Romanenko arom@ccfit.nsu.ru Novosibirsk State University GPU Graphical Processing

More information

NVIDIA Quadro M5000 Designed for Extreme Performance and Power Efficiency

NVIDIA Quadro M5000 Designed for Extreme Performance and Power Efficiency WEB COPY NVIDIA Quadro M5000 Part No. VCQM5000-PB Overview NVIDIA Quadro M5000 Designed for Extreme Performance and Power Efficiency Get real interactive expression with NVIDIA Quadro the world s most

More information

GPGPU. Peter Laurens 1st-year PhD Student, NSC

GPGPU. Peter Laurens 1st-year PhD Student, NSC GPGPU Peter Laurens 1st-year PhD Student, NSC Presentation Overview 1. What is it? 2. What can it do for me? 3. How can I get it to do that? 4. What s the catch? 5. What s the future? What is it? Introducing

More information

Mattan Erez. The University of Texas at Austin

Mattan Erez. The University of Texas at Austin EE382V (17325): Principles in Computer Architecture Parallelism and Locality Fall 2007 Lecture 11 The Graphics Processing Unit Mattan Erez The University of Texas at Austin Outline What is a GPU? Why should

More information

CENG 477 Introduction to Computer Graphics. Graphics Hardware and OpenGL

CENG 477 Introduction to Computer Graphics. Graphics Hardware and OpenGL CENG 477 Introduction to Computer Graphics Graphics Hardware and OpenGL Introduction Until now, we focused on graphic algorithms rather than hardware and implementation details But graphics, without using

More information

Textures. Texture coordinates. Introduce one more component to geometry

Textures. Texture coordinates. Introduce one more component to geometry Texturing & Blending Prof. Aaron Lanterman (Based on slides by Prof. Hsien-Hsin Sean Lee) School of Electrical and Computer Engineering Georgia Institute of Technology Textures Rendering tiny triangles

More information

Comparison of High-Speed Ray Casting on GPU

Comparison of High-Speed Ray Casting on GPU Comparison of High-Speed Ray Casting on GPU using CUDA and OpenGL November 8, 2008 NVIDIA 1,2, Andreas Weinlich 1, Holger Scherl 2, Markus Kowarschik 2 and Joachim Hornegger 1 1 Chair of Pattern Recognition

More information

General Purpose GPU Computing in Partial Wave Analysis

General Purpose GPU Computing in Partial Wave Analysis JLAB at 12 GeV - INT General Purpose GPU Computing in Partial Wave Analysis Hrayr Matevosyan - NTC, Indiana University November 18/2009 COmputationAL Challenges IN PWA Rapid Increase in Available Data

More information

NVIDIA Quadro K5200 Sync PNY Part Number: VCQK5200SYNC-PB. User Guide

NVIDIA Quadro K5200 Sync PNY Part Number: VCQK5200SYNC-PB. User Guide NVIDIA Quadro K5200 Sync PNY Part Number: VCQK5200SYNC-PB User Guide PNY 100 Jefferson Road Parsippany NJ 07054-0218 973-515-9700 www.pny.com/quadro Features and specifications are subject to change without

More information

AMD E8860 2GB PCIEx16 Mini DP X6 GFX-AE8860F16-5J

AMD E8860 2GB PCIEx16 Mini DP X6 GFX-AE8860F16-5J AMD E8860 2GB PCIEx16 Mini DP X6 GFX-AE8860F16-5J MPN NUMBERS: 1A1-E000188ADP Embedded PCIe Graphics 6 x Mini DP with cable locking CONTENTS 1. Spe c ification... 2. Fun ctional 3 Overview... 4 2.1. Memory

More information

Volume Graphics Introduction

Volume Graphics Introduction High-Quality Volume Graphics on Consumer PC Hardware Volume Graphics Introduction Joe Kniss Gordon Kindlmann Markus Hadwiger Christof Rezk-Salama Rüdiger Westermann Motivation (1) Motivation (2) Scientific

More information

Real-time Graphics 9. GPGPU

Real-time Graphics 9. GPGPU 9. GPGPU GPGPU GPU (Graphics Processing Unit) Flexible and powerful processor Programmability, precision, power Parallel processing CPU Increasing number of cores Parallel processing GPGPU general-purpose

More information

CS 130 Final. Fall 2015

CS 130 Final. Fall 2015 CS 130 Final Fall 2015 Name Student ID Signature You may not ask any questions during the test. If you believe that there is something wrong with a question, write down what you think the question is trying

More information

Chapter 1 Introduction

Chapter 1 Introduction Graphics & Visualization Chapter 1 Introduction Graphics & Visualization: Principles & Algorithms Brief History Milestones in the history of computer graphics: 2 Brief History (2) CPU Vs GPU 3 Applications

More information

Lets assume each object has a defined colour. Hence our illumination model is looks unrealistic.

Lets assume each object has a defined colour. Hence our illumination model is looks unrealistic. Shading Models There are two main types of rendering that we cover, polygon rendering ray tracing Polygon rendering is used to apply illumination models to polygons, whereas ray tracing applies to arbitrary

More information

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. What is Computer Architecture? Sources

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. What is Computer Architecture? Sources This Unit: Putting It All Together CIS 371 Computer Organization and Design Unit 15: Putting It All Together: Anatomy of the XBox 360 Game Console Application OS Compiler Firmware CPU I/O Memory Digital

More information