Postgraduate course on Electronics and Informatics Engineering (M.Sc.) Training Course on Circuits Theory (prof. G. Capizzi)! Workshop on High performance computing and GPGPU computing Postgraduate course on Computer Sciences (M.Sc.)! Training Course on Distributed Sistems (prof. G. Pappalardo)! Workshop on High performance computing and GPGPU computing PART III GPU Cards and Architectures Dr. Christian Napoli, M.Sc.! Dpt. Mathematics and Informatics, University of Catania!!! - www.dmi.unict.it/~napoli/ 1 1
GPU Computing Graphic Processing Units & cards 2
Moore s Law CLOUD???! DISTRIBUITED! COMPUTING BOINC GRID GPGPU High Performance Computing 3
GPU Computing The ABC of an Algorithm! - D as Definite - E as Executable - F as Finite - G as Generic he Great Challenge (large scale) 4 - Distribuited - General - Pourpose - Computing on - Graphic - Processing - Units! DGPCGPU Computational Dimension! - Size of the Dataset - Number of Operations - Time of Execution - Operation density and topology An algorithm is a specific set of instructions for carrying out a procedure or solving a problem, with the requirement that the procedure terminate at some point. The word "algorithm" is a distortion of al-khwārizmī, a Persian mathematician who wrote an influential treatise about algebraic methods. The process of applying an algorithm to an input to obtain an output is called a computation. The Great Challenge (local scale)! - Distribuited - General - Pourpose - High Perofmance Computing on - Graphic - Processing - Units Computational Performance! - Memory access counters - Operations per second - Time of Execution - Scalability! DGPHPCGPU
5 Once upon a time. Fiat lux and devices! In 1983, Intel made the isbx 275 Video Graphics Controller Multimodule Board, for industrial systems based on the Multibus standard. The card was based on the 82720 Graphics Display Controller, and accelerated the drawing of lines, arcs, rectangles, and character bitmaps. The framebuffer was also accelerated through loading via DMA. The board was intended for use with Intel's line of Multibus industrial single-board computer plugin cards. The High-Performance Graphics Display Controller 7220 (commonly µpd7220 or NEC 7220) is a video interface controller capable of drawing lines, circles, arcs, and character graphics to a bit-mapped display. It was developed by NEC and used in NEC's APC III computers, the optional graphics module for the DEC Rainbow, the Tulip System-1, and the Epson QX-10.! (CC) Wikimedia Commons / CC-SA-3.0 The µpd7220 was one of the first implementations of a graphics display controller as a single Large Scale Integration (LSI) integrated circuit chip, enabling the design of low-cost, high-performance video graphics cards such as those from Number Nine Visual Technology. It became one of the best known of what became known as graphics processing units in the 1980s.
Once upon a time. 6
Graphic rendering The rendering pipeline Is a sequence of steps used to create a 2D raster representation of a 3D scene. In the early history of 3D computer graphics fixed purpose hardware was used to speed up the steps of the pipeline through a fixed-function pipeline, but the hardware evolved, becoming more general purpose, allowing greater flexibility in graphics rendering, as well as more generalized hardware, allowing the same generalized hardware to perform not only different steps of the pipeline, unlike fixed purpose hardware, but even limited forms of general purpose computing. 7
Graphic rendering The rendering pipeline 8
Graphic rendering The rendering pipeline The rendering pipeline is mapped onto current graphics acceleration hardware such that the input to the GPU is in the form of vertices. These vertices then undergo transformation and per-vertex lighting. At this point in modern GPU pipelines a custom vertex shader program can be used to manipulate the 3D vertices prior to rasterization. Once transformed and lit, the vertices undergo clipping and rasterization resulting in fragments. A second custom shader program can then be run on each fragment before the final pixel values are output to the frame buffer for display. 9
GPU Cards Inside a GPGPU device FERMI GeForce 10
GPU Cards GPGPU cores and multicore parallelism 11
GPU Cards GPGPU cores and streams 12
GPU Cards Nvidia GeForce Architecture 13
GPU Cards Stream, threads and parallels 14
GPU Cards Simple multi-threading 15
GPU Cards A little less-simple multi-threading 16
GPU Cards GPU cores matrix and multi-threading 17
GPU Cards GPU cores: threads and blocks 18
GPU Cards Threads and blocks: execution flux 19
GPU Cards A bit-old-fashioned gate logic for GPU cards 20
GPU Cards A bit-old-fashioned gate logic for GPU cards 21
GPU Cards CUDA vs STREAM 22
GPU Cards de gustibus non disputandum est! (sed mores filia tempora est!) 23
GPU Cards A promising future 24
GPU Moore Law A youth elisir for the poor-old Moore s Law Hot Zone! 25
GPU Moore Law A youth elisir for the poor-old Moore s Law 26
GPU Moore Law Moore s Law - GPU REFIT 27
GPU Moore Law Moore s Law - HPC & GPU REFIT DGPHPC HPCGPU 28
GPU: TNG Moore s Law - The next generation 29
GPU: TNG «So, five-card stud, nothing wild... and the sky's the limit.»! [Star Trek: The Next Generation, last words]!! 30
QUESTION TIME 31
Thank You You will find the PDF edition in the didactic section of the author s website, visit http://www.dmi.unict.it/~napoli/ To contact the author send an email to If you want to share this presentation be sure to read and follow the CC-BY-NC-ND-4.0 license. Visit http://creativecommons.org/licenses/by-nc-nd/4.0/deed.en_us 32 32