Stevina Dias* Sherrin Benjamin* Mitchell D silva* Lynette Lopes* *Assistant Professor Dwarkadas J Sanghavi College of Engineering, Vile Parle
|
|
- Jasmin West
- 6 years ago
- Views:
Transcription
1 GPU Programmig Models Stevia Dias* Sherri Bejami* Mitchell D silva* Lyette Lopes* *Assistat Professor Dwarkadas J Saghavi College of Egieerig, Vile Parle Abstract The CPU, the brais of the computer is beig ehaced by aother part of the computer the Graphics Processig Uit (GPU), which is its soul. GPUs are optimized for takig huge batches of data ad performig the same operatio repeatedly ad quickly, which is ot the case with PC microprocessors. A CPU combied with GPU, ca deliver better system performace, price, ad power. The versio 2 of Microsoft s Accelerator offers a efficiet way for applicatios to implemet array-processig operatios. These operatios use the parallel processig capabilities of multi-processor computers. CUDA is a parallel computig platform ad programmig model created by NVIDIA. CUDA provides both lower ad a higher level APIs. Keywords GPU, GPU Programmig CUDA, CUDA API, CGMA ratio, CUDA Memory Model, Accelerator, Data Parallel Arrays I. INTRODUCTION Computers have chips that facilitate the display of images to moitors. Each chip has differet fuctioality. Itel s itegrated graphics cotroller provides basic graphics that ca display applicatios like Microsoft PowerPoit, lowresolutio video ad basic games. The GPU is a programmable ad powerful computatioal device that goes far beyod basic graphics cotroller fuctios [1]. GPUs are special-purpose processors used for the display of three dimesioal scees. The capabilities of GPU are beig haressed to accelerate computatioal workloads i areas such as fiacial modellig, cuttig-edge scietific research ad oil ad gas exploratio. The GPU ca ow take o may multimedia tasks, such as speed up Adobe Flash video, traslatig video to differet formats, image recogitio, virus patter matchig ad may more. But the challege is to solve those problems that have a iheret parallel ature video processig, image aalysis, sigal processig. The CPU is composed of few cores ad a huge cache that ca hadle a few software s at a time. Coversely, a GPU is composed of hudreds of cores that ca hadle thousads of s simultaeously [2]. This ability of a GPU ca accelerate some software by 100x over a CPU aloe. The GPU achieves this acceleratio while beig more powerful ad cost-efficiet as compared to a CPU. This paper describes 2 differet ways of programmig a GPU amely Accelerator ad CUDA. II. ACCELERATOR Accelerator is a high-level data parallel library which uses parallel processors such as the GPU or multi core CPU to speed up executio. The Accelerator applicatio programmig iterface (API) helps i implemetig a wide variety of arrayprocessig operatios by supportig a fuctioal programmig model. the details of parallelizig ad ruig the computatio o the selected target processor, icludig GPUs ad multi core CPUs are hadled by the accelerator. The Accelerator API is almost processor idepedet, so the same arrayprocessig code rus o ay supported processor with oly mior chages [3]. The basic steps i Accelerator programmig are: i. Create data arrays. ii. Load the data arrays ito Accelerator dataparallel array objects. iii. Process the data-parallel array objects usig a series of Accelerator operatios. iv. Create a result object. v. Evaluate the result object o a target. A. Data Parallel Array Objects Applicatios do ot have direct access to the array data ad caot maipulate the array by idex [3]. Applicatios implemet array processig schemes by applyig Accelerator operatios to dataparallel array objects which represet a etire array as a uit rather tha workig with idividual array elemets. 5 data-parallel array objects supported by Accelerator v2 are: i. BoolParallelArray ii. DoubleParallelArray iii. Float4ParallelArray iv. FloatParallelArray v. ItParallelArray A example for Float4ParallelArray is as follows: typedef struct float a; float b; float c; float d; } float4; 225 P a g e
2 Float4 is a Accelerator structure that cotais a quadruplet of float values. It is used primarily i graphics programmig. float4 xyz (0.7f, 5.0f, 4.2f, 9.0f). The API icludes a large collectio of operatios that applicatios ca use to maipulate the cotets of arrays ad to combie the arrays i various ways. Most operatios take oe or more data-parallel array objects as iput, ad retur the processed data as a ew data-parallel object. Accelerator operatios work with copies of the iput objects; they do ot modify the origial objects. Operatio iputs are typically data-parallel objects, but some operatios ca also take costat values. Table 1 Type of Accelerator Operatios Operatio Explaatio Creatio ad coversio Elemetwise Reductio Trasform Liear algebra Create ew data-parallel array objects ad covert them from oe type to aother. Operate o each elemet of oe or more dataparallel array objects ad retur a ew object with the same dimesios as the origials. Add. Abs Subtract Divide, etc Reduce the rak of a data-parallel array object by applyig a fuctio across 1D or more dimesios. Trasform the orgaizatio of the elemets i a data-parallel array object. These operatios reorgaize the data i the object, but do ot require computatio. For example: Traspose Pad Perform stadard matrix operatios o dataparallel array objects, icludig matrix multiplicatio, scalar product, ad outer product. B. Sample Code usig System; usig Microsoft.ParallelArrays; usig FloatArray = Microsoft.ParallelArrays.FloatParallelArray; usig PArray = Microsoft.ParallelArrays.ParallelArrays; amespace SubArrays class Calculatio staticvoid Mai(strig[] args) it arraysize = 100; Radom raf = ewradom(); float[] Array1 = ewfloat[arraysize]; float[] Array2 = ewfloat[arraysize]; float[] Array3 = ewfloat[arraysize]; DX9Target result = ewdx9target(); // (1) for (it i = 0; i < arraysize; i++) // (2) Array1[i] = (float)(math.si((double)i / 10.0) + raf.nextdouble() / 5.0); Array2[i] = (float)(math.si((double)i / 10.0) + raf.nextdouble() / 5.0); } FloatArray fpiput1 = ew FloatArray (Array1); // (3) FloatArray fpiput2 = ew FloatArray (Array2); FloatArray fpstacked = PArray.Subtract(fpIput1, fpiput2); // (4) FloatArray fpoutput = PArray.Divide(fpStacked, 2); Array3 = result.toarray1d(fpoutput); // (5) for (it i = 0; i < arraylegth; i++) // (6) Cosole.WriteLie(Array3[i].ToStrig()); //(7) } } } } The above code uses two commoly used types: ParallelArrays ad FloatParallelArray. ParallelArrays represeted by PArray, cotais the Accelerator operatio methods [3]. FloatParallelArray represeted by FloatArray, cotais floatig poit arrays i the Accelerator eviromet. 1. Create a target object: Each processor that supports Accelerator has oe or more target objects. Target Objects covert Accelerator operatios to processor-specific code ad ru them o the processor. StackArrays uses DX9Target, which rus Accelerator operatios o ay DirectX 9-compatible GPU, usig the DirectX 9 API. 2. Create iput data arrays: the iput arrays for StackArrays are geerated by the applicatio, but they ca also be created from stored data. 3. Load each iput array: Each iput array is loaded ito a Accelerator data parallel array object. 4. Array Processig: Use oe or more Accelerator operatios to process the arrays. 5. Evaluatio: evaluate the results of the series of operatios by passig the result object to the target s ToArray1D method. ToArray1D evaluates the 1D arrays ad ToArray2D, evaluates 2-D arrays. 6. Process the result: Stack Arrays displays the elemets of the processed array i the cosole widow. 226 P a g e
3 C. Accelerator Architecture Applicatios Umaaged Applicatios Maaged Applicatios Accelerator Library Target Rutimes DirectX Maaged API GPU- Specific Other Processors Accelerator Resource Maager Native API Multi-Core CPU Other Resource Maagers CUDA is a high level laguage which stads for Compute Uified Device Architecture. It is a parallel computig platform ad programmig model created by NVIDIA. The CUDA platform is accessible to software developers through CUDAaccelerated libraries, compiler directives ad extesios to idustry-stadard programmig laguages, icludig C, C++ ad FORTRAN. Programmers use C/C++ with CUDA extesios to express parallelism, data locality, ad cooperatio, which is compiled with vcc, NVIDIA s LLVM-based C/C++ compiler, to code algorithms for executio o the GPU.CUDA eables heterogeeous systems(i.e., CPU+GPU). CPU & GPU are separate devices with separate DRAMs Scale to 100 s of cores, 1000 s of parallel s. CUDA is a Extesio to the C Programmig Laguage [4]. Processors GPUs Multi-core CPU C/C++ CUDA Applicatio GENERIC Figure 1: Accelerator Architecture 1) The Accelerator Library ad APIs The Accelerator library is used by the applicatios to implemet the processor-idepedet aspects of Accelerator programmig. The library exposes two APIs amely Maaged ad Native. C++ applicatios use the ative API, which is implemeted as umaaged C++ classes. Maaged applicatios use the maaged API. NVCC PTX Code CPU Code 2) Targets Accelerator eeds a set of target objects. Each target traslates Accelerator operatios ad data objects ito a suitable form for its associated processor ad rus the computatio o the processor. A processor ca have multiple targets, each of which accesses the processor i a differet way. For example, most GPUs will probably have DirectX 9 ad DirectX 11 targets, ad possibly a vedorimplemeted target to support processor-specific techologies. Each target cosists of a small API, which are called by applicatios to iteract with the target. The most commoly used methods aretoarray1d ad ToArray2D. 3) Processor Hardware The versio 2 of Accelerator ca ru operatios o differet targets. Curretly it icludes targets for: Multicore x64 CPUs, GPUs, by usig DirectX 9 III. CUDA PTX to Target Compiler G80. GPU Figure 2: Compilatio Procedure of CUDA SPECIALIZED CUDA Programmig Basics cosists of CUDA API basics, Programmig model ad its Memory model A. CUDA API Basics: There are three basic APIs i CUDA amely, Fuctio type qualifiers, Variable type qualifiers ad Built-i variables. Fuctio type qualifiers are used to specify executio o host or device. Variable type qualifiers are used to specify 227 P a g e
4 the locatio o the device. Four built-i variables are used to specify the grid ad block dimesios ad the block ad idices. blockdim dim3 dimesios of the block Idx uit3 idex withi the block 1) Fuctio Type Qualifiers Table 2 Fuctio type Qualifier Fuctio Type Qualifier Executed o Callable from device Device Device global Device Host host Host Host 2) Variable Type Qualifier Table 3 Variable Type Qualifier Variable Type Qualifier Locatio Lifetime device costat shared Global space Costat space Shared space lifetime of a applicatio lifetime of a applicatio lifetime of a block Accessible from all the s withi the grid ad from the host through the rutime library all the s withi the grid ad from the host through the rutime library (optioally used together with device ) all the s withi the block (optioally used together with device ) 3) Built i Variables Table 4 Built i Variables Built i Type Explaatio variables griddim dim3 dimesios of the grid blockidx uit3 block idex withi the grid 4) Executio Cofiguratio (EC): EC must be specified for ay call to a global fuctio. Defies the dimesio of the grid ad blocks specified by isertig a expressio betwee fuctio ame ad argumet list: Fuctio Declaratio: global void fuctio_ame(float* parameter) Fuctio Call: fuctio_ame <<< Dg, Db, Ns >>> (parameter) Where Dg, Db, Ns are: Dg is of type dim3, dimesio ad size of the grid Dg.x * Dg.y = umber of blocks beig lauched; Db is of type dim3, dimesio ad size of each block Db.x * Db.y * Db.z = umber of s per block. B. Programmig Model The GPU is see as a computig device to execute a portio of a applicatio that has to be executed several times, ca be isolated as a fuctio ad works idepedetly o differet data. Such a fuctio ca be compiled to ru o the device. The resultig program is called a Kerel. The batch of s that executes a kerel is orgaized as a grid of blocks. Followig are steps i CUDA code Step 1: Iitialize the device (GPU) Step 2: ocate o the device Step 3: Copy the data from host array to device array Step 4: Execute kerel o device Step 5: Copy data from device (GPU) to host Step 6: Free allocated memories o the device ad host Kerel Code (xx_kerel.cu): A kerel is a fuctio callable from the host ad executed o the CUDA device -- simultaeously by may s i parallel. Callig a kerel ivolves specifyig the ame of the kerel plus a executio cofiguratio. C. The Memory Model Each CUDA device has several memories as show i Figure 3. These memories ca be used by programmers to achieve high CGMA (Compute to Global Memory Access) ratio ad thus high executio speed i their kerels. CGMA ratio is the umber of floatig-poit calculatios performed for each access to the global withi a regio of a CUDA program. Variables that reside i shared memories ad registers ca be accessed at very high speed i a parallel maer. Each ca oly access registers that are allocated to them. A kerel fuctio uses registers to store frequetly accessed variables. These variables are private to each. 228 P a g e
5 Shared memories are allocated to blocks. s i a block ca access variables i the shared locatios of the block. Threads ca share their results via Shared memories. The global ca be accessed by all the s at aytime of program executio. The costat allows faster ad more parallel data access paths for CUDA kerel executio tha the global. Texture is read from kerels usig device fuctios called texture fetches. The first parameter of a texture fetch specifies a object called a texture referece. A texture referece defies which part of texture is fetched. capacities are implemetatio depedet. Oce their capacities are exceeded, they become limitig factors for the umber of s that ca be assiged to each SM [5]. Table 2 Differet types of CUDA Memory Memor y Locatio Local Off-chip No Shared O-chip Cache d N/A Global Off-chip No Costa t Texture Access Read/Writ e Read/writ e Read/writ e Off-chip Yes Read Off-chip Yes Read Who Oe Threa d s i a block s + CPU s + CPU s + CPU D. A example of a CUDA program The followig program calculates ad prits the square root of first 1000 itegers. #iclude <stdio.h> // (1) #iclude <cuda.h> #iclude <coio.h> global void sq_rt(float*a,it N) // (2) itidx=blockidx.x*blockdim.x+idx.x; if(idx<n) a[idx]=sqrt(a[idx]);} it mai(void) // (3) float*arr_host,*arr_dev; // (4) cost it C=100; // (5) size_t legth=c*sizeof(float); arr_host= (float*)malloc(legth); // (6) Figure 3: CUDA Memory Model CUDA defies registers, shared, ad costat that ca be accessed at higher speed ad i a more parallel maer tha the global. Usig these memories effectively will likely require re-desig of the algorithm. It is importat for CUDA programmers to be aware of the limited sizes of these special memories. Their cudamalloc((void**)&arr_dev,legth);// (7) for(it i=0;i<c;i++) arr_host[i]=(float)i; cudamemcpy(arr_dev,arr_host, legth,cudamemcpyhosttodevice);// (8) 229 P a g e
6 it blk_size=4; // (9) it _blk=c/blk_size+(c%blk_size==0); sq_rt<<<_blk,blk_size>>>(arr_dev,c); cudamemcpy(arr_host,arr_dev,legth,cudamemcpy DeviceToHost); // (10) for (it i=0;i<c;i++) //(11) pritf("%d\t%f\",i,arr_host[i]); free (arr_host); // (12) cudafree(arr_dev); getch();} 1. Iclude header files 2. Kerel fuctio that executes o the CUDA device 3. mai ( ) routie, that the CPU must fid 4. Defies poiters to host ad device arrays 5. Defies other variables used i the program, size_t is a usiged iteger type of at least 16 bit 6. ocate array o the host 7. ocate array o device (DRAM of the GPU) 8. Copy the data from host array to device array. 9. Kerel Call, Executio Cofiguratio 10. Retrieve result from device to host i the host 11. Prit result 12. Free allocated memories o the device ad host egieerig.a GPU allows you to ru the sigle processor applicatios i parallel processig eviromet. I this paper, we itroduced GPU computig, CUDA ad Accelerator programmig eviromet by explaiig, step-by-step, how to implemet a efficiet GPU-eabled applicatio. The above metioed programmig models help to achieve the goal of Parallel Processig. Due to the presece of these programmig models ad may more models, the gamig applicatios have become more lively ad creative. REFERENCES [1] [2] files/cruz _gpucomputig09.pdf [3] research.microsoft.com/e.../accelerator/acc elerator_i tro.docx 11/12/2012 4:06 PM [4] 11/12/2012 3:52 PM [5] uda_docs/nvidia_cuda_programmigg uide_2.3.pdf [6] De Doo, D.; Esposito, A.; Tarricoe, L.; Catariucci, L. Ateas ad Propagatio Magazie, IEEE Volume: 52, Issue: 3, Itroductio to GPU Computig ad CUDA Programmig: A Case Study o FDTD [EM Programmer's Notebook] Figure 4: Illustratio of code CONCLUSION The stagatio of CPU clock speeds has brought parallel computig, multi-core architectures, ad hardware-acceleratio techiques back to the forefrot of computatioal sciece ad 230 P a g e
Copyright 2016 Ramez Elmasri and Shamkant B. Navathe
Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 18 Strategies for Query Processig Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio DBMS techiques to process a query Scaer idetifies
More informationRunning Time. Analysis of Algorithms. Experimental Studies. Limitations of Experiments
Ruig Time Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Most algorithms trasform iput objects ito output objects. The
More informationRunning Time ( 3.1) Analysis of Algorithms. Experimental Studies. Limitations of Experiments
Ruig Time ( 3.1) Aalysis of Algorithms Iput Algorithm Output A algorithm is a step- by- step procedure for solvig a problem i a fiite amout of time. Most algorithms trasform iput objects ito output objects.
More informationAnalysis of Algorithms
Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Ruig Time Most algorithms trasform iput objects ito output objects. The
More informationChapter 11. Friends, Overloaded Operators, and Arrays in Classes. Copyright 2014 Pearson Addison-Wesley. All rights reserved.
Chapter 11 Frieds, Overloaded Operators, ad Arrays i Classes Copyright 2014 Pearso Addiso-Wesley. All rights reserved. Overview 11.1 Fried Fuctios 11.2 Overloadig Operators 11.3 Arrays ad Classes 11.4
More informationComputer Graphics Hardware An Overview
Computer Graphics Hardware A Overview Graphics System Moitor Iput devices CPU/Memory GPU Raster Graphics System Raster: A array of picture elemets Based o raster-sca TV techology The scree (ad a picture)
More informationChapter 4 Threads. Operating Systems: Internals and Design Principles. Ninth Edition By William Stallings
Operatig Systems: Iterals ad Desig Priciples Chapter 4 Threads Nith Editio By William Stalligs Processes ad Threads Resource Owership Process icludes a virtual address space to hold the process image The
More informationChapter 9. Pointers and Dynamic Arrays. Copyright 2015 Pearson Education, Ltd.. All rights reserved.
Chapter 9 Poiters ad Dyamic Arrays Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Overview 9.1 Poiters 9.2 Dyamic Arrays Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Slide 9-3
More informationData Structures and Algorithms. Analysis of Algorithms
Data Structures ad Algorithms Aalysis of Algorithms Outlie Ruig time Pseudo-code Big-oh otatio Big-theta otatio Big-omega otatio Asymptotic algorithm aalysis Aalysis of Algorithms Iput Algorithm Output
More informationChapter 1. Introduction to Computers and C++ Programming. Copyright 2015 Pearson Education, Ltd.. All rights reserved.
Chapter 1 Itroductio to Computers ad C++ Programmig Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Overview 1.1 Computer Systems 1.2 Programmig ad Problem Solvig 1.3 Itroductio to C++ 1.4 Testig
More informationtop() Applications of Stacks
CS22 Algorithms ad Data Structures MW :00 am - 2: pm, MSEC 0 Istructor: Xiao Qi Lecture 6: Stacks ad Queues Aoucemets Quiz results Homework 2 is available Due o September 29 th, 2004 www.cs.mt.edu~xqicoursescs22
More informationOutline and Reading. Analysis of Algorithms. Running Time. Experimental Studies. Limitations of Experiments. Theoretical Analysis
Outlie ad Readig Aalysis of Algorithms Iput Algorithm Output Ruig time ( 3.) Pseudo-code ( 3.2) Coutig primitive operatios ( 3.3-3.) Asymptotic otatio ( 3.6) Asymptotic aalysis ( 3.7) Case study Aalysis
More informationComputers and Scientific Thinking
Computers ad Scietific Thikig David Reed, Creighto Uiversity Chapter 15 JavaScript Strigs 1 Strigs as Objects so far, your iteractive Web pages have maipulated strigs i simple ways use text box to iput
More informationGPUMP: a Multiple-Precision Integer Library for GPUs
GPUMP: a Multiple-Precisio Iteger Library for GPUs Kaiyog Zhao ad Xiaowe Chu Departmet of Computer Sciece, Hog Kog Baptist Uiversity Hog Kog, P. R. Chia Email: {kyzhao, chxw}@comp.hkbu.edu.hk Abstract
More informationIntroduction to SWARM Software and Algorithms for Running on Multicore Processors
Itroductio to SWARM Software ad Algorithms for Ruig o Multicore Processors David A. Bader Georgia Istitute of Techology http://www.cc.gatech.edu/~bader Tutorial compiled by Rucheek H. Sagai M.S. Studet,
More informationPseudocode ( 1.1) Analysis of Algorithms. Primitive Operations. Pseudocode Details. Running Time ( 1.1) Estimating performance
Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Pseudocode ( 1.1) High-level descriptio of a algorithm More structured
More informationCMSC Computer Architecture Lecture 12: Virtual Memory. Prof. Yanjing Li University of Chicago
CMSC 22200 Computer Architecture Lecture 12: Virtual Memory Prof. Yajig Li Uiversity of Chicago A System with Physical Memory Oly Examples: most Cray machies early PCs Memory early all embedded systems
More informationAppendix D. Controller Implementation
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Appedix D Cotroller Implemetatio Cotroller Implemetatios Combiatioal logic (sigle-cycle); Fiite state machie (multi-cycle, pipelied);
More informationPython Programming: An Introduction to Computer Science
Pytho Programmig: A Itroductio to Computer Sciece Chapter 1 Computers ad Programs 1 Objectives To uderstad the respective roles of hardware ad software i a computig system. To lear what computer scietists
More informationCS 11 C track: lecture 1
CS 11 C track: lecture 1 Prelimiaries Need a CMS cluster accout http://acctreq.cms.caltech.edu/cgi-bi/request.cgi Need to kow UNIX IMSS tutorial liked from track home page Track home page: http://courses.cms.caltech.edu/courses/cs11/material
More informationMorgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5.
Morga Kaufma Publishers 26 February, 208 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 5 Virtual Memory Review: The Memory Hierarchy Take advatage of the priciple
More informationAnalysis of Algorithms
Presetatio for use with the textbook, Algorithm Desig ad Applicatios, by M. T. Goodrich ad R. Tamassia, Wiley, 2015 Aalysis of Algorithms Iput 2015 Goodrich ad Tamassia Algorithm Aalysis of Algorithms
More informationCSC 220: Computer Organization Unit 11 Basic Computer Organization and Design
College of Computer ad Iformatio Scieces Departmet of Computer Sciece CSC 220: Computer Orgaizatio Uit 11 Basic Computer Orgaizatio ad Desig 1 For the rest of the semester, we ll focus o computer architecture:
More informationCMPT 125 Assignment 2 Solutions
CMPT 25 Assigmet 2 Solutios Questio (20 marks total) a) Let s cosider a iteger array of size 0. (0 marks, each part is 2 marks) it a[0]; I. How would you assig a poiter, called pa, to store the address
More informationChapter 8. Strings and Vectors. Copyright 2014 Pearson Addison-Wesley. All rights reserved.
Chapter 8 Strigs ad Vectors Overview 8.1 A Array Type for Strigs 8.2 The Stadard strig Class 8.3 Vectors Slide 8-3 8.1 A Array Type for Strigs A Array Type for Strigs C-strigs ca be used to represet strigs
More informationLecture 1: Introduction and Strassen s Algorithm
5-750: Graduate Algorithms Jauary 7, 08 Lecture : Itroductio ad Strasse s Algorithm Lecturer: Gary Miller Scribe: Robert Parker Itroductio Machie models I this class, we will primarily use the Radom Access
More informationAPPLICATION NOTE PACE1750AE BUILT-IN FUNCTIONS
APPLICATION NOTE PACE175AE BUILT-IN UNCTIONS About This Note This applicatio brief is iteded to explai ad demostrate the use of the special fuctios that are built ito the PACE175AE processor. These powerful
More informationChapter 5. Functions for All Subtasks. Copyright 2015 Pearson Education, Ltd.. All rights reserved.
Chapter 5 Fuctios for All Subtasks Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Overview 5.1 void Fuctios 5.2 Call-By-Referece Parameters 5.3 Usig Procedural Abstractio 5.4 Testig ad Debuggig
More informationChapter 8. Strings and Vectors. Copyright 2015 Pearson Education, Ltd.. All rights reserved.
Chapter 8 Strigs ad Vectors Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Overview 8.1 A Array Type for Strigs 8.2 The Stadard strig Class 8.3 Vectors Copyright 2015 Pearso Educatio, Ltd..
More informationClasses and Objects. Again: Distance between points within the first quadrant. José Valente de Oliveira 4-1
Classes ad Objects jvo@ualg.pt José Valete de Oliveira 4-1 Agai: Distace betwee poits withi the first quadrat Sample iput Sample output 1 1 3 4 2 jvo@ualg.pt José Valete de Oliveira 4-2 1 The simplest
More informationAppendix A. Use of Operators in ARPS
A Appedix A. Use of Operators i ARPS The methodology for solvig the equatios of hydrodyamics i either differetial or itegral form usig grid-poit techiques (fiite differece, fiite volume, fiite elemet)
More information. Written in factored form it is easy to see that the roots are 2, 2, i,
CMPS A Itroductio to Programmig Programmig Assigmet 4 I this assigmet you will write a java program that determies the real roots of a polyomial that lie withi a specified rage. Recall that the roots (or
More informationElementary Educational Computer
Chapter 5 Elemetary Educatioal Computer. Geeral structure of the Elemetary Educatioal Computer (EEC) The EEC coforms to the 5 uits structure defied by vo Neuma's model (.) All uits are preseted i a simplified
More informationPython Programming: An Introduction to Computer Science
Pytho Programmig: A Itroductio to Computer Sciece Chapter 6 Defiig Fuctios Pytho Programmig, 2/e 1 Objectives To uderstad why programmers divide programs up ito sets of cooperatig fuctios. To be able to
More informationLast Class. Announcements. Lecture Outline. Types. Structural Equivalence. Type Equivalence. Read: Scott, Chapters 7 and 8. T2 y; x = y; n Types
Aoucemets HW9 due today HW10 comig up, will post after class Team assigmet Data abstractio (types) ad cotrol abstractio (parameter passig) Due o Tuesday, November 27 th Last Class Types Type systems Type
More informationMulti-Threading. Hyper-, Multi-, and Simultaneous Thread Execution
Multi-Threadig Hyper-, Multi-, ad Simultaeous Thread Executio 1 Performace To Date Icreasig processor performace Pipeliig. Brach predictio. Super-scalar executio. Out-of-order executio. Caches. Hyper-Threadig
More informationCopyright 2016 Ramez Elmasri and Shamkant B. Navathe
Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 26 Ehaced Data Models: Itroductio to Active, Temporal, Spatial, Multimedia, ad Deductive Databases Copyright 2016 Ramez Elmasri ad Shamkat B.
More informationIMP: Superposer Integrated Morphometrics Package Superposition Tool
IMP: Superposer Itegrated Morphometrics Package Superpositio Tool Programmig by: David Lieber ( 03) Caisius College 200 Mai St. Buffalo, NY 4208 Cocept by: H. David Sheets, Dept. of Physics, Caisius College
More informationImproving Template Based Spike Detection
Improvig Template Based Spike Detectio Kirk Smith, Member - IEEE Portlad State Uiversity petra@ee.pdx.edu Abstract Template matchig algorithms like SSE, Covolutio ad Maximum Likelihood are well kow for
More informationChapter 10. Defining Classes. Copyright 2015 Pearson Education, Ltd.. All rights reserved.
Chapter 10 Defiig Classes Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Overview 10.1 Structures 10.2 Classes 10.3 Abstract Data Types 10.4 Itroductio to Iheritace Copyright 2015 Pearso Educatio,
More informationMorgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5
Morga Kaufma Publishers 26 February, 28 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 5 Set-Associative Cache Architecture Performace Summary Whe CPU performace icreases:
More informationCOSC 1P03. Ch 7 Recursion. Introduction to Data Structures 8.1
COSC 1P03 Ch 7 Recursio Itroductio to Data Structures 8.1 COSC 1P03 Recursio Recursio I Mathematics factorial Fiboacci umbers defie ifiite set with fiite defiitio I Computer Sciece sytax rules fiite defiitio,
More informationIsn t It Time You Got Faster, Quicker?
Is t It Time You Got Faster, Quicker? AltiVec Techology At-a-Glace OVERVIEW Motorola s advaced AltiVec techology is desiged to eable host processors compatible with the PowerPC istructio-set architecture
More informationOnes Assignment Method for Solving Traveling Salesman Problem
Joural of mathematics ad computer sciece 0 (0), 58-65 Oes Assigmet Method for Solvig Travelig Salesma Problem Hadi Basirzadeh Departmet of Mathematics, Shahid Chamra Uiversity, Ahvaz, Ira Article history:
More informationEE University of Minnesota. Midterm Exam #1. Prof. Matthew O'Keefe TA: Eric Seppanen. Department of Electrical and Computer Engineering
EE 4363 1 Uiversity of Miesota Midterm Exam #1 Prof. Matthew O'Keefe TA: Eric Seppae Departmet of Electrical ad Computer Egieerig Uiversity of Miesota Twi Cities Campus EE 4363 Itroductio to Microprocessors
More informationCIS 121 Data Structures and Algorithms with Java Spring Stacks, Queues, and Heaps Monday, February 18 / Tuesday, February 19
CIS Data Structures ad Algorithms with Java Sprig 09 Stacks, Queues, ad Heaps Moday, February 8 / Tuesday, February 9 Stacks ad Queues Recall the stack ad queue ADTs (abstract data types from lecture.
More informationChapter 4. Procedural Abstraction and Functions That Return a Value. Copyright 2015 Pearson Education, Ltd.. All rights reserved.
Chapter 4 Procedural Abstractio ad Fuctios That Retur a Value Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Overview 4.1 Top-Dow Desig 4.2 Predefied Fuctios 4.3 Programmer-Defied Fuctios 4.4
More informationSoftware development of components for complex signal analysis on the example of adaptive recursive estimation methods.
Software developmet of compoets for complex sigal aalysis o the example of adaptive recursive estimatio methods. SIMON BOYMANN, RALPH MASCHOTTA, SILKE LEHMANN, DUNJA STEUER Istitute of Biomedical Egieerig
More informationAnalysis Metrics. Intro to Algorithm Analysis. Slides. 12. Alg Analysis. 12. Alg Analysis
Itro to Algorithm Aalysis Aalysis Metrics Slides. Table of Cotets. Aalysis Metrics 3. Exact Aalysis Rules 4. Simple Summatio 5. Summatio Formulas 6. Order of Magitude 7. Big-O otatio 8. Big-O Theorems
More informationLast class. n Scheme. n Equality testing. n eq? vs. equal? n Higher-order functions. n map, foldr, foldl. n Tail recursion
Aoucemets HW6 due today HW7 is out A team assigmet Submitty page will be up toight Fuctioal correctess: 75%, Commets : 25% Last class Equality testig eq? vs. equal? Higher-order fuctios map, foldr, foldl
More informationCOP4020 Programming Languages. Compilers and Interpreters Prof. Robert van Engelen
COP4020 mig Laguages Compilers ad Iterpreters Prof. Robert va Egele Overview Commo compiler ad iterpreter cofiguratios Virtual machies Itegrated developmet eviromets Compiler phases Lexical aalysis Sytax
More informationLecture Notes 6 Introduction to algorithm analysis CSS 501 Data Structures and Object-Oriented Programming
Lecture Notes 6 Itroductio to algorithm aalysis CSS 501 Data Structures ad Object-Orieted Programmig Readig for this lecture: Carrao, Chapter 10 To be covered i this lecture: Itroductio to algorithm aalysis
More informationA SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON
A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON Roberto Lopez ad Eugeio Oñate Iteratioal Ceter for Numerical Methods i Egieerig (CIMNE) Edificio C1, Gra Capitá s/, 08034 Barceloa, Spai ABSTRACT I this work
More informationData diverse software fault tolerance techniques
Data diverse software fault tolerace techiques Complemets desig diversity by compesatig for desig diversity s s limitatios Ivolves obtaiig a related set of poits i the program data space, executig the
More informationOne advantage that SONAR has over any other music-sequencing product I ve worked
*gajedra* D:/Thomso_Learig_Projects/Garrigus_163132/z_productio/z_3B2_3D_files/Garrigus_163132_ch17.3d, 14/11/08/16:26:39, 16:26, page: 647 17 CAL 101 Oe advatage that SONAR has over ay other music-sequecig
More informationCOP4020 Programming Languages. Functional Programming Prof. Robert van Engelen
COP4020 Programmig Laguages Fuctioal Programmig Prof. Robert va Egele Overview What is fuctioal programmig? Historical origis of fuctioal programmig Fuctioal programmig today Cocepts of fuctioal programmig
More informationCS : Programming for Non-Majors, Summer 2007 Programming Project #3: Two Little Calculations Due by 12:00pm (noon) Wednesday June
CS 1313 010: Programmig for No-Majors, Summer 2007 Programmig Project #3: Two Little Calculatios Due by 12:00pm (oo) Wedesday Jue 27 2007 This third assigmet will give you experiece writig programs that
More informationMOTIF XF Extension Owner s Manual
MOTIF XF Extesio Ower s Maual Table of Cotets About MOTIF XF Extesio...2 What Extesio ca do...2 Auto settig of Audio Driver... 2 Auto settigs of Remote Device... 2 Project templates with Iput/ Output Bus
More informationCOP4020 Programming Languages. Subroutines and Parameter Passing Prof. Robert van Engelen
COP4020 Programmig Laguages Subrouties ad Parameter Passig Prof. Robert va Egele Overview Parameter passig modes Subroutie closures as parameters Special-purpose parameters Fuctio returs COP4020 Fall 2016
More informationUNIVERSITY OF MORATUWA
UNIVERSITY OF MORATUWA FACULTY OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING B.Sc. Egieerig 2014 Itake Semester 2 Examiatio CS2052 COMPUTER ARCHITECTURE Time allowed: 2 Hours Jauary 2016
More informationOutline. CSCI 4730 Operating Systems. Questions. What is an Operating System? Computer System Layers. Computer System Layers
Outlie CSCI 4730 s! What is a s?!! System Compoet Architecture s Overview Questios What is a?! What are the major operatig system compoets?! What are basic computer system orgaizatios?! How do you commuicate
More informationStructuring Redundancy for Fault Tolerance. CSE 598D: Fault Tolerant Software
Structurig Redudacy for Fault Tolerace CSE 598D: Fault Tolerat Software What do we wat to achieve? Versios Damage Assessmet Versio 1 Error Detectio Iputs Versio 2 Voter Outputs State Restoratio Cotiued
More informationAnalysis of Algorithms
Aalysis of Algorithms Ruig Time of a algorithm Ruig Time Upper Bouds Lower Bouds Examples Mathematical facts Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite
More informationOutline n Introduction n Background o Distributed DBMS Architecture
Outlie Itroductio Backgroud o Distributed DBMS Architecture Datalogical Architecture Implemetatio Alteratives Compoet Architecture o Distributed DBMS Architecture o Distributed Desig o Sematic Data Cotrol
More information1 Enterprise Modeler
1 Eterprise Modeler Itroductio I BaaERP, a Busiess Cotrol Model ad a Eterprise Structure Model for multi-site cofiguratios are itroduced. Eterprise Structure Model Busiess Cotrol Models Busiess Fuctio
More informationCopyright 2016 Ramez Elmasri and Shamkant B. Navathe
Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 19 Query Optimizatio Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio Query optimizatio Coducted by a query optimizer i a DBMS Goal:
More informationA New Morphological 3D Shape Decomposition: Grayscale Interframe Interpolation Method
A ew Morphological 3D Shape Decompositio: Grayscale Iterframe Iterpolatio Method D.. Vizireau Politehica Uiversity Bucharest, Romaia ae@comm.pub.ro R. M. Udrea Politehica Uiversity Bucharest, Romaia mihea@comm.pub.ro
More informationFrom last week. Lecture 5. Outline. Principles of programming languages
Priciples of programmig laguages From last week Lecture 5 http://few.vu.l/~silvis/ppl/2007 Natalia Silvis-Cividjia e-mail: silvis@few.vu.l ML has o assigmet. Explai how to access a old bidig? Is & for
More informationComputer Systems - HS
What have we leared so far? Computer Systems High Level ENGG1203 2d Semester, 2017-18 Applicatios Sigals Systems & Cotrol Systems Computer & Embedded Systems Digital Logic Combiatioal Logic Sequetial Logic
More informationMath 10C Long Range Plans
Math 10C Log Rage Plas Uits: Evaluatio: Homework, projects ad assigmets 10% Uit Tests. 70% Fial Examiatio.. 20% Ay Uit Test may be rewritte for a higher mark. If the retest mark is higher, that mark will
More informationGE FUNDAMENTALS OF COMPUTING AND PROGRAMMING UNIT III
GE2112 - FUNDAMENTALS OF COMPUTING AND PROGRAMMING UNIT III PROBLEM SOLVING AND OFFICE APPLICATION SOFTWARE Plaig the Computer Program Purpose Algorithm Flow Charts Pseudocode -Applicatio Software Packages-
More informationn Explore virtualization concepts n Become familiar with cloud concepts
Chapter Objectives Explore virtualizatio cocepts Become familiar with cloud cocepts Chapter #15: Architecture ad Desig 2 Hypervisor Virtualizatio ad cloud services are becomig commo eterprise tools to
More informationChapter 3 Classification of FFT Processor Algorithms
Chapter Classificatio of FFT Processor Algorithms The computatioal complexity of the Discrete Fourier trasform (DFT) is very high. It requires () 2 complex multiplicatios ad () complex additios [5]. As
More informationWhat are we going to learn? CSC Data Structures Analysis of Algorithms. Overview. Algorithm, and Inputs
What are we goig to lear? CSC316-003 Data Structures Aalysis of Algorithms Computer Sciece North Carolia State Uiversity Need to say that some algorithms are better tha others Criteria for evaluatio Structure
More informationSouth Slave Divisional Education Council. Math 10C
South Slave Divisioal Educatio Coucil Math 10C Curriculum Package February 2012 12 Strad: Measuremet Geeral Outcome: Develop spatial sese ad proportioal reasoig It is expected that studets will: 1. Solve
More informationHow do we evaluate algorithms?
F2 Readig referece: chapter 2 + slides Algorithm complexity Big O ad big Ω To calculate ruig time Aalysis of recursive Algorithms Next time: Litterature: slides mostly The first Algorithm desig methods:
More informationGoals of the Lecture UML Implementation Diagrams
Goals of the Lecture UML Implemetatio Diagrams Object-Orieted Aalysis ad Desig - Fall 1998 Preset UML Diagrams useful for implemetatio Provide examples Next Lecture Ð A variety of topics o mappig from
More informationThe Implementation of Data Structures in Version 5 of Icon* Ralph E. Gr is wo Id TR 85-8
The Implemetatio of Data Structures i Versio 5 of Ico* Ralph E. Gr is wo Id TR 85-8 April 1, 1985 Departmet of Computer Sciece The Uiversity of Arizoa Tucso. Arizoa 85721 This work was supported by the
More informationCOMP Parallel Computing. PRAM (1): The PRAM model and complexity measures
COMP 633 - Parallel Computig Lecture 2 August 24, 2017 : The PRAM model ad complexity measures 1 First class summary This course is about parallel computig to achieve high-er performace o idividual problems
More informationSolution printed. Do not start the test until instructed to do so! CS 2604 Data Structures Midterm Spring, Instructions:
CS 604 Data Structures Midterm Sprig, 00 VIRG INIA POLYTECHNIC INSTITUTE AND STATE U T PROSI M UNI VERSI TY Istructios: Prit your ame i the space provided below. This examiatio is closed book ad closed
More informationAvid Interplay Bundle
Avid Iterplay Budle Versio 2.5 Cofigurator ReadMe Overview This documet provides a overview of Iterplay Budle v2.5 ad describes how to ru the Iterplay Budle cofiguratio tool. Iterplay Budle v2.5 refers
More informationWeston Anniversary Fund
Westo Olie Applicatio Guide 2018 1 This guide is desiged to help charities applyig to the Westo to use our olie applicatio form. The Westo is ope to applicatios from 5th Jauary 2018 ad closes o 30th Jue
More informationInstruction and Data Streams
Advaced Architectures Master Iformatics Eg. 2017/18 A.J.Proeça Data Parallelism 1 (vector & SIMD extesios) (most slides are borrowed) AJProeça, Advaced Architectures, MiEI, UMiho, 2017/18 1 Istructio ad
More informationImplementing Dynamic Programming Recurrences in Constraint Handling Rules with Rule Priorities
Implemetig Dyamic Programmig Recurreces i Costrait Hadlig Rules with Rule Priorities Ahmed Magdy 1, Frak Raiser, ad Thom Frühwirth 1 Computer Sciece ad Egieerig, Germa Uiversity i Cairo ahmed.mabrouk@studet.guc.edu.eg
More informationMultiprocessors. HPC Prof. Robert van Engelen
Multiprocessors Prof. Robert va Egele Overview The PMS model Shared memory multiprocessors Basic shared memory systems SMP, Multicore, ad COMA Distributed memory multicomputers MPP systems Network topologies
More informationA Very Simple Approach for 3-D to 2-D Mapping
A Very Simple Approach for -D to -D appig Sadipa Dey (1 Ajith Abraham ( Sugata Sayal ( Sadipa Dey (1 Ashi Software Private Limited INFINITY Tower II 10 th Floor Plot No. - 4. Block GP Salt Lake Electroics
More informationCMSC Computer Architecture Lecture 3: ISA and Introduction to Microarchitecture. Prof. Yanjing Li University of Chicago
CMSC 22200 Computer Architecture Lecture 3: ISA ad Itroductio to Microarchitecture Prof. Yajig Li Uiversity of Chicago Lecture Outlie ISA uarch (hardware implemetatio of a ISA) Logic desig basics Sigle-cycle
More informationAn Efficient Algorithm for Graph Bisection of Triangularizations
A Efficiet Algorithm for Graph Bisectio of Triagularizatios Gerold Jäger Departmet of Computer Sciece Washigto Uiversity Campus Box 1045 Oe Brookigs Drive St. Louis, Missouri 63130-4899, USA jaegerg@cse.wustl.edu
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Part A Datapath Design
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter The Processor Part A path Desig Itroductio CPU performace factors Istructio cout Determied by ISA ad compiler. CPI ad
More informationSCI Reflective Memory
Embedded SCI Solutios SCI Reflective Memory (Experimetal) Atle Vesterkjær Dolphi Itercoect Solutios AS Olaf Helsets vei 6, N-0621 Oslo, Norway Phoe: (47) 23 16 71 42 Fax: (47) 23 16 71 80 Mail: atleve@dolphiics.o
More informationChapter 2. C++ Basics. Copyright 2015 Pearson Education, Ltd.. All rights reserved.
Chapter 2 C++ Basics Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Overview 2.1 Variables ad Assigmets 2.2 Iput ad Output 2.3 Data Types ad Expressios 2.4 Simple Flow of Cotrol 2.5 Program
More information游戏设计与开发. Outline. Game Programming Topics. Building A Game
1896 1935 1987 2006 Outlie 游戏设计与开发 Real Time Requiremet A Coceptual Rederig Pipelie The Graphics Processig Uit (GPU) Example 技术篇 : 实时图形硬件 Game Programmig Topics Focus: Buildig game ad virtual world High-level
More informationHash Tables. Presentation for use with the textbook Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015.
Presetatio for use with the textbook Algorithm Desig ad Applicatios, by M. T. Goodrich ad R. Tamassia, Wiley, 2015 Hash Tables xkcd. http://xkcd.com/221/. Radom Number. Used with permissio uder Creative
More information3.1 Overview of MySQL Programs. These programs are discussed further in Chapter 4, Database Administration. Client programs that access the server:
3 Usig MySQL Programs This chapter provides a brief overview of the programs provided by MySQL AB ad discusses how to specify optios whe you ru these programs. Most programs have optios that are specific
More informationProgramming with Shared Memory PART II. HPC Spring 2017 Prof. Robert van Engelen
Programmig with Shared Memory PART II HPC Sprig 2017 Prof. Robert va Egele Overview Sequetial cosistecy Parallel programmig costructs Depedece aalysis OpeMP Autoparallelizatio Further readig HPC Sprig
More informationBEA WebLogic XML/Non-XML Translator
BEA WebLogic XML/No-XML Traslator A Compoet of BEA WebLogic Itegratio Plug-I Guide BEA WebLogic XML/No-XML Traslator Release 2.0 Documet Editio 2.0 July 2001 Copyright Copyright 2001 BEA Systems, Ic. All
More informationn Some thoughts on software development n The idea of a calculator n Using a grammar n Expression evaluation n Program organization n Analysis
Overview Chapter 6 Writig a Program Bjare Stroustrup Some thoughts o software developmet The idea of a calculator Usig a grammar Expressio evaluatio Program orgaizatio www.stroustrup.com/programmig 3 Buildig
More informationUNIVERSITY OF MORATUWA
UNIVERSITY OF MORATUWA FACULTY OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING B.Sc. Egieerig 2010 Itake Semester 7 Examiatio CS4532 CONCURRENT PROGRAMMING Time allowed: 2 Hours September 2014
More information9.1. Sequences and Series. Sequences. What you should learn. Why you should learn it. Definition of Sequence
_9.qxd // : AM Page Chapter 9 Sequeces, Series, ad Probability 9. Sequeces ad Series What you should lear Use sequece otatio to write the terms of sequeces. Use factorial otatio. Use summatio otatio to
More informationThe Magma Database file formats
The Magma Database file formats Adrew Gaylard, Bret Pikey, ad Mart-Mari Breedt Johaesburg, South Africa 15th May 2006 1 Summary Magma is a ope-source object database created by Chris Muller, of Kasas City,
More information