GPUMP: a Multiple-Precision Integer Library for GPUs

Size: px
Start display at page:

Download "GPUMP: a Multiple-Precision Integer Library for GPUs"

Transcription

1 GPUMP: a Multiple-Precisio Iteger Library for GPUs Kaiyog Zhao ad Xiaowe Chu Departmet of Computer Sciece, Hog Kog Baptist Uiversity Hog Kog, P. R. Chia {kyzhao, chxw}@comp.hkbu.edu.hk Abstract Multiple-precisio iteger operatios are key compoets of may security applicatios but ufortuately they are computatioally expesive o cotemporary CPUs. I this paper, we preset our desig ad implemetatio of a multiple-precisio iteger library for GPUs which is implemeted by CUDA. We report our experimetal results which show that a sigificat speedup ca be achieved by GPUs as compared with the GNU MP library o CPUs. Keywords: Multiple-precisio algorithm, GPU, CUDA I. INTRODUCTION Public-key ecryptio plays a critical role i our daily life. The core compoet of a public-key system is a set of multiple-precisio iteger operatios. A server that relies o public-key ecryptio (such as a SSL server) eeds to process a large umber of multiple-precisio iteger operatios, which require huge computig power. Recet advaces i Graphics Processig Uits (GPUs) ope a ew era of GPU computig. For example, commodity GPUs like NVIDIA s GTX has processig cores ad ca achieve 9 GFLOPS of computatioal horsepower. More importatly, the NVIDIA CUDA programmig model makes it easier for developers to develop o-graphic applicatios usig GPU. I CUDA, the GPU becomes a dedicated coprocessor to the host CPU, which works i the priciple of Sigle-Program Multiple Data (SPMD) where multiple threads based o the same code ca ru simultaeously. We are motivated by the fact that GPUs could be utilized to speed up multiple-precisio iteger operatios. This is of practical importace to ed users as well as applicatio servers. However, it is ot easy to achieve high performace o GPUs due to the complicated memory architecture ad the relatively slow iteger operatios. I this paper, we preset our desig, implemetatio, ad experimetal results o a highly optimized multiple-precisio iteger library. Our library achieved a sigificat speedup for a umber of multiple-precisio iteger operatios. The rest of the paper is orgaized as follows. Sectio II provides backgroud iformatio o GPU architecture ad CUDA programmig model. Sectio III presets our desig ad implemetatio of multiple-precisio iteger arithmetic o GPU. Experimetal results are preseted i Sectio IV, ad we coclude the paper i Sectio V. The latest Fermi architecture has a better support o iteger operatios, but it is out of the scope of this paper. II. BACKGROUND AND RELATED WORK GPUs are dedicated hardware for maipulatig computer graphics. Due to the huge computig demad for real-time ad high-defiitio D graphics, the GPU has evolved ito a highly parallel, multithreaded, may core processor. The advaces of computig power i GPUs have drive the developmet of geeral-purpose computig o GPUs (GPGPU). The first geeratio of GPGPU requires that ay o-graphics applicatio must be mapped through graphics applicatio programmig iterfaces (APIs). NVIDIA provided a geeral-purpose parallel programmig model, amely Compute Uified Device Architecture (CUDA) [] [], which exteds the C programmig laguage for geeral-purpose applicatio developmet. Meawhile, aother GPU vedor AMD also itroduced Close To Metal (CTM) programmig model which provides a assembly laguage for applicatio developmet []. Itel also exposed Larrabee, a ew maycore GPU architecture specifically desiged for the market of GPU computig this year []. Sice the release of CUDA, it has bee used for speedig up a large umber of applicatios [-]. Give its popularity, we choose CUDA to implemet our multipleprecisio iteger library. III. MULTIPLE-PRECISION MODULAR ARITHMETIC I this sectio, we preset a set of library fuctios of multiple-precisio modular arithmetic implemeted o GPUs. I modular arithmetic, all operatios are performed i a group Z m, i.e., the set of itegers {,,, m-}. I the followig, the modulus m is represeted i radix b as (m m - m m ) b, where m. Each symbol m i is referred to as a radix b digit. No-egative itegers x ad y, x<m, y<m, are represeted i radix b as (x x - x x ) b ad (y y - y y ) b respectively. We have implemeted the followig multiple-precisio library fuctios for CUDA: Multiple-precisio compariso Multiple-precisio additio ad subtractio Multiple-precisio modular additio ad subtractio Multiple-precisio multiplicatio ad divisio Multiple-precisio Motgomery reductio Multiple-precisio Motgomery multiplicatio Multiple-precisio expoetiatio

2 Our library implemets each operatio as a sigle thread. To make full usage of a GPU, hudreds to thousads of threads are required to be executed simultaeously. It is also possible to implemet a complicated operatio by multithreadig, e.g., a block of threads could be used to perform a sigle operatio such as expoetiatio. We leave this as our future work. A. Compariso, Additio ad Subtractio The pseudo codes of multiple-precisio compariso, additio, ad subtractio operatios are show i Algorithm,, ad, respectively. Algorithm Multiple-precisio Compariso b digits. OUTPUT:, if x > y, if x = y -, if x < y. : i : while ( x i == yi ad i > ) : i i : ed while : if ( x i > yi ) the retur : else if ( x i == yi ) the retur 7: else retur - Algorithm Multiple-precisio Additio b digits. OUTPUT: x + y = ( zz zz ) b. : c /* carry digit */ : for ( i from to ) do : zi ( xi + yi mod b : c ( xi + yi b : ed for 7: z + c : retur ( z z z z) b Algorithm Multiple-precisio Subtractio b digits, x y. OUTPUT: x y = ( zz z z) : c /* carry digit */ : for ( i from to ) do : zi ( xi yi mod b : if ( xi yi + c ) the c : else c : ed for 7: retur ( z z z z) b B. Modular Additio ad Subtractio The pseudo codes of multiple-precisio modular additio ad subtractio operatios are show i Algorithm ad, respectively. Algorithm Multiple-precisio Modular Additio b digits, x < m, y < m. OUTPUT: ( x + y) mod m = ( z z z z) : c /* carry digit */ : for ( i from to ) do : zi ( xi + yi mod b : if ( xi + yi + c < b ) the c : else c : ed for 7: z + c m + : if ( ( z + zz zz ) b >= ( m + mm mm ) b ) the 9: ( t + tt tt ) b ( z + zz zz ) b ( m + mm mm ) b : retur ( t t t t ) b : else retur ( z z z z) b Algorithm Multiple-precisio Modular Subtractio b digits, x < m, y < m. OUTPUT: ( x y) mod m = ( z z z z) : if ( x >= y ) the retur x y : else : t ( m y) : retur ( x + t) mod m : ed else C. Multiplicatio, Divisio, ad Modular Multiplicatio Oe straightforward method to implemet modular multiplicatio of x y mod m is to calculate x y first ad the calculate the remaider of x y divided by m. Hece modular multiplicatio ca be implemeted by usig multiplicatio ad divisio operatios. Next, we give the pseudocode for calculatig multiple-precisio multiplicatio ad divisio i Algorithm ad 7, respectively. Algorithm Multiple-precisio Multiplicatio b digits ad s + radix b digits respectively. OUTPUT: x y = ( z + s + z + s z z) : for ( i from to + s + ) do : z i : ed for : for ( i from to s ) do : c /* carry digit */ : for ( j from to ) do 7: ( uv) b zi + j + x j yi + c : z i + j v c u 9: ed for : z + i + u : ed for : retur ( z + s + z + s z z) b

3 Algorithm 7 Multiple-precisio Divisio b digits ad s + radix b digits respectively, s, y s. OUTPUT: the quotiet q = ( q s qq ) b ad remaider r = ( rs r r ) b such that x = q y + r, r < y. : for ( i from to s ) do : q i : ed for s : while ( x y ) do : q s q s + : s x x y 7: ed while : for ( i from dow to t + ) do 9: if ( x i == ys ) the q i s b : else q i s ( xi + xi) / ys : while ( q i s ( ys + ys ) > xi + xi + xi ) do : q i s q i s : ed while : i s x x qi s y : if ( x < ) the : i s x x + y 7: q i s q i s : ed if 9: ed for : r x : retur ( q, r) The classical modular multiplicatio is suitable for ormal operatios. However, whe performig modular expoetiatios, Motgomery multiplicatio shows much better performace advatage []. Motgomery multiplicatio makes uses of Motgomery reductio. Hece the followig gives the pseudocode of Motgomery reductio ad Motgomery multiplicatio i Algorithm ad 9 respectively. Let m be a positive iteger, ad let R ad A be itegers such that R > m, gcd(m, R) =, ad A < m R. The Motgomery reductio of A modulo m with respect to R is defied as A R mod m. I our library, R is chose as b to simply the calculatio. Algorithm Multiple-precisio Motgomery Reductio INPUT: iteger m with radix b digits ad gcd(m, b) =, R = b, m ' = m mod b, ad iteger A with radix b digits ad A < m R. OUTPUT: T = A R mod m. : T A : for ( i from to ) : ui Ti m' mod b i : T T + ui m : ed for : T T / b 7: if ( T m ) the T T m : retur T Algorithm 9 Multiple-precisio Motgomery Multiplicatio INPUT: o-egative iteger m, x, y with radix b digits, x < m, y < m, ad gcd(m, b) =, R = b, m ' = m mod b. OUTPUT: T = x y R mod m. : T : for ( i from to ) : ui ( T + xi y ) m' mod b : T ( T + xi y + ui m) / b : ed for : if ( T m ) the T T m 7: retur T D. Modular Expoetiatio Modular expoetiatio has foud a lot of applicatio [7]. There are differet ways to implemet modular expoetiatio. We choose to implemet the Motgomery expoetiatio because it avoids usig divisio operatios which are very iefficiet i GPUs. The pseudocode of Motgomery expoetiatio is show i Algorithm. Algorithm Multiple-precisio Motgomery Expoetiatio INPUT: iteger m with radix b digits ad gcd(m, b) =, R = b, positive iteger x with radix b digits ad x < m, ad positive iteger e = ( e t e ). e OUTPUT: x mod m. : x Mot( x, R mod m) : A R mod m : for ( i from dow to ) : A Mot( A, A) : if e i == the A Mot( A, x ) : ed for 7: A Mot( A, ) : retur A IV. IMPLEMENTATION AND EXPERIMENTAL RESULT I this sectio, we first briefly discuss the data structure of multiple-precisio (MP) iteger ad optimizatio techiques used by our library, ad the report our experimetal results. More details ca be foud i []. A. Data Structure of Multiple-precisio Iteger We represet a MP iteger as a sequece of -bit itegers, sice most GPUs support -bit iteger operatios. There are two ways to arrage this sequece of -bit itegers i memory. Oe is to put the data of a MP iteger i a array. The a group of MP itegers will be stored as a two-dimesioal array. The secod way is to traspose the two-dimesioal array described previously, so that each MP iteger is stored i a colum istead of a row. This is to achieve coalesced memory access o GPUs.

4 I our implemetatio, a group of MP itegers are orgaized i two parts. The first oe is a array, which keeps the legth of each MP iteger. The secod part is a matrix. Suppose the umber of MP iteger is, the maximum legth of the MP iteger is l, the set of MP itegers could be regarded as matrix[(/w) l][w], i which w is the umber of colums. B. Optimizatio Techiques Usig Costat Value with Cache Memory Most algorithms will use the same data multiple times durig the calculatios. Uder these cases, the utilizatio of memory via cache mechaism ca icrease the calculatio efficiecy. O GPUs, texture ad costat memory adopt cache mechaism. Thus, those frequetly accessed data ca be kept i texture or costat memory i order to achieve high readig efficiecy. Usig Shared Memory for Temp Value From the algorithms listed i Sectio III, we otice that some algorithms (Algorithm to ) eed to use temporary variables. Usig local or global memory to store these variables will cause log readig latecy. But if we use shared memory, the readig latecy ca become much shorter. Hece, we adopt shared memory to store the temporary variables as much as possible. Balacig the Computig Resource I CUDA programmig model, the umber of registers ad shared memory is limited i a sigle SM (Stream Multiple-processor), which oly ca make blocks be active simultaeously. Cosequetly, i order to maximize the umber of threads ruig i a sigle SM, we eed to reasoably maage the umber of registers ad shared memory i each block. C. Experimetal Results We tested our library o XFX GTX graphics card. It cotais a NVIDIA GT GPU which has processig cores workig at. GHz. We also give the results of GNU MP library ruig o a i7 CPU (.GHz) for compariso. I the followig figures (Figure to ), the x-axis deotes the umber of multiple-precisio itegers, ad the y- axis deotes the achieved umber of operatios per secod. Figure to ad 9 respectively represet the operatios per secod results about the additio, subtractio, multiplicatio, divisio, modular additio, modular subtractio i GPU MP library ruig o GPU ad CPU. I order to guaratee GPU ruig with full load, we select five groups of data, ad each group cotais 9,, 7,, ad 7 multiple-precisio itegers, respectively. I each group, we select multiple-precisio itegers with three differet legths, icludig -bit, -bit ad -bit. Figure 7 ad list the results about Motgomery reductio ad Motgomery multiplicatio algorithm. Sice GNU MP library has o idividual algorithm about Motgomery reductio ad Motgomery multiplicatio, we oly presets our results o GPU. All results show that the GPU MP library ca achieve sigificat speedup o GPU, far better tha the GNU MP library ruig o CPU. Multiple-precisio Additio Operatio per Secod (x ) CPU Add() CPU Add() CPU Add() GPU Add() GPU Add() GPU Add() Figure. Multiple-precisio Additio ruig o CPU & GPU Multiple-precisio Subtractio Operatio per Secod (x ) CPU sub() CPU sub() CPU sub() GPU sub() GPU sub() GPU sub() Figure. Multiple-precisio Subtractio ruig o CPU & GPU Multiple-precisio Multiplicatio Operatio per Secod (x ) CPU Mul() CPU Mul() CPU Mul() GPU Mul() GPU Mul() GPU Mul() Figure. Multiple-precisio Multiplicatio ruig o CPU & GPU Multiple-precisio Divisio Operatio per Secod (x ) CPU Div() CPU Div() CPU Div() GPU Div() GPU Div() GPU Div() Figure. Multiple-precisio Divisio ruig o CPU & GPU

5 Multiple-precisio Modular Additio Operatio per Secod (x ) CPU Mod Add() CPU Mod Add() CPU Mod Add() GPU Mod Add() GPU Mod Add() GPU Mod Add() Figure. Multiple-precisio Modular Additio ruig o CPU & GPU Multiple-precisio Modular Substractio Operatio per Secod (x ) CPU Mod Sub() CPU Mod Sub() CPU Mod Sub() GPU Mod Sub() GPU Mod Sub() GPU Mod Sub() Figure. Multiple-precisio Modular Subtractio ruig o CPU & GPU Multiple-precisio Motgomery Reductio Operatio per Secod (x ) Multiple-precisio Motgomery Multiplicatio Operatio per Secod (x ) GPU Mot Reductio() GPU Mot Reductio() GPU Mot Reductio() Figure 7. Multiple-precisio Motgomery Reductio ruig o GPU 7 GPU Mot Mul() GPU Mot Mul() GPU Mot Mul() Figure. Multiple-precisio Motgomery Multiplicatio ruig o GPU Multiple-precisio Motgomery Expoetiatio Operatio per Secod (x) 9 7 CPU Exp() GPU Exp() Figure 9. Multiple-precisio Motgomery Expoetiatio ruig o CPU & GPU V. CONCLUSIONS Multiple-precisio iteger operatios are a importat compoet i public-key cryptography for ecryptig ad sigig digital data. I this paper, we describe the desig, implemetatio ad optimizatio of multiple-precisio iteger library for GPUs usig CUDA. I the future, we will explore how to make use of the ew Fermi architecture to further optimize the performace of our library. We will also port our library to OpeCL. ACKNOWLEDGMENT This work is supported by FRG Grat frg9: FRG/-9/9 from Hog Kog Baptist Uiversity. REFERENCES [] NVIDIA CUDA. [] NVIDIA CUDA Compute Uified Device Architecture: Programmig Guide, Versio.beta, Ju.. [] AMD CTM Guide: Techical Referece Maual.. uide.pdf [] Seiler, L., et. al.,. Larrabee: a may-core x architecture for visual computig. ACM Trasactios o Graphics, 7(), Aug.. [] GNU MP Arithmetic Library. [] Motgomery, P., 9. Multiplicatio without trial divisio, Math. Computatio, vol., 9, 9-. [7] Meezes, A., va Oorshot, P., ad Vastoe S., 99. Hadbook of applied cryptography. CRC Press, 99. [] Ryoo, S., Rodrigues, C. I., Baghsorkhi, S. S., Stoe, S. S., Kirk, D. B., ad Hwu, W.. Optimizatio priciples ad applicatio performace evaluatio of a multithreaded GPU usig CUDA. I Proceedigs of ACM PPoPP, Feb.. [9] Falcao, G., Sousa, L., ad Silva, V.. Massiv parallel LDPC decodig i GPU. I Proceedigs of ACM PPoPP, Feb.. [] Owes, J. D., Housto, M., Luebke, D., Gree, S., Stoe, J. E., ad Phillips, J. C.. GPU computig. IEEE Proceedigs, May, [] X.-W. Chu, K. Zhao, ad M. Wag. Massively Parallel Network Codig o GPUs. I Proceedigs of IEEE IPCCC, Austi, Texas, USA, Dec. [] X.-W. Chu, K. Zhao, ad M. Wag. Practical Radom Liear Network Codig o GPUs. I Proceedigs of IFIP Networkig 9, Arche, Germay, May 9. [] K. Zhao ad X.-W. Chu. GPUMP: a Multiple-Precisio Iteger Library for GPUs. Techical Report, Departmet of Computer Sciece, Hog Kog Baptist Uiversity,.

Appendix D. Controller Implementation

Appendix D. Controller Implementation COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Appedix D Cotroller Implemetatio Cotroller Implemetatios Combiatioal logic (sigle-cycle); Fiite state machie (multi-cycle, pipelied);

More information

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5 Morga Kaufma Publishers 26 February, 28 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 5 Set-Associative Cache Architecture Performace Summary Whe CPU performace icreases:

More information

Chapter 3 Classification of FFT Processor Algorithms

Chapter 3 Classification of FFT Processor Algorithms Chapter Classificatio of FFT Processor Algorithms The computatioal complexity of the Discrete Fourier trasform (DFT) is very high. It requires () 2 complex multiplicatios ad () complex additios [5]. As

More information

CSC 220: Computer Organization Unit 11 Basic Computer Organization and Design

CSC 220: Computer Organization Unit 11 Basic Computer Organization and Design College of Computer ad Iformatio Scieces Departmet of Computer Sciece CSC 220: Computer Orgaizatio Uit 11 Basic Computer Orgaizatio ad Desig 1 For the rest of the semester, we ll focus o computer architecture:

More information

A Study on the Performance of Cholesky-Factorization using MPI

A Study on the Performance of Cholesky-Factorization using MPI A Study o the Performace of Cholesky-Factorizatio usig MPI Ha S. Kim Scott B. Bade Departmet of Computer Sciece ad Egieerig Uiversity of Califoria Sa Diego {hskim, bade}@cs.ucsd.edu Abstract Cholesky-factorizatio

More information

Lecture Notes 6 Introduction to algorithm analysis CSS 501 Data Structures and Object-Oriented Programming

Lecture Notes 6 Introduction to algorithm analysis CSS 501 Data Structures and Object-Oriented Programming Lecture Notes 6 Itroductio to algorithm aalysis CSS 501 Data Structures ad Object-Orieted Programmig Readig for this lecture: Carrao, Chapter 10 To be covered i this lecture: Itroductio to algorithm aalysis

More information

Ones Assignment Method for Solving Traveling Salesman Problem

Ones Assignment Method for Solving Traveling Salesman Problem Joural of mathematics ad computer sciece 0 (0), 58-65 Oes Assigmet Method for Solvig Travelig Salesma Problem Hadi Basirzadeh Departmet of Mathematics, Shahid Chamra Uiversity, Ahvaz, Ira Article history:

More information

Chapter 4. Procedural Abstraction and Functions That Return a Value. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

Chapter 4. Procedural Abstraction and Functions That Return a Value. Copyright 2015 Pearson Education, Ltd.. All rights reserved. Chapter 4 Procedural Abstractio ad Fuctios That Retur a Value Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Overview 4.1 Top-Dow Desig 4.2 Predefied Fuctios 4.3 Programmer-Defied Fuctios 4.4

More information

Outline and Reading. Analysis of Algorithms. Running Time. Experimental Studies. Limitations of Experiments. Theoretical Analysis

Outline and Reading. Analysis of Algorithms. Running Time. Experimental Studies. Limitations of Experiments. Theoretical Analysis Outlie ad Readig Aalysis of Algorithms Iput Algorithm Output Ruig time ( 3.) Pseudo-code ( 3.2) Coutig primitive operatios ( 3.3-3.) Asymptotic otatio ( 3.6) Asymptotic aalysis ( 3.7) Case study Aalysis

More information

A Note on Least-norm Solution of Global WireWarping

A Note on Least-norm Solution of Global WireWarping A Note o Least-orm Solutio of Global WireWarpig Charlie C. L. Wag Departmet of Mechaical ad Automatio Egieerig The Chiese Uiversity of Hog Kog Shati, N.T., Hog Kog E-mail: cwag@mae.cuhk.edu.hk Abstract

More information

How do we evaluate algorithms?

How do we evaluate algorithms? F2 Readig referece: chapter 2 + slides Algorithm complexity Big O ad big Ω To calculate ruig time Aalysis of recursive Algorithms Next time: Litterature: slides mostly The first Algorithm desig methods:

More information

Lecture 1: Introduction and Strassen s Algorithm

Lecture 1: Introduction and Strassen s Algorithm 5-750: Graduate Algorithms Jauary 7, 08 Lecture : Itroductio ad Strasse s Algorithm Lecturer: Gary Miller Scribe: Robert Parker Itroductio Machie models I this class, we will primarily use the Radom Access

More information

Running Time. Analysis of Algorithms. Experimental Studies. Limitations of Experiments

Running Time. Analysis of Algorithms. Experimental Studies. Limitations of Experiments Ruig Time Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Most algorithms trasform iput objects ito output objects. The

More information

1&1 Next Level Hosting

1&1 Next Level Hosting 1&1 Next Level Hostig Performace Level: Performace that grows with your requiremets Copyright 1&1 Iteret SE 2017 1ad1.com 2 1&1 NEXT LEVEL HOSTING 3 Fast page loadig ad short respose times play importat

More information

Running Time ( 3.1) Analysis of Algorithms. Experimental Studies. Limitations of Experiments

Running Time ( 3.1) Analysis of Algorithms. Experimental Studies. Limitations of Experiments Ruig Time ( 3.1) Aalysis of Algorithms Iput Algorithm Output A algorithm is a step- by- step procedure for solvig a problem i a fiite amout of time. Most algorithms trasform iput objects ito output objects.

More information

Analysis of Algorithms

Analysis of Algorithms Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Ruig Time Most algorithms trasform iput objects ito output objects. The

More information

CSE 417: Algorithms and Computational Complexity

CSE 417: Algorithms and Computational Complexity Time CSE 47: Algorithms ad Computatioal Readig assigmet Read Chapter of The ALGORITHM Desig Maual Aalysis & Sortig Autum 00 Paul Beame aalysis Problem size Worst-case complexity: max # steps algorithm

More information

Analysis of Algorithms

Analysis of Algorithms Aalysis of Algorithms Ruig Time of a algorithm Ruig Time Upper Bouds Lower Bouds Examples Mathematical facts Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite

More information

A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON

A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON Roberto Lopez ad Eugeio Oñate Iteratioal Ceter for Numerical Methods i Egieerig (CIMNE) Edificio C1, Gra Capitá s/, 08034 Barceloa, Spai ABSTRACT I this work

More information

Multi-Threading. Hyper-, Multi-, and Simultaneous Thread Execution

Multi-Threading. Hyper-, Multi-, and Simultaneous Thread Execution Multi-Threadig Hyper-, Multi-, ad Simultaeous Thread Executio 1 Performace To Date Icreasig processor performace Pipeliig. Brach predictio. Super-scalar executio. Out-of-order executio. Caches. Hyper-Threadig

More information

Pseudocode ( 1.1) Analysis of Algorithms. Primitive Operations. Pseudocode Details. Running Time ( 1.1) Estimating performance

Pseudocode ( 1.1) Analysis of Algorithms. Primitive Operations. Pseudocode Details. Running Time ( 1.1) Estimating performance Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Pseudocode ( 1.1) High-level descriptio of a algorithm More structured

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 18 Strategies for Query Processig Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio DBMS techiques to process a query Scaer idetifies

More information

Matrix representation of a solution of a combinatorial problem of the group theory

Matrix representation of a solution of a combinatorial problem of the group theory Matrix represetatio of a solutio of a combiatorial problem of the group theory Krasimir Yordzhev, Lilyaa Totia Faculty of Mathematics ad Natural Scieces South-West Uiversity 66 Iva Mihailov Str, 2700 Blagoevgrad,

More information

Data Structures and Algorithms. Analysis of Algorithms

Data Structures and Algorithms. Analysis of Algorithms Data Structures ad Algorithms Aalysis of Algorithms Outlie Ruig time Pseudo-code Big-oh otatio Big-theta otatio Big-omega otatio Asymptotic algorithm aalysis Aalysis of Algorithms Iput Algorithm Output

More information

EE University of Minnesota. Midterm Exam #1. Prof. Matthew O'Keefe TA: Eric Seppanen. Department of Electrical and Computer Engineering

EE University of Minnesota. Midterm Exam #1. Prof. Matthew O'Keefe TA: Eric Seppanen. Department of Electrical and Computer Engineering EE 4363 1 Uiversity of Miesota Midterm Exam #1 Prof. Matthew O'Keefe TA: Eric Seppae Departmet of Electrical ad Computer Egieerig Uiversity of Miesota Twi Cities Campus EE 4363 Itroductio to Microprocessors

More information

6.854J / J Advanced Algorithms Fall 2008

6.854J / J Advanced Algorithms Fall 2008 MIT OpeCourseWare http://ocw.mit.edu 6.854J / 18.415J Advaced Algorithms Fall 2008 For iformatio about citig these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. 18.415/6.854 Advaced Algorithms

More information

Hash Tables. Presentation for use with the textbook Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015.

Hash Tables. Presentation for use with the textbook Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015. Presetatio for use with the textbook Algorithm Desig ad Applicatios, by M. T. Goodrich ad R. Tamassia, Wiley, 2015 Hash Tables xkcd. http://xkcd.com/221/. Radom Number. Used with permissio uder Creative

More information

Improvement of the Orthogonal Code Convolution Capabilities Using FPGA Implementation

Improvement of the Orthogonal Code Convolution Capabilities Using FPGA Implementation Improvemet of the Orthogoal Code Covolutio Capabilities Usig FPGA Implemetatio Naima Kaabouch, Member, IEEE, Apara Dhirde, Member, IEEE, Saleh Faruque, Member, IEEE Departmet of Electrical Egieerig, Uiversity

More information

BOOLEAN MATHEMATICS: GENERAL THEORY

BOOLEAN MATHEMATICS: GENERAL THEORY CHAPTER 3 BOOLEAN MATHEMATICS: GENERAL THEORY 3.1 ISOMORPHIC PROPERTIES The ame Boolea Arithmetic was chose because it was discovered that literal Boolea Algebra could have a isomorphic umerical aspect.

More information

top() Applications of Stacks

top() Applications of Stacks CS22 Algorithms ad Data Structures MW :00 am - 2: pm, MSEC 0 Istructor: Xiao Qi Lecture 6: Stacks ad Queues Aoucemets Quiz results Homework 2 is available Due o September 29 th, 2004 www.cs.mt.edu~xqicoursescs22

More information

EE123 Digital Signal Processing

EE123 Digital Signal Processing Last Time EE Digital Sigal Processig Lecture 7 Block Covolutio, Overlap ad Add, FFT Discrete Fourier Trasform Properties of the Liear covolutio through circular Today Liear covolutio with Overlap ad add

More information

Elementary Educational Computer

Elementary Educational Computer Chapter 5 Elemetary Educatioal Computer. Geeral structure of the Elemetary Educatioal Computer (EEC) The EEC coforms to the 5 uits structure defied by vo Neuma's model (.) All uits are preseted i a simplified

More information

An Improved Shuffled Frog-Leaping Algorithm for Knapsack Problem

An Improved Shuffled Frog-Leaping Algorithm for Knapsack Problem A Improved Shuffled Frog-Leapig Algorithm for Kapsack Problem Zhoufag Li, Ya Zhou, ad Peg Cheg School of Iformatio Sciece ad Egieerig Hea Uiversity of Techology ZhegZhou, Chia lzhf1978@126.com Abstract.

More information

Lecture 2. RTL Design Methodology. Transition from Pseudocode & Interface to a Corresponding Block Diagram

Lecture 2. RTL Design Methodology. Transition from Pseudocode & Interface to a Corresponding Block Diagram Lecture 2 RTL Desig Methodology Trasitio from Pseudocode & Iterface to a Correspodig Block Diagram Structure of a Typical Digital Data Iputs Datapath (Executio Uit) Data Outputs System Cotrol Sigals Status

More information

Lower Bounds for Sorting

Lower Bounds for Sorting Liear Sortig Topics Covered: Lower Bouds for Sortig Coutig Sort Radix Sort Bucket Sort Lower Bouds for Sortig Compariso vs. o-compariso sortig Decisio tree model Worst case lower boud Compariso Sortig

More information

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5.

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5. Morga Kaufma Publishers 26 February, 208 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 5 Virtual Memory Review: The Memory Hierarchy Take advatage of the priciple

More information

ISSN (Print) Research Article. *Corresponding author Nengfa Hu

ISSN (Print) Research Article. *Corresponding author Nengfa Hu Scholars Joural of Egieerig ad Techology (SJET) Sch. J. Eg. Tech., 2016; 4(5):249-253 Scholars Academic ad Scietific Publisher (A Iteratioal Publisher for Academic ad Scietific Resources) www.saspublisher.com

More information

Bank-interleaved cache or memory indexing does not require euclidean division

Bank-interleaved cache or memory indexing does not require euclidean division Bak-iterleaved cache or memory idexig does ot require euclidea divisio Adré Sezec To cite this versio: Adré Sezec. Bak-iterleaved cache or memory idexig does ot require euclidea divisio. 11th Aual Workshop

More information

Analysis Metrics. Intro to Algorithm Analysis. Slides. 12. Alg Analysis. 12. Alg Analysis

Analysis Metrics. Intro to Algorithm Analysis. Slides. 12. Alg Analysis. 12. Alg Analysis Itro to Algorithm Aalysis Aalysis Metrics Slides. Table of Cotets. Aalysis Metrics 3. Exact Aalysis Rules 4. Simple Summatio 5. Summatio Formulas 6. Order of Magitude 7. Big-O otatio 8. Big-O Theorems

More information

APPLICATION NOTE PACE1750AE BUILT-IN FUNCTIONS

APPLICATION NOTE PACE1750AE BUILT-IN FUNCTIONS APPLICATION NOTE PACE175AE BUILT-IN UNCTIONS About This Note This applicatio brief is iteded to explai ad demostrate the use of the special fuctios that are built ito the PACE175AE processor. These powerful

More information

Computer Systems - HS

Computer Systems - HS What have we leared so far? Computer Systems High Level ENGG1203 2d Semester, 2017-18 Applicatios Sigals Systems & Cotrol Systems Computer & Embedded Systems Digital Logic Combiatioal Logic Sequetial Logic

More information

COMP Parallel Computing. PRAM (1): The PRAM model and complexity measures

COMP Parallel Computing. PRAM (1): The PRAM model and complexity measures COMP 633 - Parallel Computig Lecture 2 August 24, 2017 : The PRAM model ad complexity measures 1 First class summary This course is about parallel computig to achieve high-er performace o idividual problems

More information

3D Model Retrieval Method Based on Sample Prediction

3D Model Retrieval Method Based on Sample Prediction 20 Iteratioal Coferece o Computer Commuicatio ad Maagemet Proc.of CSIT vol.5 (20) (20) IACSIT Press, Sigapore 3D Model Retrieval Method Based o Sample Predictio Qigche Zhag, Ya Tag* School of Computer

More information

Multiprocessors. HPC Prof. Robert van Engelen

Multiprocessors. HPC Prof. Robert van Engelen Multiprocessors Prof. Robert va Egele Overview The PMS model Shared memory multiprocessors Basic shared memory systems SMP, Multicore, ad COMA Distributed memory multicomputers MPP systems Network topologies

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Single-Cycle Disadvantages & Advantages

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Single-Cycle Disadvantages & Advantages COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 4 The Processor Pipeliig Sigle-Cycle Disadvatages & Advatages Clk Uses the clock cycle iefficietly the clock cycle must

More information

FAST BIT-REVERSALS ON UNIPROCESSORS AND SHARED-MEMORY MULTIPROCESSORS

FAST BIT-REVERSALS ON UNIPROCESSORS AND SHARED-MEMORY MULTIPROCESSORS SIAM J. SCI. COMPUT. Vol. 22, No. 6, pp. 2113 2134 c 21 Society for Idustrial ad Applied Mathematics FAST BIT-REVERSALS ON UNIPROCESSORS AND SHARED-MEMORY MULTIPROCESSORS ZHAO ZHANG AND XIAODONG ZHANG

More information

UNIVERSITY OF MORATUWA

UNIVERSITY OF MORATUWA UNIVERSITY OF MORATUWA FACULTY OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING B.Sc. Egieerig 2014 Itake Semester 2 Examiatio CS2052 COMPUTER ARCHITECTURE Time allowed: 2 Hours Jauary 2016

More information

Cache-Optimal Methods for Bit-Reversals

Cache-Optimal Methods for Bit-Reversals Proceedigs of the ACM/IEEE Supercomputig Coferece, November 1999, Portlad, Orego, U.S.A. Cache-Optimal Methods for Bit-Reversals Zhao Zhag ad Xiaodog Zhag Departmet of Computer Sciece College of William

More information

Load balanced Parallel Prime Number Generator with Sieve of Eratosthenes on Cluster Computers *

Load balanced Parallel Prime Number Generator with Sieve of Eratosthenes on Cluster Computers * Load balaced Parallel Prime umber Geerator with Sieve of Eratosthees o luster omputers * Soowook Hwag*, Kyusik hug**, ad Dogseug Kim* *Departmet of Electrical Egieerig Korea Uiversity Seoul, -, Rep. of

More information

A New Morphological 3D Shape Decomposition: Grayscale Interframe Interpolation Method

A New Morphological 3D Shape Decomposition: Grayscale Interframe Interpolation Method A ew Morphological 3D Shape Decompositio: Grayscale Iterframe Iterpolatio Method D.. Vizireau Politehica Uiversity Bucharest, Romaia ae@comm.pub.ro R. M. Udrea Politehica Uiversity Bucharest, Romaia mihea@comm.pub.ro

More information

Major CSL Write your name and entry no on every sheet of the answer script. Time 2 Hrs Max Marks 70

Major CSL Write your name and entry no on every sheet of the answer script. Time 2 Hrs Max Marks 70 NOTE:. Attempt all seve questios. Major CSL 02 2. Write your ame ad etry o o every sheet of the aswer script. Time 2 Hrs Max Marks 70 Q No Q Q 2 Q 3 Q 4 Q 5 Q 6 Q 7 Total MM 6 2 4 0 8 4 6 70 Q. Write a

More information

Chapter 3. Floating Point Arithmetic

Chapter 3. Floating Point Arithmetic COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 3 Floatig Poit Arithmetic Review - Multiplicatio 0 1 1 0 = 6 multiplicad 32-bit ALU shift product right multiplier add

More information

SPIRAL DSP Transform Compiler:

SPIRAL DSP Transform Compiler: SPIRAL DSP Trasform Compiler: Applicatio Specific Hardware Sythesis Peter A. Milder (peter.milder@stoybroo.edu) Fraz Frachetti, James C. Hoe, ad Marus Pueschel Departmet of ECE Caregie Mello Uiversity

More information

CMSC Computer Architecture Lecture 11: More Caches. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 11: More Caches. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 11: More Caches Prof. Yajig Li Uiversity of Chicago Lecture Outlie Caches 2 Review Memory hierarchy Cache basics Locality priciples Spatial ad temporal How to access

More information

Reversible Realization of Quaternary Decoder, Multiplexer, and Demultiplexer Circuits

Reversible Realization of Quaternary Decoder, Multiplexer, and Demultiplexer Circuits Egieerig Letters, :, EL Reversible Realizatio of Quaterary Decoder, Multiplexer, ad Demultiplexer Circuits Mozammel H.. Kha, Member, ENG bstract quaterary reversible circuit is more compact tha the correspodig

More information

. Written in factored form it is easy to see that the roots are 2, 2, i,

. Written in factored form it is easy to see that the roots are 2, 2, i, CMPS A Itroductio to Programmig Programmig Assigmet 4 I this assigmet you will write a java program that determies the real roots of a polyomial that lie withi a specified rage. Recall that the roots (or

More information

An Efficient Algorithm for Graph Bisection of Triangularizations

An Efficient Algorithm for Graph Bisection of Triangularizations Applied Mathematical Scieces, Vol. 1, 2007, o. 25, 1203-1215 A Efficiet Algorithm for Graph Bisectio of Triagularizatios Gerold Jäger Departmet of Computer Sciece Washigto Uiversity Campus Box 1045, Oe

More information

Chapter 5. Functions for All Subtasks. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

Chapter 5. Functions for All Subtasks. Copyright 2015 Pearson Education, Ltd.. All rights reserved. Chapter 5 Fuctios for All Subtasks Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Overview 5.1 void Fuctios 5.2 Call-By-Referece Parameters 5.3 Usig Procedural Abstractio 5.4 Testig ad Debuggig

More information

CS200: Hash Tables. Prichard Ch CS200 - Hash Tables 1

CS200: Hash Tables. Prichard Ch CS200 - Hash Tables 1 CS200: Hash Tables Prichard Ch. 13.2 CS200 - Hash Tables 1 Table Implemetatios: average cases Search Add Remove Sorted array-based Usorted array-based Balaced Search Trees O(log ) O() O() O() O(1) O()

More information

Examples and Applications of Binary Search

Examples and Applications of Binary Search Toy Gog ITEE Uiersity of Queeslad I the secod lecture last week we studied the biary search algorithm that soles the problem of determiig if a particular alue appears i a sorted list of iteger or ot. We

More information

CMSC Computer Architecture Lecture 12: Virtual Memory. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 12: Virtual Memory. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 12: Virtual Memory Prof. Yajig Li Uiversity of Chicago A System with Physical Memory Oly Examples: most Cray machies early PCs Memory early all embedded systems

More information

FPGA IMPLEMENTATION OF BASE-N LOGARITHM. Salvador E. Tropea

FPGA IMPLEMENTATION OF BASE-N LOGARITHM. Salvador E. Tropea FPGA IMPLEMENTATION OF BASE-N LOGARITHM Salvador E. Tropea Electróica e Iformática Istituto Nacioal de Tecología Idustrial Bueos Aires, Argetia email: salvador@iti.gov.ar ABSTRACT I this work, we preset

More information

n Explore virtualization concepts n Become familiar with cloud concepts

n Explore virtualization concepts n Become familiar with cloud concepts Chapter Objectives Explore virtualizatio cocepts Become familiar with cloud cocepts Chapter #15: Architecture ad Desig 2 Hypervisor Virtualizatio ad cloud services are becomig commo eterprise tools to

More information

Fast Fourier Transform (FFT) Algorithms

Fast Fourier Transform (FFT) Algorithms Fast Fourier Trasform FFT Algorithms Relatio to the z-trasform elsewhere, ozero, z x z X x [ ] 2 ~ elsewhere,, ~ e j x X x x π j e z z X X π 2 ~ The DFS X represets evely spaced samples of the z- trasform

More information

An Efficient Algorithm for Graph Bisection of Triangularizations

An Efficient Algorithm for Graph Bisection of Triangularizations A Efficiet Algorithm for Graph Bisectio of Triagularizatios Gerold Jäger Departmet of Computer Sciece Washigto Uiversity Campus Box 1045 Oe Brookigs Drive St. Louis, Missouri 63130-4899, USA jaegerg@cse.wustl.edu

More information

Recursion. Recursion. Mathematical induction: example. Recursion. The sum of the first n odd numbers is n 2 : Informal proof: Principle:

Recursion. Recursion. Mathematical induction: example. Recursion. The sum of the first n odd numbers is n 2 : Informal proof: Principle: Recursio Recursio Jordi Cortadella Departmet of Computer Sciece Priciple: Reduce a complex problem ito a simpler istace of the same problem Recursio Itroductio to Programmig Dept. CS, UPC 2 Mathematical

More information

Python Programming: An Introduction to Computer Science

Python Programming: An Introduction to Computer Science Pytho Programmig: A Itroductio to Computer Sciece Chapter 6 Defiig Fuctios Pytho Programmig, 2/e 1 Objectives To uderstad why programmers divide programs up ito sets of cooperatig fuctios. To be able to

More information

G2 T. Specification Sheet G2T-001 G2T Touchscreen Mainframes Accepts G2 Plug-in Modules Four Sizes: 2RU, 3RU, 6RU and 8RU

G2 T. Specification Sheet G2T-001 G2T Touchscreen Mainframes Accepts G2 Plug-in Modules Four Sizes: 2RU, 3RU, 6RU and 8RU G2 T Geeral The G2T Maiframes are part of our field-prove G2 family of products ad replaces the G2S maiframes. The mai differece is the all ew frot pael touchscree desig which replaces the older VF display

More information

Pattern Recognition Systems Lab 1 Least Mean Squares

Pattern Recognition Systems Lab 1 Least Mean Squares Patter Recogitio Systems Lab 1 Least Mea Squares 1. Objectives This laboratory work itroduces the OpeCV-based framework used throughout the course. I this assigmet a lie is fitted to a set of poits usig

More information

New HSL Distance Based Colour Clustering Algorithm

New HSL Distance Based Colour Clustering Algorithm The 4th Midwest Artificial Itelligece ad Cogitive Scieces Coferece (MAICS 03 pp 85-9 New Albay Idiaa USA April 3-4 03 New HSL Distace Based Colour Clusterig Algorithm Vasile Patrascu Departemet of Iformatics

More information

Chapter 1. Introduction to Computers and C++ Programming. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

Chapter 1. Introduction to Computers and C++ Programming. Copyright 2015 Pearson Education, Ltd.. All rights reserved. Chapter 1 Itroductio to Computers ad C++ Programmig Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Overview 1.1 Computer Systems 1.2 Programmig ad Problem Solvig 1.3 Itroductio to C++ 1.4 Testig

More information

Avid Interplay Bundle

Avid Interplay Bundle Avid Iterplay Budle Versio 2.5 Cofigurator ReadMe Overview This documet provides a overview of Iterplay Budle v2.5 ad describes how to ru the Iterplay Budle cofiguratio tool. Iterplay Budle v2.5 refers

More information

LU Decomposition Method

LU Decomposition Method SOLUTION OF SIMULTANEOUS LINEAR EQUATIONS LU Decompositio Method Jamie Traha, Autar Kaw, Kevi Marti Uiversity of South Florida Uited States of America kaw@eg.usf.edu http://umericalmethods.eg.usf.edu Itroductio

More information

condition w i B i S maximum u i

condition w i B i S maximum u i ecture 10 Dyamic Programmig 10.1 Kapsack Problem November 1, 2004 ecturer: Kamal Jai Notes: Tobias Holgers We are give a set of items U = {a 1, a 2,..., a }. Each item has a weight w i Z + ad a utility

More information

Improving Template Based Spike Detection

Improving Template Based Spike Detection Improvig Template Based Spike Detectio Kirk Smith, Member - IEEE Portlad State Uiversity petra@ee.pdx.edu Abstract Template matchig algorithms like SSE, Covolutio ad Maximum Likelihood are well kow for

More information

The Magma Database file formats

The Magma Database file formats The Magma Database file formats Adrew Gaylard, Bret Pikey, ad Mart-Mari Breedt Johaesburg, South Africa 15th May 2006 1 Summary Magma is a ope-source object database created by Chris Muller, of Kasas City,

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Part A Datapath Design

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Part A Datapath Design COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter The Processor Part A path Desig Itroductio CPU performace factors Istructio cout Determied by ISA ad compiler. CPI ad

More information

EFFICIENT MULTIPLE SEARCH TREE STRUCTURE

EFFICIENT MULTIPLE SEARCH TREE STRUCTURE EFFICIENT MULTIPLE SEARCH TREE STRUCTURE Mohammad Reza Ghaeii 1 ad Mohammad Reza Mirzababaei 1 Departmet of Computer Egieerig ad Iformatio Techology, Amirkabir Uiversity of Techology, Tehra, Ira mr.ghaeii@aut.ac.ir

More information

Python Programming: An Introduction to Computer Science

Python Programming: An Introduction to Computer Science Pytho Programmig: A Itroductio to Computer Sciece Chapter 1 Computers ad Programs 1 Objectives To uderstad the respective roles of hardware ad software i a computig system. To lear what computer scietists

More information

Novel Encryption Schemes Based on Catalan Numbers

Novel Encryption Schemes Based on Catalan Numbers D. Sravaa Kumar, H. Sueetha, A. hadrasekhar / Iteratioal Joural of Egieerig Research ad Applicatios (IJERA) ISSN: 48-96 www.iera.com Novel Ecryptio Schemes Based o atala Numbers 1 D. Sravaa Kumar H. Sueetha

More information

Master Informatics Eng. 2017/18. A.J.Proença. Memory Hierarchy. (most slides are borrowed) AJProença, Advanced Architectures, MiEI, UMinho, 2017/18 1

Master Informatics Eng. 2017/18. A.J.Proença. Memory Hierarchy. (most slides are borrowed) AJProença, Advanced Architectures, MiEI, UMinho, 2017/18 1 Advaced Architectures Master Iformatics Eg. 2017/18 A.J.Proeça Memory Hierarchy (most slides are borrowed) AJProeça, Advaced Architectures, MiEI, UMiho, 2017/18 1 Itroductio Programmers wat ulimited amouts

More information

Lecture 5. Counting Sort / Radix Sort

Lecture 5. Counting Sort / Radix Sort Lecture 5. Coutig Sort / Radix Sort T. H. Corme, C. E. Leiserso ad R. L. Rivest Itroductio to Algorithms, 3rd Editio, MIT Press, 2009 Sugkyukwa Uiversity Hyuseug Choo choo@skku.edu Copyright 2000-2018

More information

Σ P(i) ( depth T (K i ) + 1),

Σ P(i) ( depth T (K i ) + 1), EECS 3101 York Uiversity Istructor: Ady Mirzaia DYNAMIC PROGRAMMING: OPIMAL SAIC BINARY SEARCH REES his lecture ote describes a applicatio of the dyamic programmig paradigm o computig the optimal static

More information

Stone Images Retrieval Based on Color Histogram

Stone Images Retrieval Based on Color Histogram Stoe Images Retrieval Based o Color Histogram Qiag Zhao, Jie Yag, Jigyi Yag, Hogxig Liu School of Iformatio Egieerig, Wuha Uiversity of Techology Wuha, Chia Abstract Stoe images color features are chose

More information

CIS 121 Data Structures and Algorithms with Java Spring Stacks and Queues Monday, February 12 / Tuesday, February 13

CIS 121 Data Structures and Algorithms with Java Spring Stacks and Queues Monday, February 12 / Tuesday, February 13 CIS Data Structures ad Algorithms with Java Sprig 08 Stacks ad Queues Moday, February / Tuesday, February Learig Goals Durig this lab, you will: Review stacks ad queues. Lear amortized ruig time aalysis

More information

Performance Plus Software Parameter Definitions

Performance Plus Software Parameter Definitions Performace Plus+ Software Parameter Defiitios/ Performace Plus Software Parameter Defiitios Chapma Techical Note-TG-5 paramete.doc ev-0-03 Performace Plus+ Software Parameter Defiitios/2 Backgroud ad Defiitios

More information

Cubic Polynomial Curves with a Shape Parameter

Cubic Polynomial Curves with a Shape Parameter roceedigs of the th WSEAS Iteratioal Coferece o Robotics Cotrol ad Maufacturig Techology Hagzhou Chia April -8 00 (pp5-70) Cubic olyomial Curves with a Shape arameter MO GUOLIANG ZHAO YANAN Iformatio ad

More information

Chapter 24. Sorting. Objectives. 1. To study and analyze time efficiency of various sorting algorithms

Chapter 24. Sorting. Objectives. 1. To study and analyze time efficiency of various sorting algorithms Chapter 4 Sortig 1 Objectives 1. o study ad aalyze time efficiecy of various sortig algorithms 4. 4.7.. o desig, implemet, ad aalyze bubble sort 4.. 3. o desig, implemet, ad aalyze merge sort 4.3. 4. o

More information

DESIGN AND ANALYSIS OF LDPC DECODERS FOR SOFTWARE DEFINED RADIO

DESIGN AND ANALYSIS OF LDPC DECODERS FOR SOFTWARE DEFINED RADIO DESIGN AND ANALYSIS OF LDPC DECODERS FOR SOFTWARE DEFINED RADIO Sagwo Seo, Trevor Mudge Advaced Computer Architecture Laboratory Uiversity of Michiga at A Arbor {swseo, tm}@umich.edu Yumig Zhu, Chaitali

More information

IMP: Superposer Integrated Morphometrics Package Superposition Tool

IMP: Superposer Integrated Morphometrics Package Superposition Tool IMP: Superposer Itegrated Morphometrics Package Superpositio Tool Programmig by: David Lieber ( 03) Caisius College 200 Mai St. Buffalo, NY 4208 Cocept by: H. David Sheets, Dept. of Physics, Caisius College

More information

Software development of components for complex signal analysis on the example of adaptive recursive estimation methods.

Software development of components for complex signal analysis on the example of adaptive recursive estimation methods. Software developmet of compoets for complex sigal aalysis o the example of adaptive recursive estimatio methods. SIMON BOYMANN, RALPH MASCHOTTA, SILKE LEHMANN, DUNJA STEUER Istitute of Biomedical Egieerig

More information

Evaluation scheme for Tracking in AMI

Evaluation scheme for Tracking in AMI A M I C o m m u i c a t i o A U G M E N T E D M U L T I - P A R T Y I N T E R A C T I O N http://www.amiproject.org/ Evaluatio scheme for Trackig i AMI S. Schreiber a D. Gatica-Perez b AMI WP4 Trackig:

More information

Enhancing Efficiency of Software Fault Tolerance Techniques in Satellite Motion System

Enhancing Efficiency of Software Fault Tolerance Techniques in Satellite Motion System Joural of Iformatio Systems ad Telecommuicatio, Vol. 2, No. 3, July-September 2014 173 Ehacig Efficiecy of Software Fault Tolerace Techiques i Satellite Motio System Hoda Baki Departmet of Electrical ad

More information

SCI Reflective Memory

SCI Reflective Memory Embedded SCI Solutios SCI Reflective Memory (Experimetal) Atle Vesterkjær Dolphi Itercoect Solutios AS Olaf Helsets vei 6, N-0621 Oslo, Norway Phoe: (47) 23 16 71 42 Fax: (47) 23 16 71 80 Mail: atleve@dolphiics.o

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 19 Query Optimizatio Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio Query optimizatio Coducted by a query optimizer i a DBMS Goal:

More information

Counting the Number of Minimum Roman Dominating Functions of a Graph

Counting the Number of Minimum Roman Dominating Functions of a Graph Coutig the Number of Miimum Roma Domiatig Fuctios of a Graph SHI ZHENG ad KOH KHEE MENG, Natioal Uiversity of Sigapore We provide two algorithms coutig the umber of miimum Roma domiatig fuctios of a graph

More information

Algorithm. Counting Sort Analysis of Algorithms

Algorithm. Counting Sort Analysis of Algorithms Algorithm Coutig Sort Aalysis of Algorithms Assumptios: records Coutig sort Each record cotais keys ad data All keys are i the rage of 1 to k Space The usorted list is stored i A, the sorted list will

More information

CS2410 Computer Architecture. Flynn s Taxonomy

CS2410 Computer Architecture. Flynn s Taxonomy CS2410 Computer Architecture Dept. of Computer Sciece Uiversity of Pittsburgh http://www.cs.pitt.edu/~melhem/courses/2410p/idex.html 1 Fly s Taxoomy SISD Sigle istructio stream Sigle data stream (SIMD)

More information

What are we going to learn? CSC Data Structures Analysis of Algorithms. Overview. Algorithm, and Inputs

What are we going to learn? CSC Data Structures Analysis of Algorithms. Overview. Algorithm, and Inputs What are we goig to lear? CSC316-003 Data Structures Aalysis of Algorithms Computer Sciece North Carolia State Uiversity Need to say that some algorithms are better tha others Criteria for evaluatio Structure

More information

Adaptive Resource Allocation for Electric Environmental Pollution through the Control Network

Adaptive Resource Allocation for Electric Environmental Pollution through the Control Network Available olie at www.sciecedirect.com Eergy Procedia 6 (202) 60 64 202 Iteratioal Coferece o Future Eergy, Eviromet, ad Materials Adaptive Resource Allocatio for Electric Evirometal Pollutio through the

More information