Bank-interleaved cache or memory indexing does not require euclidean division
|
|
- Brice Davis
- 5 years ago
- Views:
Transcription
1 Bak-iterleaved cache or memory idexig does ot require euclidea divisio Adré Sezec To cite this versio: Adré Sezec. Bak-iterleaved cache or memory idexig does ot require euclidea divisio. 11th Aual Workshop o Duplicatig, Decostructig ad Debukig, Ju 2015, Portlad, Uited States. Proceedig of the 11th Aual Workshop o Duplicatig, Decostructig ad Debukig, 2015, < <hal > HAL Id: hal Submitted o 2 Oct 2015 HAL is a multi-discipliary ope access archive for the deposit ad dissemiatio of scietific research documets, whether they are published or ot. The documets may come from teachig ad research istitutios i Frace or abroad, or from public or private research ceters. L archive ouverte pluridiscipliaire HAL, est destiée au dépôt et à la diffusio de documets scietifiques de iveau recherche, publiés ou o, émaat des établissemets d eseigemet et de recherche fraçais ou étragers, des laboratoires publics ou privés.
2 1 Bak-iterleaved cache or memory idexig does ot require euclidea divisio Adré Sezec, IRISA/INRIA, Campus de Beaulieu Rees Cedex, FRANCE adre.sezec@iria.fr Abstract Cocurret access to bak-iterleaved memory structure have bee studied for decades, particularly i the cotext of vector supercomputer systems. It is still commo belief that usig a umber of baks differet from 2 leads to isert a complex hardware icludig a o-trivial divider o the access path to the memory. I 1993, two idepedet studies [1], [2] were showig that through leveragig a very simple arithmetic result, the Chiese Remaider Theorem, this euclidea divisio is ot eeded whe the umber of baks is prime or simply odd. I the mid 90 s, the iterest for vector supercomputers faded ad the research topic disappeared. The iterest for bakiterleaved cache has reappeared recetly [3] i the GPU cotext. I this short paper, we exted the result from [1] ad we show that, regardless the umber of baks: Bak-iterleaved cache or memory idexig does ot require euclidea divisio. 1 INTRODUCTION The eed for cocurret access to data i memory structures has lead to the desig ad use of baked-iterleaved structures, first for mai memory, e.g. i vector supercomputers ad at secod step i caches. Optimizig parallel access to this bak-iterleaved memory structure has bee studied for decades particularly i the 80 s ad the early 90 s i the cotext of the access to strided vector processors [4], [5], [6], [7], [8], [9]. More recetly, similar studies have bee published i the cotext of bak-iterleaved caches for vector processors [10], [11] or GPU caches [3]. It was commo belief that for simple idexig of a bakiterleaved cache or memory, the umber of baks should be a power of 2, 2, sice otherwise complex arithmetic icludig a euclidea divisio ad euclidea modulo would be required [12], [6]. Two idepedet studies published i 1993 at the ISCA coferece [1], [2] poited out that for prime or odd umbers of baks such divisio is useless due a very simple arithmetic theorem, the Chiese Remaider Theorem. I a recet study, Diamod et al. [3] poit out that, i the cotext of GPU, the best umber of cache baks might ot be a power of two but may be ay other umber. They propose a optimized hardware mechaism to compute both modulo ad divisio at the same time. While their hardware proposal optimizes the implemetatio of bak-iterleaved cache or memory idexig if the euclidea divisio was required, such a euclidea divisio is useless. I this short paper, we exted the result from [1], [2] showig that the euclidea divisio is useless for idexig a bak-iterleaved memory or cache regardless the umber of baks. Notatio ad defiitio For coveiece, i the remaider of the paper, we will refer oly to a bak-iterleaved memory. The result i the paper also applies to bak-iterleaved caches. We will refer to a bak-iterleaved memory with a odd umber of baks as a odd memory system. Bak iterleavig i a memory or cache ca be implemeted at differet graularities; for istace 8-byte word o Cray vector supercomputers, cache blocks o vector microprocessor [11], 8-byte word for recet GPUs. For coveiece, we will refer to this graularity as word i the remaider of the paper. All the addresses that will be cosidered i this paper will be i words, therefore igorig offset i the word. 2 ODD MEMORY SYSTEMS DO NOT REQUIRE EU- CLIDEAN DIVISION This sectio summarizes the results published i [1]. 2.1 Usual data mappig i a bak-iterleaved memory The physical mappig of word at address A, 0 A < 2 c N o memory is defied by its bak umber m(a), 0 m(a) < N, ad its local address l(a), 0 l(a) < 2 c, i the bak. The most importat property that most maufacturers wat to guaratee whe usig i a N-bak iterleaved memory is the parallel access to cosecutive words i memory, i.e. ay N cosecutive words are stored. This leads to the covetioal mappig of data defied by: m(a) = A mod N
3 Fig. 1. mod-div mappig o a 5-bak iterleaved memory l(a) = A / N We will refer to this address mappig as the mod-div mappig. For N=2, the mod-div mappig, the bak umber cosists i the least sigificat bits while the high order bits costitute the local address. Figure 1 illustrates this mappig for N=5 ad a memory bak of 4 words. The old case for prime memory system Usig prime (or odd) memory system has bee kow to provide iterestig properties for vector supercomputers sice Budick [4] established the followig property o distributio of the elemets amog the memory baks. Theorem 2.1 (Distributio Theorem). Whe the bak distributio fuctio is defied by m(a) = A mod N, the for ay vector V stored with a stride R, V (i) ad V (j) are stored i the same memory bak iff i = j mod N/GCD(N, R) The for ay vector V stored with a stride R, N/GCD(N, R) cosecutive elemets of the vector are stored i distict memory baks.usig a prime umber of memory baks esures a coflict free distributio of ay slice of N cosecutive elemets for all the vectors stored with a stride R ot multiple of N. Moreover, usig a prime umber iduces simple cotrol for memory accesses; oly two distributios of elemets of a vector slice are possible: coflict free access is possible or all the elemets lie i the same memory bak. A last argumet i favor of usig a prime memory system is the demad for memory throughput o vector accesses with power-of-two strides i some specific applicatios. What was (believed to be) wrog with prime memory systems Ufortuately whe usig the usual data mappig, address computatio for a prime memory system requires arithmetic modulo a fixed prime umber: 1) Computig the memory bak umber for word at address A requires the computatio of A mod N. For specific values, very fast hardware evaluatios of such a modulo may be implemeted. 2) The computatio of the local displacemet i the memory bak requires a Euclidea Divisio by N ad this divisio is quite complex whe N is a odd umber. This Euclidea Divisio may leghte sigificatly the total memory idexig. Therefore, whe the usual low-order mappig o memory is used i a vector machie, the umber of memory baks was a power of two. I fact by chagig the choice of the local address fuctio, this Euclidea Divisio ca be avoided o memory systems with a odd umber of baks, but the result [1], [2] was probably published too late (1993) to be used i vector supercomputers. 2.2 Simple is better A very old arithmetic result kow as Chiese Remaider Theorem 1 iduces a very elegat way to map elemets oto a parallel memory cosistig i a odd umber N baks of 2 c elemets ad for which o hardware is eeded to compute the local address 2. Theorem 2.2 (Chiese Remaider Theorem). Let P ad Q be 2 relatively prime itegers, i.e. GCD(P,Q)=1 the for each pair (X, Y ) such that 0 X < P ad 0 Y < Q there exists oe ad oly oe 0 Z < P Q such that : Z X mod P ad Z Y mod Q The Chiese Remaider Theorem just guaratees that, 2 c beig the umber of memory words per bak, the fuctios 1. It seems that this result was kow more tha 2000 years ago by the old Chiese 2. O the Burroughs Scietific Processor [12], the Euclidea Divisio was also avoided, but 1 17 th of the memory was wasted.
4 Fig. 2. mod-mod mappig o a 5-bak iterleaved memory defied by m(a) = A mod N ad l(a) = A mod 2 c defie a mappig of the address space oto the physical memory sice N is odd ad therefore prime with 2 c. This mod-mod mappig is illustrated i Figure 2. The bak umber fuctio is exactly the same as for the mod-div mappig. Therefore the Distributio Theorem still holds for this mappig: coflict free access is possible to ay slice of N cosecutive elemets of a vector stored with a stride R ot multiple of N. The mai beefit of this mod-mod mappig is that the local address l(a) is the c least sigificat bits of the address: o hardware is required for derivig it from the address. The we ca state: Odd Memory Systems Do Not Require Euclidea Divisio 3 NO BANK-INTERLEAVED MEMORY SYSTEM RE- QUIRE EUCLIDEAN DIVISION I their experimets, Diamod et al [3] poits out that for GPUs powers of two are ot the best umber of baks for a GPU cache. I their particular experimets, they argue to use 62 ad 48 cache baks. These umbers are either odd or power of two. However, i this sectio we exted the result from the previous sectio to every umber N of baks. We cosider the geeral form of a iteger as N= 2 R with R odd. The particular cases of N beig a power of two (R=1) ad N beig odd ( = 0) have bee treated i the previous sectio. Therefore, we assume 0 ad R odd, but greater tha 1. We cosider the two fuctios m ad l defied by: m(a) = (A mod N) l(a) = A 2 mod 2 c These fuctios defie a mappig from the address space to the physical memory as show below: A = (A mod 2 ) + 2 A 2, therefore m(a) = (A mod 2 ) + 2 ( A 2 mod R). The applicatio of the Chiese Remaider Theorem esures that the fuctios l ad m 2 defie is a oe-to-oe mappig from {0,.., R 2 c 1} oto {0,.., 2 c 1} {0,.., R 1}. As a cosequece, the fuctios l ad m defie is a oe-to-oe mappig from the address space {0,.., N 2 c 1} oto the set of memory words of the memory system. This memory mappig is illustrated for a 6-bak iterleaved memory o Figure 3. Therefore we ca state : o euclidea divisio is eeded to idex a bak-iterleaved memory system. 4 BANK NUMBER COMPUTATION The computatio of modulo P is simple for P=2 p 1 or P= 2 p + 1, as well as N= 2 c (2 p 1) or N=2 c (2 p 1). This ca implemeted very simply through cascaded carry save adders followed by last p bits adder ad very limited logic. For example, a GPU with 32 warps would feature 32 to 64 cache baks. I that rage, may umbers are of the form N= 2 c (2 p 1) or N =2 c (2 p + 1): 33, 34, 36, 40, 48, 56, 60, 62 ad CONCLUSION Despite publicatios i 1993 [1], [2], it is still commo belief that idexig a bak iterleaved memory or cache requires a euclidea divisio whe the umber of baks is ot a power of two. I [1], [2], it was show that euclidea divisio is ot required for prime or odd umbers of baks. I this short paper, we have trivially exteded this result to ay umber of baks. Therefore, regardless the umber of baks: Bak-iterleaved cache or memory idexig does ot require euclidea divisio.
5 Fig. 3. Euclidea divisio free mappig o a 6-bak iterleaved memory ACKNOWLEDGEMENT This work was partially supported by the Europea Research Coucil Advaced Grat DAL No REFERENCES [1] A. Sezec ad J. Lefat, Odd memory systems may be quite iterestig, i Proceedigs of the 20th Aual Iteratioal Symposium o Computer Architecture. Sa Diego, CA, May 1993, 1993, pp [Olie]. Available: [2] Q. S. Gao, The chiese remaider theorem ad the prime memory system, i Proceedigs of the 20th Aual Iteratioal Symposium o Computer Architecture, ser. ISCA 93. New York, NY, USA: ACM, 1993, pp [Olie]. Available: [3] J. Diamod, D. Fussell, ad S. W. Keckler, Arbitrary modulus idexig, i Proceedigs of the 47th ACM/IEEE symposium o Microarchitecture, Dec [4] P. Budik ad D. J. Kuck, The orgaizatio ad use of parallel memories, IEEE Tras. Comput., vol. 20, o. 12, pp , Dec [Olie]. Available: C [5] D. T. Harper, III ad J. R. Jump, Vector access performace i parallel memories usig skewed storage scheme, IEEE Tras. Comput., vol. 36, o. 12, pp , Dec [Olie]. Available: [6] B. R. Rau, Pseudo-radomly iterleaved memory, i Proceedigs of the 18th Aual Iteratioal Symposium o Computer Architecture, ser. ISCA 91. New York, NY, USA: ACM, 1991, pp [Olie]. Available: [7] A. Sezec ad J. Lefat, Iterleaved parallel schemes: Improvig memory throughput o supercomputers, i Proceedigs of the 19th Aual Iteratioal Symposium o Computer Architecture, ser. ISCA 92. New York, NY, USA: ACM, 1992, pp [Olie]. Available: [8] M. Valero, T. Lag, ad E. Ayguadé, Coflict-free access of vectors with power-of-two strides, i ICS, 1992, pp [Olie]. Available: [9] B. D. de Diechi, A ultra fast euclidea divisio algorithm for prime memory systems, i Proceedigs of the 1991 ACM/IEEE Coferece o Supercomputig, ser. Supercomputig 91. New York, NY, USA: ACM, 1991, pp [Olie]. Available: [10] A. Sezec ad R. Espasa, Coflict-free accesses to strided vectors o a baked cache, IEEE Tras. Computers, vol. 54, o. 7, pp , [Olie]. Available: [11] R. Espasa, F. Ardaaz, J. Gago, R. Gramut, I. Heradez, T. Jua, J. S. Emer, S. Felix, P. G. Lowey, M. Mattia, ad A. Sezec, Taratula: A vector extesio to the alpha architecture, i 29th Iteratioal Symposium o Computer Architecture (ISCA 2002), May 2002, Achorage, AK, USA, 2002, p [Olie]. Available: [12] D. H. Lawrie ad C. R. Vora, The prime memory system for array access, IEEE Tras. Comput., vol. 31, o. 5, pp , May [Olie]. Available:
Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5
Morga Kaufma Publishers 26 February, 28 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 5 Set-Associative Cache Architecture Performace Summary Whe CPU performace icreases:
More informationHash Tables. Presentation for use with the textbook Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015.
Presetatio for use with the textbook Algorithm Desig ad Applicatios, by M. T. Goodrich ad R. Tamassia, Wiley, 2015 Hash Tables xkcd. http://xkcd.com/221/. Radom Number. Used with permissio uder Creative
More informationCHAPTER IV: GRAPH THEORY. Section 1: Introduction to Graphs
CHAPTER IV: GRAPH THEORY Sectio : Itroductio to Graphs Sice this class is called Number-Theoretic ad Discrete Structures, it would be a crime to oly focus o umber theory regardless how woderful those topics
More informationCMSC Computer Architecture Lecture 12: Virtual Memory. Prof. Yanjing Li University of Chicago
CMSC 22200 Computer Architecture Lecture 12: Virtual Memory Prof. Yajig Li Uiversity of Chicago A System with Physical Memory Oly Examples: most Cray machies early PCs Memory early all embedded systems
More informationCache-Optimal Methods for Bit-Reversals
Proceedigs of the ACM/IEEE Supercomputig Coferece, November 1999, Portlad, Orego, U.S.A. Cache-Optimal Methods for Bit-Reversals Zhao Zhag ad Xiaodog Zhag Departmet of Computer Sciece College of William
More informationarxiv: v2 [cs.ds] 24 Mar 2018
Similar Elemets ad Metric Labelig o Complete Graphs arxiv:1803.08037v [cs.ds] 4 Mar 018 Pedro F. Felzeszwalb Brow Uiversity Providece, RI, USA pff@brow.edu March 8, 018 We cosider a problem that ivolves
More informationChapter 3 Classification of FFT Processor Algorithms
Chapter Classificatio of FFT Processor Algorithms The computatioal complexity of the Discrete Fourier trasform (DFT) is very high. It requires () 2 complex multiplicatios ad () complex additios [5]. As
More informationECE4050 Data Structures and Algorithms. Lecture 6: Searching
ECE4050 Data Structures ad Algorithms Lecture 6: Searchig 1 Search Give: Distict keys k 1, k 2,, k ad collectio L of records of the form (k 1, I 1 ), (k 2, I 2 ),, (k, I ) where I j is the iformatio associated
More informationAppendix D. Controller Implementation
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Appedix D Cotroller Implemetatio Cotroller Implemetatios Combiatioal logic (sigle-cycle); Fiite state machie (multi-cycle, pipelied);
More informationAdministrative UNSUPERVISED LEARNING. Unsupervised learning. Supervised learning 11/25/13. Final project. No office hours today
Admiistrative Fial project No office hours today UNSUPERVISED LEARNING David Kauchak CS 451 Fall 2013 Supervised learig Usupervised learig label label 1 label 3 model/ predictor label 4 label 5 Supervised
More informationLecture Notes 6 Introduction to algorithm analysis CSS 501 Data Structures and Object-Oriented Programming
Lecture Notes 6 Itroductio to algorithm aalysis CSS 501 Data Structures ad Object-Orieted Programmig Readig for this lecture: Carrao, Chapter 10 To be covered i this lecture: Itroductio to algorithm aalysis
More informationGPUMP: a Multiple-Precision Integer Library for GPUs
GPUMP: a Multiple-Precisio Iteger Library for GPUs Kaiyog Zhao ad Xiaowe Chu Departmet of Computer Sciece, Hog Kog Baptist Uiversity Hog Kog, P. R. Chia Email: {kyzhao, chxw}@comp.hkbu.edu.hk Abstract
More informationPseudocode ( 1.1) Analysis of Algorithms. Primitive Operations. Pseudocode Details. Running Time ( 1.1) Estimating performance
Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Pseudocode ( 1.1) High-level descriptio of a algorithm More structured
More informationElementary Educational Computer
Chapter 5 Elemetary Educatioal Computer. Geeral structure of the Elemetary Educatioal Computer (EEC) The EEC coforms to the 5 uits structure defied by vo Neuma's model (.) All uits are preseted i a simplified
More informationBig-O Analysis. Asymptotics
Big-O Aalysis 1 Defiitio: Suppose that f() ad g() are oegative fuctios of. The we say that f() is O(g()) provided that there are costats C > 0 ad N > 0 such that for all > N, f() Cg(). Big-O expresses
More informationComputing the minimum distance between two Bézier curves
Computig the miimum distace betwee two Bézier curves Xiao-Diao Che, Liqiag Che, Yigag Wag, Gag Xu, Ju-Hai Yog, Jea-Claude Paul To cite this versio: Xiao-Diao Che, Liqiag Che, Yigag Wag, Gag Xu, Ju-Hai
More informationData Structures and Algorithms. Analysis of Algorithms
Data Structures ad Algorithms Aalysis of Algorithms Outlie Ruig time Pseudo-code Big-oh otatio Big-theta otatio Big-omega otatio Asymptotic algorithm aalysis Aalysis of Algorithms Iput Algorithm Output
More informationThe isoperimetric problem on the hypercube
The isoperimetric problem o the hypercube Prepared by: Steve Butler November 2, 2005 1 The isoperimetric problem We will cosider the -dimesioal hypercube Q Recall that the hypercube Q is a graph whose
More informationBig-O Analysis. Asymptotics
Big-O Aalysis 1 Defiitio: Suppose that f() ad g() are oegative fuctios of. The we say that f() is O(g()) provided that there are costats C > 0 ad N > 0 such that for all > N, f() Cg(). Big-O expresses
More informationOnes Assignment Method for Solving Traveling Salesman Problem
Joural of mathematics ad computer sciece 0 (0), 58-65 Oes Assigmet Method for Solvig Travelig Salesma Problem Hadi Basirzadeh Departmet of Mathematics, Shahid Chamra Uiversity, Ahvaz, Ira Article history:
More informationCS 111: Program Design I Lecture 15: Objects, Pandas, Modules. Robert H. Sloan & Richard Warner University of Illinois at Chicago October 13, 2016
CS 111: Program Desig I Lecture 15: Objects, Padas, Modules Robert H. Sloa & Richard Warer Uiversity of Illiois at Chicago October 13, 2016 OBJECTS AND DOT NOTATION Objects (Implicit i Chapter 2, Variables,
More information9.1. Sequences and Series. Sequences. What you should learn. Why you should learn it. Definition of Sequence
_9.qxd // : AM Page Chapter 9 Sequeces, Series, ad Probability 9. Sequeces ad Series What you should lear Use sequece otatio to write the terms of sequeces. Use factorial otatio. Use summatio otatio to
More informationRunning Time. Analysis of Algorithms. Experimental Studies. Limitations of Experiments
Ruig Time Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Most algorithms trasform iput objects ito output objects. The
More informationA SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON
A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON Roberto Lopez ad Eugeio Oñate Iteratioal Ceter for Numerical Methods i Egieerig (CIMNE) Edificio C1, Gra Capitá s/, 08034 Barceloa, Spai ABSTRACT I this work
More informationExtending The Sleuth Kit and its Underlying Model for Pooled Storage File System Forensic Analysis
Extedig The Sleuth Kit ad its Uderlyig Model for Pooled File System Foresic Aalysis Frauhofer Istitute for Commuicatio, Iformatio Processig ad Ergoomics Ja-Niclas Hilgert* Marti Lambertz Daiel Plohma ja-iclas.hilgert@fkie.frauhofer.de
More information1 Enterprise Modeler
1 Eterprise Modeler Itroductio I BaaERP, a Busiess Cotrol Model ad a Eterprise Structure Model for multi-site cofiguratios are itroduced. Eterprise Structure Model Busiess Cotrol Models Busiess Fuctio
More informationRunning Time ( 3.1) Analysis of Algorithms. Experimental Studies. Limitations of Experiments
Ruig Time ( 3.1) Aalysis of Algorithms Iput Algorithm Output A algorithm is a step- by- step procedure for solvig a problem i a fiite amout of time. Most algorithms trasform iput objects ito output objects.
More informationAnalysis of Algorithms
Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Ruig Time Most algorithms trasform iput objects ito output objects. The
More informationCombination Labelings Of Graphs
Applied Mathematics E-Notes, (0), - c ISSN 0-0 Available free at mirror sites of http://wwwmaththuedutw/ame/ Combiatio Labeligs Of Graphs Pak Chig Li y Received February 0 Abstract Suppose G = (V; E) is
More informationMaster Informatics Eng. 2017/18. A.J.Proença. Memory Hierarchy. (most slides are borrowed) AJProença, Advanced Architectures, MiEI, UMinho, 2017/18 1
Advaced Architectures Master Iformatics Eg. 2017/18 A.J.Proeça Memory Hierarchy (most slides are borrowed) AJProeça, Advaced Architectures, MiEI, UMiho, 2017/18 1 Itroductio Programmers wat ulimited amouts
More informationOn Infinite Groups that are Isomorphic to its Proper Infinite Subgroup. Jaymar Talledo Balihon. Abstract
O Ifiite Groups that are Isomorphic to its Proper Ifiite Subgroup Jaymar Talledo Baliho Abstract Two groups are isomorphic if there exists a isomorphism betwee them Lagrage Theorem states that the order
More informationThe Magma Database file formats
The Magma Database file formats Adrew Gaylard, Bret Pikey, ad Mart-Mari Breedt Johaesburg, South Africa 15th May 2006 1 Summary Magma is a ope-source object database created by Chris Muller, of Kasas City,
More informationUNIVERSITY OF MORATUWA
UNIVERSITY OF MORATUWA FACULTY OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING B.Sc. Egieerig 2014 Itake Semester 2 Examiatio CS2052 COMPUTER ARCHITECTURE Time allowed: 2 Hours Jauary 2016
More informationProtected points in ordered trees
Applied Mathematics Letters 008 56 50 www.elsevier.com/locate/aml Protected poits i ordered trees Gi-Sag Cheo a, Louis W. Shapiro b, a Departmet of Mathematics, Sugkyukwa Uiversity, Suwo 440-746, Republic
More informationLoad balanced Parallel Prime Number Generator with Sieve of Eratosthenes on Cluster Computers *
Load balaced Parallel Prime umber Geerator with Sieve of Eratosthees o luster omputers * Soowook Hwag*, Kyusik hug**, ad Dogseug Kim* *Departmet of Electrical Egieerig Korea Uiversity Seoul, -, Rep. of
More informationPerformance Plus Software Parameter Definitions
Performace Plus+ Software Parameter Defiitios/ Performace Plus Software Parameter Defiitios Chapma Techical Note-TG-5 paramete.doc ev-0-03 Performace Plus+ Software Parameter Defiitios/2 Backgroud ad Defiitios
More informationLecture 2: Spectra of Graphs
Spectral Graph Theory ad Applicatios WS 20/202 Lecture 2: Spectra of Graphs Lecturer: Thomas Sauerwald & He Su Our goal is to use the properties of the adjacecy/laplacia matrix of graphs to first uderstad
More informationA Generalized Set Theoretic Approach for Time and Space Complexity Analysis of Algorithms and Functions
Proceedigs of the 10th WSEAS Iteratioal Coferece o APPLIED MATHEMATICS, Dallas, Texas, USA, November 1-3, 2006 316 A Geeralized Set Theoretic Approach for Time ad Space Complexity Aalysis of Algorithms
More informationEnd Semester Examination CSE, III Yr. (I Sem), 30002: Computer Organization
Ed Semester Examiatio 2013-14 CSE, III Yr. (I Sem), 30002: Computer Orgaizatio Istructios: GROUP -A 1. Write the questio paper group (A, B, C, D), o frot page top of aswer book, as per what is metioed
More informationFAST BIT-REVERSALS ON UNIPROCESSORS AND SHARED-MEMORY MULTIPROCESSORS
SIAM J. SCI. COMPUT. Vol. 22, No. 6, pp. 2113 2134 c 21 Society for Idustrial ad Applied Mathematics FAST BIT-REVERSALS ON UNIPROCESSORS AND SHARED-MEMORY MULTIPROCESSORS ZHAO ZHANG AND XIAODONG ZHANG
More informationCS200: Hash Tables. Prichard Ch CS200 - Hash Tables 1
CS200: Hash Tables Prichard Ch. 13.2 CS200 - Hash Tables 1 Table Implemetatios: average cases Search Add Remove Sorted array-based Usorted array-based Balaced Search Trees O(log ) O() O() O() O(1) O()
More informationOutline and Reading. Analysis of Algorithms. Running Time. Experimental Studies. Limitations of Experiments. Theoretical Analysis
Outlie ad Readig Aalysis of Algorithms Iput Algorithm Output Ruig time ( 3.) Pseudo-code ( 3.2) Coutig primitive operatios ( 3.3-3.) Asymptotic otatio ( 3.6) Asymptotic aalysis ( 3.7) Case study Aalysis
More informationFPGA IMPLEMENTATION OF BASE-N LOGARITHM. Salvador E. Tropea
FPGA IMPLEMENTATION OF BASE-N LOGARITHM Salvador E. Tropea Electróica e Iformática Istituto Nacioal de Tecología Idustrial Bueos Aires, Argetia email: salvador@iti.gov.ar ABSTRACT I this work, we preset
More informationCSE 417: Algorithms and Computational Complexity
Time CSE 47: Algorithms ad Computatioal Readig assigmet Read Chapter of The ALGORITHM Desig Maual Aalysis & Sortig Autum 00 Paul Beame aalysis Problem size Worst-case complexity: max # steps algorithm
More informationFREQUENCY ESTIMATION OF INTERNET PACKET STREAMS WITH LIMITED SPACE: UPPER AND LOWER BOUNDS
FREQUENCY ESTIMATION OF INTERNET PACKET STREAMS WITH LIMITED SPACE: UPPER AND LOWER BOUNDS Prosejit Bose Evagelos Kraakis Pat Mori Yihui Tag School of Computer Sciece, Carleto Uiversity {jit,kraakis,mori,y
More information1.2 Binomial Coefficients and Subsets
1.2. BINOMIAL COEFFICIENTS AND SUBSETS 13 1.2 Biomial Coefficiets ad Subsets 1.2-1 The loop below is part of a program to determie the umber of triagles formed by poits i the plae. for i =1 to for j =
More informationGC05: Algorithmic Complexity & Computability
GC05: Algorithmic Complexity & Computability This part of the course deals with assessig the time-demad of algorithmic procedures with the aim, where possible, of fidig efficiet solutios to problems. We
More informationCMSC Computer Architecture Lecture 11: More Caches. Prof. Yanjing Li University of Chicago
CMSC 22200 Computer Architecture Lecture 11: More Caches Prof. Yajig Li Uiversity of Chicago Lecture Outlie Caches 2 Review Memory hierarchy Cache basics Locality priciples Spatial ad temporal How to access
More informationPython Programming: An Introduction to Computer Science
Pytho Programmig: A Itroductio to Computer Sciece Chapter 6 Defiig Fuctios Pytho Programmig, 2/e 1 Objectives To uderstad why programmers divide programs up ito sets of cooperatig fuctios. To be able to
More informationcondition w i B i S maximum u i
ecture 10 Dyamic Programmig 10.1 Kapsack Problem November 1, 2004 ecturer: Kamal Jai Notes: Tobias Holgers We are give a set of items U = {a 1, a 2,..., a }. Each item has a weight w i Z + ad a utility
More informationCIS 121 Data Structures and Algorithms with Java Spring Stacks, Queues, and Heaps Monday, February 18 / Tuesday, February 19
CIS Data Structures ad Algorithms with Java Sprig 09 Stacks, Queues, ad Heaps Moday, February 8 / Tuesday, February 9 Stacks ad Queues Recall the stack ad queue ADTs (abstract data types from lecture.
More information3D Model Retrieval Method Based on Sample Prediction
20 Iteratioal Coferece o Computer Commuicatio ad Maagemet Proc.of CSIT vol.5 (20) (20) IACSIT Press, Sigapore 3D Model Retrieval Method Based o Sample Predictio Qigche Zhag, Ya Tag* School of Computer
More informationA Comparative Study of Positive and Negative Factorials
A Comparative Study of Positive ad Negative Factorials A. M. Ibrahim, A. E. Ezugwu, M. Isa Departmet of Mathematics, Ahmadu Bello Uiversity, Zaria Abstract. This paper preset a comparative study of the
More informationProject 2.5 Improved Euler Implementation
Project 2.5 Improved Euler Implemetatio Figure 2.5.10 i the text lists TI-85 ad BASIC programs implemetig the improved Euler method to approximate the solutio of the iitial value problem dy dx = x+ y,
More informationCIS 121 Data Structures and Algorithms with Java Fall Big-Oh Notation Tuesday, September 5 (Make-up Friday, September 8)
CIS 11 Data Structures ad Algorithms with Java Fall 017 Big-Oh Notatio Tuesday, September 5 (Make-up Friday, September 8) Learig Goals Review Big-Oh ad lear big/small omega/theta otatios Practice solvig
More informationHomework 1 Solutions MA 522 Fall 2017
Homework 1 Solutios MA 5 Fall 017 1. Cosider the searchig problem: Iput A sequece of umbers A = [a 1,..., a ] ad a value v. Output A idex i such that v = A[i] or the special value NIL if v does ot appear
More informationAnalysis of Algorithms
Presetatio for use with the textbook, Algorithm Desig ad Applicatios, by M. T. Goodrich ad R. Tamassia, Wiley, 2015 Aalysis of Algorithms Iput 2015 Goodrich ad Tamassia Algorithm Aalysis of Algorithms
More informationWhat are we going to learn? CSC Data Structures Analysis of Algorithms. Overview. Algorithm, and Inputs
What are we goig to lear? CSC316-003 Data Structures Aalysis of Algorithms Computer Sciece North Carolia State Uiversity Need to say that some algorithms are better tha others Criteria for evaluatio Structure
More informationModule 8-7: Pascal s Triangle and the Binomial Theorem
Module 8-7: Pascal s Triagle ad the Biomial Theorem Gregory V. Bard April 5, 017 A Note about Notatio Just to recall, all of the followig mea the same thig: ( 7 7C 4 C4 7 7C4 5 4 ad they are (all proouced
More informationSorting in Linear Time. Data Structures and Algorithms Andrei Bulatov
Sortig i Liear Time Data Structures ad Algorithms Adrei Bulatov Algorithms Sortig i Liear Time 7-2 Compariso Sorts The oly test that all the algorithms we have cosidered so far is compariso The oly iformatio
More informationCopyright 2016 Ramez Elmasri and Shamkant B. Navathe
Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 20 Itroductio to Trasactio Processig Cocepts ad Theory Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio Trasactio Describes local
More informationDesign of Digital Circuits Lecture 22: GPU Programming. Dr. Juan Gómez Luna Prof. Onur Mutlu ETH Zurich Spring May 2018
Desig of Digital Circuits Lecture 22: GPU Programmig Dr. Jua Gómez Lua Prof. Our Mutlu ETH Zurich Sprig 2018 18 May 2018 Ageda for Today GPU as a accelerator Program structure Bulk sychroous programmig
More informationChapter 10. Defining Classes. Copyright 2015 Pearson Education, Ltd.. All rights reserved.
Chapter 10 Defiig Classes Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Overview 10.1 Structures 10.2 Classes 10.3 Abstract Data Types 10.4 Itroductio to Iheritace Copyright 2015 Pearso Educatio,
More informationLecture 1: Introduction and Strassen s Algorithm
5-750: Graduate Algorithms Jauary 7, 08 Lecture : Itroductio ad Strasse s Algorithm Lecturer: Gary Miller Scribe: Robert Parker Itroductio Machie models I this class, we will primarily use the Radom Access
More informationChapter 9. Pointers and Dynamic Arrays. Copyright 2015 Pearson Education, Ltd.. All rights reserved.
Chapter 9 Poiters ad Dyamic Arrays Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Overview 9.1 Poiters 9.2 Dyamic Arrays Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Slide 9-3
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Part A Datapath Design
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter The Processor Part A path Desig Itroductio CPU performace factors Istructio cout Determied by ISA ad compiler. CPI ad
More informationStructuring Redundancy for Fault Tolerance. CSE 598D: Fault Tolerant Software
Structurig Redudacy for Fault Tolerace CSE 598D: Fault Tolerat Software What do we wat to achieve? Versios Damage Assessmet Versio 1 Error Detectio Iputs Versio 2 Voter Outputs State Restoratio Cotiued
More informationCMSC Computer Architecture Lecture 10: Caches. Prof. Yanjing Li University of Chicago
CMSC 22200 Computer Architecture Lecture 10: Caches Prof. Yajig Li Uiversity of Chicago Midterm Recap Overview ad fudametal cocepts ISA Uarch Datapath, cotrol Sigle cycle, multi cycle Pipeliig Basic idea,
More informationn Explore virtualization concepts n Become familiar with cloud concepts
Chapter Objectives Explore virtualizatio cocepts Become familiar with cloud cocepts Chapter #15: Architecture ad Desig 2 Hypervisor Virtualizatio ad cloud services are becomig commo eterprise tools to
More informationCubic Polynomial Curves with a Shape Parameter
roceedigs of the th WSEAS Iteratioal Coferece o Robotics Cotrol ad Maufacturig Techology Hagzhou Chia April -8 00 (pp5-70) Cubic olyomial Curves with a Shape arameter MO GUOLIANG ZHAO YANAN Iformatio ad
More informationImprovement of the Orthogonal Code Convolution Capabilities Using FPGA Implementation
Improvemet of the Orthogoal Code Covolutio Capabilities Usig FPGA Implemetatio Naima Kaabouch, Member, IEEE, Apara Dhirde, Member, IEEE, Saleh Faruque, Member, IEEE Departmet of Electrical Egieerig, Uiversity
More informationAdaptive Resource Allocation for Electric Environmental Pollution through the Control Network
Available olie at www.sciecedirect.com Eergy Procedia 6 (202) 60 64 202 Iteratioal Coferece o Future Eergy, Eviromet, ad Materials Adaptive Resource Allocatio for Electric Evirometal Pollutio through the
More informationAlgorithms for Disk Covering Problems with the Most Points
Algorithms for Disk Coverig Problems with the Most Poits Bi Xiao Departmet of Computig Hog Kog Polytechic Uiversity Hug Hom, Kowloo, Hog Kog csbxiao@comp.polyu.edu.hk Qigfeg Zhuge, Yi He, Zili Shao, Edwi
More informationIntro to Scientific Computing: Solutions
Itro to Scietific Computig: Solutios Dr. David M. Goulet. How may steps does it take to separate 3 objects ito groups of 4? We start with 5 objects ad apply 3 steps of the algorithm to reduce the pile
More informationWeston Anniversary Fund
Westo Olie Applicatio Guide 2018 1 This guide is desiged to help charities applyig to the Westo to use our olie applicatio form. The Westo is ope to applicatios from 5th Jauary 2018 ad closes o 30th Jue
More informationPattern Recognition Systems Lab 1 Least Mean Squares
Patter Recogitio Systems Lab 1 Least Mea Squares 1. Objectives This laboratory work itroduces the OpeCV-based framework used throughout the course. I this assigmet a lie is fitted to a set of poits usig
More informationPrime Cordial Labeling on Graphs
World Academy of Sciece, Egieerig ad Techology Iteratioal Joural of Mathematical ad Computatioal Scieces Vol:7, No:5, 013 Prime Cordial Labelig o Graphs S. Babitha ad J. Baskar Babujee, Iteratioal Sciece
More informationImproving Template Based Spike Detection
Improvig Template Based Spike Detectio Kirk Smith, Member - IEEE Portlad State Uiversity petra@ee.pdx.edu Abstract Template matchig algorithms like SSE, Covolutio ad Maximum Likelihood are well kow for
More informationPartial Elements Reuse of Vector Register in SIMD Mathematical
Partial Elemets Reuse of Vector Register i SIMD Mathematical Fuctios *1 Lei Wag, 2 Zhag Chu-Ya, 1 Yog-Zhog Huag, 1 Ji-Che Xu 1 Zhegzhou Iformatio Sciece ad Techology Istitute, waglei1167@gmail.com 2 School
More informationCSC165H1 Worksheet: Tutorial 8 Algorithm analysis (SOLUTIONS)
CSC165H1, Witer 018 Learig Objectives By the ed of this worksheet, you will: Aalyse the ruig time of fuctios cotaiig ested loops. 1. Nested loop variatios. Each of the followig fuctios takes as iput a
More informationMathematical Stat I: solutions of homework 1
Mathematical Stat I: solutios of homework Name: Studet Id N:. Suppose we tur over cards simultaeously from two well shuffled decks of ordiary playig cards. We say we obtai a exact match o a particular
More informationCivil Engineering Computation
Civil Egieerig Computatio Fidig Roots of No-Liear Equatios March 14, 1945 World War II The R.A.F. first operatioal use of the Grad Slam bomb, Bielefeld, Germay. Cotets 2 Root basics Excel solver Newto-Raphso
More informationStone Images Retrieval Based on Color Histogram
Stoe Images Retrieval Based o Color Histogram Qiag Zhao, Jie Yag, Jigyi Yag, Hogxig Liu School of Iformatio Egieerig, Wuha Uiversity of Techology Wuha, Chia Abstract Stoe images color features are chose
More informationVISUALSLX AN OPEN USER SHELL FOR HIGH-PERFORMANCE MODELING AND SIMULATION. Thomas Wiedemann
Proceedigs of the 2000 Witer Simulatio Coferece J. A. Joies, R. R. Barto, K. Kag, ad P. A. Fishwick, eds. VISUALSLX AN OPEN USER SHELL FOR HIGH-PERFORMANCE MODELING AND SIMULATION Thomas Wiedema Techical
More informationCOP4020 Programming Languages. Functional Programming Prof. Robert van Engelen
COP4020 Programmig Laguages Fuctioal Programmig Prof. Robert va Egele Overview What is fuctioal programmig? Historical origis of fuctioal programmig Fuctioal programmig today Cocepts of fuctioal programmig
More informationLecturers: Sanjam Garg and Prasad Raghavendra Feb 21, Midterm 1 Solutions
U.C. Berkeley CS170 : Algorithms Midterm 1 Solutios Lecturers: Sajam Garg ad Prasad Raghavedra Feb 1, 017 Midterm 1 Solutios 1. (4 poits) For the directed graph below, fid all the strogly coected compoets
More information5.3 Recursive definitions and structural induction
/8/05 5.3 Recursive defiitios ad structural iductio CSE03 Discrete Computatioal Structures Lecture 6 A recursively defied picture Recursive defiitios e sequece of powers of is give by a = for =0,,, Ca
More informationEE123 Digital Signal Processing
Last Time EE Digital Sigal Processig Lecture 7 Block Covolutio, Overlap ad Add, FFT Discrete Fourier Trasform Properties of the Liear covolutio through circular Today Liear covolutio with Overlap ad add
More informationCOMP Parallel Computing. PRAM (1): The PRAM model and complexity measures
COMP 633 - Parallel Computig Lecture 2 August 24, 2017 : The PRAM model ad complexity measures 1 First class summary This course is about parallel computig to achieve high-er performace o idividual problems
More informationRecursion. Computer Science S-111 Harvard University David G. Sullivan, Ph.D. Review: Method Frames
Uit 4, Part 3 Recursio Computer Sciece S-111 Harvard Uiversity David G. Sulliva, Ph.D. Review: Method Frames Whe you make a method call, the Java rutime sets aside a block of memory kow as the frame of
More informationPerhaps the method will give that for every e > U f() > p - 3/+e There is o o-trivial upper boud for f() ad ot eve f() < Z - e. seems to be kow, where
ON MAXIMUM CHORDAL SUBGRAPH * Paul Erdos Mathematical Istitute of the Hugaria Academy of Scieces ad Reu Laskar Clemso Uiversity 1. Let G() deote a udirected graph, with vertices ad V(G) deote the vertex
More informationMulti-Threading. Hyper-, Multi-, and Simultaneous Thread Execution
Multi-Threadig Hyper-, Multi-, ad Simultaeous Thread Executio 1 Performace To Date Icreasig processor performace Pipeliig. Brach predictio. Super-scalar executio. Out-of-order executio. Caches. Hyper-Threadig
More informationAccuracy Improvement in Camera Calibration
Accuracy Improvemet i Camera Calibratio FaJie L Qi Zag ad Reihard Klette CITR, Computer Sciece Departmet The Uiversity of Aucklad Tamaki Campus, Aucklad, New Zealad fli006, qza001@ec.aucklad.ac.z r.klette@aucklad.ac.z
More informationΣ P(i) ( depth T (K i ) + 1),
EECS 3101 York Uiversity Istructor: Ady Mirzaia DYNAMIC PROGRAMMING: OPIMAL SAIC BINARY SEARCH REES his lecture ote describes a applicatio of the dyamic programmig paradigm o computig the optimal static
More informationXiaozhou (Steve) Li, Atri Rudra, Ram Swaminathan. HP Laboratories HPL Keyword(s): graph coloring; hardness of approximation
Flexible Colorig Xiaozhou (Steve) Li, Atri Rudra, Ram Swamiatha HP Laboratories HPL-2010-177 Keyword(s): graph colorig; hardess of approximatio Abstract: Motivated b y reliability cosideratios i data deduplicatio
More informationMassachusetts Institute of Technology Lecture : Theory of Parallel Systems Feb. 25, Lecture 6: List contraction, tree contraction, and
Massachusetts Istitute of Techology Lecture.89: Theory of Parallel Systems Feb. 5, 997 Professor Charles E. Leiserso Scribe: Guag-Ie Cheg Lecture : List cotractio, tree cotractio, ad symmetry breakig Work-eciet
More informationThe Platonic solids The five regular polyhedra
The Platoic solids The five regular polyhedra Ole Witt-Hase jauary 7 www.olewitthase.dk Cotets. Polygos.... Topologically cosideratios.... Euler s polyhedro theorem.... Regular ets o a sphere.... The dihedral
More informationEnhancing Efficiency of Software Fault Tolerance Techniques in Satellite Motion System
Joural of Iformatio Systems ad Telecommuicatio, Vol. 2, No. 3, July-September 2014 173 Ehacig Efficiecy of Software Fault Tolerace Techiques i Satellite Motio System Hoda Baki Departmet of Electrical ad
More informationHow do we evaluate algorithms?
F2 Readig referece: chapter 2 + slides Algorithm complexity Big O ad big Ω To calculate ruig time Aalysis of recursive Algorithms Next time: Litterature: slides mostly The first Algorithm desig methods:
More informationCMSC22200 Computer Architecture Lecture 9: Out-of-Order, SIMD, VLIW. Prof. Yanjing Li University of Chicago
CMSC22200 Computer Architecture Lecture 9: Out-of-Order, SIMD, VLIW Prof. Yajig Li Uiversity of Chicago Admiistrative Stuff Lab2 due toight Exam I: covers lectures 1-9 Ope book, ope otes, close device
More information