Cache and I/O Efficient Functional Algorithms

Size: px
Start display at page:

Download "Cache and I/O Efficient Functional Algorithms"

Transcription

1 Cache ad I/O Efficiet Fuctioal Algorithms Guy E. Blelloch Robert Harper Caregie Mello Uiversity Abstract The widely studied I/O ad ideal-cache models were developed to accout for the large differece i costs to access memory at differet levels of the memory hierarchy. Both models are based o a two level memory hierarchy with a fixed size primary memory (cache) of size M, a ubouded secodary memory orgaized i blocks of size B. The cost measure is based purely o the umber of block trasfers betwee the primary ad secodary memory. All other operatios are free. May algorithms have bee aalyzed i these models ad ideed these models predict the relative performace of algorithms much more accurately tha the stadard RAM model. The models, however, require specifyig algorithms at a very low level requirig the user to carefully lay out their data i arrays i memory ad maage their ow memory allocatio. I this paper we preset a cost model for aalyzig the memory efficiecy of algorithms expressed i a simple fuctioal laguage. We show how some algorithms writte i stadard forms usig just lists ad trees (o arrays) ad requirig o explicit memory layout or memory maagemet are efficiet i the model. We the describe a implemetatio of the laguage ad show provable bouds for mappig the cost i our model to the cost i the idealcache model. These boud imply that purely fuctioal programs based o lists ad trees with o special attetio to ay details of memory layout ca be as asymptotically as efficiet as the carefully desiged imperative I/O efficiet algorithms. For example we describe a O( log B M/B ) cost sortig algorithm, which is B optimal i the ideal cache ad I/O models. Categories ad Subject Descriptors D.3.1 [Programmig Laguages]: Formal Defiitios ad Theory; F.2.2 [Aalysis of Algorithms ad Problem Complexity]: Tradeoffs ad Complexity Measures; F.3.2 [Logics ad Meaigs of Programs]: Sematics of Programmig Laguages Geeral Terms Theory. Keywords 1. Itroductio Algorithms, Desig, Laguages, Performace, cost sematics, I/O algorithms O today s computers there is a vast differece i cost for accessig differet levels of the memory hierarchy, whether it be registers, Permissio to make digital or hard copies of all or part of this work for persoal or classroom use is grated without fee provided that copies are ot made or distributed for profit or commercial advatage ad that copies bear this otice ad the full citatio o the first page. To copy otherwise, to republish, to post o servers or to redistribute to lists, requires prior specific permissio ad/or a fee. POPL 13, Jauary 23 25, 2013, Rome, Italy. Copyright c 2013 ACM /13/01... $10.00 oe of may levels of cache, the mai memory, or a disk. O curret processors, for example, there is over a factor of a hudred betwee the time to access a register ad mai memory, ad aother factor of a hudred or so betwee mai memory ad disk, eve a solid state disk (SSD). This variace i costs is cotrary to the stadard Radom Access Machie (RAM) model, which assumes that the cost of accessig memory is uiform. To accout for o uiformity several cost models have bee developed that assig differece costs to differet levels of the memory hierarchy. The widely used I/O [2] ad ideal-cache [9] models both assume a two level memory hierarchy with a fixed size primary memory (cache) of size M, a ubouded secodary memory partitioed ito blocks of size B. Cost is measured i terms of the umber of block trasfers betwee primary ad secodary memory all other operatios are cosidered free. The parameters M ad B are cosidered variables for the sake of aalysis ad therefore show up i asymptotic bouds. Algorithms that do well i these models are ofte referred to as I/O efficiet or cache efficiet i this paper we will geerically use the term cache efficiet. The theory of cache efficiet algorithms is ow well developed (see e.g. the surveys [4, 6, 10, 15, 17, 22]) ad the models ideed much more accurately capture the relative cost of algorithms o real machies tha the RAM model does. This is true both i the cotext of algorithms that must ru off disk whe there is ot eough mai memory, ad also i the cotext of algorithms that ca fit i mai memory, but ot i various levels of the cache. For example, the models properly idicate that a blocked or hierarchical matrix-matrix multiply ( is ) much more efficiet tha the aïve triply ested loop (Θ 3 B vs. Θ( 3 )). I the RAM M they have equal costs. The models also idicate that properly implemeted versios of mergesort ad quicksort are reasoably cache efficiet but that samplesort ad multiway mergesort are more efficiet, ad i fact optimal. Correspodigly all the fastest disk sorts ideed use some variat of samplesort or multiway mergesort, as the theory predicts [19]. Although the study of cache efficiet algorithms has bee very successful i idetifyig algorithms that are fast i practice, ot surprisigly desigig ad programmig algorithms for these models requires a careful layout of memory ad careful maagemet of space. Both temporal ad spatial locality is critical i achievig good bouds. Spatial locality is importat sice memory is moved i blocks of size B, correspodig to either cache lies or memory pages. For example although mergig two arrays of itegers is reasoably efficiet, the cost of mergig two liked lists will deped o how the liks are laid out i memory ad eeds to be cosidered with care. Care is also eeded whe allocatig ad freeig memory sice touchig uused memory icurs a cache miss. It is therefore importat to reuse freed space immediately rather tha returig it to a pool which might be evicted by the time it is reused a geeric memory allocator or garbage collectio scheme will likely ot do the right thig. To properly maage this problem, memory 39

2 is typically preallocated ad fully maaged by the user/algorithm desiger. Needless to say, this form of programmig is icosistet with fuctioal programmig, especially whe usig recursive data types such as lists or trees. However, it is kow experimetally that by usig certai stadard memory allocatio schemes purely fuctioal programs (o side effects) ca be reasoably cache efficiet with regards to both spatial ad temporal locality [7, 8, 12, 23]. We give two examples. Firstly, cosider applyig map with some simple fuctio (e.g. icremet) over a list of itegers, ad the applyig the same map to the output. If the allocator keeps a poiter that gets icremeted o each allocatio, the after the first map all the cells of the list will be allocated adjacetly. O the secod map sice the allocatios are adjacet, readig the whole list will oly icur O(/B) cache misses, where B is the block size, ad evictig the ewly geerated blocks will also icur oly O(/B) misses. This gives O(/B) cost, which asymptotically matches the cost of a optimal array versio i a imperative settig. If the list were i a arbitrary order, the cost would be O(). All we have doe is oted that the temporal locality of the allocatios will lead to spatial locality of how they are laid out i memory. Secodly, cosider a block recursive matrix multiply o two matrices. Such a algorithm will ever require more tha O( 2 ) live space but if recursio stops at problems of a costat size it will allocate a total of O( 3 ) space. Assumig that the maximum live space fits withi the cache we should be able to ru our matrix multiply with oly O( 2 /B) cache misses, eeded for loadig the two matrices ad storig the result, but this would require beig careful about reusig freed space that is already i cache. Fortuately geeratioal garbage collectors have approximately this effect [8]. I particular if we make the first geeratio smaller tha the size of the cache (M) the we will reclaim the memory wheever the allocatio area fills, ad reuse memory that is already i cache. This does ot quite work i geeral sice what is live at the time of the mior collectio might get bumped from cache, but it gives some idicatio that it is ot hopeless to make the atural recursive matrix multiply algorithm, as well as similar recursive algorithms, cache efficiet. We show that oe ca ideed implemet cache-efficiet algorithms i a call-by-value fuctioal settig usig recursive data types, ad get provably efficiet bouds o cache complexity. I particular we show that oe ca express algorithms at a high level usig stadard techiques ad achieve optimal asymptotic performace whe implemeted o the ideal cache. Of course we do ot expect the algorithm desiger to uderstad the itricacies the garbage collector works i order to aalyze their algorithm. Istead our approach cosist of providig a reasoably high-level cost sematics that abstracts away from implemetatio details such as the garbage collectio method, but still admits precise aalysis of the cost of a algorithm o a two-level memory architecture. We the describe a provably-efficiet implemetatio of the laguage o the ideal-cache model. We show that by usig this implemetatio the costs aalyzed i the high-level cost model asymptotically match the umber of cache misses i the uderlyig ideal-cache model. The geeral idea of usig high-level cost models based o a cost sematics alog with a provable efficiet implemetatio that maps the cost oto a lower level machie model has previously bee used i the cotext of parallel cost models [5, 11, 13, 21]. Our high-level cost model cosists of a operatioal sematics for a call-by-value variat of PCF i which we make explicit the allocatio of ad access to data objects. The store cosists of three parts: a mai memory, a allocatio cache ad a read cache. Both caches have size M ad the memory is orgaized i blocks of size B (both measured i terms of abstract data objects). Data ca migrate from the allocatio cache to memory ad from memory to the read cache, always i blocks of size B. Allocatios are made i the allocatio cache, ad if the umber of live objects i the cache exceeds M, the the B oldest locatios are evicted to memory as a block, havig uit cost. The read cache cotais a subset of the memory blocks. A read has o cost if its locatio is i the read or allocatio cache, otherwise it requires loadig a block from memory ito the read cache, havig uit cost, ad possibly ejectig a existig block. Hece the oly costs are for evictig a block from the allocatio cache or loadig a block ito the read cache. Sice we are oly cocered with measurig the traffic betwee mai memory ad cache memory, garbage collectio for mai memory is ot modeled, but we do accout for the detectio of live objects, ad their migratio to mai memory, whe the cache limit is exceeded. The provable implemetatio uses a geeratioal collector to maitai the allocatio cache. It uses a ursery of size 2M ad allocates util the space rus out. It the traces the ursery for the live data. If there is L > M live data, the L M locatios are writte to memory i blocks of B, leavig the ursery with at most M locatios. The implemetatio allocates the stack i the heap ad must amortize the cost of loadig old stack frames agaist other operatios sice they are ot modeled i the high-level cost sematics. We emphasize that the algorithm desiger eed ot kow aythig about the garbage collector or how the stack is maaged to aalyze their algorithm; these cocepts are oly part of the provable implemetatio. The cost model is described i Sectio 3 ad the provable implemetatio is described i Sectios 4 ad 5. To demostrate the utility of our approach, i Sectio 6 we describe some geeral techiques for aalyzig the cost of algorithms i our model ad show three examples of how to aalyze the cost of algorithms i the model: mergesort, k-way mergesort ad matrix multiply. Importatly our results o sortig ad matrix multiply match the bouds ( for algorithms) implemeted ( directly ) i the ideal-cache model (O log B M/B ad O 3 B B respectively). The bouds for sortig are optimal. Because of our provable N implemetatio bouds these results imply that o the ideal-cache model our algorithms writte i a fuctioal style usig lists ad trees are asymptotically as efficiet as the low-level imperative programs. To aalyze the algorithms we itroduce the otio of a data structure beig compact with respect to a traversal order. This is the way we capture the spatial locality of data structures i a laguage that has o explicit way to express memory layout. Related Work Although there has bee a large amout of experimetal work o showig how good garbage collectio ca lead to efficiet use of caches ad disks ([7, 8, 12, 23] ad may refereces i [14]), we kow of oe that try to prove bouds for algorithms for fuctioal programs whe maipulatig recursive data types such as lists or trees. Abello et. al. [1] show how a fuctioal style ca be used to desig cache efficiet graph algorithms. They however assume that data structures are i arrays (called lists), ad that primitives for operatios such as sortig, map, filter ad reductios are supplied ad implemeted with optimal asymptotic cost (presumably at a lower level usig imperative code). Their goal is therefore to desig graph algorithms by composig these high-level operatios o collectios. They do ot explai how to deal with garbage collectio or memory maagemet. 2. Backgroud I/O ad Cachig Models The two-level I/O model of Aggarwal ad Vitter [2] assumes a memory hierarchy cosistig of mai memory of size M ad a u- 40

3 bouded secodary memory. 1 Both memories are partitioed ito blocks of size B of cosecutive memory locatios. All computatio must be performed from mai memory, which is treated like a stadard RAM, but there is a additioal istructio for movig a block of memory from secodary memory to mai memory ad oe for movig the other way. The cost of a algorithm is aalyzed i terms of the umber of block trasfers the cost of operatios withi the mai memory is igored. May algorithms ca be aalyzed i this model ad it is perhaps surprisig how accurately it is able to capture the relative performace of algorithms. I their origial work, for example, Aggarwal ad Vitter showed tight upper ad ( B log M/B B lower bouds for sortig keys, with I/O cost Θ The two algorithm that match this boud are a multiway mergesort ad a distributio sort, which are the stadard algorithms used for disk based sortig, ad they both perform sigificatly better tha quicksort or stadard mergesort. These algorithms are more efficiet sice they do ot eed to pass over the data as may times. The I/O model ca capture either the distictio betwee cache ad mai memory or betwee mai memory ad disk. I the first case the memory size correspods to the cache size ad the block size to the cache-lie size, ad i the secod case the memory size correspods to the mai memory size ad the block size to the page size (or whatever the trasfer size betwee the disk ad mai memory is). Oe might ote, however, that while the I/O model assumes two address spaces ad the user explicitly moves data betwee them, a machie with caches assumes a sigle address space ad makes its ow decisios about what gets evicted from cache, e.g., usig a least recetly used (LRU) policy. The ideal-cache model [9] ca be used to better model a cache. It is similar to the I/O model but assumes the primary memory is treated as a cache with a ideal evictio policy. I particular the programmer oly accesses oe address space ad a block is brought ito the cache whe a memory locatio is accessed whose block is ot already i cache. Brigig i a block might require evictig aother block from the cache. The model assumes that the best decisio is always made, which is to evict the lie used furthest i the future (the optimal off-lie replacemet policy). Sice i practice we do t kow the future, this is ot possible o-lie, but it is proved by Sleator ad Tarja s semial work o competitive pagig [20] that a LRU policy is always competitive with the optimal strategy (withi costat factors i time ad cache size). Therefore from a theoretical poit of view the models are asymptotically the same. I this paper we will be usig the idealcache model for simplicity although the results are also apply to the I/O model. The ideal-cache model is ofte used i the cotext of cacheoblivious algorithms. These are simply algorithms for which the algorithm does ot make ay decisios based o the cache parameters M ad B, although of course the aalysis of cache complexity will deped o M ad B. The advatage of cache-oblivious algorithms is that sice they are oblivious to the cache parameters they work across multiple levels of a cache hierarchy simultaeously. Most of the algorithms i this paper are cache oblivious, but our k-way mergesort is ot. We leave it as a ope questio whether it is possible to develop a I/O-efficiet cache-oblivious sortig algorithm i our model. Cache Efficiet Algorithms We ow review some basic well kow results o cache efficiet algorithms i the imperative settig. The fuctioal algorithms we preset i Sectio 6 are based o the algorithms described here, 1 Aggarwal ad Vitter also cosidered a versio of the model with parallel disk access, but most iterestig results are explaied with the sigle disk versio. ). but do ot require arrays or explicitly memory maagemet. We first cosider mergesort. Throughout our discussio we assume that the elemets beig sorted each fit i a sigle machie word. All cache efficiet algorithms we kow of for sortig store the iput ad output elemets directly i arrays. First cosider mergig two arrays of keys A ad B ito a output array C of legth (as usual we assume the iputs are sorted i A ad B i icreasig order). The stadard sequetial algorithm for mergig starts at the begiig of each array keepig a figer o each, fidig the the lesser of the two keys at the figers, copyig this key to C, ad icremetig the appropriate figer. This algorithm has a cache complexity O(/B) as log as M 3B. This ca be see by otig that at ay give time we oly eed oe block from each of A, B ad C residet i cache, ad that we fully process the block before eedig the ext block. Therefore every block is oly eeded oce. For mergesort we assume the stadard divide-ad-coquer versio, which recursively sorts each half of the array ad the merges the result. Sice mergig as described caot be doe i place we have to be specific o how to maage memory. I particular allocatig a ew array for the result ad the freeig the two old arrays usig a geeral purpose memory allocator will likely ot lead to the desired bouds (uless oe ca esure special properties of the memory allocator). Istead the algorithm eeds to pre-allocate a temporary array of legth ad pass parts of this array to all subcalls. I particular mergesort could take as argumets both the iput array ad a equal legth temporary array. The result is retured i the iput array ad the temporary array is used to merge ito. Although these optimizatios are relatively obvious ad stadard to programmers of imperative code, we brig them up to emphasize the care that eeds to be take to esure the cache bouds it is ot simply a issue of reducig the umber of calls to the memory allocator, it ca actually asymptotically affect the cache bouds. The cache complexity of this mergesort ca be aalyzed by cosiderig two cases. The first is whe the full computatio fits i cache. I this case the two arrays eed oly be loaded ito cache oce ad all the work ca be doe i cache. The problem fits i cache as log as 2 + log M, where the log accouts for the stack size. The secod case is whe the problem does ot fit i cache. I this case we have to pay for the cache misses o the two recursive calls plus the cache misses of the merge. This gives the followig recurrece for the cache complexity Q(): { 2Q( Q() = ) + O( ) 2 + log > M 2 B O( ) otherwise (1) B The solutio to this recurrece ca be derived by otig that the top log 2 (2/M) levels of the recursio do ot fit i memory while the lower levels do. The total cache complexity across each of the upper levels is O(/B) so the total overall cache complexity is O(/B log 2 (/M)). We ote that this does ot match the optimal cache complexity for sortig but is sigificatly better tha simply assumig every access is a cache miss specifically, a factor of B log /(log log M) better. For sortig words i a memory with 10 9 words ad a block size of 10 3 words, it is about a factor of about 4000 better. Quicksort has basically the same complexity as mergesort, although i the expected case. This is because scaig the iput array to split it ito the lesser ad larger elemets ca be doe usig two figers like i mergig so agai each block oly eeds to be loaded oce. We ow describe a sort that is optimal for the I/O model. The idea is istead of partitioig the iput array ito two ad recursively callig sort o each, to partitio the iput array ito k parts, sort each part, ad the merge all the parts. Sice istead of havig just two arrays to merge we have k arrays, we require a k-way merge. Without goig ito too much detail, such a merge 41

4 ca be implemeted usig oe block of memory for each of the iputs eedig to be merged as well as oe block for the output. We keep a figer o each iput ad o each step select the miimum key at the figers, move it to the output buffer ad icremet that figer. As log as all iput blocks, the output block ad ay data for maitaiig the figers fit i cache, the the k-way merge will ru with cache complexity O(/B), which is the same as the biary merge. Sice there will be k iput blocks ad 1 output blocks, the space eeded for the blocks is (k + 1)B. Therefore accoutig for overheads everythig will fit i cache as log as ckb M, or equivaletly k M/(cB) for some costat c. We therefore pick k to be as large as possible, givig k = M/(cB) As i the two way merge we eed to be careful about allocatio ad preallocate temporary arrays to copy the output. We agai ca aalyze the algorithm by cosiderig the case whe the problem fits i memory ad whe it does ot. This gives the recurrece: Q() = { M Q( ) + O( ) B M/cB B c > M O( ) otherwise (2) B where c ad c are costats, but, B ad M are variables. This solves to O(/B log M/B (/B)). This boud matches the lower boud for sortig i the I/O model [2] ad hece also the idealcache model. The k-way mergesort is therefore asymptotically optimal. 3. Cost Sematics I this sectio we defie a evaluatio dyamics that assigs a cost to a complete executio of a program. Followig the I/O model, the cost measures the cache complexity, which is defied to be the traffic caused by the trasfer of objects betwee the mai memory ad the memory cache. Accesses to objects i cache are cosidered to be cost-free, whereas migratio of objects from cache ito memory ad from memory ito the cache are charged uit cost. The dyamics is based o a two-level model of storage that icludes a fixed-size allocatio cache ad a fixed-size read cache together with a mai memory of ubouded size. The evaluatio dyamics provides the basis for assessig both the correctess ad the cache complexity of programs. It is formulated at a sufficietly abstract level to free the programmer from havig to reaso directly about the compiler ad ru-time system, but is sufficietly cocrete as to admit a implemetatio with a provable boud o its cache complexity. Thus, we may achieve the same overall results as are obtaied usig oly low-level machie models i previous work o I/O algorithms, while workig at the much more practical level of abstractio offered by fuctioal programmig laguages. We give the dyamics of a call-by-value variat of Plotki s PCF laguage [18]. The sytax of expressios is summarized by the followig grammar: e ::= x z s(e) ifz(e; e 0; x.e 1) fu(x, y.e) app(e 1; e 2) The coditioal tests whether a umber is zero or ot, ad passes the predecessor to the o-zero case. Fuctios are equipped with a ame for themselves to allow for recursio. The typig rules are stadard, ad are omitted here for the sake of cocisio. (See, for example, Chapter 10 of [13].) For illustrative purposes atural umbers are treated as heapallocated data structures of ubouded size (as will become evidet shortly). It is straightforward to exted the laguage to accout for a richer variety of data structures, icludig sum, product, fiite sequece, ad recursive types, ad to accout for typical hardware-orieted cocepts such as machie words ad floatig poit umbers. Storage Model Followig Morrisett, et al. [16], the dyamics distiguishes large from small values, with large values beig allocated i memory ad represeted by a locatio, ad small values beig those that are maipulated directly. I the preset case the oly small values are locatios, but it is also possible to cosider, for example, fixedsized umbers as forms of small value. Correspodigly, all other forms of value (umbers ad fuctios) are large. We also allocate stack frames, which reify the cotrol state of evaluatio, i memory. A memory object is either a large value of a stack frame. The two-level memory model is parameterized by two costats, the block size B, ad the cache size M = c B determied by some costat c represetig the umber of blocks i the cache. A memory µ is a fiite mappig assigig a memory object to each of a fiite set dom(µ) of abstract locatios. The memory may grow without boud. (We do ot cosider here the separate problem of garbage collectio for mai memory, for which see Morrisett, et al. [16].) As a techical coveiece, we assume that locatios are divided ito two classes, value locatios, l, ad stack locatios, s, ad require that a memory map value locatios to large values ad stack locatios to stack frames. Whe the distictio is immaterial, we speak simply of locatios ad objects i memory. A memory µ comes equipped with a equivalece relatio l µ l over dom(µ) specifyig that l ad l are eighbors i µ. Additioally, we require that each equivalece class i the domai of a memory is of size B. A memory whose domai cosists of a sigle equivalece class of size B is called a block. The eighborhood bhd(µ, l) of a locatio l dom(µ) is the restrictio of µ to the eighbors of l i µ, a sigle block. The expasio µ β of a memory µ by a block β such that dom(β) dom(µ) = is the memory µ that agrees with µ ad β o their respective domais ad for which l µ l iff l µ l or l β l. There are two forms of cache mediatig access to memory. A read cache ρ for a memory µ is the restrictio of µ to a fiite set of locatios of size at most M. The cotractio ρ β of a read cache ρ by a block β ρ is the read cache ρ such that ρ = ρ β. A ursery ν is a fiite mappig that associates a object to each a fiite set dom(ν) of locatios. A ursery comes equipped with a liear orderig l ν l of dom(ν), called the allocatio orderig. If l ν l we say that l is older tha l ad that l is ewer tha l i ν. The extesio ν[l o] of a ursery ν bidig a locatio l / dom(µ) to a object o is the ursery ν such that (1) ν (l) = o ad ν (l ) = ν(l ) for each l dom(ν), ad (2) l ν l for every l dom(ν). The cotractio ν β of a ursery ν by a block β ν is the restrictio of ν to dom(ν) \ dom(β). The live locatios live(r, ν) i a ursery ν relative to a subset R dom(ν) cosists of those locatios i dom(ν) that are (trasitively) reachable from locatios i R. The sca sca(r, ν) of a ursery ν with respect to a subset R dom(ν) is the block β of cosistig of the oldest B live locatios i live(r, ν). (See Morrisett, et al. [16] for formal defiitios of these stadard cocepts.) It will be a ivariat of the dyamics that the ursery cotais at most M live objects relative to the roots of the computatio. A store σ is a triple (µ, ρ, ν) cosistig of a memory µ, a read cache ρ for µ, ad a ursery ν such that dom(ν) dom(µ) =. The domai of a store σ is defied by dom(σ) = dom(µ) dom(ν). A iitial store is a store i which the mai memory cotais oly large values ad i which the read cache ad allocatio area are empty. Evaluatio Dyamics The overall goal of the evaluatio dyamics is to defie the evaluatio of a closed expressio by a iductive defiitio of a relatio betwee a expressio ad its value, which is always small, ad its cost, a o-egative iteger. The cost is computed by trackig the 42

5 z R l z R l (3a) { } s( ) R locs(e ) s e R {s } l s(l ) R l s(e ) + + R l (3b) ifz( ; e 2; x.e 3) 1 R locs(e 1 ) σ e 1 1 R {s 1 } σ l 1 s1 σ l 1 1 σ z σ e 2 2 R l ifz(e 1; e 2; x.e 3) R l (3c) ifz( ; e 2; x.e 3) 1 R locs(e 1 ) s1 σ e 1 1 R {s 1 } σ l 1 σ l 1 1 σ s(l 1 ) σ [l 1 /x]e 3 3 R l ifz(e 1; e 2; x.e 3) R l (3d) fu(x, y.e) R l fu(x, y.e) R l (3e) app( ; e 2) 1 R locs(e 1 ) s1 e1 1 R {s 1 } σ l 1 σ l 1 1 σ fu(x, y.e) σ app(l 1; ) 1 R s2 σ e 2 2 R {s 2 } σ l 2 σ [l 1, l 2/x, y]e 2 R l app(e 1; e 2) R l (3f) Figure 1. Cost Dyamics l dom(ρ) (µ, ρ, l 0 (µ, ρ, ρ(l) (4a) l dom(ν) (µ, ρ, l 0 (µ, ρ, ν(l) (4b) l / dom(ρ) dom(ν) dom(ρ) M B (µ, ρ, l 1 (µ, ρ bhd(µ, l), µ(l) (4c) l / dom(ρ) dom(ν) dom(ρ) = M β ρ (µ, ρ, l 1 (µ, ρ β bhd(µ, l), µ(l) (4d) live(r locs(o), ν) < M l / dom(ν) (µ, ρ, o 0 R (µ, ρ, ν[l l (5a) live(r locs(o), ν) = M β = sca(r locs(o), ν) l / dom(ν) (µ, ρ, o 1 R (µ β, ρ, (ν β)[l l Figure 2. Readig ad Allocatio (5b) movemet of objects amog the compoets of the store, chargig oe uit of cost wheever a block of objects must be moved to or copied from mai memory, ad chargig zero cost otherwise. (So, for example, a computatio that rus etirely i cache will be assiged zero cost, cosistetly with the I/O model.) To accout for the memory traffic ivolvig values, the dyamics makes explicit the allocatio of objects i the ursery, their evictio to mai memory whe the capacity of the ursery is exceeded, ad their movemet ito the read cache as they are required by the computatio. To accout for the memory traffic attributable to the implicit cotrol stack, the dyamics also allocates (but does ot otherwise use) stack frames, ad esures that ay data that would appear i the stack is kept live by the dyamics. These cosideratios lead to the evaluatio judgmet e R l statig that the expressio e, whe evaluated with respect to a store σ such that dom(σ) locs(e) ad to roots R dom(σ), results i a modified store σ, a locatio l represetig the (large) value of the expressio, ad a cost represetig the cache complexity of the executio. The modificatios to the store cosist of allocatios i the ursery, migratios of objects from the ursery to the mai memory, ad copyig of objects from the mai memory to the read cache. All memory traffic occurs i blocks of B objects, correspodig to loadig a cache lie or readig a block from disk. The roots R represet locatios that are to be kept live by virtue of their beig preset i the implicit cotrol stack or expressio uder evaluatio. The evaluatio judgmet is defied by the rules i Figure 1, makig use of two auxiliary judgmets for readig ad allocatig objects defied i Figure 2. It may be helpful to read through the rules oce while igorig all but the evaluatio judgmets to see that the rules defie a covetioal eager dyamics for a fuctioal laguage. O such a readig the root set plays o role, ad ca be igored. Moreover, the cost assigmet has o sigificace uder such a simplificatio. Next, let us cosider the roles of the read judgmets l σ ad the allocate judgmets v R l, where v is a value, i the dyamics. The quoted read judgmet states that the result of readig locatio l i store σ results i the object o ad the modified store σ, ad has cost = 0 or = 1. The cost is o-zero oly if the read causes a block to be loaded ito the read cache. The modified store represets the possible effect of loadig a block ito the read cache. The quoted write judgmet states that allocatig the large value v i store σ results i a modified store σ ad locatio l dom(σ ), ad has cost = 0 or = 1. The cost is o-zero oly if the allocatio causes the evictio of a block from the ursery i order to maitai the live-size ivariat. The read ad allocate operatios i the dyamics record the memory traffic egedered by the creatio ad examiatio of values durig computatio. It remais to cosider the role of the allocatio judgmets of the form s R f, which represet the allocatio of a stack frame i the store at stack locatio s. The purpose of allocatig these frames is purely to esure that the cost assiged to a computatio is accurate with respect to the uderlyig implemetatio. Although a evaluatio sematics has o explicit cotrol stack, it is evertheless the case that a implemetatio must allocate space for the represetatio of the cotrol state, ad this space allocatio does ifluece the cache behavior of the computatio. It may ot, therefore, be igored. Our method for accoutig for the memory effects of the cotrol stack is to allocate explicitly frames that would appear i the cotrol stack to esure that space usage is properly accouted for, ad that required liveess iformatio (to be 43

6 detailed shortly) is properly maitaied. The frames are deoted as app( ; e 2) ad app(l 1; ) i the cost dyamics. With this i mid, let us examie i detail Rule 3f i Figure 1. We are to evaluate ad determie the cost of app(e 1; e 2) i store σ with give roots R. First, we allocate a stack frame s 1 represetig the pedig evaluatio of e 2 durig the evaluatio of e 1. This frame is ow cosidered live, eve though it does ot appear i ay expressio uder cosideratio. Accordigly, we evaluate e 1 relative to the store cotaiig this frame, treatig the just-allocated stack poiter to be live (as idicated by R {s1 }). This results i a locatio l 1, which we the read from the store to obtai a fuctio abstractio (as would be guarateed by the static type disciplie omitted here). We the create aother frame s 2 correspodig to the suspeded applicatio of the fuctio at locatio l 1, ad evaluate e 2 with this stack poiter cosidered live (as idicated by R {s2 }) to obtai locatio l 2. Fially, we evaluate the fuctio body, replacig the self variable by l 1 ad the argumet by l 2. The overall cost of the computatio is the sum of the costs of each of these steps, which are give either iductively or by the uses of the read ad allocate judgmets. Observe that this rule properly accouts for tail recursio i that o extra space is held durig tail recursive calls (as idicated by R ). It remais to explai the read ad allocate judgmets defied i Figure 2. The read judgmet assigs cost zero to ay read from a locatio i either the ursery or the read cache (Rules 4b ad 4a). Such reads have o effects, ad hece iduce o cache traffic. A read of a locatio that is oly i mai memory iduces a load of the eighborhood of that locatio (a block of memory) ito the read cache. If there is sufficiet room for it i the read cache, the block is added to the cache ad the cotets is retured, at a cost of oe uit (Rule 4c). If there is isufficiet room i the read cache, a block is selected o-determiistically to be replaced by the required block, ad oce agai a uit cost is charged to the read (Rule 4d). At the ed of the sectio we discuss the use of o-determiistic evictio. The allocate judgemet defies the procedure for creatig ew objects i the store. Of course, ew objects are cosidered ewer i the allocatio orderig tha the objects already preset i the ursery. If the ew object fits withi the ursery, it is allocated there at zero cost (Rule 5a). If the ew object will ot fit withi the ursery, the the block cosistig of the oldest B live objects i the ursery is evicted to mai memory, makig room for the ewly allocated object; such a allocatio is charged uit cost (Rule 5b). It is importat to our method that the oldest objects be evicted from the cache as a block formig the eighborhood of each of its locatios. Whether a object fits withi the ursery is determied as follows. The ursery is full if the umber of live objects i it is exactly M. (It is for the sake of assessig liveess that the allocatio judgmet is parameterized by a root set.) Evictio of a block reduces this to at most M B objects, so that the ext B 1 allocatios will ot cause a evictio. Thus we are, i effect, chargig at most 1/B uits of cost to each allocatio (less if objects die before eedig to be evicted). It is essetial to our results that the liveess of objects i the ursery may be assessed without accessig mai memory. Give roots R we eed oly trace objects i the ursery itself, ad eed ever cosider locatios lyig outside of it. This is esured by two properties of the dyamics. First, sice the model is purely fuctioal, the depedecy graph of objects i the ursery is acyclic; a object may oly refer to objects allocated earlier i the computatio as defied by the allocatio orderig. Secod, implicit stack frames are explicitly allocated i the ursery to esure that liveess may be assessed solely by examiig the ursery itself, startig from the root set. Put aother way, a object i the ursery caot be live solely because of a poiter from mai memory back to the ursery. This property is a cosequece of immutability ad the explicit allocatio of stack frames i the sematics. I Sectio 6 we will make use of a deep copy operatio o values of certai types. I the illustrative laguage cosidered here this operatio is defiable o atural umbers as follows: fu(copy, x.ifz(x; z; x.s(app(copy; x )))). Callig this fuctio o a umber has the effect of creatig a fresh copy of i the heap. No such operatio is defiable, or required, for fuctio types. Deep copyig is easily exteded to product, sum, ad iductive types, but would eed to be provided as a primitive for base types such as fixed precisio itegers or floatig poit umbers. Discussio We briefly discuss some of the motivatio for the decisios we made i formulatig the dyamic sematics. The overall goal is to allow a simple aalysis for the algorithm developer while capturig all the costs eeded to prove asymptotic implemetatio bouds. The separate allocatio cache is importat both for coveiece of aalysis ad properly accoutig for costs. It esures that all short lived allocatios ever eed to be allocated to memory. For a subcomputatio i which the maximum footprit of live data allocated fits i the allocatio cache, the user eed ot worry about ay costs for ay temporary memory. I a block matrix multiply o matrices, for example, oce k 2 M for some small costat k, the oly cost that eeds be cosidered is the cost of readig the iput ad evictig the output. This is the case eve though the multiply will allocate a total of Θ( 3 ) space. It is also importat that the partitioig of locatios ito blocks is ot decided util locatios are evicted from the allocatio cache, which esures that oly live data is ever migrated to memory. If blockig were to be decided o allocatio, for example, the by the time the objects are evicted most of the objects i a block may o loger be live. This would break the bouds we give i Sectio 6. The cost sematics accouts for the allocatio of stack frames i order to accout for the space required to maage the cotrol state of evaluatio. This is particularly importat i the case that o allocatio is associated with the creatio of a frame, for the there is o possibility to amortize the space required for the frame agaist the allocated object. Note that the sematics oly models the space take by the frames i the allocatio cache ad the cost of evictig them. It does ot model ay costs associated with reloadig them ito the read cache. As described i the ext sectio, i a lower level model this ca be amortized agaist the cost of evictig the frames i the first place. It is importat that the stack frames be heap allocated. A crucial ivariat we require is that all live data i the allocatio cache ca be determied solely through the caches. If we had a separate stack cache it could allow for the evictio of a stack frame that refereces data i the allocatio cache, breakig the required ivariat. There are other techiques to hadle this problem but we foud that allocatig the stack frames i the heap is the easiest. Our model is o-determiistic i the choice of what block is evicted from the read cache i the case of a read miss. I our provable implemetatio bouds we show that if there is a (odetermiistic) executio that gives certai cache complexity the we ca guaratee those bouds o the ideal cache model (withi costat factors). Whe aalyzig a algorithm this allows oe to cosider ay policy for evictio. This is possible because the ideal cache makes the optimal decisios ad will therefore be at least as good as the policy the user assumes. The justificatio for the ideal cache model is give i Sectio 2. 44

7 4. Abstract Machie The abstract cost of a computatio assiged by the evaluatio dyamics give i the precedig sectio is validated i two stages. First, i this sectio we defie a abstract machie with a explicit cotrol stack, ad show that the evaluatio dyamics accurately predicts the behavior of the abstract machie with respect to both the outcome ad the cost of the computatio. Secod, i Sectio 5 we show how to implemet the basic operatios of the abstract machie with oly a small overhead. Take together these two argumets demostrate that the evaluatio sematics provides a accurate model of the cache complexity of a program whe implemeted as described i these two steps. The abstract machie takes the form of a labeled trasitio system betwee states of two differet forms: 1. Evaluatio state: k e, where k dom(σ) is a stack poiter, ad locs(e) dom(σ), statig that e is to be evaluated o stack k relative to store σ. 2. Retur state: k l, where k, l dom(σ), statig that small value l is to be retured to stack k relative to store σ. The cotrol stack is represeted by a stack locatio, k, that refers to a liked list of frames, either the empty stack, writte, or a frame together with aother stack locatio, writte f;k. The label o a trasitio is either 0, 1, or 2, ad specifies the amout of work to be charged for that trasitio. The rules give i Figure 3 defie the abstract machie. Their overall form is stadard (see, for example, Chapter 27 of [13]), with the mai differeces beig that allocatio ad readig of values is made explicit, just as i the evaluatio dyamics, ad that the stack is explicitly represeted as a liked data structure i the store. The multistep trasitio judgmet s s meas that there is a fiite, possibly empty, sequece of trasitios from s to s whose labels sum to. THEOREM 4.1 (Correctess of Evaluatio Dyamics). Let σ 0 be a iitial store, let e 0 be a closed expressio such that locs(e 0) dom(σ 0). Let the abstract machie be equipped with oe additioal block i the read cache, ad let k 0 be a reserved stack locatio ot used i the evaluatio dyamics. If the there is a evaluatio such that σ 0[k 0 k 0 e 0 σ e 0 l, m σ [k 0 k 0 l 1. the results are isomorphic, l = l, ad 2. the cost m is at most 3. The relatio l = l states that the reachable graph from l i σ is isomorphic to the reachable graph from l i σ. Theorem 4.1 states that the outcome of a computatio o the abstract machie is the same, up to choice of locatios, as the outcome of the same computatio accordig to the evaluatio dyamics. Moreover, the total cost of the machie executio (measured i accordace with the I/O model described earlier) is at most a small costat factor larger tha the cost assiged by the evaluatio dyamics. The cotet of the theorem amouts to a proof that the space required by the cotrol stack i a computatio may be maaged so as ot to iterfere with space usage of the computatio itself. The correctess proof may be decomposed ito three major compoets. The first obligatio is to relate the outcome of the evaluatio dyamics to that of the abstract machie, disregardig, for the momet, the cost. The required correspodece is proved k l 0 k l z {k} l k z k l s( );k locs(e ) k k s(e ) k e k s( );k s(l) {k } l k l + k l ifz( ; e 2; x.e 3);k locs(e 1 ) k k ifz(e 1; e 2; x.e 3) k e 1 k 1 ifz( ; e 2; x.e 3);k k l 1+ 2 k e 2 l 2 z k 1 ifz( ; e 2; x.e 3);k l 2 s(l ) k l 1+ 2 k [l /x]e 3 fu(x, y.e) {k} l k fu(x, y.e) k l app( ; e 2);k locs(e 1 ) σ k 1 k app(e 1; e 2) σ k 1 e 1 { k 1 σ app( ; e 2);k 1 } σ app(l 1; );k 1 2 locs(e 2 ) k2 k 1 σ app(l 1; );k 2 k l k 2 e 2 (6a) (6b) (6c) (6d) (6e) (6f) (6g) (6h) (6i) (6j) σ l 1 2 σ fu(x, y.e) k l k 2 [l 1, l 2/x, y]e Figure 3. Abstract Machie (6k) by iductio o the derivatio of evaluatio judgmet. Specifically, we prove that if e R l, the for ay stack poiter k, k e k l. The proof proceeds alog stadard lies, as described, for example, i Chapter 27 of the secod author s textbook [13]. The same choice of locatios may be made i the machie derivatio as were made i the evaluatio derivatio, because the sequece of value allocatios is precisely the same i both forms of dyamics. The ext step of the proof is to show that the abstract machie performs the same sequece of value reads i the same order as specified by the evaluatio dyamics. This may be proved alog with the correspodece described i the precedig paragraph. The argumet relies o two importat properties of the evaluatio dyamics: 1. Ay read of a value locatio is either a read of a locatio i the iitial store, or a locatio that was allocated earlier i the evaluatio. 2. Stack frames are allocated, but ever read, i order to esure that evictio of blocks from the ursery occurs i exactly the order imposed by the stack-based abstract machie. 45

8 A determiistic ursery evictio policy is required to esure that the memory reads correspod exactly betwee the evaluatio dyamics ad the abstract machie. We ca assume whatever policy is used the the dyamic sematics is also used by the abstract machie. It remais to show that the stack reads employed by the abstract machie do ot impose a asymptotically sigificat cost beyod what is predicted by the evaluatio dyamics. Without special provisio, access to the cotrol stack would iterfere with the allocatio of data i the read cache, ivalidatig the cost give to the computatio by the evaluatio dyamics. To avoid this we make use of a dedicated read cache block i the abstract machie, which we will call the stack cache block, ad explicitly maage this cache block as follows. Wheever a stack locatio is read from mai memory, its eighborhood is loaded ito the stack cache block, evictig the block that curretly occupies it. We will argue that the cost of loadig the stack cache ca be amortized across the executio sequece, eve if the same block is loaded ito the stack cache more tha oce, a possibility that will be detailed shortly. The validity of the argumet depeds o two special properties of the ru-time stack, amely that each allocated frame is read exactly oce i a complete computatio, ad that the precedig stack frame is always older tha the curret oe. Give such a amortizatio, it is the clear that the overall cost of executio o the abstract machie is bouded by a small costat factor of the cost ascribed to it by the evaluatio dyamics, establishig the theorem. To complete the proof, we describe the amortizatio of the cost of stack maagemet i more detail. As a ivariat we put a dollar o every memory block that cotais a stack frame, except if it is the yougest such block ad resides i the stack cache block, i which case it has o moey associated with it. Whe the abstract machie evicts a block cotaiig a stack frame from the allocatio cache we sped three dollars oe for the evictio itself, oe to put a dollar o the evicted block, ad oe to put a dollar o the block that is i the cache stack block. This third dollar might be eeded to maitai our ivariat sice that block, if there is oe, will o loger be the yougest memory block cotaiig a stack frame. Now whe the abstract machie loads a block ito the stack cache block from memory we sped its dollar for the load. All blocks with older frames have a dollar o them already by the ivariat, so the ivariat is maitaied. I summary we sped 3 block trasfers (worst case) per block that is evicted from the allocatio cache. We fially ote that there is o eed to explicitly maitai the stack cache block i the abstract machie sematics sice we are assumig a ideal cache. Therefore as log as the cache has a extra block available, the the cache policy will do at least as well as the oe we described. 5. Provable Implemetatio I this sectio we describe a implemetatio of the abstract machie give i Sectio 4 i the ideal cache model with the same asymptotic cost. The efficiecy proof for the implemetatio takes accout of two issues that are treated abstractly i the evaluatio sematics ad i the abstract machie. The mai issue is how to implemet the allocatio judgmet defied i Figure 2. Rules 5a ad 5b make referece to liveess of the data, ad evict a block cosistig of the oldest B live objects i the ursery. To esure that the predicted costs are realized i practice we must argue that these coditios ca be met by a implemetatio. The secod issue is that we must accout for the size of the stored objects (values ad frames) that may appear i a computatio, ad accout for the cost of hadlig these objects i a implemetatio. Defie the size of a machie state σ k 0 e 0 be the sum of the size of e 0 ad the size of ay fuctio i σ 0. This may be thought of as the size of the program, icludig ay λ-abstractios that may be preset i the iitial store. THEOREM 5.1. Fix a iitial state σ k 0 e 0 of size s 0, ad cosider a complete computatio σ k 0 e 0 m k 0 l with s 1 objects i the fial store, σ. This computatio be simulated i the ideal cache model with cache complexity c m for some costat c, provided that words are of size at least d log(max(s 0, s 1)) for some d > 0 ad that the cache has at least (4M + B) s 0 words. (The costats c ad d are idepedet of σ 0, k 0 ad e 0.) Theorem 5.1 states that the implemetatio asymptotically realizes the work attributed to the computatio by the evaluatio sematics, ad hece validates the algorithm aalysis performed usig that sematics. The requiremet o the word size i Theorem 5.1 esures that all objects are addressable by a word-sized poiter, ad accouts for the sizes of the objects themselves i storage. (A closure ca be as large as the iitial program.) The requiremet o the cache size i Theorem 5.1 esures that we may implemet the abstract memory hierarchy with o more tha a small costat factor of overhead i a maer that we ow describe. (The B s 0 additioal words accout for the stack cache described i Sectio 4; it remais to discuss the implemetatio of allocatio.) The allocatio judgmet defied i Figure 2 relies o a assessmet of the live size of the ursery, ad o the evictio of blocks from the ursery to esure that the ursery cotais o more tha M live objects. As we ote earlier, the liveess of data i the ursery may be assessed without referece to the mai memory; the liveess computatio takes place etirely withi the cache. Rather tha assess liveess, possibly evictig a block, o each allocatio, we istead amortize these costs across multiple allocatios accordig to the followig strategy. We reserve 2M s 0 words of cache memory for the allocatio area to accommodate at least 2M objects. Objects are allocated by maitaiig a poiter ito the ursery area, icremetig it o each allocatio util 2M objects have bee allocated, at which poit the ursery space is exhausted. Whe this occurs, we perform a compactig garbage collectio that preserves the allocatio order of objects, simultaeously evictig as may blocks as ecessary to obtai a live size of M objects i the ursery. After compactio, allocatio cotiues as before util the ursery is agai exhausted. As log as there is sufficiet space, allocatio takes costat time. Whe a garbage collectio is required, the cost may be attributed to the allocatios of the live data i the ursery, so that i a amortized sese garbage collectio is cost-free [3]. It is easy to see that o object is evicted to mai memory usig this implemetatio that would ot have bee evicted i the abstract sese. However, the evictios will, i geeral, happe later tha predicted by the sematics. As a result, fewer objects may be live at the time of evictio, ad so fewer blocks overall may be moved to mai memory. As a result of this compressio effect, two locatios that were eighbors i the evaluatio sematics may be i two differet blocks i the implemetatio. To accout for this, two blocks must be loaded to esure that eighborig objects i the sematics are loaded ito the read cache together. Thus we require 2M s 0 words of cache i the ideal cache model to accout for the M objects i the read cache. With regards to the evictio policy from the read cache we ote that a ideal cache will always choose a optimal policy (furtherst i the future). It will therefore do at least as well as ay policy assumed by the abstract machie. This completes the proof of the implemetatio boud stated i Theorem

Basic allocator mechanisms The course that gives CMU its Zip! Memory Management II: Dynamic Storage Allocation Mar 6, 2000.

Basic allocator mechanisms The course that gives CMU its Zip! Memory Management II: Dynamic Storage Allocation Mar 6, 2000. 5-23 The course that gives CM its Zip Memory Maagemet II: Dyamic Storage Allocatio Mar 6, 2000 Topics Segregated lists Buddy system Garbage collectio Mark ad Sweep Copyig eferece coutig Basic allocator

More information

Lecture 5. Counting Sort / Radix Sort

Lecture 5. Counting Sort / Radix Sort Lecture 5. Coutig Sort / Radix Sort T. H. Corme, C. E. Leiserso ad R. L. Rivest Itroductio to Algorithms, 3rd Editio, MIT Press, 2009 Sugkyukwa Uiversity Hyuseug Choo choo@skku.edu Copyright 2000-2018

More information

Lecture 1: Introduction and Strassen s Algorithm

Lecture 1: Introduction and Strassen s Algorithm 5-750: Graduate Algorithms Jauary 7, 08 Lecture : Itroductio ad Strasse s Algorithm Lecturer: Gary Miller Scribe: Robert Parker Itroductio Machie models I this class, we will primarily use the Radom Access

More information

The Magma Database file formats

The Magma Database file formats The Magma Database file formats Adrew Gaylard, Bret Pikey, ad Mart-Mari Breedt Johaesburg, South Africa 15th May 2006 1 Summary Magma is a ope-source object database created by Chris Muller, of Kasas City,

More information

Pseudocode ( 1.1) Analysis of Algorithms. Primitive Operations. Pseudocode Details. Running Time ( 1.1) Estimating performance

Pseudocode ( 1.1) Analysis of Algorithms. Primitive Operations. Pseudocode Details. Running Time ( 1.1) Estimating performance Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Pseudocode ( 1.1) High-level descriptio of a algorithm More structured

More information

Lecture Notes 6 Introduction to algorithm analysis CSS 501 Data Structures and Object-Oriented Programming

Lecture Notes 6 Introduction to algorithm analysis CSS 501 Data Structures and Object-Oriented Programming Lecture Notes 6 Itroductio to algorithm aalysis CSS 501 Data Structures ad Object-Orieted Programmig Readig for this lecture: Carrao, Chapter 10 To be covered i this lecture: Itroductio to algorithm aalysis

More information

Data Structures and Algorithms. Analysis of Algorithms

Data Structures and Algorithms. Analysis of Algorithms Data Structures ad Algorithms Aalysis of Algorithms Outlie Ruig time Pseudo-code Big-oh otatio Big-theta otatio Big-omega otatio Asymptotic algorithm aalysis Aalysis of Algorithms Iput Algorithm Output

More information

Elementary Educational Computer

Elementary Educational Computer Chapter 5 Elemetary Educatioal Computer. Geeral structure of the Elemetary Educatioal Computer (EEC) The EEC coforms to the 5 uits structure defied by vo Neuma's model (.) All uits are preseted i a simplified

More information

CIS 121 Data Structures and Algorithms with Java Spring Stacks, Queues, and Heaps Monday, February 18 / Tuesday, February 19

CIS 121 Data Structures and Algorithms with Java Spring Stacks, Queues, and Heaps Monday, February 18 / Tuesday, February 19 CIS Data Structures ad Algorithms with Java Sprig 09 Stacks, Queues, ad Heaps Moday, February 8 / Tuesday, February 9 Stacks ad Queues Recall the stack ad queue ADTs (abstract data types from lecture.

More information

How do we evaluate algorithms?

How do we evaluate algorithms? F2 Readig referece: chapter 2 + slides Algorithm complexity Big O ad big Ω To calculate ruig time Aalysis of recursive Algorithms Next time: Litterature: slides mostly The first Algorithm desig methods:

More information

COSC 1P03. Ch 7 Recursion. Introduction to Data Structures 8.1

COSC 1P03. Ch 7 Recursion. Introduction to Data Structures 8.1 COSC 1P03 Ch 7 Recursio Itroductio to Data Structures 8.1 COSC 1P03 Recursio Recursio I Mathematics factorial Fiboacci umbers defie ifiite set with fiite defiitio I Computer Sciece sytax rules fiite defiitio,

More information

Running Time. Analysis of Algorithms. Experimental Studies. Limitations of Experiments

Running Time. Analysis of Algorithms. Experimental Studies. Limitations of Experiments Ruig Time Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Most algorithms trasform iput objects ito output objects. The

More information

Running Time ( 3.1) Analysis of Algorithms. Experimental Studies. Limitations of Experiments

Running Time ( 3.1) Analysis of Algorithms. Experimental Studies. Limitations of Experiments Ruig Time ( 3.1) Aalysis of Algorithms Iput Algorithm Output A algorithm is a step- by- step procedure for solvig a problem i a fiite amout of time. Most algorithms trasform iput objects ito output objects.

More information

Analysis of Algorithms

Analysis of Algorithms Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Ruig Time Most algorithms trasform iput objects ito output objects. The

More information

Chapter 11. Friends, Overloaded Operators, and Arrays in Classes. Copyright 2014 Pearson Addison-Wesley. All rights reserved.

Chapter 11. Friends, Overloaded Operators, and Arrays in Classes. Copyright 2014 Pearson Addison-Wesley. All rights reserved. Chapter 11 Frieds, Overloaded Operators, ad Arrays i Classes Copyright 2014 Pearso Addiso-Wesley. All rights reserved. Overview 11.1 Fried Fuctios 11.2 Overloadig Operators 11.3 Arrays ad Classes 11.4

More information

Big-O Analysis. Asymptotics

Big-O Analysis. Asymptotics Big-O Aalysis 1 Defiitio: Suppose that f() ad g() are oegative fuctios of. The we say that f() is O(g()) provided that there are costats C > 0 ad N > 0 such that for all > N, f() Cg(). Big-O expresses

More information

Lower Bounds for Sorting

Lower Bounds for Sorting Liear Sortig Topics Covered: Lower Bouds for Sortig Coutig Sort Radix Sort Bucket Sort Lower Bouds for Sortig Compariso vs. o-compariso sortig Decisio tree model Worst case lower boud Compariso Sortig

More information

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5 Morga Kaufma Publishers 26 February, 28 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 5 Set-Associative Cache Architecture Performace Summary Whe CPU performace icreases:

More information

What are we going to learn? CSC Data Structures Analysis of Algorithms. Overview. Algorithm, and Inputs

What are we going to learn? CSC Data Structures Analysis of Algorithms. Overview. Algorithm, and Inputs What are we goig to lear? CSC316-003 Data Structures Aalysis of Algorithms Computer Sciece North Carolia State Uiversity Need to say that some algorithms are better tha others Criteria for evaluatio Structure

More information

Ones Assignment Method for Solving Traveling Salesman Problem

Ones Assignment Method for Solving Traveling Salesman Problem Joural of mathematics ad computer sciece 0 (0), 58-65 Oes Assigmet Method for Solvig Travelig Salesma Problem Hadi Basirzadeh Departmet of Mathematics, Shahid Chamra Uiversity, Ahvaz, Ira Article history:

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 19 Query Optimizatio Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio Query optimizatio Coducted by a query optimizer i a DBMS Goal:

More information

CIS 121 Data Structures and Algorithms with Java Spring Stacks and Queues Monday, February 12 / Tuesday, February 13

CIS 121 Data Structures and Algorithms with Java Spring Stacks and Queues Monday, February 12 / Tuesday, February 13 CIS Data Structures ad Algorithms with Java Sprig 08 Stacks ad Queues Moday, February / Tuesday, February Learig Goals Durig this lab, you will: Review stacks ad queues. Lear amortized ruig time aalysis

More information

A New Morphological 3D Shape Decomposition: Grayscale Interframe Interpolation Method

A New Morphological 3D Shape Decomposition: Grayscale Interframe Interpolation Method A ew Morphological 3D Shape Decompositio: Grayscale Iterframe Iterpolatio Method D.. Vizireau Politehica Uiversity Bucharest, Romaia ae@comm.pub.ro R. M. Udrea Politehica Uiversity Bucharest, Romaia mihea@comm.pub.ro

More information

Sorting in Linear Time. Data Structures and Algorithms Andrei Bulatov

Sorting in Linear Time. Data Structures and Algorithms Andrei Bulatov Sortig i Liear Time Data Structures ad Algorithms Adrei Bulatov Algorithms Sortig i Liear Time 7-2 Compariso Sorts The oly test that all the algorithms we have cosidered so far is compariso The oly iformatio

More information

6.854J / J Advanced Algorithms Fall 2008

6.854J / J Advanced Algorithms Fall 2008 MIT OpeCourseWare http://ocw.mit.edu 6.854J / 18.415J Advaced Algorithms Fall 2008 For iformatio about citig these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. 18.415/6.854 Advaced Algorithms

More information

the beginning of the program in order for it to work correctly. Similarly, a Confirm

the beginning of the program in order for it to work correctly. Similarly, a Confirm I our sytax, a Assume statemet will be used to record what must be true at the begiig of the program i order for it to work correctly. Similarly, a Cofirm statemet is used to record what should be true

More information

CSC 220: Computer Organization Unit 11 Basic Computer Organization and Design

CSC 220: Computer Organization Unit 11 Basic Computer Organization and Design College of Computer ad Iformatio Scieces Departmet of Computer Sciece CSC 220: Computer Orgaizatio Uit 11 Basic Computer Orgaizatio ad Desig 1 For the rest of the semester, we ll focus o computer architecture:

More information

Big-O Analysis. Asymptotics

Big-O Analysis. Asymptotics Big-O Aalysis 1 Defiitio: Suppose that f() ad g() are oegative fuctios of. The we say that f() is O(g()) provided that there are costats C > 0 ad N > 0 such that for all > N, f() Cg(). Big-O expresses

More information

Analysis Metrics. Intro to Algorithm Analysis. Slides. 12. Alg Analysis. 12. Alg Analysis

Analysis Metrics. Intro to Algorithm Analysis. Slides. 12. Alg Analysis. 12. Alg Analysis Itro to Algorithm Aalysis Aalysis Metrics Slides. Table of Cotets. Aalysis Metrics 3. Exact Aalysis Rules 4. Simple Summatio 5. Summatio Formulas 6. Order of Magitude 7. Big-O otatio 8. Big-O Theorems

More information

Chapter 1. Introduction to Computers and C++ Programming. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

Chapter 1. Introduction to Computers and C++ Programming. Copyright 2015 Pearson Education, Ltd.. All rights reserved. Chapter 1 Itroductio to Computers ad C++ Programmig Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Overview 1.1 Computer Systems 1.2 Programmig ad Problem Solvig 1.3 Itroductio to C++ 1.4 Testig

More information

CMSC Computer Architecture Lecture 12: Virtual Memory. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 12: Virtual Memory. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 12: Virtual Memory Prof. Yajig Li Uiversity of Chicago A System with Physical Memory Oly Examples: most Cray machies early PCs Memory early all embedded systems

More information

. Written in factored form it is easy to see that the roots are 2, 2, i,

. Written in factored form it is easy to see that the roots are 2, 2, i, CMPS A Itroductio to Programmig Programmig Assigmet 4 I this assigmet you will write a java program that determies the real roots of a polyomial that lie withi a specified rage. Recall that the roots (or

More information

Python Programming: An Introduction to Computer Science

Python Programming: An Introduction to Computer Science Pytho Programmig: A Itroductio to Computer Sciece Chapter 6 Defiig Fuctios Pytho Programmig, 2/e 1 Objectives To uderstad why programmers divide programs up ito sets of cooperatig fuctios. To be able to

More information

DATA STRUCTURES. amortized analysis binomial heaps Fibonacci heaps union-find. Data structures. Appetizer. Appetizer

DATA STRUCTURES. amortized analysis binomial heaps Fibonacci heaps union-find. Data structures. Appetizer. Appetizer Data structures DATA STRUCTURES Static problems. Give a iput, produce a output. Ex. Sortig, FFT, edit distace, shortest paths, MST, max-flow,... amortized aalysis biomial heaps Fiboacci heaps uio-fid Dyamic

More information

Outline and Reading. Analysis of Algorithms. Running Time. Experimental Studies. Limitations of Experiments. Theoretical Analysis

Outline and Reading. Analysis of Algorithms. Running Time. Experimental Studies. Limitations of Experiments. Theoretical Analysis Outlie ad Readig Aalysis of Algorithms Iput Algorithm Output Ruig time ( 3.) Pseudo-code ( 3.2) Coutig primitive operatios ( 3.3-3.) Asymptotic otatio ( 3.6) Asymptotic aalysis ( 3.7) Case study Aalysis

More information

Hash Tables. Presentation for use with the textbook Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015.

Hash Tables. Presentation for use with the textbook Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015. Presetatio for use with the textbook Algorithm Desig ad Applicatios, by M. T. Goodrich ad R. Tamassia, Wiley, 2015 Hash Tables xkcd. http://xkcd.com/221/. Radom Number. Used with permissio uder Creative

More information

Analysis of Algorithms

Analysis of Algorithms Aalysis of Algorithms Ruig Time of a algorithm Ruig Time Upper Bouds Lower Bouds Examples Mathematical facts Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 18 Strategies for Query Processig Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio DBMS techiques to process a query Scaer idetifies

More information

End Semester Examination CSE, III Yr. (I Sem), 30002: Computer Organization

End Semester Examination CSE, III Yr. (I Sem), 30002: Computer Organization Ed Semester Examiatio 2013-14 CSE, III Yr. (I Sem), 30002: Computer Orgaizatio Istructios: GROUP -A 1. Write the questio paper group (A, B, C, D), o frot page top of aswer book, as per what is metioed

More information

CSE 417: Algorithms and Computational Complexity

CSE 417: Algorithms and Computational Complexity Time CSE 47: Algorithms ad Computatioal Readig assigmet Read Chapter of The ALGORITHM Desig Maual Aalysis & Sortig Autum 00 Paul Beame aalysis Problem size Worst-case complexity: max # steps algorithm

More information

Python Programming: An Introduction to Computer Science

Python Programming: An Introduction to Computer Science Pytho Programmig: A Itroductio to Computer Sciece Chapter 1 Computers ad Programs 1 Objectives To uderstad the respective roles of hardware ad software i a computig system. To lear what computer scietists

More information

Massachusetts Institute of Technology Lecture : Theory of Parallel Systems Feb. 25, Lecture 6: List contraction, tree contraction, and

Massachusetts Institute of Technology Lecture : Theory of Parallel Systems Feb. 25, Lecture 6: List contraction, tree contraction, and Massachusetts Istitute of Techology Lecture.89: Theory of Parallel Systems Feb. 5, 997 Professor Charles E. Leiserso Scribe: Guag-Ie Cheg Lecture : List cotractio, tree cotractio, ad symmetry breakig Work-eciet

More information

Analysis of Algorithms

Analysis of Algorithms Presetatio for use with the textbook, Algorithm Desig ad Applicatios, by M. T. Goodrich ad R. Tamassia, Wiley, 2015 Aalysis of Algorithms Iput 2015 Goodrich ad Tamassia Algorithm Aalysis of Algorithms

More information

Homework 1 Solutions MA 522 Fall 2017

Homework 1 Solutions MA 522 Fall 2017 Homework 1 Solutios MA 5 Fall 017 1. Cosider the searchig problem: Iput A sequece of umbers A = [a 1,..., a ] ad a value v. Output A idex i such that v = A[i] or the special value NIL if v does ot appear

More information

Greedy Algorithms. Interval Scheduling. Greedy Algorithms. Interval scheduling. Greedy Algorithms. Interval Scheduling

Greedy Algorithms. Interval Scheduling. Greedy Algorithms. Interval scheduling. Greedy Algorithms. Interval Scheduling Greedy Algorithms Greedy Algorithms Witer Paul Beame Hard to defie exactly but ca give geeral properties Solutio is built i small steps Decisios o how to build the solutio are made to maximize some criterio

More information

Last class. n Scheme. n Equality testing. n eq? vs. equal? n Higher-order functions. n map, foldr, foldl. n Tail recursion

Last class. n Scheme. n Equality testing. n eq? vs. equal? n Higher-order functions. n map, foldr, foldl. n Tail recursion Aoucemets HW6 due today HW7 is out A team assigmet Submitty page will be up toight Fuctioal correctess: 75%, Commets : 25% Last class Equality testig eq? vs. equal? Higher-order fuctios map, foldr, foldl

More information

Counting the Number of Minimum Roman Dominating Functions of a Graph

Counting the Number of Minimum Roman Dominating Functions of a Graph Coutig the Number of Miimum Roma Domiatig Fuctios of a Graph SHI ZHENG ad KOH KHEE MENG, Natioal Uiversity of Sigapore We provide two algorithms coutig the umber of miimum Roma domiatig fuctios of a graph

More information

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5.

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5. Morga Kaufma Publishers 26 February, 208 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 5 Virtual Memory Review: The Memory Hierarchy Take advatage of the priciple

More information

Exact Minimum Lower Bound Algorithm for Traveling Salesman Problem

Exact Minimum Lower Bound Algorithm for Traveling Salesman Problem Exact Miimum Lower Boud Algorithm for Travelig Salesma Problem Mohamed Eleiche GeoTiba Systems mohamed.eleiche@gmail.com Abstract The miimum-travel-cost algorithm is a dyamic programmig algorithm to compute

More information

Solution printed. Do not start the test until instructed to do so! CS 2604 Data Structures Midterm Spring, Instructions:

Solution printed. Do not start the test until instructed to do so! CS 2604 Data Structures Midterm Spring, Instructions: CS 604 Data Structures Midterm Sprig, 00 VIRG INIA POLYTECHNIC INSTITUTE AND STATE U T PROSI M UNI VERSI TY Istructios: Prit your ame i the space provided below. This examiatio is closed book ad closed

More information

Data diverse software fault tolerance techniques

Data diverse software fault tolerance techniques Data diverse software fault tolerace techiques Complemets desig diversity by compesatig for desig diversity s s limitatios Ivolves obtaiig a related set of poits i the program data space, executig the

More information

1.2 Binomial Coefficients and Subsets

1.2 Binomial Coefficients and Subsets 1.2. BINOMIAL COEFFICIENTS AND SUBSETS 13 1.2 Biomial Coefficiets ad Subsets 1.2-1 The loop below is part of a program to determie the umber of triagles formed by poits i the plae. for i =1 to for j =

More information

Data Structures Week #9. Sorting

Data Structures Week #9. Sorting Data Structures Week #9 Sortig Outlie Motivatio Types of Sortig Elemetary (O( 2 )) Sortig Techiques Other (O(*log())) Sortig Techiques 21.Aralık.2010 Boraha Tümer, Ph.D. 2 Sortig 21.Aralık.2010 Boraha

More information

Pattern Recognition Systems Lab 1 Least Mean Squares

Pattern Recognition Systems Lab 1 Least Mean Squares Patter Recogitio Systems Lab 1 Least Mea Squares 1. Objectives This laboratory work itroduces the OpeCV-based framework used throughout the course. I this assigmet a lie is fitted to a set of poits usig

More information

CMSC Computer Architecture Lecture 11: More Caches. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 11: More Caches. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 11: More Caches Prof. Yajig Li Uiversity of Chicago Lecture Outlie Caches 2 Review Memory hierarchy Cache basics Locality priciples Spatial ad temporal How to access

More information

One advantage that SONAR has over any other music-sequencing product I ve worked

One advantage that SONAR has over any other music-sequencing product I ve worked *gajedra* D:/Thomso_Learig_Projects/Garrigus_163132/z_productio/z_3B2_3D_files/Garrigus_163132_ch17.3d, 14/11/08/16:26:39, 16:26, page: 647 17 CAL 101 Oe advatage that SONAR has over ay other music-sequecig

More information

CMSC Computer Architecture Lecture 10: Caches. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 10: Caches. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 10: Caches Prof. Yajig Li Uiversity of Chicago Midterm Recap Overview ad fudametal cocepts ISA Uarch Datapath, cotrol Sigle cycle, multi cycle Pipeliig Basic idea,

More information

Chapter 9. Pointers and Dynamic Arrays. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

Chapter 9. Pointers and Dynamic Arrays. Copyright 2015 Pearson Education, Ltd.. All rights reserved. Chapter 9 Poiters ad Dyamic Arrays Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Overview 9.1 Poiters 9.2 Dyamic Arrays Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Slide 9-3

More information

Random Graphs and Complex Networks T

Random Graphs and Complex Networks T Radom Graphs ad Complex Networks T-79.7003 Charalampos E. Tsourakakis Aalto Uiversity Lecture 3 7 September 013 Aoucemet Homework 1 is out, due i two weeks from ow. Exercises: Probabilistic iequalities

More information

Review: The ACID properties

Review: The ACID properties Recovery Review: The ACID properties A tomicity: All actios i the Xactio happe, or oe happe. C osistecy: If each Xactio is cosistet, ad the DB starts cosistet, it eds up cosistet. I solatio: Executio of

More information

On Infinite Groups that are Isomorphic to its Proper Infinite Subgroup. Jaymar Talledo Balihon. Abstract

On Infinite Groups that are Isomorphic to its Proper Infinite Subgroup. Jaymar Talledo Balihon. Abstract O Ifiite Groups that are Isomorphic to its Proper Ifiite Subgroup Jaymar Talledo Baliho Abstract Two groups are isomorphic if there exists a isomorphism betwee them Lagrage Theorem states that the order

More information

A Generalized Set Theoretic Approach for Time and Space Complexity Analysis of Algorithms and Functions

A Generalized Set Theoretic Approach for Time and Space Complexity Analysis of Algorithms and Functions Proceedigs of the 10th WSEAS Iteratioal Coferece o APPLIED MATHEMATICS, Dallas, Texas, USA, November 1-3, 2006 316 A Geeralized Set Theoretic Approach for Time ad Space Complexity Aalysis of Algorithms

More information

Chapter 10. Defining Classes. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

Chapter 10. Defining Classes. Copyright 2015 Pearson Education, Ltd.. All rights reserved. Chapter 10 Defiig Classes Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Overview 10.1 Structures 10.2 Classes 10.3 Abstract Data Types 10.4 Itroductio to Iheritace Copyright 2015 Pearso Educatio,

More information

Image Segmentation EEE 508

Image Segmentation EEE 508 Image Segmetatio Objective: to determie (etract) object boudaries. It is a process of partitioig a image ito distict regios by groupig together eighborig piels based o some predefied similarity criterio.

More information

Throughput-Delay Scaling in Wireless Networks with Constant-Size Packets

Throughput-Delay Scaling in Wireless Networks with Constant-Size Packets Throughput-Delay Scalig i Wireless Networks with Costat-Size Packets Abbas El Gamal, James Mamme, Balaji Prabhakar, Devavrat Shah Departmets of EE ad CS Staford Uiversity, CA 94305 Email: {abbas, jmamme,

More information

COMP Parallel Computing. PRAM (1): The PRAM model and complexity measures

COMP Parallel Computing. PRAM (1): The PRAM model and complexity measures COMP 633 - Parallel Computig Lecture 2 August 24, 2017 : The PRAM model ad complexity measures 1 First class summary This course is about parallel computig to achieve high-er performace o idividual problems

More information

Chapter 24. Sorting. Objectives. 1. To study and analyze time efficiency of various sorting algorithms

Chapter 24. Sorting. Objectives. 1. To study and analyze time efficiency of various sorting algorithms Chapter 4 Sortig 1 Objectives 1. o study ad aalyze time efficiecy of various sortig algorithms 4. 4.7.. o desig, implemet, ad aalyze bubble sort 4.. 3. o desig, implemet, ad aalyze merge sort 4.3. 4. o

More information

Load balanced Parallel Prime Number Generator with Sieve of Eratosthenes on Cluster Computers *

Load balanced Parallel Prime Number Generator with Sieve of Eratosthenes on Cluster Computers * Load balaced Parallel Prime umber Geerator with Sieve of Eratosthees o luster omputers * Soowook Hwag*, Kyusik hug**, ad Dogseug Kim* *Departmet of Electrical Egieerig Korea Uiversity Seoul, -, Rep. of

More information

CSE 2320 Notes 8: Sorting. (Last updated 10/3/18 7:16 PM) Idea: Take an unsorted (sub)array and partition into two subarrays such that.

CSE 2320 Notes 8: Sorting. (Last updated 10/3/18 7:16 PM) Idea: Take an unsorted (sub)array and partition into two subarrays such that. CSE Notes 8: Sortig (Last updated //8 7:6 PM) CLRS 7.-7., 9., 8.-8. 8.A. QUICKSORT Cocepts Idea: Take a usorted (sub)array ad partitio ito two subarrays such that p q r x y z x y y z Pivot Customarily,

More information

CS200: Hash Tables. Prichard Ch CS200 - Hash Tables 1

CS200: Hash Tables. Prichard Ch CS200 - Hash Tables 1 CS200: Hash Tables Prichard Ch. 13.2 CS200 - Hash Tables 1 Table Implemetatios: average cases Search Add Remove Sorted array-based Usorted array-based Balaced Search Trees O(log ) O() O() O() O(1) O()

More information

CIS 121 Data Structures and Algorithms with Java Fall Big-Oh Notation Tuesday, September 5 (Make-up Friday, September 8)

CIS 121 Data Structures and Algorithms with Java Fall Big-Oh Notation Tuesday, September 5 (Make-up Friday, September 8) CIS 11 Data Structures ad Algorithms with Java Fall 017 Big-Oh Notatio Tuesday, September 5 (Make-up Friday, September 8) Learig Goals Review Big-Oh ad lear big/small omega/theta otatios Practice solvig

More information

WYSE Academic Challenge Sectional Computer Science 2005 SOLUTION SET

WYSE Academic Challenge Sectional Computer Science 2005 SOLUTION SET WYSE Academic Challege Sectioal Computer Sciece 2005 SOLUTION SET 1. Correct aswer: a. Hz = cycle / secod. CPI = 2, therefore, CPI*I = 2 * 28 X 10 8 istructios = 56 X 10 8 cycles. The clock rate is 56

More information

Lecturers: Sanjam Garg and Prasad Raghavendra Feb 21, Midterm 1 Solutions

Lecturers: Sanjam Garg and Prasad Raghavendra Feb 21, Midterm 1 Solutions U.C. Berkeley CS170 : Algorithms Midterm 1 Solutios Lecturers: Sajam Garg ad Prasad Raghavedra Feb 1, 017 Midterm 1 Solutios 1. (4 poits) For the directed graph below, fid all the strogly coected compoets

More information

APPLICATION NOTE PACE1750AE BUILT-IN FUNCTIONS

APPLICATION NOTE PACE1750AE BUILT-IN FUNCTIONS APPLICATION NOTE PACE175AE BUILT-IN UNCTIONS About This Note This applicatio brief is iteded to explai ad demostrate the use of the special fuctios that are built ito the PACE175AE processor. These powerful

More information

A Parallel DFA Minimization Algorithm

A Parallel DFA Minimization Algorithm A Parallel DFA Miimizatio Algorithm Ambuj Tewari, Utkarsh Srivastava, ad P. Gupta Departmet of Computer Sciece & Egieerig Idia Istitute of Techology Kapur Kapur 208 016,INDIA pg@iitk.ac.i Abstract. I this

More information

Lecture 6. Lecturer: Ronitt Rubinfeld Scribes: Chen Ziv, Eliav Buchnik, Ophir Arie, Jonathan Gradstein

Lecture 6. Lecturer: Ronitt Rubinfeld Scribes: Chen Ziv, Eliav Buchnik, Ophir Arie, Jonathan Gradstein 068.670 Subliear Time Algorithms November, 0 Lecture 6 Lecturer: Roitt Rubifeld Scribes: Che Ziv, Eliav Buchik, Ophir Arie, Joatha Gradstei Lesso overview. Usig the oracle reductio framework for approximatig

More information

Lecture 2: Spectra of Graphs

Lecture 2: Spectra of Graphs Spectral Graph Theory ad Applicatios WS 20/202 Lecture 2: Spectra of Graphs Lecturer: Thomas Sauerwald & He Su Our goal is to use the properties of the adjacecy/laplacia matrix of graphs to first uderstad

More information

COP4020 Programming Languages. Functional Programming Prof. Robert van Engelen

COP4020 Programming Languages. Functional Programming Prof. Robert van Engelen COP4020 Programmig Laguages Fuctioal Programmig Prof. Robert va Egele Overview What is fuctioal programmig? Historical origis of fuctioal programmig Fuctioal programmig today Cocepts of fuctioal programmig

More information

Appendix D. Controller Implementation

Appendix D. Controller Implementation COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Appedix D Cotroller Implemetatio Cotroller Implemetatios Combiatioal logic (sigle-cycle); Fiite state machie (multi-cycle, pipelied);

More information

A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON

A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON Roberto Lopez ad Eugeio Oñate Iteratioal Ceter for Numerical Methods i Egieerig (CIMNE) Edificio C1, Gra Capitá s/, 08034 Barceloa, Spai ABSTRACT I this work

More information

UNIVERSITY OF MORATUWA

UNIVERSITY OF MORATUWA UNIVERSITY OF MORATUWA FACULTY OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING B.Sc. Egieerig 2014 Itake Semester 2 Examiatio CS2052 COMPUTER ARCHITECTURE Time allowed: 2 Hours Jauary 2016

More information

A Study on the Performance of Cholesky-Factorization using MPI

A Study on the Performance of Cholesky-Factorization using MPI A Study o the Performace of Cholesky-Factorizatio usig MPI Ha S. Kim Scott B. Bade Departmet of Computer Sciece ad Egieerig Uiversity of Califoria Sa Diego {hskim, bade}@cs.ucsd.edu Abstract Cholesky-factorizatio

More information

Fast Fourier Transform (FFT) Algorithms

Fast Fourier Transform (FFT) Algorithms Fast Fourier Trasform FFT Algorithms Relatio to the z-trasform elsewhere, ozero, z x z X x [ ] 2 ~ elsewhere,, ~ e j x X x x π j e z z X X π 2 ~ The DFS X represets evely spaced samples of the z- trasform

More information

Symbolic Execution with Abstraction

Symbolic Execution with Abstraction Software Tools for Techology Trasfer mauscript No. (will be iserted by the editor) Symbolic Executio with Abstractio Saswat Aad 1, Coria S. Păsăreau 2, Willem Visser 3 1 College of Computig, Georgia Istitute

More information

Threads and Concurrency in Java: Part 1

Threads and Concurrency in Java: Part 1 Cocurrecy Threads ad Cocurrecy i Java: Part 1 What every computer egieer eeds to kow about cocurrecy: Cocurrecy is to utraied programmers as matches are to small childre. It is all too easy to get bured.

More information

Threads and Concurrency in Java: Part 1

Threads and Concurrency in Java: Part 1 Threads ad Cocurrecy i Java: Part 1 1 Cocurrecy What every computer egieer eeds to kow about cocurrecy: Cocurrecy is to utraied programmers as matches are to small childre. It is all too easy to get bured.

More information

Σ P(i) ( depth T (K i ) + 1),

Σ P(i) ( depth T (K i ) + 1), EECS 3101 York Uiversity Istructor: Ady Mirzaia DYNAMIC PROGRAMMING: OPIMAL SAIC BINARY SEARCH REES his lecture ote describes a applicatio of the dyamic programmig paradigm o computig the optimal static

More information

9.1. Sequences and Series. Sequences. What you should learn. Why you should learn it. Definition of Sequence

9.1. Sequences and Series. Sequences. What you should learn. Why you should learn it. Definition of Sequence _9.qxd // : AM Page Chapter 9 Sequeces, Series, ad Probability 9. Sequeces ad Series What you should lear Use sequece otatio to write the terms of sequeces. Use factorial otatio. Use summatio otatio to

More information

Introduction to SWARM Software and Algorithms for Running on Multicore Processors

Introduction to SWARM Software and Algorithms for Running on Multicore Processors Itroductio to SWARM Software ad Algorithms for Ruig o Multicore Processors David A. Bader Georgia Istitute of Techology http://www.cc.gatech.edu/~bader Tutorial compiled by Rucheek H. Sagai M.S. Studet,

More information

Speeding-up dynamic programming in sequence alignment

Speeding-up dynamic programming in sequence alignment Departmet of Computer Sciece Aarhus Uiversity Demark Speedig-up dyamic programmig i sequece aligmet Master s Thesis Dug My Hoa - 443 December, Supervisor: Christia Nørgaard Storm Pederse Implemetatio code

More information

condition w i B i S maximum u i

condition w i B i S maximum u i ecture 10 Dyamic Programmig 10.1 Kapsack Problem November 1, 2004 ecturer: Kamal Jai Notes: Tobias Holgers We are give a set of items U = {a 1, a 2,..., a }. Each item has a weight w i Z + ad a utility

More information

why study sorting? Sorting is a classic subject in computer science. There are three reasons for studying sorting algorithms.

why study sorting? Sorting is a classic subject in computer science. There are three reasons for studying sorting algorithms. Chapter 5 Sortig IST311 - CIS65/506 Clevelad State Uiversity Prof. Victor Matos Adapted from: Itroductio to Java Programmig: Comprehesive Versio, Eighth Editio by Y. Daiel Liag why study sortig? Sortig

More information

top() Applications of Stacks

top() Applications of Stacks CS22 Algorithms ad Data Structures MW :00 am - 2: pm, MSEC 0 Istructor: Xiao Qi Lecture 6: Stacks ad Queues Aoucemets Quiz results Homework 2 is available Due o September 29 th, 2004 www.cs.mt.edu~xqicoursescs22

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Part A Datapath Design

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Part A Datapath Design COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter The Processor Part A path Desig Itroductio CPU performace factors Istructio cout Determied by ISA ad compiler. CPI ad

More information

EE 459/500 HDL Based Digital Design with Programmable Logic. Lecture 13 Control and Sequencing: Hardwired and Microprogrammed Control

EE 459/500 HDL Based Digital Design with Programmable Logic. Lecture 13 Control and Sequencing: Hardwired and Microprogrammed Control EE 459/500 HDL Based Digital Desig with Programmable Logic Lecture 13 Cotrol ad Sequecig: Hardwired ad Microprogrammed Cotrol Refereces: Chapter s 4,5 from textbook Chapter 7 of M.M. Mao ad C.R. Kime,

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 20 Itroductio to Trasactio Processig Cocepts ad Theory Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio Trasactio Describes local

More information

Assignment 5; Due Friday, February 10

Assignment 5; Due Friday, February 10 Assigmet 5; Due Friday, February 10 17.9b The set X is just two circles joied at a poit, ad the set X is a grid i the plae, without the iteriors of the small squares. The picture below shows that the iteriors

More information

IMP: Superposer Integrated Morphometrics Package Superposition Tool

IMP: Superposer Integrated Morphometrics Package Superposition Tool IMP: Superposer Itegrated Morphometrics Package Superpositio Tool Programmig by: David Lieber ( 03) Caisius College 200 Mai St. Buffalo, NY 4208 Cocept by: H. David Sheets, Dept. of Physics, Caisius College

More information

CHAPTER IV: GRAPH THEORY. Section 1: Introduction to Graphs

CHAPTER IV: GRAPH THEORY. Section 1: Introduction to Graphs CHAPTER IV: GRAPH THEORY Sectio : Itroductio to Graphs Sice this class is called Number-Theoretic ad Discrete Structures, it would be a crime to oly focus o umber theory regardless how woderful those topics

More information

arxiv: v2 [cs.ds] 24 Mar 2018

arxiv: v2 [cs.ds] 24 Mar 2018 Similar Elemets ad Metric Labelig o Complete Graphs arxiv:1803.08037v [cs.ds] 4 Mar 018 Pedro F. Felzeszwalb Brow Uiversity Providece, RI, USA pff@brow.edu March 8, 018 We cosider a problem that ivolves

More information