A Holistic View of Stream Partitioning Costs

Size: px
Start display at page:

Download "A Holistic View of Stream Partitioning Costs"

Transcription

1 A Holstc Vew of Stream Parttonng Costs Nkos R. Katspoulaks, Alexandros Labrnds, Panos K. Chrysanths Unversty of Pttsburgh Pttsburgh, Pennsylvana, USA {katsp, labrnd, ABSTRACT Stream processng has become the domnant processng model for montorng and real-tme analytcs. Modern Parallel Stream Processng Engnes (pspes) have made t feasble to ncrease the performance n both montorng and analytcal queres by parallelzng a query s executon and dstrbutng the load on multple workers. A determnng factor for the performance of a pspe s the parttonng algorthm used to dssemnate tuples to workers. Untl now, parttonng methods n pspes have been smlar to the ones used n parallel databases and only recently load-aware algorthms have been employed to mprove the effectveness of parallel executon. We dentfy and demonstrate the need to ncorporate aggregaton costs n the parttonng model when executng stateful operatons n parallel, n order to mnmze the overall latency and/or throughput. Towards ths, we propose new stream parttonng algorthms, that consder both tuple mbalance and aggregaton cost. We evaluate our proposed algorthms and show that they can acheve up to an order of magntude better performance, compared to the current state of the art. 1. INTRODUCTION Montorng and real-tme temporal analytc queres are beng wdely used n a varety of servces, whose qualty reles on successfully capturng topc drft or trend fluctuaton. Examples of such servces nclude hgh-frequency algorthmc stock tradng, socal network analyss, targeted advertsng, and clck stream analyss. In order to match the speeds of data producton, stream processng s deemed as the most promsng model. At a hgh level, t requres (a) one-pass over the data, (b) constant processng tme, and (c) contnuous executon. After many flavors of sngle-thread stream processng engnes [2, 6, 7, 11], Parallel Stream Processng Engnes (pspes) have been ntroduced as a soluton for processng hgh-volume data streams, n both sngle-node mult-threaded (scale-up) [1, 24], and n multple-node (scale-out) [1, 35, 27, 3, 37, 34, 25, 8, 3] envronments. pspes have domnated the streamng landscape because of ther ablty to scale processng capablty by dvdng load nto parallel Ths work s lcensed under the Creatve Commons Attrbuton- NonCommercal-NoDervatves 4. Internatonal Lcense. To vew a copy of ths lcense, vst For any use beyond those covered by ths lcense, obtan permsson by emalng nfo@vldb.org. Proceedngs of the VLDB Endowment, Vol. 1, No. 11 Copyrght 217 VLDB Endowment /17/7. data-flows, handled by many workers. At ther core, pspes are crtcally affected by the parttonng algorthm of the sub data-flows delegated to workers: the more evenly the load s parttoned, the more scalable a pspe s. Therefore, parttonng s paramount, especally n stateful operatons, whch nvolve wndows of computaton, complex delvery semantcs (.e., exactly-once), and wndow synchronzaton [8]. The focal pont of our work s on stateful operatons, whch requre the collocaton of tuples wth smlar characterstcs and produce an aggregated result, for each user-defned logcal wndow. A wndow contans tuples based on count, tme, or even sesson. The need for strct delvery semantcs (e.g., exactly-once) mposes addtonal overheads for guaranteeng correctness and tmely delvery of results, whch are generated by checkpontng, out-of-order wndow algnment, wndow barrers, etc. Often, pspes rely on dynamc re-parttonng of tuples to acheve better load balance [32, 38, 17, 13, 18, 35, 9, 15]. However, reparttonng comes wth the addtonal burden of state mgraton, whch s a heavyweght task and nvolves complex synchronzaton protocols, state ntegraton polces (subject to wndow semantcs), and can potentally lead to delayed tuple delvery. Therefore, n our study we chose to take a step back and focus on the parttonng algorthm to make t more effcent, so that the need for reparttonng materalzes less often. Of course, re-parttonng solutons, such as Flux [32], requre parttonng 1, and can complement our work to enhance a pspe s performance further. Untl recently, pspes adopted partton algorthms used n Massvely Parallel Processng Database Management Systems. The most popular algorthms among them are shuffle (or round-robn) (SH), and feld (or hash) (FLD) algorthms [8, 34]. The former, blndly sends tuples to workers n a crcular fashon akn to shufflng a deck of cards; whereas, the latter explots a random process, usually a hash functon, to dsperse tuples to workers. Each partton algorthm has ts merts and ts drawbacks: SH manages to balance load evenly, but forces a computatonally heavy aggregaton step (Fg. 1a); FLD underperforms on skewed streams (.e., when some keys appear more often than others) but does not requre an aggregaton step (Fg. 1b). The state of the art partton algorthm s partal key groupng (PK) [28]. It focuses on mprovng performance by keepng track of the number of tuples sent to each worker n an onlne fashon. PK leverages the dea of key splttng [5], whch dctates that tuples wth the same attrbute(s) can be splt among two workers for the beneft of overall performance (Fg. 1c). Recently, an extenson of PK that uses more than two choces was proposed [29], and was shown to further balance load among workers. The decson about 1 Flux uses FLD as ts partton algorthm. 1286

2 W 1 W 1 W 1 P W 2 A P W 2 P W 2 A W V a: SH s aggregaton runtme s proportonal to V tmes the number of dstnct groups. W V b: FLD doesn t requre aggregaton, but fals to balance load under skewed nput. W V c: PK s aggregaton runtme s proportonal to M tmes the number of dstnct groups (M s the number of choces). Fgure 1: Exstng stream parttonng algorthms lack a unfed model that lmts mbalance whle keepng aggregaton cost low. whch worker wll receve a tuple s determned by the total number of tuples already sent to each one of them at the tme of the decson. Ths way, the merts of FLD and SH are combned by overcomng skewness through the use of multple choces and, at the same tme, reducng aggregaton cost. Partton algorthms lke PK (and ts mult-choce varant [29]) focus on the aspect of mbalance, n terms of tuples sent to each worker on the parallel step of a stateful operaton. Nevertheless, every stateful operaton requres a step n whch partal results are combned (Fgs. 1a and 1c). In our work, we argue that an mportant factor for performance s the aggregaton cost requred to produce the fnal result, whch s not consdered by any other parttonng algorthm. In fact, to the extent of our knowledge, no other stream parttonng algorthm ncorporates both mbalance and aggregaton cost. In ths paper we propose that trackng the aggregaton cost of a stateful operaton reduces to countng the number of dstnct keys sent to each worker on every wndow. Hence, we ntroduce a new class of parttonng algorthms, whch leverage such nformaton durng the decson process. Our contrbutons are: Introduce a novel cost model for stream parttonng that consders both load mbalance and aggregaton cost on every wndow of a stateful operaton. Propose novel stream parttonng algorthms that ncorporate our cost model to mprove performance. Demonstrate the benefts of our cost model n real world benchmarks and present an emprcal rule for choosng a parttonng algorthm for a stateful query. Secton 2 presents our model and exstng parttonng algorthms. Secton 3 shows mechansms for keepng track of cardnalty, and Secton 4 presents our proposed algorthms. Secton 5 and 6 demonstrate the detals of our experments, followed by Secton 7, whch offers a dscusson on pckng a partton algorthm. Fnally, Secton 8 presents related work, and our work concludes n Secton PROPOSED MODEL We focus on pspes for ether scale-up or scale-out archtectures. A scale-up archtecture s a sngle mult-core machne, n whch multple cores are used to accommodate concurrent threads. A scale-out archtecture s a mult-node envronment n whch a cluster of machnes s at the dsposal of a central managng authorty of the pspe. 2.1 Prelmnares A query Q s submtted to the pspe n ether declaratve or mperatve form. For the rest of ths secton we are gong to use the SELECT R.a, COUNT(*) FROM R JOIN S on R.a = S.b [ Range 5 mnutes slde 3 seconds ] WHERE S.c < 1 GROUP BY R.a a: Input Query Q. output groupby map jon flter stateful S R stateless b: Evaluaton tree. Fgure 2: Query submsson and evaluaton on a pspe. Table 1: Model Symbol Overvew Model Symbol Overvew V # of workers S streams 1 N X schema of S e X tuple of S W : S {S 1,..., S w } wndow for S P : S w {L 1 S w,..., LP S w } partton functon L P S w wndow load of worker f : L P S w {..., (k x, v x),...} partal evaluaton functon Γ : {..., f(l j S w ),...} R aggregaton functon example query depcted n Fg. 2a, usng CQL [4]. The pspe transforms Q nto a logcal plan, whch s often modeled as an evaluaton tree (Fg. 2b). The root of the tree represents output, whch can ether be an external system consumng the result or external storage. The leaves of the tree are streams, each one represented by S, where 1 N (N s the number of nput streams). Each S s abstracted as a sequence of tuples e X wth a predefned schema X. From ths pont on we are gong to descrbe our model based on a sngle nput stream S. However, wthout any loss of generalty ths model s capable of accommodatng multple streams as well. An e X s attrbutes can be represented as a trplet (τ X, k X, p X ). τ X s the attrbute responsble for orderng tuples n S and s used to assgn each e X to a logcal wndow (ether tme- or countbased). A logcal wndow s abstracted as a functon W : S {S 1,..., S w }, where w. Each S w represents the tuples of S that belong to wndow w accordng to W. k X {X τ X } are the attrbutes, whch dentfy a tuple, and p X {X (τ X +k X )} are the remanng attrbutes, whch comprse e X s payload. Often, those appear n predcates, projecton lsts, or are used by aggregate functons. In our example query, S 1 = R and S 2 = S. 1287

3 P((R S) w ) f(l 1 (R S)w )... f(l V (R S)w ) Γ({f(L 1 ),, f(l V )}) partton partal eval. aggregaton Fgure 3: The wndowed group-by count of the sample query (Fg. 2) as a 3 stage process. X 1 = (t, a) and X 2 = (t, b, c). Each tuple from R s modeled as a trplet where τ X1 = {R.t}, k X1 = {R.a}, and p X1 =. Smlarly, τ X2 = {S.t}, k X2 = {S.b}, and p X2 = {S.c}. Turnng to the evaluaton tree, nternal nodes represent algebrac operatons, whch work as transformatons of an nput stream S to another S. Each operaton can be ether stateless or stateful. The former are pure functons (as defned by functonal programmng prncples) and are easly parallelzed by arbtrarly parttonng ther nput stream. The latter can be ether a relatonal algebra operaton or any user-defned functon that produces a result on every wndow S w. Our work focuses only on parttonng tuples for stateful operatons. In the tree llustrated on Fg. 2b, map and flter are stateless, whereas, jon and group-by are stateful operatons. 2.2 A new formulaton for Parallel Stateful Operatons By the tme a stateful operaton s scheduled to execute n parallel, t transforms nto a 3-stage process for each wndow S w. Its nput conssts of S1 w,..., SN w and the 3 stages are n order: () partton, () partal evaluaton, and () aggregaton. Fgure 3 depcts the wndowed group-by count between streams R and S of the sample query as the 3 stage process. Partton can be modeled as a functon that takes a subsequence S w and produces another sequence of equal length that ndcates the worker to whch each e X w s gong to be sent. In other words, partton s a functon P : S w {o 1,..., o S w }, where 1 o l V ( x represents the length of a stream/sequence x). The resultng sequence conssts of elements o l, where 1 l S w, each one mappng e X w ndexed by l to a number n [1, V]. V represents the pspe s parallelsm degree for a partcular stateful operaton and s materalzed by V workers, whch are responsble for processng the partal result n wndow w. Each worker s ether a thread or a process. L o S w = {e e X w P w(s w )[e] = o} denotes the sequence of tuples from S w that wll be sent to worker o, by the partton process. Partal evaluaton s executed by V workers n parallel. Each worker receves ts correspondng L o S w sequence and apples the user-defned transformaton f. f produces a set of key-value pars: f : L o S w {(k 1, v 1),..., (k m, v m)} of arbtrary sze m. m s naturally bounded by the cardnalty of S w, whch s defned as the number of dstnct values k S n S w. For the rest of ths paper, the cardnalty of a stream/sequence x wll be represented by x, whch s used to refer to the number of dstnct keys (groups) held by a worker. Fnally, aggregaton combnes all the key value pars f(l o S w ) produced by each worker o, o [1, V], nto a fnal result, usng an aggregaton functon Γ({f(L 1 S w ),..., f(lv S w )}). Gong back to the query shown n Fg. 3, partton would be a functon P ({R S w}) that parttons each wndowed stream based on R.a. Partal evaluaton would be the partal count of the group-by operator and the result of each worker would produce a sequence of key value pars, n whch the keys would consst of dstnct R.a values and values would be the number of tuples for each correspondng R.a key. Fnally, the aggregaton stage would combne partal results by addng partal counts for every matchng R.a key. In essence, f 2 workers are used wth worker #1 producng {(x, 12), (y, 123)} and worker #2 producng {(x, 43), (y, 1), (z, 4)}, then aggregaton (Γ) produces the result {(x, 55), (y, 124), (z, 4)} (smlar to the processng model of [33]). 2.3 Proposed parttonng cost model Partton ams to: () dvde S w as evenly as possble among V workers, whle () aggregaton load (Γ) remans low. Ths way, executon can beneft from employng multple workers: the more max V(L V S w ) gets reduced, the faster the partal evaluaton step s gong to progress. In our work, we adopt the assumpton that there exsts a monotonc relaton between the number of tuples and load ncrease (smlar to prevous work on stream parttonng [28]). Ths entals that when a tuple s assgned to a worker, ts load wll ether ncrease or stay the same. [28] ntroduced tuple mbalance as a metrc for quantfyng a parttoner s effcency n terms of balancng load among workers. However, [28] expressed mbalance on the entre stream (.e., countng from the begnnng of tme), whch we beleve s lmted, gven the dynamc nature and characterstcs of data streams. In ths work, we extend mbalance to cover the wndow aspect of a streamng query: I(P (S w )) = max(l j j S w ) avg(l j S w ), j = 1,..., V (1) j Equaton 1 defnes tuple mbalance as the dfference of the maxmum L S w mnus the average L S w, as parttoned by a partton algorthm P. The less the tuple mbalance acheved by P s, the less the maxmum runtme of each worker wll be. We propose a new model for measurng the effectveness of a parttoner by ncorporatng aggregaton cost, whch has been gnored n the past. As we dscussed n Sec. 2.2, the aggregaton stage wll have to ngest all f(l V S w ) and combne every par of (k, v) tuples wth a matchng key k. Hence, the number of operatons for processng partal results s proportonal to the sum of the szes of all partal aggregatons f(l V S w ) : V Γ(S w ) = O( f(l o S w ) ) (2) o=1 Equaton 2 captures both processng and memory cost of the aggregaton, snce partal results need to be stored untl they are processed. In fact, the larger Γ(S w ) s, the more memory s requred to accommodate partal results. Therefore, we model stream parttonng as the followng mnmzaton problem: mnmze I(P (S w )) whle Γ(S w ) max 1 j V (Lj S w ) The reason Γ(S w ) should be less or equal than the maxmum L V S w s so that executon benefts from parallelzng the workload and not havng aggregaton become more than the maxmum partal processng cost. Fnally, n scale-out archtectures, workers load mght dverge due to external factors (.e., communcaton, mult-tenancy etc.). Our model (Eq. 1) focuses on dentfyng load generated by the (3) 1288

4 Aggregaton cost Low Hgh shuffle (SH) worst partal-key (PK) best feld (FLD) Low Imbalance Hgh Fgure 4: Stream parttonng algorthms expected performance. stateful operaton and act accordngly to balance t. To the extent of our knowledge, any method for broader load montorng n a pspe nvolves archtectural nterventons, such as montorng modules and feedback loops [17, 38, 13, 18, 23, 31]. If a pspe features the aforementoned components to detect load dvergence caused by external factors, our cost model (Eq. 3) can be extended to ncorporate that nformaton as well The ptfall of gnorng aggregaton costs To better understand nherent trade-offs among exstng partton algorthms, we present Fg. 4, whch llustrates the two dmensons wth whch each algorthm s measured. The horzontal axs represents the ablty of an algorthm to balance the load among workers, and the vertcal axs represents an algorthm s ablty to mantan the aggregaton cost low, based on our model (Eq. 3). In Fg. 4 we have placed prevously proposed partton algorthms based on ther expected behavor n terms of mbalance and aggregaton cost. As ndcated by Eq. 3, parttonng becomes a trade-off between tuple mbalance and aggregaton cost: the more tuples are spread, the more aggregaton tme ncreases. Consder S to be an nput stream wth schema X = (t, a, b), where τ X = {t}, k X = {a} and p X = {b}. In a stateful operaton, a parttonng algorthm has to make a choce of where all tuples wth a partcular key a wll be sent. Parttonng algorthms can be categorzed based on how many worker optons are presented for a gven k X. A 1-choce parttoner offers no mechansms to balance skewness on nput data. As a result, the workers that happen to be assgned the part of the data that appear the most (.e., most frequent) wll always have more work compared to others. That leads to hgher mbalance (Eq. 1). In addton, when a sngle opton for each k X s presented, aggregaton cost (Eg. 2) s mnmal, because each worker wll produce a subset of the full result. On the other hand, a M-choce parttoner (M V) presents M canddate workers for each k X. Thus, load for k X s dvded nto M equal parts and handled by M workers. As a result, mbalance (Eq. 1) s reduced, and the pspe takes better advantage of parallelsm. Unfortunately, partal results produced by the M workers handlng a partcular k X have to be gathered and combned. That entals an nflated aggregaton cost, whch s expected to ncrease by a factor of M. For example, n a sngle wndow S w, f tuples wth k X = a x are assgned to 4 workers, then the aggregaton stage wll process 4 partal results (.e., one from each worker). Shuffle Parttonng (round-robn) - SH blndly sends tuples to workers, wthout makng any attempt to balance load and collocate keys (Fg. 1a). Therefore, SH s categorzed as a M-choce parttoner because an aggregaton stage s requred to produce the 2 By changng Eq. 1 to multply L j S w wth load-dvergence coeffcents, produced perodcally by montorng components. fnal result. SH manages to mnmze tuple mbalance (Eq. 1) snce each worker receves the same number of tuples n a gven wndow S w : f V workers exst, each one wll receve Sw tuples. Turnng V to aggregaton cost Γ(S w ) (Eq. 2), when SH s used t becomes computatonally expensve, because tuples are parttoned wthout an attempt to collocate keys. Therefore, n a worst case scenaro, each worker wll produce a partal result (f(l V S w )) wth all the keys that exst n S w (llustrated n Fg. 1a). In that case, Γ(S w ) wll become equal to M tmes S w. As far as our cost model s concerned (Eq. 3), SH mnmzes mbalance, but does not act to lmt the aggregaton cost. Hash Parttonng (feld) - FLD follows a dfferent approach than SH, by collocatng tuples wth the same k X on the same worker (Fg. 1b). FLD feeds k X to a hash functon and selects a worker based on the result. It guarantees that keys from the same group wll be collocated, resultng n mnmal aggregaton cost (Eq. 2). Hence, FLD s characterzed as a 1-choce parttoner. Nevertheless, FLD fals to balance the load effectvely when nput s skewed and some keys appear more often than others. (.e., there s tuple mbalance - Eq. 1). Matters can get exacerbated f ntal expectatons (or assumptons) on nput load do not hold true overtme. Under such crcumstances, strugglng workers wth excess load wll hnder the progress of a query and even compromse the correctness of the result. In concluson, FLD mposes mnmal aggregaton cost but does not act on lmtng tuple mbalance (based on the cost model - Eq. 3). Partal Key Parttonng - PK, s the current state of the art algorthm [28]. It adopted the dea of key splttng [5] to allevate the load of processng keys that are part of the skew. PK was frst to ncorporate load n terms of the number of tuples assgned to each worker (.e., L V S w ). Key splttng s materalzed by usng a par of ndependent hash functons (.e., H 1, H 2) and feed k X to both. Also, PK mantans an array of sze V wth the total tuple count sent to each worker. Every tme a tuple arrves, ts k X s fed to H 1 and H 2 to dentfy 2 canddate workers. The parttoner wll forward the tuple to the canddate that has receved the least number of tuples up to that pont. PK was extended to more than two canddates [29], when two are not suffcent to handle skew. Even though PK succeeded n mprovng mbalance (Eq. 1) compared to FLD, t dd so by addng an essental aggregaton step (Eq. 2). Therefore, PK s expected to ncur aggregaton cost proportonal to the number of canddates. Turnng to our cost model (Eq. 3), PK can potentally volate the aggregaton cost constrant, when Γ(S w ) exceeds the maxmum workload experenced by each worker. Summary: Our goal s to propose parttonng algorthms that belong to the best quartle (Fg. 4) and use our cost model (Eq. 3). To acheve ths, we have to mantan the aggregaton cost low and acheve better mbalance. 3. MINIMIZING IMBALANCE WITH LOW AGGREGATION COST Desgnng a parttonng algorthm that acheves low aggregaton cost entals keepng track of the number of keys produced by each worker on every wndow S w (.e., f(l j S w ), for 1 j V). Equaton 2 ndcates that f the sum of f(l j S w ) s reduced, then the aggregaton cost gets reduced as well. However, the boundares of the aggregaton cost need to be dentfed frst. PROPOSITION 1. For a gven stream S w, a stateful operaton f, and V number of workers, Γ(S w ) s bounded by: S w Γ(S w ) V S w. 1289

5 PROOF. Γ(S w ) wll always be greater or equal to S w and that happens when the parttonng algorthm sends each key to a sngle worker only. In ths case, L S w L j S w =, 1 j V. Hence, L 1 S w LV S w = Sw. Smlarly, f the partton algorthm sends at least one tuple for each key to every worker (.e., k L j S w, k S w and 1 j V), then L 1 S w LV S w = Sw S w = V S w }{{} V A mechansm for montorng Γ(S w ) s value has to be establshed. Eq. 2 can be expanded to the sum of ts operands as: Γ(S w ) = f(l 1 S w ) f(lv S w ) (4) Hence, n order to montor aggregaton cost, the partton algorthm has to keep track of the number of dstnct keys sent to each worker, for each S w. 3.1 Incorporatng Cardnalty n Parttonng Assumng a mechansm for keepng track of workers cardnaltes has been establshed, the cost model (Eq. 3) can be extended to ncorporate the knowledge of the number of dstnct keys sent to each worker. As ndcated by Eq. 3, the nformaton about workers cardnaltes can be used n two places: (a) mbalance (Eq. 1), and (b) aggregaton cost (Eq. 2) Cardnalty n mbalance The load of each worker has been modeled n terms of number of tuples. In the same manner, a worker s load can be expressed n terms of cardnalty usng the followng formula: CL j S w = L j S w, 1 j V (5) Equaton 5 depcts the load of a worker n terms of the number of dstnct keys sent to t. Therefore, cardnalty mbalance can be expressed as the dfference between the maxmum and the mean cardnalty of all workers for a gven wndow S w, as a result of a parttonng algorthm P : CI(P (S w )) = max(cl j j S w ) avg(cl j S w ), 1 j V (6) j At ths pont, mbalance s determned by tuple count and cardnalty. However, dfferent stateful operatons are affected by each metrc dfferently. Hence, there s a need for a more dverse load estmaton formula, whch combnes tuple count and cardnalty. In order to avod one metrc domnatng the other, the ntal values should be scaled accordngly: L j S w L j S = w mn 1 k V (L k S w ) max 1 k V (L k S w ) mn 1 k V (L k S w ) CL j CL j S S = w mn 1 k V (CL k S w ) w max 1 k V (CL k S w ) mn 1 k V (CL k S w ) H j S w = pl j S w (7) (8) + (1 p)cl j S, where 1 j V (9) w Equaton 9 combnes the normalzed loads both n terms of tuples (Eq. 7) and dstnct keys (Eq. 8) n a unfed score. That score s adjustable based on a user s (or query optmzer s) parameter p, whch controls the bas for each score accordngly: the smaller the p, the less the load n terms of tuples affects Eq. 9; whereas the hgher the p, the less the load n terms of dstnct keys affects Eq. 9. Fnally, mbalance can be expanded to a hybrd form that ncorporates load n terms of both tuple count and cardnalty: HI(P (S w )) = max(h j j S w ) avg(h j S w ), 1 j V (1) j Cardnalty n aggregaton Aggregaton cost s determned by Γ(S w ) (Eq. 2) and reducng t emanates from reducng the sum of dstnct keys sent to each worker. Its mnmum value can be S w when each key s sent to only a sngle worker. Ths behavor resembles FLD and t mght result n mbalance on workers. To avod ths, we employ key splttng for keys that have not been sent to a worker before n a partcular wndow S w. By sendng each newly encountered key to the worker wth ether the least keys or the least number of tuples up to that pont, the aggregaton cost remans low. Also, mbalance s expected to be lower compared to the one acheved from FLD. 3.2 Cardnalty Estmaton data structures The parttoner needs to keep track of each worker s cardnalty, every tme a new tuple arrves. Hence, t should mantan an array of V cardnalty estmaton structures (C), whch wll offer two methods: () update(k X ): for updatng the count of dstnct keys; and () estmate(): for returnng the count of dstnct keys Nave The nave approach for estmatng a worker s cardnalty nvolves keepng track of the exact number of dstnct keys. Therefore, a parttoner responsble for V downstream workers, V unordered set structures are needed. Ths way, the update and the estmate methods wll offer constant executon tme (O(1)). One caveat of usng an unordered set structure for each worker s the memory overhead. Dependng on the algorthm used, a key can end up n multple workers (e.g., SH). Ths way, the memory requred for mantanng the number of keys on each worker can become O(V S w ), snce all unordered sets can end up havng each key. The memory cost of a nave cardnalty estmaton structure s related to the cardnalty of S w and the choce of the parttonng polcy: If S w remans low and the parttoner does not send the same keys to multple workers, the memory requrements for C wll reman low. However, f S w s hgh and the parttoner tends to send tuples wth the same key to multple workers, then memory load can hnder the partton process Hyperloglog HyperLogLog (HLL) ntroduced by Flajolet et al. [14] s an algorthm for estmatng the number of dstnct elements n databases wth a bounded error. HLL requres O (log 2 log 2 N) memory for a relaton expected to have N dstnct elements. Every tme a new tuple arrves, ts k X s extracted and fed through a hash functon. HLL extracts the m most sgnfcant bts of the hash result, and uses them to dentfy whch regster (out of 2 m ) to update. Each regster s log 2 log 2 N bts long, and ts value s updated dependng on the left-most zero of the m most sgnfcant bts of the hashed value. HLL has been shown to present an accuracy of 1.4 m. Recent work from Heule et al. [2] presented a number of mprovements that need to take place so that cardnaltes n the orders of bllons can be estmated effcently. For cardnalty estmaton, a partton algorthm s requred to use V HLL s to measure the number of dstnct keys sent to each of the V worker. HLL can be used to lmt the memory cost but t uses rreversble operatons to update ts nternal buckets. That consttutes t unable to check whether a key has been prevously sent to a worker. We ntroduce an optmstc mechansm for checkng f a key has been 129

6 Algorthm 1: Partton. nput : e X, t w, C, L output: worker to whch e X wll be sent to 1 k = GetKey(e X ); 2 t = GetTmestamp(e X ); 3 f t t w then 4 Reset(C); 5 Reset(L); 6 c 1 = H 1(k); 7 c 2 = H 2(k); 8 return decde(c, L, k, c 1, c 2); forwarded to a partcular worker before: upon the arrval of a key that hashes to a worker, ts cardnalty c s estmated (call to estmate). A tral update of c s performed and the new cardnalty c s estmated. If the cardnalty estmate dfference ( c c ) =, then our mechansm optmstcally assumes that the key has already been sent to the correspondng worker. HLL s expected to make wrong decsons at the beneft of a constant memory cost. 4. PROPOSED CARDINALITY-AWARE PAR- TITIONING ALGORITHMS PK [28, 29] has motvated the merts of key splttng [5] for reducng mbalance among workers. Therefore, all the varatons of our proposed algorthms leverage key splttng, whch materalzes wth the use of multple hash functons for dentfyng canddate workers. For smplcty, our algorthms are presented wth only two canddates, but they can be extended to accommodate more. The bass of our algorthms s presented n Alg. 1 and s called by the pspe s parttoner when a new tuple arrves. The parttoner mantans two arrays of sze V: one wth cardnalty estmaton structures (C), and one wth tuple counters (L). L s dentcal to the one used by PK and gets updated the same way n all our proposed algorthms. e X s key s extracted and fed to the two hash functons: H 1 and H 2. If the parttoner uses more than two canddates (.e., M > 2), then an equal number of hash functons are used n the decson process. The resultng choces (c 1 and c 2) along wth the arrays C and L are passed to decde(). Durng query executon, a pspe mght have multple nstances of parttoners runnng on dfferent machnes (especally n a scale-out settng, where thousands of threads are nvolved n a query). The advantage of usng hash functons s that no exchange of nformaton s requred among dfferent nstances of parttoners. On top of that, C and L have ther counts monotoncally ncreasng on each wndow. Therefore, f each of the parttoners tres to reduce mbalance and/or aggregaton cost, then (through the addtve property) the overall mbalance and/or aggregaton cost are reduced. Fnally, C and L need to be reset when a wndow expres (Alg. 1 lne 3). Ths guarantees that decsons reflect the temporal nature of stream processng. Algorthm 1 receves t w as an argument, whch s the expraton tmestamp of the current wndow. In the followng sectons we go over our varatons for decde(): () Cardnalty mbalance Mnmzaton (CM), () Group Affnty wth mbalance Mnmzaton (AM & cam), and () Hybrd mbalance Mnmzaton (LM). 4.1 Cardnalty Imbalance Mnmzaton (CM) The frst parttonng algorthm ams at lmtng cardnalty mbalance (Eq. 6) and the decson s made based on the cardnalty Algorthm 2: Cardnalty mbalance mnmzaton (CM) nput : C, L, k, c 1, c 2 output: worker to whch the tuple s gong to be sent to 1 l 1 = C[c 1].estmate(); 2 l 2 = C[c 2].estmate(); 3 f l 1 l 2 then 4 C[c 1].update(k); 5 L[c 1] += 1; 6 return c 1; 7 else 8 C[c 2].update(k); 9 L[c 2] += 1; 1 return c 2; Algorthm 3: Group affnty combned wth cardnalty mbalance mnmzaton (AM) nput : C, L, k, c 1, c 2 output: worker to whch the tuple s gong to be sent to 1 f C[c 1].contans(k) then 2 L[c 1] += 1; 3 return c 1; 4 else f C[c 2].contans(k) then 5 L[c 2] += 1; 6 return c 2; 7 else 8 l 1 = C[c 1].estmate(); 9 l 2 = C[c 2].estmate(); 1 f l 1 l 2 then 11 C[c 1].update(k); 12 L[c 1] += 1; 13 return c 1; 14 else 15 C[c 2].update(k); 16 L[c 2] += 1; 17 return c 2; estmate retreved by the C array structure (Eq. 5). The newly arrved tuple e X s sent to the canddate worker that has the least cardnalty. Algorthm 2 llustrates the cardnalty mbalance mnmzaton algorthm (CM), whch works as a counterpart to PK. CM can have ts cardnalty estmaton structure be ether the Nave (Secton 3.2.1) or the HLL wth our optmstc mechansm (Secton 3.2.2). Ths algorthm s expected to be used n operatons n whch processng cost s domnated by the amount of dstnct keys. Ths way, mbalance n terms of cardnalty wll be mnmal. However, mbalance n terms of tuple counts wll be ncreased, snce CM s tuple count agnostc and makes no effort on lmtng aggregaton cost. 4.2 Group Affnty and Imbalance Mnmzaton (AM & cam) Group Affnty algorthms try to mpose no addtonal aggregaton cost, whle balancng load wth the use of key splttng. The name affnty comes from keepng track of whether a key has been encountered before n S w, and f t dd, then t s forwarded to the worker that receved t prevously. The frst varaton of affnty based algorthms, s AM and focuses on cardnalty mbalance (Alg. 3). AM tres to mnmze aggregaton cost by not splttng keys among workers. Frst, t checks 1291

7 Algorthm 4: Group affnty wth mbalance mnmzaton (cam) nput : C, L, k, c 1, c 2 output: worker to whch the tuple s gong to be sent to 1 f C[c 1].contans(k) then 2 L[c 1] += 1; 3 return c 1; 4 else f C[c 2].contans(k) then 5 L[c 2] += 1; 6 return c 2; 7 else 8 l 1 = L[c 1].estmate(); 9 l 2 = L[c 2].estmate(); 1 f l 1 l 2 then 11 C[c 1].update(k); 12 L[c 1] += 1; 13 return c 1; 14 else 15 C[c 2].update(k); 16 L[c 2] += 1; 17 return c 2; Algorthm 5: Hybld mbalance mnmzaton (LM) nput : C, L, k, c 1, c 2 output: worker to whch the tuple s gong to be sent to 1 hl 1 = pl c 1 S w + (1 p)cl c 1 S w ; 2 hl 2 = pl c 2 S w + (1 p)cl c 2 S w ; 3 f hl 1 hl 2 then 4 C[c 1].update(k); 5 L[c 1] += 1; 6 return c 1; 7 else 8 C[c 2].update(k); 9 L[c 2] += 1; 1 return c 2; f one of the canddate workers has encountered key k prevously. If one of them dd, then the tuple s forwarded to that worker; otherwse, t s sent to the worker wth the least cardnalty up to that pont. A dfferent varaton of AM, named cam (Alg. 4) behaves smlarly, but t forwards the tuple to the worker wth the least tuple count up to that pont. Ths way, both aggregaton cost and mbalance are consdered durng parttonng. Despte the fact that AM and cam resemble FLD, they are expected to perform better because of the multple number of choces that are presented to them. 4.3 Hybrd Imbalance Mnmzaton (LM) For stateful operatons equally affected by tuple count and cardnalty, we propose the hybrd load mbalance mnmzaton algorthm (LM). It combnes a worker s tuple count wth cardnalty and calculates hybrd load as ndcated n Eq. 9. A tuple s forwarded to the worker wth the least load and LM s man goal s to mnmze hybrd load mbalance (Eq. 1). LM s depcted on Algorthm EXPERIMENTAL SETUP Our experments were conducted on an AWS c4.8xlarge nstance, runnng Ubuntu v14.4. For all experments, we used our own mult-threaded stream parttonng lbrary, developed n C++11 and Table 2: Stream parttonng algorthms. w s the total number of workers. Symbol Algorthm Choces Cardnalty Estmaton Structure used shuffle w None feld 1 None partal-key [28] 2 None PK-5 partal-key [29] 5 None Alg. 2 2 Nave, Sec Alg. 3 2 Nave, Sec AM-5 Alg. 3 5 Nave, Sec c Alg. 4 2 Nave, Sec Alg. 4 5 Nave, Sec Alg. 5 2 Nave, Sec H Alg. 2 2 HLL, Sec H Alg. 3 2 HLL, Sec Alg. 5 2 HLL, Sec compled wth GCC v Our performance analyss nvolved a varyng numbers of concurrent worker threads (8 up to 32), and data parttons (from 8 to 256). The reason we dd not experment wth more threads was because we dd not want to pollute results wth context-swtchng overheads. All reported runtmes are the averages of 7 runs, after removng mnmum and maxmum reported tmes, to compensate for anomales related to runnng concurrent processes. 5.1 Stream Parttonng Algorthms We evaluated algorthms shuffle (SH), feld (FLD), and partal key (PK) [28] (wth 2 and 5 canddate workers), along wth dfferent varatons of our proposed algorthms: Cardnalty Imbalance Mnmzaton (CM), Group Affnty wth Cardnalty Imbalance Mnmzaton (AM), Group Affnty wth Imbalance Mnmzaton (cam) and Hybrd Imbalance Mnmzaton (LM). For all varaton of LM we set p to.5 to acheve unbased load estmaton. As a reference mplementaton for SH, FLD and PK we used the ones found n Apache Storm. In addton, we used the open source mplementaton of Murmur-Hash v3. All our proposed algorthms appear n two versons: one wth nave and one wth HLL as the cardnalty estmaton structure. For the former, we used C++ STL s mplementaton of unordered set, and for HLL, we mplemented our verson wth 496(= 2 12 ) regsters and a regster sze of 5 bts. The choce of the number and sze of regsters was made to accommodate up to 1 7 dstnct keys of 32 bts, wth an accuracy lower than 2%, as nstructed n [14]. Table 2 explans the algorthm symbols we use n our graphs. 5.2 Data sets and Workloads Table 3 summarzes the characterstcs of each data set/benchmark used n our experments. Below, we go over each data set and the queres we used n our study. TPC-H (TPCH): TPCH has been extensvely used for throughput orented streamng scenaros [13, 3, 1, 8, 12]. Out of 22 TPCH queres, 16 of them feature a groupng statement: half mantan a constant and half a scalng number of groups that ncreases when the scale factor grows. Due to the fact that our work addresses stateful operatons, we focused on groupng TPCH queres and pcked Query 1 (as a constant groupng query) and query 3 (as a scalng groupng query). Those two dffer sgnfcantly n the number of resultng groups, and ths enabled us to document the performance of dfferent partton algorthms, when the aggrega- 1292

8 Throughput (GB/s) Table 3: Summary of data characterstcs. Dataset Sze Groups Wndow Metrc TPC-H 1GB 4 up to 1k N/A throughput DEBS 32GB 62.5K up to 8.1M sldng latency GCM 16GB 4 to 67K sldng latency Number of workers c -H -H AM-5 PK-5 Fgure 5: TPCH Query 1 performance (throughput). ton cost vares n terms of sze. As ndcated n Table 3, Query 1 presents 4 and Query 3 presents up to 11 resultng groups. Data were generated usng the dbgen tool (v2.17) wth a scale factor of 1. ACM DEBS 215 Grand Challenge (DEBS): DEBS totals 32 GBs n raw sze, and comes wth two sldng wndow queres [21]: () the Top-1 most frequent routes (Query 1), and () the Top- 1 most proftable areas (Query 2). DEBS presents a per wndow latency orented data set, and ts two queres offer group numbers that can potentally range from 62.5 thousand to 8.1 mllon. Google Cluster Montorng (GCM): Ths data set contans executon traces from one of Google s cluster management software systems, and for t we used two sldng wndow queres, whch, lke DEBS, are per wndow latency orented. GCM Query 1 was taken from [24], and scans the task event table to calculate every 6 seconds (wth a slde of 1 second) the total CPU cores requested by each schedulng class. In addton, we ntroduced GCM Query 2 that calculates every 45 mnutes (wth a slde of 1 second) the average CPU cores requested by each job ID. There are more than 6 thousand job IDs n the whole data set. 6. EXPERIMENTAL RESULTS Our experments evaluate the mpact of a partton algorthm on performance (Sec. 6.1), n terms of throughput (usng the TPCH data set) and wndow latency (usng the DEBS data set). Moreover, we evaluate the scalablty (Sec. 6.2) of our algorthms compared to the state of the art (usng the GCM data set). For all experments, data were loaded n man memory before executon. The tme to load data and wrte output to storage was not ncluded n the reported tmes. Fnally, for the experments of Sec. 6.1 and 6.2, the tme t takes to partton tuples s not ncluded, because t s analyzed n Sec. 6.3 n terms of both processng and memory costs. 6.1 Performance In ths set of experments, we used the TPCH and DEBS benchmarks to evaluate performance TPCH Query 1 (Fg. 5) Fgure 5 ndcates that for TPCH Query 1, performs the best. Ths behavor s expected snce there are only 4 groups for Query 1. Therefore, aggregaton cost s neglgble and performance s affected only by tuple mbalance. s expected to offer optmal tuple mbalance ( 1), whch s reflected on results shown n Fg. 5. Those agree wth our model (Eq. 3), whch dentfes that offers mnmal mbalance wth constant aggregaton runtme of O(4V) (V s the number of workers). In addton, PK-5 offers the next best throughput, snce t reduces tuple mbalance, compared to all other algorthms wth 2 and 5 alternatve choces per group. CM and LM do not scale well wth two choces, snce they are affected by cardnalty mbalance. LM s expected to perform smlarly to PK, f p takes a value of 1 (as ndcated n Eq. 9). Turnng to 1- choce parttoners (.e.,, AM and cam), they present constant performance and do not scale when the number of workers ncreases. Ths happens because each group, s presented to a sngle canddate worker. Take-away: If the number of groups s constant and much smaller than the sze of the aggregaton, SH performs the best TPCH Query 3 (Fgs. 6-7) The query plan of TPCH Query 3 conssts of a parallel hash jon for the customer and orders tables, followed by a broadcast jon wth the lnetem table. Then, a parallel computaton of the group by follows, and executon concludes wth a fnal aggregaton step to materalze the result. Fgure 6 llustrates the performance of 1-choce parttoners (.e.,, AM, and cam),,, and PK-5. M-choce parttoners performed from 2.5x up to an order of magntude worse (LM and CM offered smlar performance to ). As shown on Table 3, TPCH Query 3 nvolves 11 thousand groups (before applyng the lmt statement), and aggregaton can take up to 6% of total executon tme for M-choce parttoners. As a result, M- choce parttoners (.e., SH, PK, CM, and LM) experence a substantal performance overhead on the fnal aggregaton step. Fgure 7 llustrates the relatve to tuple mbalance acheved by dfferent varatons of AM and cam. Even though, -H acheves better tuple mbalance compared to, t fals to perform n the same level as. Tuple mbalance results justfy the throughput shown on Fg. 6, n whch (apart from -H) all varatons of our proposed algorthms perform sgnfcantly better than. By adoptng key splttng, throughput ncreases wth the use of multple canddate workers. AM-5 and offer mproved throughput up to 47% compared to. Take-away: For throughput-orented queres, wth a large number of groups, cam and AM perform the best. They acheve up to an order of magntude better throughput compared to PK, and outperform FLD by up to 47% DEBS Query 1 (Fgs. 8a - 8c) Turnng to DEBS, both queres nvolve wndow semantcs and the performance s measured n wndow latency. Fgure 8 shows the mean and 99 percentle wndow latency acheved by each partton algorthm. It s clear that n all worker settngs,,, c, -H, AM-5, and perform the best. Ths emanates from a lack of aggregaton overhead, whch consttutes those algorthms scalable when the number of workers ncreases. In fact, aggregaton cost amounts for more than 7%, 84%, and 88% of total runtme for,, PK-5,, and. Fnally, -H acheves dentcal performance wth, whch leads us to beleve that our optmstc mechansm for cardnalty estmaton mantans a low error. Take-away: For latency-orented stateful queres, AM, and cam perform from 4.5x up to 11.6x better compared to PK. 1293

9 Throughput (GB/s) Number of workers -H c AM-5 PK-5 Fgure 6: TPCH Query 3 performance (throughput) DEBS Query 2 (Fgs. 9a - 9c) The executon plan starts by parttonng ncomng tuples based on the medallon of each rde and each worker has to create two local ndces: one for accumulatng fares for each pckup cell, and one for keepng track of the latest drop-off cell. Then, an aggregaton step follows, whch gathers each pckup cell s fares and determnes the latest cell for each medallon. The two resultng streams are parttoned based on pckup and drop-off cell IDs. Next, a gather step s executed, n whch the medan fare and the number of vacant taxs are processed to calculate the proft of each cell. Fnally, partal results are merged and ordered to produce the Top-1 most proftable cells. Ths query represents a problematc case for our model (Eq. 3), because parttonng does not necessarly affect the workload mposed on each worker (.e., the parttonng key s the medallon but each worker s state s affected by the number of dstnct cells). Fg. 9 depcts wndow latency acheved by each algorthm, and t s apparent that,, AM-5, c,, and - H offer the best performance. AM and cam n all ther varatons outperform FLD n both mean (from 1.2x up to 1.5x) and 99 percentle (from 1.3x to 1.9x) latency. Ths s justfed by AM s and cam s ablty to partton data more evenly compared to FLD. M- choce parttoners underperform because they do not act on lmtng aggregaton overhead. In comparson wth PK (n both and PK-5), AM and cam perform up to 5.7x faster. In order to examne AM s and cam s scalablty, we also ran DEBS Query 2 wth 64 and 128 workers. They performed up to 6.2x better than PK and up to 2.3x better than FLD. Take-away: For latency-orented complex queres, wth more than one stateful operatons, AM and cam have wndow latency between 1.2x and 1.9x lower than FLD, and up to 5.7x lower than PK. 6.2 Scalablty We used the GCM dataset to measure the scalablty of AM and cam compared to SH and PK. The reason for pckng GCM for scalablty experments s because t presents a conventonal montorng scenaro, n whch groups are not sgnfcantly more than the number of tuples n a wndow (lke n TPCH and DEBS), and the queres consst of a sngle stateful operaton. Ths way, M-choce parttoners would not be mpeded by the aggregaton cost GCM Query 1 (Fgs. 1 & 11) GCM Query 1 features up to 4 groups and dffers from TPCH Query 1 because the number of tuples n every wndow s com- Relatve Imbalance Number of Workers -H AM-5 c Fgure 7: TPCH Query 3 relatve mbalance to FLD. parable to the number of groups (the average wndow sze s 42 groups). For ths query, we measured SH s, AM s and cam s scalablty compared to, whch s the current state of the art and s expected to be scalable due to the small number of groups (as n Sec ). Fg. 1 presents the percentage mprovement n wndow latency of SH, AM, and cam compared to. has ts latency mprovement decrease, because aggregaton cost ncreases when more workers are employed. In contrast, AM and cam have ther latency decrease when the number of workers ncreases and they exhbt lower latency than. As Fg. 11 ndcates, AM s and cam s scalablty results from ther constant aggregaton cost whle the partal evaluaton latency decreases. The former s not the case wth and, whch have the aggregaton cost percentage ncrease wth the number of workers. Take-away: AM and cam are scalable, mantan a constant aggregaton cost, and outperform by up to 1.3x GCM Query 2 (Fg. 12) Fg. 12 llustrates the percentage mprovement n wndow latency of, AM, and cam over. Even though ths query contans a large number of groups (Table 3), ts average wndow sze s only 181 tuples and group repetton s scarce. Therefore, M-choce parttoners wll not have ther performance deterorate due to an overwhelmng aggregaton cost (the case n TPCH Query 3-Sec ). However, s not scalable because when addtonal workers are employed ts aggregaton cost becomes hgher. Turnng to, t manages to be scalable, but t underperforms compared to AM and cam n all worker settngs. Take-away: AM and cam are scalable and present more than 1.4x better latency compared to PK. 6.3 Partton algorthm cost In ths set of experments, we measure overhead mposed by each algorthm n terms of processng and memory cost. To that end, we pcked DEBS Q1, because t features the longest group dentfer (15 bytes), and the number of groups can reach up to 8.1M Partton latency (Fgs. 13a - 13c) To measure processng tme, we marshaled DEBS data to each algorthm and measured partton latency on each wndow. As descrbed n Sec. 3.2, the cardnalty estmaton structure sze reles on the number of workers. Therefore, we measured partton latency for 8, 16, and 32 workers. Fgure 13 llustrates the total tme spent on each wndow wth each partton algorthm. We ncluded 9 and 99 percentle wndow latency. Most of the algorthms present constant values for 8, 16, and 32 workers. However, notceable 1294

10 Wndow Latency (msec) H -H AM-5 PK-5 c mean 99 %le a: 8 workers. Wndow Latency (msec) H -H AM-5 PK-5 c mean 99 %le b: 16 workers. Wndow Latency (msec) H -H AM-5 PK-5 c mean 99 %le c: 32 workers. Wndow Latency (msec) H -H AM-5 PK-5 c mean 99 %le a: 8 workers. Fgure 8: DEBS Query 1 performance (wndow latency). Wndow Latency (msec) H -H AM-5 PK-5 c mean 99 %le b: 16 workers. Wndow Latency (msec) Fgure 9: DEBS Query 2 performance (wndow latency) H -H AM-5 PK-5 c mean 99 %le c: 32 workers. Latency mprov. over PK (%) Number of Workers -H AM-5 c Fgure 1: Latency percentage mprovement over PK for GCM Query 1. Aggr. (%) of Runtme Number of workers -H AM-5 PK-5 c Fgure 11: Aggregaton percentage of runtme for GCM Query 1. dfference can be seen wth 32 workers (Fg. 13c) for the 99 percentle wndow latency of, -H,, and. The ncrease s a result of addtonal processng requred for those algorthms. Take-away: Usng our proposed algorthms does not ncur any notceable overhead n latency Partton Memory (Table 4) Our proposed algorthms make use of cardnalty estmaton structures. Hence, we ran a mcro-benchmark, n whch we produced each possble key and replcated t to both avalable canddate workers. Ths experment ams at examnng an extreme scenaro, n whch all of the 8.1M groups appear n a sngle wndow. We measured memory consumed n MBs (Table 4). The nave cardnalty estmaton structure sze quckly ncreases wth the number of keys. Snce each key s sent to both of the two canddates, the nave cardnalty estmaton structure s sze ncreases further. Conversely, when HLL s used, memory consumpton ncreases when the number of workers ncreases and ts sze does not get affected by nether the number of keys, nor the number of canddates. However, f the expected cardnalty of the nput stream s more than 1 mllon, then each HLL structure needs to double ts number of buckets. Take-away: Memory requrements of the cardnalty estmaton structure can be sgnfcantly lmted wth the use of HLL. 7. DISCUSSION In concluson, a pspe s performance can be affected by both mbalance and aggregaton cost. Accordng to our expermental results, the state of the art soluton (.e., PK) fals to perform well, when a large number of groups appears, and 1-choce parttoners lke FLD can make use of key splttng [5] to acheve better performance. Mantanng low mbalance does not necessarly lead to lmtng aggregaton cost. Even f an mproved and dverse load metrc s used (.e., CM wth Eq. 6 and LM wth Eq. 1), performance wll degrade when the number of groups ncreases. In fact, M-choce parttoners underperform when a large number of keys appear, because they focus solely on mnmzng mbalance. After conductng a senstvty analyss on M-choce parttoners and ther 1295

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Simulation Based Analysis of FAST TCP using OMNET++

Simulation Based Analysis of FAST TCP using OMNET++ Smulaton Based Analyss of FAST TCP usng OMNET++ Umar ul Hassan 04030038@lums.edu.pk Md Term Report CS678 Topcs n Internet Research Sprng, 2006 Introducton Internet traffc s doublng roughly every 3 months

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Parallel matrix-vector multiplication

Parallel matrix-vector multiplication Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009. Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton

More information

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory Background EECS. Operatng System Fundamentals No. Vrtual Memory Prof. Hu Jang Department of Electrcal Engneerng and Computer Scence, York Unversty Memory-management methods normally requres the entre process

More information

Reducing Frame Rate for Object Tracking

Reducing Frame Rate for Object Tracking Reducng Frame Rate for Object Trackng Pavel Korshunov 1 and We Tsang Oo 2 1 Natonal Unversty of Sngapore, Sngapore 11977, pavelkor@comp.nus.edu.sg 2 Natonal Unversty of Sngapore, Sngapore 11977, oowt@comp.nus.edu.sg

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

Load-Balanced Anycast Routing

Load-Balanced Anycast Routing Load-Balanced Anycast Routng Chng-Yu Ln, Jung-Hua Lo, and Sy-Yen Kuo Department of Electrcal Engneerng atonal Tawan Unversty, Tape, Tawan sykuo@cc.ee.ntu.edu.tw Abstract For fault-tolerance and load-balance

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

CSE 326: Data Structures Quicksort Comparison Sorting Bound

CSE 326: Data Structures Quicksort Comparison Sorting Bound CSE 326: Data Structures Qucksort Comparson Sortng Bound Steve Setz Wnter 2009 Qucksort Qucksort uses a dvde and conquer strategy, but does not requre the O(N) extra space that MergeSort does. Here s the

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

UB at GeoCLEF Department of Geography Abstract

UB at GeoCLEF Department of Geography   Abstract UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department

More information

Programming in Fortran 90 : 2017/2018

Programming in Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

CSE 326: Data Structures Quicksort Comparison Sorting Bound

CSE 326: Data Structures Quicksort Comparison Sorting Bound CSE 326: Data Structures Qucksort Comparson Sortng Bound Bran Curless Sprng 2008 Announcements (5/14/08) Homework due at begnnng of class on Frday. Secton tomorrow: Graded homeworks returned More dscusson

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

Efficient Distributed File System (EDFS)

Efficient Distributed File System (EDFS) Effcent Dstrbuted Fle System (EDFS) (Sem-Centralzed) Debessay(Debsh) Fesehaye, Rahul Malk & Klara Naherstedt Unversty of Illnos-Urbana Champagn Contents Problem Statement, Related Work, EDFS Desgn Rate

More information

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Some materal adapted from Mohamed Youns, UMBC CMSC 611 Spr 2003 course sldes Some materal adapted from Hennessy & Patterson / 2003 Elsever Scence Performance = 1 Executon tme Speedup = Performance (B)

More information

Concurrent Apriori Data Mining Algorithms

Concurrent Apriori Data Mining Algorithms Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng

More information

Dynamic Voltage Scaling of Supply and Body Bias Exploiting Software Runtime Distribution

Dynamic Voltage Scaling of Supply and Body Bias Exploiting Software Runtime Distribution Dynamc Voltage Scalng of Supply and Body Bas Explotng Software Runtme Dstrbuton Sungpack Hong EE Department Stanford Unversty Sungjoo Yoo, Byeong Bn, Kyu-Myung Cho, Soo-Kwan Eo Samsung Electroncs Taehwan

More information

Self-tuning Histograms: Building Histograms Without Looking at Data

Self-tuning Histograms: Building Histograms Without Looking at Data Self-tunng Hstograms: Buldng Hstograms Wthout Lookng at Data Ashraf Aboulnaga Computer Scences Department Unversty of Wsconsn - Madson ashraf@cs.wsc.edu Surajt Chaudhur Mcrosoft Research surajtc@mcrosoft.com

More information

Distributed Resource Scheduling in Grid Computing Using Fuzzy Approach

Distributed Resource Scheduling in Grid Computing Using Fuzzy Approach Dstrbuted Resource Schedulng n Grd Computng Usng Fuzzy Approach Shahram Amn, Mohammad Ahmad Computer Engneerng Department Islamc Azad Unversty branch Mahallat, Iran Islamc Azad Unversty branch khomen,

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

Hierarchical clustering for gene expression data analysis

Hierarchical clustering for gene expression data analysis Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally

More information

Comparison of Heuristics for Scheduling Independent Tasks on Heterogeneous Distributed Environments

Comparison of Heuristics for Scheduling Independent Tasks on Heterogeneous Distributed Environments Comparson of Heurstcs for Schedulng Independent Tasks on Heterogeneous Dstrbuted Envronments Hesam Izakan¹, Ath Abraham², Senor Member, IEEE, Václav Snášel³ ¹ Islamc Azad Unversty, Ramsar Branch, Ramsar,

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

Real-Time Guarantees. Traffic Characteristics. Flow Control

Real-Time Guarantees. Traffic Characteristics. Flow Control Real-Tme Guarantees Requrements on RT communcaton protocols: delay (response s) small jtter small throughput hgh error detecton at recever (and sender) small error detecton latency no thrashng under peak

More information

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan

More information

Run-Time Operator State Spilling for Memory Intensive Long-Running Queries

Run-Time Operator State Spilling for Memory Intensive Long-Running Queries Run-Tme Operator State Spllng for Memory Intensve Long-Runnng Queres Bn Lu, Yal Zhu, and lke A. Rundenstener epartment of Computer Scence, Worcester Polytechnc Insttute Worcester, Massachusetts, USA {bnlu,

More information

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe CSCI 104 Sortng Algorthms Mark Redekopp Davd Kempe Algorthm Effcency SORTING 2 Sortng If we have an unordered lst, sequental search becomes our only choce If we wll perform a lot of searches t may be benefcal

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes SPH3UW Unt 7.3 Sphercal Concave Mrrors Page 1 of 1 Notes Physcs Tool box Concave Mrror If the reflectng surface takes place on the nner surface of the sphercal shape so that the centre of the mrror bulges

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

CHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION

CHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION 24 CHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION The present chapter proposes an IPSO approach for multprocessor task schedulng problem wth two classfcatons, namely, statc ndependent tasks and

More information

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between

More information

Problem Set 3 Solutions

Problem Set 3 Solutions Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,

More information

An Efficient Garbage Collection for Flash Memory-Based Virtual Memory Systems

An Efficient Garbage Collection for Flash Memory-Based Virtual Memory Systems S. J and D. Shn: An Effcent Garbage Collecton for Flash Memory-Based Vrtual Memory Systems 2355 An Effcent Garbage Collecton for Flash Memory-Based Vrtual Memory Systems Seunggu J and Dongkun Shn, Member,

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search Sequental search Buldng Java Programs Chapter 13 Searchng and Sortng sequental search: Locates a target value n an array/lst by examnng each element from start to fnsh. How many elements wll t need to

More information

Today s Outline. Sorting: The Big Picture. Why Sort? Selection Sort: Idea. Insertion Sort: Idea. Sorting Chapter 7 in Weiss.

Today s Outline. Sorting: The Big Picture. Why Sort? Selection Sort: Idea. Insertion Sort: Idea. Sorting Chapter 7 in Weiss. Today s Outlne Sortng Chapter 7 n Wess CSE 26 Data Structures Ruth Anderson Announcements Wrtten Homework #6 due Frday 2/26 at the begnnng of lecture Proect Code due Mon March 1 by 11pm Today s Topcs:

More information

Insertion Sort. Divide and Conquer Sorting. Divide and Conquer. Mergesort. Mergesort Example. Auxiliary Array

Insertion Sort. Divide and Conquer Sorting. Divide and Conquer. Mergesort. Mergesort Example. Auxiliary Array Inserton Sort Dvde and Conquer Sortng CSE 6 Data Structures Lecture 18 What f frst k elements of array are already sorted? 4, 7, 1, 5, 1, 16 We can shft the tal of the sorted elements lst down and then

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

Virtual Machine Migration based on Trust Measurement of Computer Node

Virtual Machine Migration based on Trust Measurement of Computer Node Appled Mechancs and Materals Onlne: 2014-04-04 ISSN: 1662-7482, Vols. 536-537, pp 678-682 do:10.4028/www.scentfc.net/amm.536-537.678 2014 Trans Tech Publcatons, Swtzerland Vrtual Machne Mgraton based on

More information

RAP. Speed/RAP/CODA. Real-time Systems. Modeling the sensor networks. Real-time Systems. Modeling the sensor networks. Real-time systems:

RAP. Speed/RAP/CODA. Real-time Systems. Modeling the sensor networks. Real-time Systems. Modeling the sensor networks. Real-time systems: Speed/RAP/CODA Presented by Octav Chpara Real-tme Systems Many wreless sensor network applcatons requre real-tme support Survellance and trackng Border patrol Fre fghtng Real-tme systems: Hard real-tme:

More information

arxiv: v3 [cs.ds] 7 Feb 2017

arxiv: v3 [cs.ds] 7 Feb 2017 : A Two-stage Sketch for Data Streams Tong Yang 1, Lngtong Lu 2, Ybo Yan 1, Muhammad Shahzad 3, Yulong Shen 2 Xaomng L 1, Bn Cu 1, Gaogang Xe 4 1 Pekng Unversty, Chna. 2 Xdan Unversty, Chna. 3 North Carolna

More information

Chapter 6 Programmng the fnte element method Inow turn to the man subject of ths book: The mplementaton of the fnte element algorthm n computer programs. In order to make my dscusson as straghtforward

More information

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions Sortng Revew Introducton to Algorthms Qucksort CSE 680 Prof. Roger Crawfs Inserton Sort T(n) = Θ(n 2 ) In-place Merge Sort T(n) = Θ(n lg(n)) Not n-place Selecton Sort (from homework) T(n) = Θ(n 2 ) In-place

More information

Self-Tuning, Bandwidth-Aware Monitoring for Dynamic Data Streams

Self-Tuning, Bandwidth-Aware Monitoring for Dynamic Data Streams Self-Tunng, Bandwdth-Aware Montorng for Dynamc Data Streams Navendu Jan, Praveen Yalagandula, Mke Dahln, Yn Zhang Mcrosoft Research HP Labs The Unversty of Texas at Austn Abstract We present, a self-tunng,

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

USING GRAPHING SKILLS

USING GRAPHING SKILLS Name: BOLOGY: Date: _ Class: USNG GRAPHNG SKLLS NTRODUCTON: Recorded data can be plotted on a graph. A graph s a pctoral representaton of nformaton recorded n a data table. t s used to show a relatonshp

More information

CACHE MEMORY DESIGN FOR INTERNET PROCESSORS

CACHE MEMORY DESIGN FOR INTERNET PROCESSORS CACHE MEMORY DESIGN FOR INTERNET PROCESSORS WE EVALUATE A SERIES OF THREE PROGRESSIVELY MORE AGGRESSIVE ROUTING-TABLE CACHE DESIGNS AND DEMONSTRATE THAT THE INCORPORATION OF HARDWARE CACHES INTO INTERNET

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

ARTICLE IN PRESS. Signal Processing: Image Communication

ARTICLE IN PRESS. Signal Processing: Image Communication Sgnal Processng: Image Communcaton 23 (2008) 754 768 Contents lsts avalable at ScenceDrect Sgnal Processng: Image Communcaton journal homepage: www.elsever.com/locate/mage Dstrbuted meda rate allocaton

More information

Avoiding congestion through dynamic load control

Avoiding congestion through dynamic load control Avodng congeston through dynamc load control Vasl Hnatyshn, Adarshpal S. Seth Department of Computer and Informaton Scences, Unversty of Delaware, Newark, DE 976 ABSTRACT The current best effort approach

More information

Maintaining temporal validity of real-time data on non-continuously executing resources

Maintaining temporal validity of real-time data on non-continuously executing resources Mantanng temporal valdty of real-tme data on non-contnuously executng resources Tan Ba, Hong Lu and Juan Yang Hunan Insttute of Scence and Technology, College of Computer Scence, 44, Yueyang, Chna Wuhan

More information

y and the total sum of

y and the total sum of Lnear regresson Testng for non-lnearty In analytcal chemstry, lnear regresson s commonly used n the constructon of calbraton functons requred for analytcal technques such as gas chromatography, atomc absorpton

More information

Meta-heuristics for Multidimensional Knapsack Problems

Meta-heuristics for Multidimensional Knapsack Problems 2012 4th Internatonal Conference on Computer Research and Development IPCSIT vol.39 (2012) (2012) IACSIT Press, Sngapore Meta-heurstcs for Multdmensonal Knapsack Problems Zhbao Man + Computer Scence Department,

More information

Video Proxy System for a Large-scale VOD System (DINA)

Video Proxy System for a Large-scale VOD System (DINA) Vdeo Proxy System for a Large-scale VOD System (DINA) KWUN-CHUNG CHAN #, KWOK-WAI CHEUNG *# #Department of Informaton Engneerng *Centre of Innovaton and Technology The Chnese Unversty of Hong Kong SHATIN,

More information

Learning-Based Top-N Selection Query Evaluation over Relational Databases

Learning-Based Top-N Selection Query Evaluation over Relational Databases Learnng-Based Top-N Selecton Query Evaluaton over Relatonal Databases Lang Zhu *, Wey Meng ** * School of Mathematcs and Computer Scence, Hebe Unversty, Baodng, Hebe 071002, Chna, zhu@mal.hbu.edu.cn **

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics Introducton G10 NAG Fortran Lbrary Chapter Introducton G10 Smoothng n Statstcs Contents 1 Scope of the Chapter... 2 2 Background to the Problems... 2 2.1 Smoothng Methods... 2 2.2 Smoothng Splnes and Regresson

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

Related-Mode Attacks on CTR Encryption Mode

Related-Mode Attacks on CTR Encryption Mode Internatonal Journal of Network Securty, Vol.4, No.3, PP.282 287, May 2007 282 Related-Mode Attacks on CTR Encrypton Mode Dayn Wang, Dongda Ln, and Wenlng Wu (Correspondng author: Dayn Wang) Key Laboratory

More information

Active Contours/Snakes

Active Contours/Snakes Actve Contours/Snakes Erkut Erdem Acknowledgement: The sldes are adapted from the sldes prepared by K. Grauman of Unversty of Texas at Austn Fttng: Edges vs. boundares Edges useful sgnal to ndcate occludng

More information

Intelligent Information Acquisition for Improved Clustering

Intelligent Information Acquisition for Improved Clustering Intellgent Informaton Acquston for Improved Clusterng Duy Vu Unversty of Texas at Austn duyvu@cs.utexas.edu Mkhal Blenko Mcrosoft Research mblenko@mcrosoft.com Prem Melvlle IBM T.J. Watson Research Center

More information

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters Proper Choce of Data Used for the Estmaton of Datum Transformaton Parameters Hakan S. KUTOGLU, Turkey Key words: Coordnate systems; transformaton; estmaton, relablty. SUMMARY Advances n technologes and

More information

A Robust Method for Estimating the Fundamental Matrix

A Robust Method for Estimating the Fundamental Matrix Proc. VIIth Dgtal Image Computng: Technques and Applcatons, Sun C., Talbot H., Ourseln S. and Adraansen T. (Eds.), 0- Dec. 003, Sydney A Robust Method for Estmatng the Fundamental Matrx C.L. Feng and Y.S.

More information

Adaptive Load Shedding for Windowed Stream Joins

Adaptive Load Shedding for Windowed Stream Joins Adaptve Load Sheddng for Wndowed Stream Jons Bu gra Gedk College of Computng, GaTech bgedk@cc.gatech.edu Kun-Lung Wu, Phlp Yu T.J. Watson Research, IBM {klwu,psyu}@us.bm.com Lng Lu College of Computng,

More information

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Proceedngs of the Wnter Smulaton Conference M E Kuhl, N M Steger, F B Armstrong, and J A Jones, eds A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Mark W Brantley Chun-Hung

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

Space-Optimal, Wait-Free Real-Time Synchronization

Space-Optimal, Wait-Free Real-Time Synchronization 1 Space-Optmal, Wat-Free Real-Tme Synchronzaton Hyeonjoong Cho, Bnoy Ravndran ECE Dept., Vrgna Tech Blacksburg, VA 24061, USA {hjcho,bnoy}@vt.edu E. Douglas Jensen The MITRE Corporaton Bedford, MA 01730,

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

Brave New World Pseudocode Reference

Brave New World Pseudocode Reference Brave New World Pseudocode Reference Pseudocode s a way to descrbe how to accomplsh tasks usng basc steps lke those a computer mght perform. In ths week s lab, you'll see how a form of pseudocode can be

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

Partial Restreaming Approach for Massive Graph Partitioning.

Partial Restreaming Approach for Massive Graph Partitioning. Partal Restreamng Approach for Massve Graph Parttonng. Ghzlane Echbarth, Hamamache Kheddouc To cte ths verson: Ghzlane Echbarth, Hamamache Kheddouc. Partal Restreamng Approach for Massve Graph Parttonng..

More information

Time- and Space-Efficient Sliding Window Top-k Query Processing

Time- and Space-Efficient Sliding Window Top-k Query Processing Tme- and Space-Effcent Sldng Wndow Top-k Query Processng KREŠIMIR PRIPUŽIĆ and IVANA PODNAR ŽARKO, Unversty of Zagreb KARL ABERER, École Polytechnque FédéraledeLausanne 1 A sldng wndow top-k (top-k/w)

More information

TPL-Aware Displacement-driven Detailed Placement Refinement with Coloring Constraints

TPL-Aware Displacement-driven Detailed Placement Refinement with Coloring Constraints TPL-ware Dsplacement-drven Detaled Placement Refnement wth Colorng Constrants Tao Ln Iowa State Unversty tln@astate.edu Chrs Chu Iowa State Unversty cnchu@astate.edu BSTRCT To mnmze the effect of process

More information

ELEC 377 Operating Systems. Week 6 Class 3

ELEC 377 Operating Systems. Week 6 Class 3 ELEC 377 Operatng Systems Week 6 Class 3 Last Class Memory Management Memory Pagng Pagng Structure ELEC 377 Operatng Systems Today Pagng Szes Vrtual Memory Concept Demand Pagng ELEC 377 Operatng Systems

More information

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 An Iteratve Soluton Approach to Process Plant Layout usng Mxed

More information

Optimal Workload-based Weighted Wavelet Synopses

Optimal Workload-based Weighted Wavelet Synopses Optmal Workload-based Weghted Wavelet Synopses Yoss Matas School of Computer Scence Tel Avv Unversty Tel Avv 69978, Israel matas@tau.ac.l Danel Urel School of Computer Scence Tel Avv Unversty Tel Avv 69978,

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

Routing in Degree-constrained FSO Mesh Networks

Routing in Degree-constrained FSO Mesh Networks Internatonal Journal of Hybrd Informaton Technology Vol., No., Aprl, 009 Routng n Degree-constraned FSO Mesh Networks Zpng Hu, Pramode Verma, and James Sluss Jr. School of Electrcal & Computer Engneerng

More information

Adaptive Load Shedding for Windowed Stream Joins

Adaptive Load Shedding for Windowed Stream Joins Adaptve Load Sheddng for Wndowed Stream Jons Buğra Gedk, Kun-Lung Wu, Phlp S. Yu, Lng Lu College of Computng, Georga Tech Atlanta GA 333 {bgedk,lnglu}@cc.gatech.edu IBM T. J. Watson Research Center Yorktown

More information

Estimating Costs of Path Expression Evaluation in Distributed Object Databases

Estimating Costs of Path Expression Evaluation in Distributed Object Databases Estmatng Costs of Path Expresson Evaluaton n Dstrbuted Obect Databases Gabrela Ruberg, Fernanda Baão, and Marta Mattoso Department of Computer Scence COPPE/UFRJ P.O.Box 685, Ro de Janero, RJ, 2945-970

More information

CMPS 10 Introduction to Computer Science Lecture Notes

CMPS 10 Introduction to Computer Science Lecture Notes CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not

More information

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach Angle Estmaton and Correcton of Hand Wrtten, Textual and Large areas of Non-Textual Document Images: A Novel Approach D.R.Ramesh Babu Pyush M Kumat Mahesh D Dhannawat PES Insttute of Technology Research

More information

Data Representation in Digital Design, a Single Conversion Equation and a Formal Languages Approach

Data Representation in Digital Design, a Single Conversion Equation and a Formal Languages Approach Data Representaton n Dgtal Desgn, a Sngle Converson Equaton and a Formal Languages Approach Hassan Farhat Unversty of Nebraska at Omaha Abstract- In the study of data representaton n dgtal desgn and computer

More information

On the Fairness-Efficiency Tradeoff for Packet Processing with Multiple Resources

On the Fairness-Efficiency Tradeoff for Packet Processing with Multiple Resources On the Farness-Effcency Tradeoff for Packet Processng wth Multple Resources We Wang, Chen Feng, Baochun L, and Ben Lang Department of Electrcal and Computer Engneerng, Unversty of Toronto {wewang, cfeng,

More information

Oracle Database: SQL and PL/SQL Fundamentals Certification Course

Oracle Database: SQL and PL/SQL Fundamentals Certification Course Oracle Database: SQL and PL/SQL Fundamentals Certfcaton Course 1 Duraton: 5 Days (30 hours) What you wll learn: Ths Oracle Database: SQL and PL/SQL Fundamentals tranng delvers the fundamentals of SQL and

More information

Optimization of decentralized multi-way join queries over pipelined filtering services

Optimization of decentralized multi-way join queries over pipelined filtering services Computng (2012) 94:939 972 DOI 10.1007/s00607-012-0209-9 Optmzaton of decentralzed mult-way jon queres over ppelned flterng servces Efthyma Tsamoura Anastasos Gounars Yanns Manolopoulos Receved: 23 March

More information