arxiv: v3 [cs.ds] 7 Feb 2017

Size: px
Start display at page:

Download "arxiv: v3 [cs.ds] 7 Feb 2017"

Transcription

1 : A Two-stage Sketch for Data Streams Tong Yang 1, Lngtong Lu 2, Ybo Yan 1, Muhammad Shahzad 3, Yulong Shen 2 Xaomng L 1, Bn Cu 1, Gaogang Xe 4 1 Pekng Unversty, Chna. 2 Xdan Unversty, Chna. 3 North Carolna State Unversty, USA. 4 ICT, CAS, Chna arxv: v3 [cs.ds] 7 Feb 2017 Abstract A sketch s a probablstc data structure used to record frequences of tems n a mult-set. Sketches are wdely used n varous felds, especally those that nvolve processng and storng data streams. In streamng applcatons wth hgh data rates, a sketch flls up very quckly. Thus, ts contents are perodcally transferred to the remote collector, whch s responsble for answerng queres. In ths paper, we propose a new sketch, called Slm-Fat (SF) sketch, whch has a sgnfcantly hgher accuracy compared to pror art, a much smaller memory footprnt, and at the same tme acheves the same speed as the best pror sketch. The key dea behnd our proposed s to mantan two separate sketches: a small sketch called Slm-subsketch and a large sketch called Fat-subsketch. The Slm-subsketch s perodcally transferred to the remote collector for answerng queres quckly and accurately. The Fat-subsketch, however, s not transferred to the remote collector because t s used only to assst the Slm-subsketch durng the nsertons and deletons and s not used to answer queres. We mplemented and extensvely evaluated along wth several pror sketches and compared them sde by sde. Our expermental results show that outperforms the most wdely used by up to 33.1 tmes n terms of accuracy. We have released the source codes of our proposed sketch as well as exstng sketches at Gthub [2]. The short verson of ths paper wll appear n ICDE 2017 [32]. 1 Introducton 1.1 Background and Motvaton A sketch s a probablstc data structure that s used to record frequences of dstnct tems n a mult-set. Due to ther small memory footprnts, hgh accuracy, and fast speeds of queres, nsertons, and deletons, sketches are beng extensvely used n data stream processng [3], [6], [9], [10], [23], [26], [29], [31], [33],. Sketches are also beng appled n many other felds, such as NLP [15, 17, 18], sparse approxmaton n compressed sensng [14], and more [12, 16, 20]. Data streams are generated and used n many scenaros, such as network traffc, graph streams, and multmeda streams. For example, a typcal applcaton s to measure the number of packets for each flow n the traffc of a network.in most of these applcatons, the rate at whch the data arrves n the stream s very hgh and a sketch beng used to record nformaton n a stream flls up very quckly. Consequently, each montorng node, such as a router or swtch, that populates the sketch has to perodcally transfer the contents of the flled up sketch to some remote collector [19, 21, 34], whch stores the contents of these sketches and answers any queres. As exstng sketches need only a few (e.g., 4) memory accesses for each nserton, the nserton speed of sketches s often fast enough to keep up wth the fast rates of data streams. However, the bottleneck les at the bandwdth and hgh speed memory of remote controller, because t receves flled up sketches from a large number of montorng nodes. Consequently, t s crtcal that the sze of the flled up sketch that each montorng node sends to the collector be very small but stll contan enough nformaton that allows the remote collector to answer queres accurately. The accuracy of a sketch n answerng queres quantfes how close the value of the frequency estmated from the sketch s to the actual value of the frequency. Ths paper focuses on the desgn of a new scheme that not only sgnfcantly reduces the sze of the sketch that t sends to the collector compared wth the exstng sketches, but s also more accurate whle achevng the same query speed as the best pror sketches. 1.2 Lmtatons of Pror Art Here we only ntroduce four typcal sketches:,, CU-sketch, and CML-sketch. Charkar et al. proposed the Count sketch () [5]. experences two types of errors: over-estmaton error, 1

2 where the result of a query s a value larger than the true value, and under-estmaton error, where the result of a query s a value smaller than the true value. Improvng on the, Cormode and Muthukrshnan proposed the Count-mn (CM) sketch [11], whch does not suffer from the under-estmaton error, but only from the overestmaton error. And the authors clamed that such onesde error has many benefts [11]. In a further enhancement, Cormode et al. proposed the conservatve update (CU) sketch [13], whch mproves the accuracy at the cost of not supportng tem deletons. CML-sketch [25] further mproves the accuracy at the cost of sufferng from both over-estmaton and under-estmaton errors. Because supports deletons and does not have under-estmaton errors, t s stll the most popular sketch n practce. As mentoned above, gven the desred accuracy and transmsson perod, smaller sketch leads to less requrement for bandwdth and fast expensve fast memory and faster query speed n the collector. 1.3 Proposed Approach In ths paper, we present a novel two-stage sketch, called Slm-Fat (SF) sketch. The key dea behnd our proposed s to mantan two separate sketches: a small sketch called Slm-subsketch and a large sketch called Fat-subsketch. Both subsketches have smlar accuracy, and thus, we only need to send the slm-subsketch to the remote collector. Compared to the state-of-the-art, the slm-subsketch acheves sgnfcantly hgher accuracy and sgnfcantly smaller memory footprnt whle supportng deletons and achevng the same query speed. Before descrbng our approach, we frst brefly descrbe how the conventonal works because several desgn choces n are bult on t. As shown n Fgure 1, a conssts of d arrays, where we represent the th array wth A. Each array conssts of w buckets and each bucket contans one counter. We represent the counter n the j th bucket of the th array wth A [ j]. Each array A (1 d), s assocated wth an ndependent hash functon h (.), whose output s unformly dstrbuted n the range [1, w]. In, the ntalzaton operaton s smply to set all the d w counters to zero. To nsert an tem e,.e., to ncrement ts frequency stored n the sketch by 1, the computes the d hash functons h 1 (e),h 2 (e),...,h d (e) and ncrements the d counters A 1 [h 1 (e)],a 2 [h 2 (e)],...,a d [h d (e)] by 1. For convenence, we call them the d hashed counters. To delete an tem e, the computes d hash functons and decrements the d hashed counters by 1. When queryng the frequency of an tem e, the computes hash functons and returns the value of the smallest one among the d hashed counters. Note that the value returned by the n response to the query for the frequency of an tem wll never be smaller than the true value of ts frequency. Consequently, suffers only from the under-estmaton error, but not from the over-estmaton error [11]. e 1 2 w A 1 A d e : an tem h : a hash functon : a counter : a bucket Fgure 1: The Count-mn sketch archtecture. In, the Slm-subsketch, as the name suggests, has sgnfcantly fewer counters compared to the Fat-subsketch. The motvaton behnd keepng the small Slm-subsketch s to use smaller memory whle keepng hgh accuracy. The motvaton behnd keepng the relatvely large Fat-subsketch s to assst the Slm-subsketch durng updates so as to make the accuracy of the Slmsubsketch as hgh as possble. Fat-subsketch uses many more counters compared to Slm-subsketch, but only needs tens of Mega bytes, whch s very small compared to the large but cheap memory of current computers. In desgnng our, we start wth a bare bones verson of the sketch and make mprovements step by step to arrve at ts fnal desgn. In ths process, we present fve versons of, namely SF 1 -sketch through SF 4 - sketch, and SF F -sketch, where each verson mproves upon the prevous versons by addressng some of the lmtatons n those versons. The verson SF F -sketch s the fnal desgn of our. In SF 1 -sketch, the Slm-subsketch conssts of d w counters and the Fat-subsketch conssts of d w counters, where w > w. The Fat-subsketch n SF 1 -sketch s the same as a standard. The key dea behnd the desgn of SF 1 -sketch s that, when nsertng an tem e, f the value of a counter to whch the hash functon h (e) ponts s already greater than the real frequency of the tem e, then ncrementng that counter wll only degrade the accuracy. Specfcally, when nsertng an tem e, we frst nsert t nto the Fat-subsketch usng the nserton operaton of the standard and then query ts current frequency from ths Fat-subsketch usng the query operaton of the standard. Suppose the Fat-subsketch estmates e s current frequency to be c. Next, we compute d hash functons correspondng to the d arrays of the Slm-subsketch, and retreve d hashed counters. If all d counters are less than or equal to the estmate c, we ncrement only the smallest counter(s) by 1 n the Slm-subsketch; otherwse, we do nothng. The motvaton behnd ncrementng no counter or only the smallest counter(s) n the Slm-subsketch s twofold. Frst, by reducng the number of counters that we ncrement, the over-estmaton error reduces. Second, as fewer counters are ncremented, ther szes can be reduced, 2

3 whch reduces the footprnt of the Slm-subsketch. Unfortunately, the SF 1 -sketch does not support deletons. To well support deleton, we propose several optmzaton technques, and fnally arrve at the fnal verson SF F sketch. 1.4 Techncal Challenges The frst techncal challenge s to acheve a sgnfcantly hgher accuracy compared to the, whch s currently the most wdely used sketch. To address ths challenge, we leverage our novel nsght that f we reduce the number of counters that are ncremented for each nserton, the accuracy wll be mproved because the extent of over-estmaton error wll decrease. When nsertng a new tem, our proposed sketch does not always ncrement d counters n the Slm-subsketch, rather ncrements no counter or only the smallest counter(s) to avod under-estmaton error. Note that the query s only processed based on the nformaton stored n the Slmsubsketch, whch s why we focus on mnmzng the number of counter ncrements per nserton only n the Slm-subsketch. To determne exactly whch counters to ncrement n the Slm-subsketch, our makes use of the Fat-subsketch, whch enables t to estmate the number of tmes the tem has already been nserted. SFsketch then ether ncrements only the smallest counters n the Slm-subsketch f the value of the smallest counters s less than ths estmated value, or ncrements no counter at all. The second techncal challenge s to enable Slmsubsketch to support deletons. It s very dffcult to acheve accurate deletons from the Slm-subsketch because to support deletons, one needs to keep track of exactly whch counters were ncremented when each tem was nserted. Ths nformaton s requred to dentfy the approprate counters to decrement when deletng an tem and to dentfy the nfluence of those decrements on other tems. Trackng such nformaton s very expensve, both n terms of memory overhead and computatonal cost. To address ths challenge, nstead of achevng accurate deletons,.e., decrementng all those counters that were ncremented at the tme of nsertng the gven tem, we acheve approxmate deletons,.e., decrementng as many counters n the Slm-subsketch as possble wthout causng any under-estmaton errors. 1.5 Key Contrbutons 1. We propose a new sketch, namely the, whch has hgher accuracy compared to the pror art whle supportng deletons and keepng the query speed unchanged. 2. We mplemented,, CU-sketch, CML-sketch and on GPU and mult-core CPU platforms. We carred out extensve experments on these two platforms to evaluate and compare the performance of all these sketches. Expermental results show that outperforms by up to 33.1 tmes n terms of average relatve error. 2 Related Work The structure of the Count sketch () [5] proposed by Charkar et al. s exactly the same as the CMsketch [11] descrbed earler except that each array A s assocated wth two hash functons h (.) and δ (.). Each hash functon δ (.) evaluates to -1 or +1 wth equal probablty. To nsert an tem e, for all values of [1,w], calculates hash functons h (e) and δ (e) and adds δ (e) to the counters A [h (e)]. When queryng the frequency of tem e, reports the medan of {A 1 [h 1 (e)] δ 1 (e),a 2 [h 2 (e)] δ 2 (e)...a d [h d (e)] δ d (e)}. Unfortunately, suffers from both overestmaton and under-estmaton errors. Therefore, several mprovements, whch do not suffer from the underestmaton errors, have been proposed such as the [11], CU-sketch [13] 1, and Count-Mn-Log (CML) sketch [25]. These three sketches all have d arrays of w counters each. To nsert an tem e, CU-sketch [13] ncrements only the smallest counter(s) among the d hashed counters. Although CU-sketch mproves the query accuracy sgnfcantly, ts fundamental lmtaton s that t does not support deletons, and consequently t has not receved as wde acceptance n practce as the. CML-sketch s another varant of the CMsketch that uses logarthm-based approxmate counters nstead of lnear counters [25]. Instead of ncrementng one counter per array per nserton, t decdes whether or not to ncrement the counters each tme wth logarthmc probabltes. Ths helps n reducng the number of bts for each counter, whch n turn allows the sketch to have more counters n the same amount of memory and thus acheve better accuracy. Unfortunately, CML-sketch suffers from both over-estmaton and under-estmaton errors, and ts fnal verson does NOT support deletons. Thorough statstcal analyss of varous sketches s provded n [27, 28]. A recent work presented Augment sketch (A-sketch), whch s a unversal framework that can be appled to many exstng sketches, especally to those wth low accuracy [26]. A-sketch uses a flter to catch heavy htters (hgh-frequency tems) earler, and uses classcal sketches (such as and ) to store and 1 Estan and Varghese proposed the CU-sketch whch can be combned wth other sketches. For convenence, CU-sketch means CM-CUsketch when t s combned wth CM sketch. 3

4 e : a counter : a bucket Ad query the rest tems. In ths way, the accuracy could be mproved. However, always keepng the most frequent tems n the frst flter wthout ncurrng addtonal errors s a challengng ssue. Complex desgn and frequent communcatons between the two flters are unavodable, makng the mplementaton complcated. Indeed, A-sketch can be appled to our as well. However, accordng to our test results, as our s already very accurate, combnng A-sketch wth SFsketch brngs lttle ncrease n accuracy but does brng more complexty. Another class of data structures that can be used to store frequences of tems are the enhanced Bloom flters, such as Spectral Bloom Flters (SBF) [7], Dynamc Count Flters (DCF) [4], and more [31], whch ndeed can estmate frequences of tems. SBF replaces each bt n the conventonal Bloom flter wth a counter [7]. To nsert an tem, the basc verson of SBF smply ncrements all the counters that the tem maps to. When queryng the frequency of an tem, SBF returns the value of the smallest counter(s) among all the counters to whch the hash functons map the tem to as the estmate of the frequency of that tem n the multset. DCF extends the concept of SBF whle mprovng the memory effcency of SBF by usng two separate flters [4]. The frst flter s comprsed of fxed sze counters whle the sze of counters n the second flter s dynamcally adjusted. The use of two flters, unfortunately, ncreases the complexty of DCF, whch degrades ts query and update speeds. 3 The Slm-Fat Sketch In ths secton, we present the detals of our. To better explan the ntuton at work behnd the SFsketch and to justfy the desgn choces we made n developng the, we wll start wth a basc verson and mprove t ncrementally to arrve at the fnal desgn. For each ntermedate verson of the that we develop whle workng our way towards the fnal desgn, we wll frst descrbe ts nserton, query, and deleton operatons. After that we wll dscuss ts lmtatons, whch wll gude us n makng our desgn choces for the next verson. In ths process, we wll present fve dfferent versons of, whch we name SF 1 -sketch through SF 4 -sketch, and fnally SF F -sketch, whch s our fnal desgn. Each verson s developed by studyng the lmtatons of ts predecessor verson and addressng them. Ratonale: In our slm-fat archtecture (shown n Fgure 2), there s a set of arrays wth fewer counters per array called a Slm-subsketch, and a set of arrays wth comparatvely more counters per array called a Fat-subsketch. When nsertng or deletng an tem, we frst update the Fat-subsketch, and then update the Slm-subsketch based on the observatons we make from the Fat-subsketch. e A 1 Ad Slm-subketch Fat-subsketch 1 2 w 1 2 w w Sent to collector for fast query B 1 Bd Stored only n montors Fgure 2: The Slm-Fat sketch archtecture. The key nsght atslm-subsketch work behnd our proposed scheme s 1 2 w that, when nsertng an tem, f the smallest one of the d A1 B1 hashed counters s already bgger than ts current true e frequency, then ncrementng any counter only degrades Ad Bd the accuracy. As the true accuracy s not easy to obtan usng small memory, we use Fat-subsketch to gve a Sent to collector for fast query Stored only n montors relatvely accurate estmate of the current true frequency. Next, we start wth the frst verson of our slm-fat sketch,.e., the SF 1 -sketch, and dscuss ts operatons and lmtatons, whch wll pave the way towards the desgn of SF 2 -sketch and ts subsequent versons. Table 1 summarzes the symbols and abbrevatons used n ths paper. Fat-subsketch 1 2 w Table 1: Symbols & abbrevatons used n the paper Symbol Descrpton e Any tem that can be handled by SF -sketch d # of arrays n Slm-subsketch and Fat-subsketch # of counters n each bucket of Fat-subsketch of z SF 4 - and SF F -sketch w/w # of counters or buckets n each array of Slm- / Fat-subsketch A the th array n the Slm-subsketch of SF -sketch B the th array n the Fat-subsketch of SF -sketch the C th array n the Deleton-subsketch used n SF 2 -sketch the h (e) th hash functon used n Slm- and Deleton-subsketch g (e) the th hash functon used n Fat-subsketch B mn the mnmum value among all counters n e {B 1 d} % mod operaton 3.1 SF 1 : Optmzng Accuracy Usng One fat-subsketch As shown n Fgure 2, SF 1 -sketch conssts of d arrays n both the Slm-subsketch and the Fat-subsketch. The Fat-subsketch s exactly a standard wth many more counters than the Slm-subsketch. We represent the th array n the Slm-subsketch wth A and n the Fatsubsketch wth B. Each array n the Slm-subsketch conssts of w buckets whle each array n the Fat-subsketch conssts of w buckets, where w > w. Furthermore, each bucket n both Slm and Fat-subsketches contans one counter. We represent the counter n the j th bucket of the 4

5 th array n the Slm-subsketch wth A [ j], where 1 d and 1 j w. Smlarly, we represent the counter n the k th bucket of the th array n the Fat-subsketch wth B [k], where 1 d and 1 k w. Each array A s assocated wth a unformly dstrbuted ndependent hash functon h (.), where the output of h (.) les n the range [1,w]. Smlarly, each array B s assocated wth a unformly dstrbuted ndependent hash functon g (.), where the output of g (.) les n the range [1,w ]. The structure of the SF 1 -sketch s shown n Fgure 2. The ntalzaton operaton for the SF 1 -sketch s smply settng all counters A [ j] and B [k] to zero, where 1 d, 1 j w, and 1 k w. Inserton: When nsertng an tem, the SF 1 -sketch frst nserts t nto the Fat-subsketch, and based on the observatons made from the Fat-subsketch, ncrements approprate counters n the Slm-subsketch. The nserton operaton n the Fat-subsketch s exactly the same as the conventonal. To nsert an tem e nto the Fat-subsketch, we frst compute the d hash functons g 1 (e),g 2 (e),...,g d (e) and ncrement the d hashed counters B 1 [g 1 (e)],b 2 [g 2 (e)],...,b d [g d (e)] by 1. After nsertng the tem, we estmate ts current frequency of e by fndng the mnmum value among the d hashed counters we just ncremented and represent t wth B mn e. To nsert the tem e nto the Slmsubsketch, we compute the d hash functons and dentfy the smallest counter(s) among the d hashed counters A 1 [h 1 (e)],a 2 [h 2 (e)],...,a d [h d (e)]. If the value of the smallest counter(s) are not smaller than B mn e, nserton operaton ends. Otherwse, we ncrement the smallest counter(s) by 1. Note that CU-sketch always ncrements the smallest counter(s). Thus SF 1 -sketch s much more accurate than CU-sketch. In other words, l [1,d], SF 1 -sketch ncrement all counters A l [h l (e)] by one that satsfy the followng two condtons: A l [h l (e)] = mn d =1 A [h (e)], and A l [h l (e)] < B mn e. Query: When queryng the frequency of tem e, the SF 1 -sketch computes the d hash functons h 1 (e),h 2 (e),...,h d (e), and returns the value of the smallest counter among A 1 [h 1 (e)],a 2 [h 2 (e)],...,a d [h d (e)] as the result of the query. Note that the query s answered only from the Slm-subsketch. Deleton: SF 1 -sketch does not support deletons. Advantages and Lmtatons: The key advantage of the SF 1 -sketch s that to answer a query t does not access the Fat-subsketch, but only accesses the Slmsubsketch, whch keeps the query speed of ths sketch as fast as the conventonal. Furthermore, note that durng the nserton operaton, we ether ncrement no counters or ncrement only the smallest counter(s) n the Slm-subsketch. The smallest counter n the Fatsubsketch gves the upper bound on the number of tmes that a gven tem has already been nserted. Ths strategy reduces the number of ncrements n the Slm-subsketch, whch has two advantages. Frst, t reduces the memory footprnt of the Slm-subsketch on the expensve and lmted memory. Second, due to fewer ncrements, the over-estmaton error s reduced. Unfortunately, the bggest lmtaton of the SF 1 -sketch s that t does not support deletons from the Slm-subsketch. Whle the Fat-subsketch asssts the Slm-subsketch durng nserton operaton, t cannot assst n the deleton operaton because the numbers of counters per array n the Fat- and Slm-subsketches are not the same. Ths nablty to support deletons from the Slm-subsketch lmts the practcal usablty of the SF 1 -sketch. In the next verson of our,.e., the SF 2 -sketch, we address ths lmtaton whle keepng the advantages of the SF 1 -sketch. 3.2 SF 2 : Supportng Deleton Deletonsubsketch 1 2 A A nsert e 1 A nsert e 2 A A w A1 e : an tem track of exactly whch counters were h : a ncremented hash functon when Slm-subketch Fat-subsketch counters per array, where all counters are ntalzed to w 1 2 w w A1 B1 Let us frst nsert two tems e 1 and e 2 and then delete the e tem e 1. Furthermore, let e 1 maps to A 1 [1] and A 2 [1] and Ad Bd e 2 maps to A 1 [1] and A Sent to collector for fast query 2 [2]. In nsertng e Stored only n montors 1, we ncrement A1 A 2 Fgure 3: An example of the deleton problem. Dffcultes for deletons: It s challengng to acheve accurate deletons n SF 1 -sketch because to delete tems from the Slm-subsketch of SF 1 -sketch, one has to keep e : a counter nsertng each tem. Such trackng : s a bucket dffcult and re- Ad qures large memory and processng overhead. We explan ths wth help of an example. As shown n Fgure 3, consder a Slm-subsketch that has two arrays and two A 1 [1] and A 2 [1] both to 1. After that, n nsertng e 2, as the current value of A 1 [1] s 1 and A 2 [2] s 0, we only ncrement the smaller of the two,.e., A 2 [2] to 1. At ths Slm-subsketch 1 2 w Fat-subsketch 1 2 w pont, A 1 [1] = 1, A 1 [2] = 0, A 2 [1] = 1, and A 2 [2] = 1. In deletng e 1, as e e 1 maps to both A 1 [1] and A 2 [1] and as both were ncremented Ad at the tme Bd of nsertng e 1, f we Sent to collector for fast query Stored only n montors decrement them both, the query result of e 2 wll be 0,.e., an under-estmaton error occurs, whch we do not want n our. Deleton-subsketch: To support deletons, n addton to one Slm-subsketch and one Fat-subsketch just lke n the SF 1 -sketch, the SF 2 -sketch mantans another sketch, called the Deleton-subsketch. The Deleton-subsketch s essentally a standard. Unlke Fat-subsketch, all the parameters (d, w, h (.)) of the Deleton-subsketch and the Slm-subsketch are exactly the same. For the Deleton-subsketch, we represent the counter n the j th bucket of the th array wth C [ j], where 1 d and B1 5

6 1 j w. Note that the Fat-subsketch helps n decdng whch counters to ncrement n the Slm-subsketch whle nsertng an tem, whereas the Deleton-subsketch helps n decdng whch counters to decrement n the Slmsubsketch when deletng an tem. The ntalzaton operaton for the SF 2 -sketch conssts of smply settng all counters A [ j], B [k], and C [ j], to 0 (1 d, 1 j w, and 1 k w.) Inserton: The nserton operaton of the SF 2 -sketch for the Slm- and Fat-subsketches s exactly the same as that of the SF 1 -sketch, except that for the SF 2 - sketch, we also add nformaton about the ncomng tem to the Deleton-subsketch. Specfcally, to nsert an tem e nto the Deleton-subsketch, we compute d hash functons and ncrement the d hashed counters C 1 [h 1 (e)],c 2 [h 2 (e)],...,c d [h d (e)] by 1. Query: The query operaton of the SF 2 -sketch s exactly the same as the SF 1 -sketch. Deleton: To delete an tem e from the SF 2 -sketch, we frst delete t from the Fat-subsketch by decrementng the d counters B 1 [g 1 (e)],b 2 [g 2 (e)],...,b d [g d (e)] by 1 and then delete t from the Deleton-subsketch by decrementng the d counters C 1 [h 1 (e)],c 2 [h 2 (e)],...,c d [h d (e)] by 1. Fnally, we delete t from the Slm-subsketch. We leverage the fact that before deletng the tem from the Deleton-subsketch, each counter n the Slm-subsketch s always less than or equal to the correspondng counter n the Deleton-subsketch, because when nsertng an tem, even f a counter n the Slm-subsketch to whch the ncomng tem maps to s not ncremented, the correspondng counter n the Deleton-subsketch s always ncremented. To delete the tem e from the Slm-subsketch, for each [1,d], we compare A [h (e)] wth C [h (e)] and decrement A [h (e)] by 1 only when A [h (e)] > C [h (e)]. Advantages and Lmtatons: The SF 2 -sketch s advantageous over the SF 1 -sketch because t supports deletons. However, t s not effcent n terms of memory usage and update speed because t has to mantan an addtonal sketch, the Deleton-subsketch, to support deletons from the Slm-subsketch. In the next verson of the,.e., the SF 3 -sketch, we address ths lmtaton whle keepng the advantages of both SF 1 - and SF 2 - sketches. 3.3 SF 3 : Combnng Fat-subsketch wth Deleton-subsketch In SF 3 -sketch, we get rd of the separate Deletonsubsketch, and modfy the Fat-subsketch so that, n addton to nsertons, t can assst deletons n the Slmsubsketch. The Fat-subsketch n the SF 3 -sketch s smlar to the Fat-subsketch n the SF 1 - and SF 2 -sketches. However, n the Fat-subsketch of SF 3 -sketch, the number of buckets n each array s gven by w = z w, where z s a postve nteger. In other words, the Fat-subsketch consumes z tmes as much memory as the Slm-subsketch. The structure of the Slm-subsketch n the SF 3 -sketch s exactly the same as the Slm-subsketches n the SF 1 - and SF 2 -sketches. However, the hash functons h (.), where 1 d, assocated wth the Slm-subsketch are now derved from the hash functons g (.), where the output of g (.) les n the range [1,z w]. More specfcally, ( ) h (.) = g (.) 1 %w + 1 (1) Consequently, the value of the hash functon h (.) always les n the range [1,w], where w s the number of buckets per array n the Slm-subsketch. Note also that calculatng the hash functon h (.) from the hash functon g (.) usng the equaton above essentally assocates each counter A [ j] n the Slm-subsketch wth z counters B [ j],b [ j +w],b [ j +2w],...,B [ j +(z 1)w] n the Fatsubsketch. Every tme a counter n the Slm-subsketch s ncremented, t s certan that one of ts assocated z counters n the Fat-subsketch s also ncremented. Ths further means that the value of a counter n the Slmsubsketch wll always be less than or equal to the sum of values of all ts assocated counters n the Fat-subsketch. Inserton: When nsertng an tem e, the SF 3 -sketch frst nserts t nto the Fat-subsketch. For ths we compute the d hash functons g 1 (e),g 2 (e),...,g d (e) and ncrement the d counters B 1 [g 1 (e)],b 2 [g 2 (e)],...,b d [g d (e)] by 1. Next, we fnd the mnmum value among all these d counters and represent t wth B mn e. To nsert the tem e nto the Slm-subsketch, we frst compute the d hash functons h 1 (e),h 2 (e),...,h d (e) usng Equaton 1 and then ncrement all counters A l [h l (e)] that satsfy the condtons A l [h l (e)] < B mn e and A l [h l (e)] = mn d =1 A [h (e)], where l [1,d]. Note that f mn d =1 A [h (e)] B mn e, we do nothng. Query: The query operaton of SF 3 -sketches s exactly the same as that of SF 1 - and SF 2 -sketches. Deleton: To delete an tem from the SF 3 -sketch, we frst delete t from the Fat-subsketch and then from the Slm-subsketch. To delete the tem e from the Fat-subsketch, we frst calculate the d hash functons g 1 (e),g 2 (e),...,g d (e) and then decrement the d counters B 1 [g 1 (e)],b 2 [g 2 (e)],...,b d [g d (e)] by 1. To delete the tem e from the Slm-subsketch, we leverage the fact stated earler that before deletng the tem from the Fatsubsketch, the value of a counter n the Slm-subsketch s always less than or equal to the sum of values of all ts assocated counters n the Fat-subsketch, because when nsertng an tem, even f a counter n the Slm-subsketch s not ncremented, one of the assocated counters n the Fat-subsketch s always ncremented. To delete the tem e from the Slm-subsketch, after deletng t from the Fatsubsketch, for each [1,d], we compare A [h (e)] wth z 1 m=0 B [h (e) + (m w)] and decrement A [h (e)] by 1 6

7 e e 1 2 w f A [h (e)] > z 1 m=0 B [h (e) + (m w)]. Note that each value of h (e) s calculated usng Equaton (1). Advantages and Lmtatons: The advantage of SF 3 - sketch over the SF 2 -sketch s that t does not have to mantanslm-subketch a separate Deleton-subsketch. Fat-subsketchUnfortunately, 1 2 w 1 2 w w t s not effcent n terms of deleton speed because to A delete 1 B an tem, t needs 1 d z memory accesses to add the counters n each array of the Fat-subsketch. In the next verson A d of our, B d.e., the SF 4 -sketch, we address ths Sent to lmtaton collector for fast whle query keepng the Stored advantages only n montors of all three prevous versons of the. 3.4 SF 4 : Improvng Deleton Speed e A 1 A d Slm-subsketch 1 2 w Sent to collector for fast query B 1 B d A 1 A d e : an tem h : a hash functon : a counter : a bucket Fat-subsketch 1 2 w Stored only n montors Fgure 4: The SF 4 - and & SF F -sketch archtecture. In SF 4 -sketch, we modfy the Fat-subsketch so that nstead of each bucket havng one counter, each bucket has z counters. As shown n Fgure 4, n the Fat-subsketch of the SF 4 -sketch, we have d arrays wth w = w buckets each, and each bucket now contans z counters nstead of one counter. We represent the k th counter n the j th bucket of the th array n the Fat-subsketch wth B [ j][k], where 1 d, 1 j w, and 1 k z. Each array B n the Fat-subsketch s assocated wth two unformly dstrbuted ndependent hash functons: h (.) wth output n the range [1,w], whch maps an tem to a bucket n the th array, and f (.) wth output n the range [1,z], whch maps an tem to a counter nsde the bucket B [h (.)] of the th array. The Slm-subsketch uses the same has functons h (.) as the Fat-subsketch to map tems to buckets. Every tme a counter n the Slm-subsketch s ncremented, t s certan that one of the counters among the z counters n the correspondng bucket of the Fat-subsketch s also ncremented. Ths means that the value of a counter n the Slm-subsketch wll always be less than or equal to the sum of the values of all counters n the correspondng bucket n the Fat-subsketch. Inserton: When nsertng an tem, the SF 4 -sketch frst nserts t nto the Fat-subsketch, and based on the observatons t makes from the Fat-subsketch, ncrements approprate counters n the Slm-subsketch. Specfcally, to nsert an tem e nto the Fat-subsketch, we frst compute d hash functons h 1 (e),h 2 (e),...,h d (e) and another d hash functons f 1 (e), f 2 (e),..., f d (e) and ncrement the d counters B 1 [h 1 (e)][ f 1 (e)], B 2 [h 2 (e)][ f 2 (e)],..., B d [h d (e)][ f d (e)] by 1. Next, we fnd the mnmum value among all counters we just ncremented and represent t wth B mn e. To nsert the tem e nto the Slm-subsketch, we dentfy the counters wth the smallest value among the d counters A 1 [h 1 (e)],a 2 [h 2 (e)],...,a d [h d (e)] and ncrement them by 1 only f ther values are less than B mn e. In other words, we ncrement all counters A l [h l (e)] by one that satsfy the condtons A l [h l (e)] = mn d =1 A [h (e)] and A l [h l (e)] < B mn e, where l [1,d]. If mn d =1 A [h (e)] B mn e, we do nothng. Query: The query operaton of the SF 4 -sketch s exactly the same as the SF 1 -, SF 2 -, and SF 3 -sketches. Deleton: To delete an tem from the SF 4 -sketch, we frst delete t from the Fat-subsketch and then from the Slm-subsketch. To delete the tem e from the Fat-subsketch, we frst calculate the d hash functons h 1 (e),h 2 (e),...,h d (e) and another d hash functons f 1 (e), f 2 (e),..., f d (e) and decrement the d counters B 1 [h 1 (e)][ f 1 (e)], B 2 [h 2 (e)][ f 2 (e)],..., B d [h d (e)][ f d (e)] by 1. To delete the tem e from the Slm-subsketch, we leverage the fact stated earler that before deletng the tem from the Fat-subsketch, the value of a counter n the Slm-subsketch s always less than or equal to the sum of values of all counters n the correspondng bucket n the Fat-subsketch. To delete the tem e from the Slm-subsketch, after deletng t from the Fatsubsketch, for each [1,d], we compare A [h (e)] wth z k=1 B [h (e)][k] and decrement counter A [h (e)] by 1 f A [h (e)] > z k=1 B [h (e)][k]. Therefore, one deleton from the Fat-subsketch only needs d z b/w memory accesses, where b s the number of bts of each counter, W s the sze of the machne word, and b < W. Advantages and Lmtatons: The prncples behnd the SF 4 -sketch and the SF 3 -sketch are essentally the same. The advantage SF 4 -sketch has over SF 3 -sketch s that all counters n the Fat-subsketch correspondng to a counter n the Slm-subsketch are now located n the same bucket. Thus, addng the values of the z counters usually only takes a sngle memory access. Based on SF 4 -sketch, our fnal verson SF F -sketch ams to mnmze the over-estmaton error. 3.5 SF F : Reducng Over-estmaton Error (The Fnal Verson) The structure of the SF F -sketch s exactly the same as SF 4 -sketch. The key dea behnd the SF F -sketch s that n updatng the counters n the Slm-subsketch, we keep the value of each counter n the Slm-subsketch always less than or equal to the value of the largest counter n the correspondng bucket of the Fat-subsketch durng nserton and deleton operatons. Next, we descrbe how nserton, deleton, and query operatons work n SF F - sketch followed by an analyss of ts error and accuracy. 7

8 Inserton: The nserton operaton of the SF F -sketch s exactly the same as the nserton operaton of the SF 4 - sketch. Query: The query operaton of the SF F -sketch s exactly the same as the prevous versons of the. Deleton: To delete an tem from the SF F -sketch, we frst delete t from the Fat-subsketch and then from the Slm-subsketch. To delete an tem e from the Slm-subsketch, we frst check the d buckets B 1 [h 1 (e)],b 2 [h 2 (e)]...b d [h d (e)]. For each [1,d], f max z k=1 B [h (e)][k] changes when deletng tem e from the Fat-subsketch, we set A [h (e)] = max z k=1 B [h (e)][k] f A [h (e)] > max z k=1 B [h (e)][k]. Otherwse, we leave the value of A [h (e)] unchanged. The key dfference between the deleton operaton of SF F -sketch and SF 4 -sketch s that n SF F -sketch, we compare the value of A [h (e)] wth max z k=1 B [h (e)][k] nstead of z k=1 B [h (e)][k], whch results n sgnfcantly reducng the values of counters n the Slm-subsketch. Advantages: The key advantage of SF F -sketch over SF 4 - sketch s that durng deleton operaton, t sgnfcantly reduces the counter values n the Slm-subsketch because n SF F -sketch, we compare the values of counters n the Slm-subsketch wth the values of the largest counters n the correspondng buckets of the Fat-subsketch nstead of comparng them wth the sum of the values of all counters n the correspondng buckets of the Fat-subsketch. Ths sgnfcantly reduces the over-estmaton error of SF F -sketch. Note that SF F -sketch does not suffer from under-estmaton error. Bound on Over-estmaton Error: As a query s entrely answered from the Slm-subsketch, the overestmaton error of SF F -sketch s actually the overestmaton error of the Slm-subsketch. Therefore, next, we calculate the over-estmaton error of the Slmsubsketch of the SF F -sketch. Let α represent the average number of counters n any gven array of the Slmsubsketch that are ncremented per nserton. Note that for the standard, the value of α s equal to 1 because n the standard, exactly one counter s ncremented n each array when nsertng an tem. For the Slm-subsketch n the SF F -sketch, α s less than or equal to 1 because the Fat-subsketch helps n reducng the number of counters that are ncremented n the Slmsubsketch per nserton. For any gven tem e, let f (e) represent ts actual frequency and let ˆf (e) represent the estmate of ts frequency returned by the Slm-subsketch of the SF F -sketch. Let N represent the total number of nsertons of all tems nto the SF F -sketch. Let h (.) represent the hash functon assocated wth the th array of the Slm-subsketch, where 1 d. Let X,(e) [ j] be the random varable that represents the dfference between the actual frequency f (e) of the tem e and the value of the j th counter n the th array,.e., X,(e) [ j] = A [ j] f (e), where j = h (e). Due to hash collsons, multple tems wll be mapped by the hash functon h (.) to the counter j, whch ncreases the value of A [ j] beyond f e and results n overestmaton error. As all hash functon have unformly dstrbuted output, Pr[h (e 1 ) = h (e 2 )] = 1/w. Therefore, the expected value of any counter A [ j], where 1 d and 1 j w, s αn/w. Let ε and δ be two numbers that are related to d and w as follows: d = ln(1/δ) and w = exp/ε. The expected value of X,(e) [ j] s gven by the followng expresson. E(X,(e) [ j]) = E(A [ j] f (e) ) E(A [ j]) = αn w εα exp N (2) Fnally, we derve the probablstc bound on the overestmaton error of the Slm-subsketch of the SF F -sketch. Pr[ ˆ f (e) f (e) + εαn] = Pr[,A [ j] f (e) + εαn] = (Pr[A [ j] f (e) εαn]) d = (Pr[X,(e) [ j] εαn]) d Substtutng the value of εαn from Equaton (2) nto the rght sde of the equaton above, we get Pr[ f (e) ˆ f (e) + εαn] (Pr[X,(e) [ j] expe(x,(e) [ j])) d Applyng Markov s Inequalty, we get Pr[ f (e) ˆ f (e) + εαn] exp d δ Dervaton of Correct Rate: The Correct Rate of a sketch s defned as the expected percentage of tems n the gven mult-set for whch the query response of the sketch contans no error. In dervng the correct rate of SF F -sketch, we make two assumptons: 1) all hash functons are ndependent; 2) the Fat-subsketch s large enough to have neglgble error. Before dervng the correct rate, we frst prove the followng theorem. Theorem 1. In the Slm-subsketch, the value of any gven counter s equal to the frequency of the most frequent tem that maps to t. Proof. We prove ths theorem usng mathematcal nducton on number of nsertons, represented by k. Base Case, k = 0: The theorem clearly holds for the base case because wth no nsertons, the frequency of the most frequent tem s currently 0, whch s also the value of all counters. Inducton Hypothess, k = n: Suppose the statement of the theorem holds true after n nsertons. 8

9 Inducton Step, k = n + 1: Let n + 1 st nserton be of any tem e that has prevously been nserted a tmes. Let α (k) represent the values of the counter A [h (e)] after k nsertons, where 0 d 1. There are two cases to consder: 1) e was the most frequent tem when k = n; 2) e was not the most frequent tem when k = n. Case 1: If e was the most frequent tem when k = n, then accordng to our nducton hypotheses, α (n) = a. After nsertng e, t wll stll be the most frequent tem and ts frequency ncreases to a + 1. The counter A [h (e)] wll be ncremented once. Consequently, we get α (n + 1) = a + 1. Thus for ths case, the theorem statement holds because the value of the counter A [h (e)] after nserton s stll equal to the frequency of the most frequent tem, whch s e. Case 2: If e was not the most frequent tem when k = n, then accordng to our nducton hypotheses, α (n) > a. After nsertng e, t may or may not become the most frequent tem. If t becomes the most frequent tem, t means that α (n) = a + 1 and as our SF F scheme, the counter A [h (e)] wll stay unchanged. Consequently, we get α (n + 1) = α (n) = a + 1. Thus for ths case, the theorem statement agan holds because the value of the counter A [h (e)] after nserton s equal to the frequency of the new most frequent tem, whch s e. After nsertng e, f t does not become the most frequent tem, then t means α (n) > a + 1 and as our SF F - sketch scheme, the counter A [h (e)] wll stay unchanged. Consequently, α (n+1) = α (n) > a+1. Thus, the theorem agan holds because the value of the counter A [h (e)] after nserton s stll equal to the frequency of the tem that was the most frequent after n nsertons. Next, we derve the correct rate of the SF F -sketch. Let v be the number of dstnct tems nserted nto the slmsubsketch and are represented by e 1,e 2,...,e v. Wthout loss of generalty, let the tem e l+1 be more frequent than e l, where 1 l v 1. Let X be the random varable representng the number of tems hashng nto the counter A [h (e l )] gven the tem e l, where 0 d 1 and 1 l v. Clearly, X Bnomal(v 1,1/w). From Theorem 1, we conclude that f e l has the hghest frequency among all tems that map to the gven counter A [h (e l )], then the query result for e l wll contan no error. Let E be the event that e l has the maxmum frequency among x tems that map to A [h (e l )]. The probablty P{E} s gven by the followng equaton: ( ) ( ) l 1 v 1 P{E} = / (where x l) x 1 x 1 Let P represent the probablty that the query result for e l from any gven counter contans no error. It s gven by: P = = l P{E} P{X = x} x=1 ( l l 1 ) x 1 ( v 1 ) x=1 x 1 ( v 1 x 1 ) ( 1 w ) x 1 ( 1 1 ) v x ( = 1 1 ) v l w w As there are d counters, the overall probablty that the query result of e l s correct s gven by the followng equaton. ( ( P CR {e l } = ) ) v l d w The equalty above holds when all v tems have dfferent frequences. If two or more tems have equal frequences, the correct rate ncreases slghtly. Consequently, the expected correct rate Cr of slm-subsketch s bound by: (1 ( ) 1 (1 w 1 )v l) d Cr v l=1 P CR{e l } v 4 Implementaton = v l=1 v (3) In ths secton, we descrbe our mplementaton of the sketches on two dfferent computng platforms namely CPU and GPU. We extensvely tested and evaluated SFsketch and compared ts performance wth pror sketches on these two platforms. Next, we frst descrbe our mplementaton on the CPU platform and then descrbe our mplementaton on the GPU platform. 4.1 CPU Implementaton Our CPU platform comprsed a machne wth dual 6-core CPUs (24 threads, Intel Xeon CPU GHz) and 62 GB total system memory. Each CPU has three levels of cache memory: L1, L2, and L3. L1 cache s comprsed of two 32KB caches, where one cache acts as the data cache and the other acts as the nstructon cache. L2 cache s a sngle 256KB cache and L3 cache s a sngle 15MB cache. To evaluate the schemes n dfferent types of settngs, our mplementatons on the CPU platform nclude both sngle-thread mplementaton as well as mult-thread mplementaton. We used C++ as the programmng language. In sngle-thread mplementaton, for each sketch, we mplemented the entre nserton, deleton, and query process wthn a sngle thread. In mult-thread mplementaton, we run each query n a dedcated thread and process t completely nsde that thread, observng near-lnear growth n query speed wth the ncrease n the number of threads. We wll present the results on query speed n more detal n Secton

10 4.2 GPU Implementaton As GPUs have seen wde acceptance for hgh-speed data processng, we mplemented our sketches on GPUs as well. For these mplementatons, we employ the basc archtecture of GAMT [22]. More specfcally, we evaluated the sketches on GPU platform usng CUDA 5.0 archtecture. We performed our experments on a NVIDIA GPU (Tesla C2075, 1147 MHz, 5376 MB devce memory, 448 CUDA cores). We mplemented our sketches on GPU usng two prevalent technques: batch processng and mult-stream ppelnng. Next, we descrbe our mplementatons for these two technques Batch Processng Our system archtecture s based on CUDA [24], the well-known parallel computng platform created by NVIDIA. In our mplementaton, a typcal query cycle proceeds n followng three steps: (1) copy the ncomng queres from the CPU to the GPU, (2) execute the query kernel, and (3) copy the result from the GPU back to the CPU. A kernel n CUDA s a functon that s called on CPU but executed on GPU. A query kernel s confgured wth a seres of thread blocks, where each block s comprsed of a group of workng threads. As GPU chps have hundreds and even thousands of cores, batch processng s needed to accelerate GPU-based mplementatons. Each batch s frst flled wth a group of ndependent queres, and then transferred to and executed on the GPU,.e., as soon as a query arrves, t s buffered untl there are enough queres to fll the current batch of queres before transferrng the batch to GPU for processng by the query kernels. Note that n practce, not all the queres are processed smultaneously, but rather GPU s scheduler decdes when to process whch query. As CPUs support less parallelsm compared to GPUs and the addtonal memory accesses to the CPUs may deterorate the batch processng performance of GPUs, n our mplementaton, all d arrays are stored on the GPU to ensure that the operatons to access the arrays are executed completely wthn the GPU Mult-Stream Ppelne As dscussed earler, batch processng s requred to take maxmum advantage of the massve parallelzaton that GPU enables. However, watng for enough queres to fll a batch before sendng the batch to GPU results n unnecessary delays. Furthermore, whle a large batch does boost the throughput of the GPU, t ncreases the watng tme before a batch flls and s transferred to GPU for processng. Ths means that the query that arrved at the start of the current batch wll experence sgnfcant latency before t s processed. To resolve ths throughputlatency dlemma, we utlze the mult-stream technque featured n NVIDIA Ferm GPU archtecture [22, 30]. A stream, n ths context, s a sequence of operatons that must be executed n a certan order. As per CUDA archtecture, data transfers and kernel executons wthn dfferent streams can be concurrent as long as the devce supports concurrent operatons and the host memores used to exchange data between the CPU and the GPU are page-locked. In ths way, when one stream s copyng data between the CPU and GPU, another stream can execute query kernels n parallel. As a result, the streams behave as a mult-stage ppelne and reduce the total processng tme. Furthermore, a large batch can be dvded nto several smaller ones, reducng the average lookup latency whle keepng the throughput hgh. Gven a batch of requests and a sequence of actve streams, the task mappng should be performed n as balanced way as possble to effcently use GPU s parallelsm. Let b and n denote the batch sze and the number of actve streams, respectvely. If b s just a multple of n, the whole batch can be evenly dvded. Otherwse, the frst b % n streams may need to perform an extra operaton. In our mplementaton, after dvdng the whole batch nto multple smaller batches of approxmately dentcal szes, we use ther offset and sze nformaton for task mappng. Each small batch s processed by the specfed stream, and all streams are launched one after the other to work as a mult-stage ppelne. 5 Expermental Results We conducted extensve experments to evaluate the performance of our SF F -sketch n terms of accuracy and speed. Onwards, we wll refer to the SF F -sketch as smply the. For comparson, we also mplemented and evaluated the performance of four well known sketches, namely the Count-sketch () [5], the [11] and the CU-sketch [13] and one recently proposed sketch, namely the CML-sketch [25]. CML-sketch and CU-sketch do NOT support deletons. CML-sketch and Count-sketch suffer from both overestmaton and under-estmaton errors. 5.1 Expermental Setup Datasets: We use three types of datasets: real world traffc, unform dataset, and skewed dataset. The real world network traffc trace s captured by the man gateway of our campus, whle the unform and the skewed datasets are generated by the well known YCSB [8]. We keep the skewness of our skewed dataset equal to the default value for YCSB, whch s We use Memcached [1] to record the real frequency of each tem to establsh the ground truth. 10

11 Emprcal CDF CML-sketch CU-sketch Relatve Error Average relatve error CML-sketch CU-sketch # unform nsertons (*100k) Average relatve error # unform deletons (*100k) Average relatve error 2.0 w/ ns w/ del. w/ ns. w/ del. w/ ns. w/ del # unform nsertons and deletons (*100k) Fgure 5: CDF of relatve error (unform). Emprcal CDF CML-sketch CU-sketch Relatve Error Fgure 9: CDF of relatve error (skewed). Fgure 6: Average relatve Fgure 7: Average relatve Fgure 8: Increase n error error vs. number of nsertons (unform). tons error vs. number of dele- due to deletons (unform). (unform). Average relatve error CML-sketch CU-sketch # zpfan nsertons (*100k) Average relatve error # zpfan deletons (*100k) 2.0 w/ ns w/ del. w/ ns. w/ del. w/ ns. w/ del # zpfan nsertons and deletons (*100k) Fgure 10: Average relatve Fgure 11: Average relatve Fgure 12: Increase n error error vs. number of nsertons (skewed). tons error vs. number of dele- due to deletons(skewed). (skewed). Average relatve error Expermental Comparson: As the memory of the montor s cheap and large enough, thus we assgn the same sze of memory for the Slm-subsketch and the stateof-the-art sketches both of whch wll be transmtted to the collector. For update experments, we compare them by varyng tem frequences and operaton sze,.e., the number of nserton and deleton operatons. 5.2 Experments on Accuracy We use relatve error (RE) to quantfy the accuracy of sketches. Let f e represent the actual frequency of an tem e and let ˆf e represent the estmate of the frequency returned by the sketch, the relatve error s defned as the rato ˆf e f e / f e. To evaluate accuracy, we used 100K dstnct tems and fxed parameter settng (d = 5, w = 40000, z = 3). We calculated relatve errors for dfferent sketches n three settngs: (1) by ncrementally ncreasng the number of nserton operatons; (2) by ncrementally ncreasng the number of deleton operatons; and (3) by frst ncreasng the number of nserton operatons and then deletng the nserted tems one by one n reverse order. We performed experments n these three settngs for both unform and skewed workloads. We also conducted experments to quantfy the effect of system parameters on the performance of the sketches. In all our experments on accuracy evaluaton, we use 100K (= ) dstnct tems n total Unform Workload Relatve Error CDF: Our expermental results show that the percentage of tems for whch the relatve error of our s less than 1% s 74.51%, whch s 18.8, 4.3, 2.1 and 1.9 tmes hgher than the correspondng percentages for CML, C, CM and CU-sketches, respectvely. Fgure 5 reports the emprcal cumulatve dstrbuton functon (CDF) of relatve error for the 100K dstnct tems after a total of 10M (= ) nsertons. Specfcally, we frst nserted the 100K dstnct tems for a total of 10M tmes such that the probablty of occurrence for each tem was unformly dstrbuted, and then calculated the relatve errors n the estmates of the frequences of those 100K dstnct tems. In ths way, we got 100K values of relatve error for each of the fve sketches (CML, C, CM, CU and es). We then plotted a CDF usng the 100K relatve error values for each sketch. We observe from Fgure 5 that the CDF of the s not only hgher than that of the other four sketches but also ascends sharply near relatve error of 0. Ths ndcates that the relatve error n the estmate of the frequences of most tems, calculated from the, s very close to 0. Relatve Error vs. # of Insertons: Our expermental results show that the average relatve error of s [0.6 to 6.2], [4.0 to 24.0], [4.4 to 33.1], and [1.8 to 3.8] tmes smaller than the average relatve errors of CML, C, CM, and CU-sketches, respectvely. Fgure 6 plots the average relatve errors n the estmate of the frequences of the 100K dstnct tems obtaned from the fve sketches for dfferent number of nsertons. We observe from ths fgure that the average relatve errors of the fve sketches converge to dfferent fxed values wth ncreasng number 11

Problem Set 3 Solutions

Problem Set 3 Solutions Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,

More information

Simulation Based Analysis of FAST TCP using OMNET++

Simulation Based Analysis of FAST TCP using OMNET++ Smulaton Based Analyss of FAST TCP usng OMNET++ Umar ul Hassan 04030038@lums.edu.pk Md Term Report CS678 Topcs n Internet Research Sprng, 2006 Introducton Internet traffc s doublng roughly every 3 months

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory Background EECS. Operatng System Fundamentals No. Vrtual Memory Prof. Hu Jang Department of Electrcal Engneerng and Computer Scence, York Unversty Memory-management methods normally requres the entre process

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Efficient Distributed File System (EDFS)

Efficient Distributed File System (EDFS) Effcent Dstrbuted Fle System (EDFS) (Sem-Centralzed) Debessay(Debsh) Fesehaye, Rahul Malk & Klara Naherstedt Unversty of Illnos-Urbana Champagn Contents Problem Statement, Related Work, EDFS Desgn Rate

More information

CE 221 Data Structures and Algorithms

CE 221 Data Structures and Algorithms CE 1 ata Structures and Algorthms Chapter 4: Trees BST Text: Read Wess, 4.3 Izmr Unversty of Economcs 1 The Search Tree AT Bnary Search Trees An mportant applcaton of bnary trees s n searchng. Let us assume

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

y and the total sum of

y and the total sum of Lnear regresson Testng for non-lnearty In analytcal chemstry, lnear regresson s commonly used n the constructon of calbraton functons requred for analytcal technques such as gas chromatography, atomc absorpton

More information

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

Array transposition in CUDA shared memory

Array transposition in CUDA shared memory Array transposton n CUDA shared memory Mke Gles February 19, 2014 Abstract Ths short note s nspred by some code wrtten by Jeremy Appleyard for the transposton of data through shared memory. I had some

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Analysis of Collaborative Distributed Admission Control in x Networks

Analysis of Collaborative Distributed Admission Control in x Networks 1 Analyss of Collaboratve Dstrbuted Admsson Control n 82.11x Networks Thnh Nguyen, Member, IEEE, Ken Nguyen, Member, IEEE, Lnha He, Member, IEEE, Abstract Wth the recent surge of wreless home networks,

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe CSCI 104 Sortng Algorthms Mark Redekopp Davd Kempe Algorthm Effcency SORTING 2 Sortng If we have an unordered lst, sequental search becomes our only choce If we wll perform a lot of searches t may be benefcal

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

ELEC 377 Operating Systems. Week 6 Class 3

ELEC 377 Operating Systems. Week 6 Class 3 ELEC 377 Operatng Systems Week 6 Class 3 Last Class Memory Management Memory Pagng Pagng Structure ELEC 377 Operatng Systems Today Pagng Szes Vrtual Memory Concept Demand Pagng ELEC 377 Operatng Systems

More information

Solving two-person zero-sum game by Matlab

Solving two-person zero-sum game by Matlab Appled Mechancs and Materals Onlne: 2011-02-02 ISSN: 1662-7482, Vols. 50-51, pp 262-265 do:10.4028/www.scentfc.net/amm.50-51.262 2011 Trans Tech Publcatons, Swtzerland Solvng two-person zero-sum game by

More information

Learning-Based Top-N Selection Query Evaluation over Relational Databases

Learning-Based Top-N Selection Query Evaluation over Relational Databases Learnng-Based Top-N Selecton Query Evaluaton over Relatonal Databases Lang Zhu *, Wey Meng ** * School of Mathematcs and Computer Scence, Hebe Unversty, Baodng, Hebe 071002, Chna, zhu@mal.hbu.edu.cn **

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes SPH3UW Unt 7.3 Sphercal Concave Mrrors Page 1 of 1 Notes Physcs Tool box Concave Mrror If the reflectng surface takes place on the nner surface of the sphercal shape so that the centre of the mrror bulges

More information

Related-Mode Attacks on CTR Encryption Mode

Related-Mode Attacks on CTR Encryption Mode Internatonal Journal of Network Securty, Vol.4, No.3, PP.282 287, May 2007 282 Related-Mode Attacks on CTR Encrypton Mode Dayn Wang, Dongda Ln, and Wenlng Wu (Correspondng author: Dayn Wang) Key Laboratory

More information

Load-Balanced Anycast Routing

Load-Balanced Anycast Routing Load-Balanced Anycast Routng Chng-Yu Ln, Jung-Hua Lo, and Sy-Yen Kuo Department of Electrcal Engneerng atonal Tawan Unversty, Tape, Tawan sykuo@cc.ee.ntu.edu.tw Abstract For fault-tolerance and load-balance

More information

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 An Iteratve Soluton Approach to Process Plant Layout usng Mxed

More information

Private Information Retrieval (PIR)

Private Information Retrieval (PIR) 2 Levente Buttyán Problem formulaton Alce wants to obtan nformaton from a database, but she does not want the database to learn whch nformaton she wanted e.g., Alce s an nvestor queryng a stock-market

More information

Real-Time Guarantees. Traffic Characteristics. Flow Control

Real-Time Guarantees. Traffic Characteristics. Flow Control Real-Tme Guarantees Requrements on RT communcaton protocols: delay (response s) small jtter small throughput hgh error detecton at recever (and sender) small error detecton latency no thrashng under peak

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

Avoiding congestion through dynamic load control

Avoiding congestion through dynamic load control Avodng congeston through dynamc load control Vasl Hnatyshn, Adarshpal S. Seth Department of Computer and Informaton Scences, Unversty of Delaware, Newark, DE 976 ABSTRACT The current best effort approach

More information

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Proceedngs of the Wnter Smulaton Conference M E Kuhl, N M Steger, F B Armstrong, and J A Jones, eds A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Mark W Brantley Chun-Hung

More information

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated.

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated. Some Advanced SP Tools 1. umulatve Sum ontrol (usum) hart For the data shown n Table 9-1, the x chart can be generated. However, the shft taken place at sample #21 s not apparent. 92 For ths set samples,

More information

Summarizing Data using Bottom-k Sketches

Summarizing Data using Bottom-k Sketches Summarzng Data usng Bottom-k Sketches Edth Cohen AT&T Labs Research 8 Park Avenue Florham Park, NJ 7932, USA edth@research.att.com Ham Kaplan School of Computer Scence Tel Avv Unversty Tel Avv, Israel

More information

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS ARPN Journal of Engneerng and Appled Scences 006-017 Asan Research Publshng Network (ARPN). All rghts reserved. NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS Igor Grgoryev, Svetlana

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

3. CR parameters and Multi-Objective Fitness Function

3. CR parameters and Multi-Objective Fitness Function 3 CR parameters and Mult-objectve Ftness Functon 41 3. CR parameters and Mult-Objectve Ftness Functon 3.1. Introducton Cogntve rados dynamcally confgure the wreless communcaton system, whch takes beneft

More information

Advanced Computer Networks

Advanced Computer Networks Char of Network Archtectures and Servces Department of Informatcs Techncal Unversty of Munch Note: Durng the attendance check a stcker contanng a unque QR code wll be put on ths exam. Ths QR code contans

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

Can We Beat the Prefix Filtering? An Adaptive Framework for Similarity Join and Search

Can We Beat the Prefix Filtering? An Adaptive Framework for Similarity Join and Search Can We Beat the Prefx Flterng? An Adaptve Framework for Smlarty Jon and Search Jannan Wang Guolang L Janhua Feng Department of Computer Scence and Technology, Tsnghua Natonal Laboratory for Informaton

More information

Efficient Broadcast Disks Program Construction in Asymmetric Communication Environments

Efficient Broadcast Disks Program Construction in Asymmetric Communication Environments Effcent Broadcast Dsks Program Constructon n Asymmetrc Communcaton Envronments Eleftheros Takas, Stefanos Ougaroglou, Petros copoltds Department of Informatcs, Arstotle Unversty of Thessalonk Box 888,

More information

Parallel matrix-vector multiplication

Parallel matrix-vector multiplication Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more

More information

CMPS 10 Introduction to Computer Science Lecture Notes

CMPS 10 Introduction to Computer Science Lecture Notes CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not

More information

Hermite Splines in Lie Groups as Products of Geodesics

Hermite Splines in Lie Groups as Products of Geodesics Hermte Splnes n Le Groups as Products of Geodescs Ethan Eade Updated May 28, 2017 1 Introducton 1.1 Goal Ths document defnes a curve n the Le group G parametrzed by tme and by structural parameters n the

More information

Insertion Sort. Divide and Conquer Sorting. Divide and Conquer. Mergesort. Mergesort Example. Auxiliary Array

Insertion Sort. Divide and Conquer Sorting. Divide and Conquer. Mergesort. Mergesort Example. Auxiliary Array Inserton Sort Dvde and Conquer Sortng CSE 6 Data Structures Lecture 18 What f frst k elements of array are already sorted? 4, 7, 1, 5, 1, 16 We can shft the tal of the sorted elements lst down and then

More information

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following. Complex Numbers The last topc n ths secton s not really related to most of what we ve done n ths chapter, although t s somewhat related to the radcals secton as we wll see. We also won t need the materal

More information

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Some materal adapted from Mohamed Youns, UMBC CMSC 611 Spr 2003 course sldes Some materal adapted from Hennessy & Patterson / 2003 Elsever Scence Performance = 1 Executon tme Speedup = Performance (B)

More information

Reducing Frame Rate for Object Tracking

Reducing Frame Rate for Object Tracking Reducng Frame Rate for Object Trackng Pavel Korshunov 1 and We Tsang Oo 2 1 Natonal Unversty of Sngapore, Sngapore 11977, pavelkor@comp.nus.edu.sg 2 Natonal Unversty of Sngapore, Sngapore 11977, oowt@comp.nus.edu.sg

More information

Parallel Inverse Halftoning by Look-Up Table (LUT) Partitioning

Parallel Inverse Halftoning by Look-Up Table (LUT) Partitioning Parallel Inverse Halftonng by Look-Up Table (LUT) Parttonng Umar F. Sddq and Sadq M. Sat umar@ccse.kfupm.edu.sa, sadq@kfupm.edu.sa KFUPM Box: Department of Computer Engneerng, Kng Fahd Unversty of Petroleum

More information

RAP. Speed/RAP/CODA. Real-time Systems. Modeling the sensor networks. Real-time Systems. Modeling the sensor networks. Real-time systems:

RAP. Speed/RAP/CODA. Real-time Systems. Modeling the sensor networks. Real-time Systems. Modeling the sensor networks. Real-time systems: Speed/RAP/CODA Presented by Octav Chpara Real-tme Systems Many wreless sensor network applcatons requre real-tme support Survellance and trackng Border patrol Fre fghtng Real-tme systems: Hard real-tme:

More information

Motivation. EE 457 Unit 4. Throughput vs. Latency. Performance Depends on View Point?! Computer System Performance. An individual user wants to:

Motivation. EE 457 Unit 4. Throughput vs. Latency. Performance Depends on View Point?! Computer System Performance. An individual user wants to: 4.1 4.2 Motvaton EE 457 Unt 4 Computer System Performance An ndvdual user wants to: Mnmze sngle program executon tme A datacenter owner wants to: Maxmze number of Mnmze ( ) http://e-tellgentnternetmarketng.com/webste/frustrated-computer-user-2/

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

Today s Outline. Sorting: The Big Picture. Why Sort? Selection Sort: Idea. Insertion Sort: Idea. Sorting Chapter 7 in Weiss.

Today s Outline. Sorting: The Big Picture. Why Sort? Selection Sort: Idea. Insertion Sort: Idea. Sorting Chapter 7 in Weiss. Today s Outlne Sortng Chapter 7 n Wess CSE 26 Data Structures Ruth Anderson Announcements Wrtten Homework #6 due Frday 2/26 at the begnnng of lecture Proect Code due Mon March 1 by 11pm Today s Topcs:

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

Programming in Fortran 90 : 2017/2018

Programming in Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values

More information

3D vector computer graphics

3D vector computer graphics 3D vector computer graphcs Paolo Varagnolo: freelance engneer Padova Aprl 2016 Prvate Practce ----------------------------------- 1. Introducton Vector 3D model representaton n computer graphcs requres

More information

CHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION

CHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION 24 CHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION The present chapter proposes an IPSO approach for multprocessor task schedulng problem wth two classfcatons, namely, statc ndependent tasks and

More information

Self-tuning Histograms: Building Histograms Without Looking at Data

Self-tuning Histograms: Building Histograms Without Looking at Data Self-tunng Hstograms: Buldng Hstograms Wthout Lookng at Data Ashraf Aboulnaga Computer Scences Department Unversty of Wsconsn - Madson ashraf@cs.wsc.edu Surajt Chaudhur Mcrosoft Research surajtc@mcrosoft.com

More information

Time- and Space-Efficient Sliding Window Top-k Query Processing

Time- and Space-Efficient Sliding Window Top-k Query Processing Tme- and Space-Effcent Sldng Wndow Top-k Query Processng KREŠIMIR PRIPUŽIĆ and IVANA PODNAR ŽARKO, Unversty of Zagreb KARL ABERER, École Polytechnque FédéraledeLausanne 1 A sldng wndow top-k (top-k/w)

More information

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009. Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton

More information

USING GRAPHING SKILLS

USING GRAPHING SKILLS Name: BOLOGY: Date: _ Class: USNG GRAPHNG SKLLS NTRODUCTON: Recorded data can be plotted on a graph. A graph s a pctoral representaton of nformaton recorded n a data table. t s used to show a relatonshp

More information

UB at GeoCLEF Department of Geography Abstract

UB at GeoCLEF Department of Geography   Abstract UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department

More information

Empirical Distributions of Parameter Estimates. in Binary Logistic Regression Using Bootstrap

Empirical Distributions of Parameter Estimates. in Binary Logistic Regression Using Bootstrap Int. Journal of Math. Analyss, Vol. 8, 4, no. 5, 7-7 HIKARI Ltd, www.m-hkar.com http://dx.do.org/.988/jma.4.494 Emprcal Dstrbutons of Parameter Estmates n Bnary Logstc Regresson Usng Bootstrap Anwar Ftranto*

More information

AADL : about scheduling analysis

AADL : about scheduling analysis AADL : about schedulng analyss Schedulng analyss, what s t? Embedded real-tme crtcal systems have temporal constrants to meet (e.g. deadlne). Many systems are bult wth operatng systems provdng multtaskng

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z. TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS Muradalyev AZ Azerbajan Scentfc-Research and Desgn-Prospectng Insttute of Energetc AZ1012, Ave HZardab-94 E-mal:aydn_murad@yahoocom Importance of

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

ARTICLE IN PRESS. Signal Processing: Image Communication

ARTICLE IN PRESS. Signal Processing: Image Communication Sgnal Processng: Image Communcaton 23 (2008) 754 768 Contents lsts avalable at ScenceDrect Sgnal Processng: Image Communcaton journal homepage: www.elsever.com/locate/mage Dstrbuted meda rate allocaton

More information

Meta-heuristics for Multidimensional Knapsack Problems

Meta-heuristics for Multidimensional Knapsack Problems 2012 4th Internatonal Conference on Computer Research and Development IPCSIT vol.39 (2012) (2012) IACSIT Press, Sngapore Meta-heurstcs for Multdmensonal Knapsack Problems Zhbao Man + Computer Scence Department,

More information

Concurrent Apriori Data Mining Algorithms

Concurrent Apriori Data Mining Algorithms Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng

More information

SAO: A Stream Index for Answering Linear Optimization Queries

SAO: A Stream Index for Answering Linear Optimization Queries SAO: A Stream Index for Answerng near Optmzaton Queres Gang uo Kun-ung Wu Phlp S. Yu IBM T.J. Watson Research Center {luog, klwu, psyu}@us.bm.com Abstract near optmzaton queres retreve the top-k tuples

More information

High-Boost Mesh Filtering for 3-D Shape Enhancement

High-Boost Mesh Filtering for 3-D Shape Enhancement Hgh-Boost Mesh Flterng for 3-D Shape Enhancement Hrokazu Yagou Λ Alexander Belyaev y Damng We z Λ y z ; ; Shape Modelng Laboratory, Unversty of Azu, Azu-Wakamatsu 965-8580 Japan y Computer Graphcs Group,

More information

Dynamic Voltage Scaling of Supply and Body Bias Exploiting Software Runtime Distribution

Dynamic Voltage Scaling of Supply and Body Bias Exploiting Software Runtime Distribution Dynamc Voltage Scalng of Supply and Body Bas Explotng Software Runtme Dstrbuton Sungpack Hong EE Department Stanford Unversty Sungjoo Yoo, Byeong Bn, Kyu-Myung Cho, Soo-Kwan Eo Samsung Electroncs Taehwan

More information

Optimal Workload-based Weighted Wavelet Synopses

Optimal Workload-based Weighted Wavelet Synopses Optmal Workload-based Weghted Wavelet Synopses Yoss Matas School of Computer Scence Tel Avv Unversty Tel Avv 69978, Israel matas@tau.ac.l Danel Urel School of Computer Scence Tel Avv Unversty Tel Avv 69978,

More information

Circuit Analysis I (ENGR 2405) Chapter 3 Method of Analysis Nodal(KCL) and Mesh(KVL)

Circuit Analysis I (ENGR 2405) Chapter 3 Method of Analysis Nodal(KCL) and Mesh(KVL) Crcut Analyss I (ENG 405) Chapter Method of Analyss Nodal(KCL) and Mesh(KVL) Nodal Analyss If nstead of focusng on the oltages of the crcut elements, one looks at the oltages at the nodes of the crcut,

More information

Life Tables (Times) Summary. Sample StatFolio: lifetable times.sgp

Life Tables (Times) Summary. Sample StatFolio: lifetable times.sgp Lfe Tables (Tmes) Summary... 1 Data Input... 2 Analyss Summary... 3 Survval Functon... 5 Log Survval Functon... 6 Cumulatve Hazard Functon... 7 Percentles... 7 Group Comparsons... 8 Summary The Lfe Tables

More information

Classifying Acoustic Transient Signals Using Artificial Intelligence

Classifying Acoustic Transient Signals Using Artificial Intelligence Classfyng Acoustc Transent Sgnals Usng Artfcal Intellgence Steve Sutton, Unversty of North Carolna At Wlmngton (suttons@charter.net) Greg Huff, Unversty of North Carolna At Wlmngton (jgh7476@uncwl.edu)

More information

The stream cipher MICKEY-128 (version 1) Algorithm specification issue 1.0

The stream cipher MICKEY-128 (version 1) Algorithm specification issue 1.0 The stream cpher MICKEY-128 (verson 1 Algorthm specfcaton ssue 1. Steve Babbage Vodafone Group R&D, Newbury, UK steve.babbage@vodafone.com Matthew Dodd Independent consultant matthew@mdodd.net www.mdodd.net

More information

Sorting: The Big Picture. The steps of QuickSort. QuickSort Example. QuickSort Example. QuickSort Example. Recursive Quicksort

Sorting: The Big Picture. The steps of QuickSort. QuickSort Example. QuickSort Example. QuickSort Example. Recursive Quicksort Sortng: The Bg Pcture Gven n comparable elements n an array, sort them n an ncreasng (or decreasng) order. Smple algorthms: O(n ) Inserton sort Selecton sort Bubble sort Shell sort Fancer algorthms: O(n

More information

Loop Permutation. Loop Transformations for Parallelism & Locality. Legality of Loop Interchange. Loop Interchange (cont)

Loop Permutation. Loop Transformations for Parallelism & Locality. Legality of Loop Interchange. Loop Interchange (cont) Loop Transformatons for Parallelsm & Localty Prevously Data dependences and loops Loop transformatons Parallelzaton Loop nterchange Today Loop nterchange Loop transformatons and transformaton frameworks

More information

Loop Transformations for Parallelism & Locality. Review. Scalar Expansion. Scalar Expansion: Motivation

Loop Transformations for Parallelism & Locality. Review. Scalar Expansion. Scalar Expansion: Motivation Loop Transformatons for Parallelsm & Localty Last week Data dependences and loops Loop transformatons Parallelzaton Loop nterchange Today Scalar expanson for removng false dependences Loop nterchange Loop

More information

CS 534: Computer Vision Model Fitting

CS 534: Computer Vision Model Fitting CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust

More information

CSE 326: Data Structures Quicksort Comparison Sorting Bound

CSE 326: Data Structures Quicksort Comparison Sorting Bound CSE 326: Data Structures Qucksort Comparson Sortng Bound Steve Setz Wnter 2009 Qucksort Qucksort uses a dvde and conquer strategy, but does not requre the O(N) extra space that MergeSort does. Here s the

More information

Data Representation in Digital Design, a Single Conversion Equation and a Formal Languages Approach

Data Representation in Digital Design, a Single Conversion Equation and a Formal Languages Approach Data Representaton n Dgtal Desgn, a Sngle Converson Equaton and a Formal Languages Approach Hassan Farhat Unversty of Nebraska at Omaha Abstract- In the study of data representaton n dgtal desgn and computer

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information