CPU Load Shedding for Binary Stream Joins

Size: px
Start display at page:

Download "CPU Load Shedding for Binary Stream Joins"

Transcription

1 Under consderaton for publcaton n Knowledge and Informaton Systems CPU Load Sheddng for Bnary Stream Jons Bugra Gedk 1,2, Kun-Lung Wu 1, Phlp S. Yu 1 and Lng Lu 2 1 IBM T.J. Watson Research Center, Hawthorne, NY 1532, USA; 2 College of Computng, Georga Insttute of Technology, Atlanta, GA 3332 Abstract. We present an adaptve load sheddng approach for wndowed stream jons. In contrast to the conventonal approach of droppng tuples from the nput streams, we explore the concept of selectve processng for load sheddng. We allow stream tuples to be stored n the wndows and shed excessve CPU load by performng the jon operatons, not on the entre set of tuples wthn the wndows, but on a dynamcally changng subset of tuples that are learned to be hghly benefcal. We support such dynamc selectve processng through three forms of runtme adaptatons: adaptaton to nput stream rates, adaptaton to tme correlaton between the streams and adaptaton to jon drectons. Our load sheddng approach enables us to ntegrate utlty-based load sheddng wth tme correlaton-based load sheddng. Indexes are used to further speed up the executon of stream jons. Experments are conducted to evaluate our adaptve load sheddng n terms of output rate and utlty. The results show that our selectve processng approach to load sheddng s very effectve and sgnfcantly outperforms the approach that drops tuples from the nput streams. Keywords: Data streams; Stream jons; Load sheddng 1. Introducton Wth the ever ncreasng rate of dgtal nformaton avalable from on-lne sources and networked sensng devces (Madden et al, 22), the management of bursty and unpredctable data streams has become a challengng problem. It requres solutons that wll enable applcatons to effectvely access and extract nformaton from such data streams. A promsng soluton for ths problem s to use declaratve query processng engnes specalzed for handlng data streams, such as data stream management systems (DSMS), exemplfed by Aurora (Balakrshnan et Receved xxx Revsed xxx Accepted xxx

2 2 B. Gedk et al al, 24), STREAM (Arasu et al, 23), and TelegraphCQ (Chandrasekaran et al, 23). Jons are key operatons n any type of query processng engne and are becomng more mportant wth the ncreasng need for fusng data from varous types of sensors avalable, such as envronmental, traffc, and network sensors. Here, we lst some real-lfe applcatons of stream jons. We wll return to these examples when we dscuss assumptons about the characterstcs of the joned streams. Fndng smlar news tems from two dfferent sources: Assumng that news tems from CNN and Reuters are represented by weghted keywords (jon attrbute) n ther respectve streams, we can perform a wndowed nner product jon to fnd smlar news tems. Fndng correlaton between phone calls and stock tradng: Assumng that phone call streams are represented as {,(P a,p b,t 1 ),} where (P a,p b,t 1 ) means P a calls P b at tme t 1, and stock tradng streams are represented as {,(P b,s x,t 2 ),} where (P b,s x,t 2 ) means P b trades S x at tme t 2 ; we can perform a wndowed equ-jon on person to fnd hnts, such as: P a hnts S x to P b n the phone call. Fndng correlated attacks from two dfferent streams: Assumng that alerts from two dfferent sources are represented by tuples n the form of (source, target, {attack descrptors}, tme) n ther respectve streams, where {attack descrptors} s a set-based attrbute that ncludes keywords descrbng the attack, we can perform a wndowed overlap jon on attack descrptors to fnd correlated attacks from dfferent sources. Recently, performng jons on unbounded data streams has been actvely studed (Golab and Ozsu, 23; Kang et al, 23; Hammad and Aref, 23). Ths s manly due to the fact that tradtonal jon algorthms need to perform a scan on one of the nputs to produce all the result tuples that match wth a gven tuple from the other nput. However, data streams are unbounded. Producng a complete answer for a stream jon requres unbounded memory and processng resources. To address ths problem, several approaches have been proposed. One natural way of handlng jons on nfnte streams s to use sldng wndows. In a wndowed stream jon, a tuple from one stream s joned wth only the tuples currently avalable n the wndow of another stream. A sldng wndow can be defned as a tme-based or count-based (tuple-based) wndow. An example of a tme-based wndow s last 1 seconds tuples and an example of a count-based wndow s last 1 tuples. Wndows can be ether user defned, n whch case we have fxed wndows, or system-defned and thus flexble, n whch case the system uses the avalable memory to maxmze the output sze of the jon. Another way of handlng the problem of blockng jons s to use punctuated streams (Tucker et al, 23), n whch punctuatons that gve hnts about the rest of the stream are used to prevent blockng. The two-way stream jons wth user defned tme-based wndows consttute one of the most common jon types n the data stream management research to date (Babcock et al, 22; Golab and Ozsu, 23; Kang et al, 23). In order to keep up wth the ncomng rates of streams, CPU load sheddng s usually needed n stream processng systems. Several factors may contrbute to the demand for CPU load sheddng, ncludng (a) bursty and unpredctable rates of the ncomng streams; (b) large wndow szes; and (c) costly jon condtons. Data streams can be unpredctable n nature (Klenberg, 22) and ncomng stream rates tend to soar durng peak tmes. A hgh stream rate requres more

3 CPU Load Sheddng for Bnary Stream Jons 3 resources for performng a wndowed jon, due to both ncreased number of tuples receved per unt tme and the ncreased number of tuples wthn a fxed-szed tme wndow. Smlarly, large wndow szes mply that more tuples are needed for processng a wndowed jon. Costly jon condtons typcally requre more CPU tme. In ths paper, we present an adaptve CPU load sheddng approach for wndowed stream jons, amng at maxmzng both the output rate and the output utlty of stream jons. The proposed approach s applcable to all knds of jon condtons, rangng from smple condtons such as equ-jons defned over snglevalued attrbutes (e.g., the phone calls and stock tradng scenaro) to complex condtons such as those defned over set-valued attrbutes (e.g., the correlated attacks scenaro) or weghted set-valued attrbutes (e.g., the smlar news tems scenaro). Summary of Contrbutons Our adaptve load sheddng approach has several unque characterstcs. Frst, nstead of droppng tuples from the nput streams as proposed n many exstng approaches, our adaptve load sheddng framework follows a selectve processng methodology by keepng tuples wthn the wndows, but processng them aganst a subset of the tuples n the opposte wndow. Second, our approach acheves effectve load sheddng by properly adaptng jon operatons to three dynamc stream propertes: () ncomng stream rates, () tme correlaton between streams and () jon drectons. The amount of selectve processng s adjusted accordng to the ncomng stream rates. Prortzed segments of the wndows are used to adapt jon operatons to the tme-based correlaton between the nput streams. Partal symmetrc jons are dynamcally employed to take advantage of the most benefcal jon drecton learned from the streams. Thrd, but not the least, our selectve processng approach enables a coherent ntegraton of the three adaptatons wth the utlty-based load sheddng. Maxmzng the utlty of the output tuples produced s especally mportant when certan tuples are more valuable than others. We employ ndexes to speed up the selectve processng of jons. Experments were conducted to evaluate the effectveness of our adaptve load sheddng approach. Our expermental results show that the three adaptatons can effectvely shed the load n the presence of any of the followng; bursty and unpredctable rates of the ncomng streams, large wndow szes, or costly jon condtons. 2. Related Work Based on the metrc beng optmzed, related work on load sheddng n wndowed stream jons can be dvded nto two categores. The work n the frst category ams at maxmzng the utlty of the output produced. Dfferent tuples may have dfferent mportance values based on the applcaton. For nstance, n the news jon example, certan type of news, e.g., securty news, may be of hgher value, and smlarly n the stock tradng example, phone calls from nsders may be of hgher nterest when compared to calls from regulars. In ths case, an output from the jon operator that contans hghly-

4 pdf pdf 4 B. Gedk et al tuple drop tme tuple drop tme tme n wndow w case I tme n wndow w case II Fg. 1. Examples of match probablty densty functons valued tuples s more preferable to a hgher rate output generated from lesservalued tuples. The work presented n Tatbul et al (23) uses user-specfed utlty specfcatons to drop tuples from the nput streams wth low utlty values. We refer to ths type of load sheddng as utlty based load sheddng, also referred to as semantc load sheddng n the lterature. The work n the second category ams at maxmzng the number of output tuples produced (Das et al, 23; Kang et al, 23; Srvastava and Wdom, 24). Ths can be acheved through rate reducton on the source streams,.e., droppng tuples from the nput streams, as suggested n Carney et al (22) and Kang et al (23). The work presented n Kang et al (23) nvestgates algorthms for evaluatng movng wndow jons over pars of unbounded streams. Although the man focus of Kang et al (23) s not on load sheddng, scenaros where system resources are nsuffcent to keep up wth the nput streams are also consdered. There are several other works related to load sheddng n DSMSs n general, ncludng memory allocaton among query operators (Babcock et al, 23) or nter-operator queues (Motwan et al, 23), load sheddng for aggregaton queres (Babcock et al, 24), and overload-senstve management of archved streams (Chandrasekaran and Frankln, 24). In summary, most of the exstng technques used for sheddng load are tuple droppng for CPU-lmted scenaros and memory allocaton among wndows for memory-lmted scenaros. However, droppng tuples from the nput streams wthout payng attenton to the selectvty of such tuples may result n a suboptmal soluton. Based on ths observaton, heurstcs that take nto account selectvty of the tuples are proposed n Das et al (23). A dfferent approach, called age-based load sheddng, s proposed recently n Srvastava and Wdom (24) for performng memory-lmted stream jons. Ths work s based on the observaton that there exsts a tme-based correlaton between the streams. Concretely, the probablty of havng a match between a tuple just receved from one stream and a tuple resdng n the wndow of the opposte stream, may change based on the dfference between the tmestamps of the tuples (assumng tmestamps are assgned based on the arrval tmes of the tuples at the query engne). Under ths observaton, memory s conserved by keepng a tuple n the wndow snce ts recepton untl the average rate of output tuples generated usng ths tuple reaches ts maxmum value. For nstance, n Fgure 1 case I, the tuples can be kept n the wndow untl they reach the vertcal lne marked. Ths effectvely cuts down the memory needed to store the tuples wthn the wndow and yet produces an output close to the actual output wthout wndow reducton. Obvously, knowng the dstrbuton of the ncomng streams has ts peak at

5 CPU Load Sheddng for Bnary Stream Jons 5 the begnnng of the wndow, the age-based wndow reducton can be effectve for sheddng memory load. A natural queston to ask s: Can the age-based wndow reducton approach of Srvastava and Wdom (24) be used to shed CPU load? Ths s a vald queston, because reducng the wndow sze also decreases the number of comparsons that have to be made n order to evaluate the jon. However, as llustrated n Fgure 1 case II, ths technque cannot drectly extend to the CPU-lmted case where the memory s not the constrant. When the dstrbuton does not have ts peak close to the begnnng of the wndow, the wndow reducton approach has to keep tuples untl they are close to the end of the wndow. As a result, tuples that are close to the begnnng of the wndow and thus are not contrbutng much to the output wll be processed untl the peak s reached close to the end of the wndow. Ths observaton ponts out two mportant facts. Frst, tme-based correlaton between the wndowed streams can play an mportant role n load sheddng. Second, the wndow reducton technque that s effectve for utlzng tme-based correlaton to shed memory load s not sutable for CPU load sheddng, especally when the dstrbuton of the streams s unknown or unpredctable. Wth the above analyss n mnd, we propose an adaptve load sheddng approach that s capable of performng selectve processng of tuples n the stream wndows by dynamc adaptaton to nput stream rates, tme-based correlatons between the streams, and proftablty of dfferent jon drectons. To the best of our knowledge, our load sheddng approach s the frst one that can handle arbtrary tme correlatons and at the same tme support maxmzaton of output utlty. Fnally, t s worth mentonng that our wndowed stream jon operator semantcs follow the nput-trggered approach, n whch the output of the jon s a tme ordered seres of tuples, where an output tuple s generated whenever an ncomng tuple matches wth an exstng tuple n the jon wndows. The same approach s used n most of the related work descrbed n ths secton (Golab and Ozsu, 23; Kang et al, 23; Hammad and Aref, 23; Das et al, 23; Srvastava and Wdom, 24). A dfferent semantcs for the wndowed stream jons, one based on the negatve tuples approach, s used n the Nle stream engne (Ghanem et al, 25). In Nle, at any tme the answer of the jon s taken as the answer of the snapshot jon whose nputs are the tuples n the current wndow for each nput stream. Ths s more smlar to the problem of ncremental materalzed vew mantenance. The extenson of our selectve processng approach to negatve tuples s an nterestng research drecton. However, t s beyond the scope of ths paper. 3. Overvew Unlke the conventonal load sheddng approach of droppng tuples from the nput streams, our adaptve load sheddng encourages stream tuples to be kept n the wndows. It sheds the CPU load by performng the stream jons on a dynamcally changng subset of tuples that are learned to be hghly benefcal, nstead of on the entre set of tuples stored wthn the wndows. Ths allows us to explot the characterstcs of stream applcatons that exhbt tme-based correlaton between the streams. Concretely, we assume that there exsts a nonflat dstrbuton of probablty of match between a newly-receved tuple and the

6 6 B. Gedk et al other tuples n the opposte wndow, dependng on the dfference between the tmestamps of the tuples. There are several reasons behnd ths assumpton. Frst, varable delays can exst between the streams as a result of dfferences between the communcaton overhead of recevng tuples from dfferent sources (Srvastava and Wdom, 24). Second and more mportantly, there may exst varable delays between related events from dfferent sources. For nstance, n the news jon example, dfferent news agences are expected to have dfferent reacton tmes due to dfferences n ther news collecton and publshng processes. In the stock tradng example, there wll be a tme delay between the phone call contanng the hnt and the acton of buyng the hnted stock. In the correlated attacks example, dfferent parts of the network may have been attacked at dfferent tmes. Note that, the effects of tme correlaton on the data stream jons are to some extent analogous to the effects of the tme of data creaton n data warehouses, whch are exploted by jon algorthms such as Dag-Jon (Helmer et al, 1998). 1 It s mportant to note that we do not assume pre-knowledge of the probablty match dstrbutons correspondng to the tme correlatons. Instead, they are learned by our jon algorthms. However, we do assume that these dstrbutons are usually not unform and thus can be exploted to maxmze the output rate of the jon. Although our load sheddng s based on the assumpton that the memory resource s suffcent, we want to pont out two mportant observatons. Frst, wth ncreasng nput stream rates and larger stream wndow szes, t s qute common that CPU becomes lmted before memory does. Second, even under lmted memory, our adaptve load sheddng approach can be used to effectvely shed the excessve CPU load after wndow reducton s performed for handlng the memory constrants Techncal Hghlghts Our load sheddng approach s best understood through ts two core mechansms, each answerng a fundamental queston on adaptve load sheddng wthout tuple droppng. The frst s called partal processng and t answers the queston of how much we can process gven a wndow of stream tuples. The factors to be consdered n answerng ths queston nclude the performance of the stream jon operaton under current system load and the current ncomng stream rates. In partcular, partal processng dynamcally adjusts the amount of load sheddng to be performed through rate adaptaton. The second s called selectve processng and t answers the queston of what should we process gven the constrant on the amount of processng, defned at the partal processng phase. The factors that nfluence the answer to ths queston nclude the characterstcs of stream wndow segments, the proftablty of jon drectons, and the utlty of dfferent stream tuples. Selectve processng extends partal processng to ntellgently select the tuples to be used durng jon 1 Dag-Jon s a jon algorthm desgned for effcently executng equ-jons over 1:N relatonshps n data warehouses, whch explots the clusters formed wthn the base relatons due to the tme of data creaton effects. These clusters are formed by tuples wth temporarly close append tmes sharng the same value as ther foregn key.

7 CPU Load Sheddng for Bnary Stream Jons 7 Notaton t T(t) S Meanng tuple tmestamp of the tuple t nput stream Notaton p,j o,j Meanng probablty of match for B,j expected output from comparng a tuple t wth a tuple n B,j W w λ wndow over S wndow sze of W n secs. rate of S n tuples per sec. s j u,z k, where o,k s the jth tem n the sorted lst {o,l l [1..n ]} expected utlty from comparng a tuple t of type z wth a tuple n W B,j basc wndow j n W Z tuple type doman b basc wndow sze n secs. Z(t) type of a tuple n number of basc wndows n W r fracton parameter V(z) utlty of a tuple of type z δ r fracton boost factor ω,z frequency of a type z, S tuple r fracton parameter for W T r T c f (.) rate adaptaton perod (secs.) tme corr. adapt. perod (secs.) match p.d.f. for W r,z γ fracton parameter for W for a tuple of type z samplng probablty Table 1. Notatons used throughout the paper processng under heavy system load, wth the goal of maxmzng the output rate or the output utlty of the stream jon. Before descrbng the detals of partal processng and selectve processng, we frst brefly revew the basc concepts nvolved n processng wndowed stream jons, and establsh the notatons that wll be used throughout the paper Basc Concepts and Notatons A two-way wndowed stream jon operaton takes two nput streams denoted as S 1 and S 2, performs the stream jon and generates the output. For notatonal convenence, we denote the opposte stream of stream ( = 1,2) as stream. The sldng wndow defned over stream S s denoted as W, and has sze w n terms of seconds. We denote a tuple as t and ts arrval tmestamp as T(t). Other notatons wll be ntroduced n the rest of the paper as needed. Table 1 summarzes the notatons used throughout the paper. A wndowed stream jon s performed by fetchng tuples from the nput streams and processng them aganst tuples n the opposte wndow. Fgure 2 llustrates the process of wndowed stream jons. For a newly fetched tuple t from stream S, the jon s performed n the followng steps. Frst, tuple t s nserted nto the begnnng of wndow W. Second, tuples at the end of wndow W are checked n order and removed f they have expred. A tuple t o expres from wndow W ff T T(t o ) > w, where T represents the current tme. The expraton check stops when an unexpred tuple s encountered. The tuples n wndow W are sorted n the order of ther arrval tmestamps by default and the wndow s managed as a doubly lnked lst for effcently

8 8 B. Gedk et al : tuple not shown S 1 output t 5 t 5 t 4 t 4 W 1 W 2 W 1 W 2 t 3 t 3 t 2 nsert search for matchs search for matchs t 1 nsert S 2 expred t 1 from S 2 matches wth t 4 n W 1 t 2 from S 1 matches wth t 3 and t 5 n W 2 t 2 t 5 t 2 t 3 t 4 t 1 t 2 t 1 S 1 S 2 Fg. 2. Stream Jon Example Jon Processng() for = 1 to 2 f no tuple n S contnue t fetch tuple from S Insert t n front of W repeat t o last tuple n W f T T(t o) > w Remove t o from W untl T T(t o) w foreach t a W Evaluate jon condton on t, t a Fg. 3. Jon Processng performng nserton and expraton operatons. In the thrd and last step, tuple t s processed aganst tuples n the wndow W, and matchng tuples are generated as output. Fgure 3 summarzes the jon processng steps. Although not depcted n the pseudo-code, n practce buffers can be placed n the nputs of the jon operator, whch s common practce n DSMS query networks and also useful for maskng small scale rate bursts n stand-alone jons. To handle jons defned on set or weghted set valued attrbutes, the followng addtonal detals are attached to the processng steps, assumng a tuple s a set of tems (possbly wth assgned weghts). Frst, the tems n tuple t are sorted as t s fetched from S. The tuples n W are expected to be sorted, snce they have gone through the same step when they were fetched from S. Then, for each tuple t a n W, t and t a are compared by performng a smple merge of ther sorted tems. Equalty, subset, superset, overlap and nner product jons all can be processed n a smlar manner. For ndexed jons, an nverted ndex s used to effcently perform the jon wthout gong through all the tuples n W. We dscuss the detals of ndexed jon n Secton Partal Processng - How Much Can We Process? The frst step n our approach to sheddng CPU load wthout droppng tuples s to determne how much we can process gven the wndows of stream tuples that partcpate n the jon. We call ths step the partal processng based load sheddng. For nstance, consder a scenaro n whch the lmtaton n processng power requres droppng half of the tuples,.e. decreasng the nput rate of the streams by half. A partal processng approach s to allow every tuple to enter nto the wndows, but to decrease the cost of jon processng by comparng a newly-fetched tuple wth only a fracton of the wndow defned on the opposte stream. Partal processng, by tself, does not sgnfcantly ncrease the number of output tuples produced by the jon operator, when compared to tuple droppng or wndow reducton approaches. However, as we wll descrbe later n the paper, t forms a bass to perform selectve processng, whch explots the tme-based correlaton between the streams, and makes t possble to accommodate utlty-

9 CPU Load Sheddng for Bnary Stream Jons 9 based load sheddng, n order to maxmze the output rate or the utlty of the output tuples produced. Two mportant factors are consdered n determnng the amount of partal processng: (1) the current ncomng stream rates, and (2) the performance of the stream jon operaton under current system load. Partal processng employs rate adaptaton to adjust the amount of processng performed dynamcally. The performance of the stream jon under the current system load s a crtcal factor and t s nfluenced by the concrete jon algorthm and optmzatons used for performng jon operatons. In the rest of ths secton, we frst descrbe rate adaptaton, then dscuss the detals of utlzng ndexes for effcent jon processng. Fnally we descrbe how to employ rate adaptaton n conjuncton wth ndexed jon processng Rate Adaptaton The partal processng-based load sheddng s performed by adaptng to the rates of the nput streams. Ths s done by observng the tuple consumpton rate of the jon operaton and comparng t to the nput rates of the streams to determne the fracton of the wndows to be processed. Ths adaptaton s performed perodcally, at every T r seconds. T r s called the adaptaton perod. We denote the fracton parameter as r, whch defnes the fracton of the wndows to be processed. In other words, the settng of r answers the queston of how much load we should shed. Algorthm 1: Rate Adaptaton RateAdapt() (1) Intally: r 1 (2) every T r seconds (3) α 1 # of tuples fetched from S 1 snce last rate adaptaton step (4) α 2 # of tuples fetched from S 2 snce last rate adaptaton step (5) λ 1 average rate of S 1 snce last rate adaptaton step (6) λ 2 average rate of S 2 snce last rate adaptaton step (7) β (α 1 + α 2 )/ ((λ 1 + λ 2 ) T r) (8) f β < 1 then r β r (9) else r mn(1, δ r r) Algorthm 1 gves a sketch of the rate adaptaton process. Intally, the fracton parameter r s set to 1. Every T r seconds, the average rates of the nput streams S 1 and S 2 are determned as λ 1 and λ 2. Smlarly, the number of tuples fetched from streams S 1 and S 2 snce the last adaptaton step are determned as α 1 and α 2. Tuples from the nput streams may not be fetched at the rate they arrve due to an napproprate ntal value of the parameter r or due to a change n the stream rates snce the last adaptaton step. As a result, β = α1+α2 (λ 1+λ 2) T r determnes the percentage of the nput tuples fetched by the jon algorthm. Based on the value of β, the fracton parameter r s readjusted at the end of each adaptaton step. If β s smaller than 1, r s multpled by β, wth the assumpton that comparng a tuple wth the other tuples n the opposte wndow has the domnatng cost n jon processng. Otherwse, the jon s able to process all the ncomng tuples wth the current value of r. In ths case, the r value s set to mn(1,δ r r), where δ r s called the fracton boost factor. Ths s amed at ncreasng the fracton of the wndows processed, optmstcally assumng that

10 1 B. Gedk et al addtonal processng power s avalable. If not, the parameter r wll be decreased durng the next adaptaton step. Hgher values of the fracton boost factor result n beng more aggressve at ncreasng the parameter r. The adaptaton perod T r should be small enough to adapt to the bursty nature of the streams, but large enough not to cause overhead and undermne the jon processng. Based on Nyqust s theorem (Brogan, 199) and control theory, the adaptaton perod can be taken as half the perod of the most frequent rate bursts n the streams. However, f ths value ncreases the percentage of CPU tme spent for performng the adaptaton step beyond a small threshold ǫ (.5%), then T r should be set to ts lower bound. Ths lower bound s the smallest value whch results n spendng ǫ percentage of the CPU tme for adaptaton Indexed Jon and Partal Processng Stream ndexng (Golab et al, 24; Wu et al, 26) can be used to cope wth the hgh processng cost of the jon operaton, reducng the amount of load sheddng performed. However, there are two mportant ponts to be resolved before ndexng can be employed together wth partal processng and thus wth other algorthms we ntroduce n the followng sectons. The frst ssue s that, n a streamng scenaro the ndex has to be mantaned dynamcally (through nsertons and removals) as the tuples enter and leave the wndow. Ths means that the assumpton made n Secton 4.1 about fndng matchng tuples wthn a wndow (ndex search cost) beng the domnant cost n the jon processng, no longer holds. Second, the ndex does not naturally allow processng only a certan porton of the wndow. We resolve these ssues n the context of nverted ndexes that are predomnantly used for jons based on set or weghted set-valued attrbutes. The same deas apply to hash-ndexes used for equ-jons on snglevalued attrbutes. Our nverted-ndex mplementaton reduces to a hash-ndex n the presence of sngle-valued attrbutes. Here, we frst gve a bref overvew of nverted ndexes and then descrbe the modfcatons requred to use them n conjuncton wth our load sheddng algorthms Inverted Indexes An nverted ndex conssts of a collecton of sorted dentfer lsts. In order to nsert a set nto the ndex, for each tem n the set, the unque dentfer of the set s nserted nto the dentfer lst assocated wth that partcular tem. Smlar to nserton, removal of a set from the ndex requres fndng the dentfer lsts assocated wth the tems n the set. The removal s performed by removng the dentfer of the set from these dentfer lsts. In our context, the nverted ndex s mantaned as an n-memory data structure. The collecton of dentfer lsts are managed n a hash table. The hash table s used to effcently fnd the dentfer lst assocated wth an tem. The dentfer lsts are nternally organzed as sorted (based on unque set dentfers) balanced bnary trees to facltate both fast nserton and removal. The set dentfers are n fact ponters to the tuples they represent. Query processng on an nverted ndex follows a mult-way mergng process, whch s usually accelerated through the use of a heap. Same type of processng s used for all types of queres we have mentoned so far. Specfcally, gven a query set, the dentfer lsts correspondng to tems n the query set are re-

11 CPU Load Sheddng for Bnary Stream Jons 11 hash table on tems α,.. exstng tuple tj = (T(tj), {α,β,γ}). tuple to be added t = (T(t), {α,β}), T(t) > T(tj) β, sorted based on [tmestamp, d] pars... [T(tj), d(tj)] nsert to front [T(t), d(t)]... [T(tj), d(tj)] nsert to front [T(t), d(t)] expraton expraton Fg. 4. Illustraton of an nverted ndex that uses (tmestamp, dentfer) pars for sortng the dentfer lsts, whch mproves tuple nserton and expraton performance, wthout negatvely effectng the search performance, snce tuples wth equal tmestamps are rare. treved usng the hash table. These sorted dentfer lsts are then merged. Ths s done by nsertng the fronters of the lsts nto a mn heap and teratvely removng the topmost set dentfer from the heap and replacng t wth the next set dentfer (new fronter) n ts lst. Durng ths process, the dentfer of an ndexed set, sharng k tems wth the query set, wll be pcked from the heap k consecutve tmes, makng t possble to process relatvely complex overlap and nner product 2 queres effcently (Mamouls et al, 23) Tme Ordered Identfer Lsts Although the usage of nverted ndexes speeds up the processng of jons based on set-valued attrbutes, t also ntroduces sgnfcant nserton and deleton costs. Ths problem can be allevated by explotng the tmestamps of the tuples that are beng ndexed and the fact that these tuples are receved n tmestamp order from the nput streams. In partcular, nstead of mantanng dentfer lsts as balanced trees sorted on dentfers, we can mantan them as lnked lsts sorted on tmestamps of the tuples (sets). Ths does not affect the mergng phase of the ndexed search, snce a tmestamp unquely dentfes a tuple n a stream unless dfferent tuples wth equal tmestamps are allowed. In order to handle the latter, the dentfer lsts can be sorted based on (tmestamp, dentfer) pars. Ths requres very small reorderng, as the event of recevng dfferent tuples wth equal tmestamps s expected to happen very nfrequently, f t happens at all. Fgure 4 provdes an llustraton of tmestamp ordered dentfer lsts n nverted ndexes. Usng tmestamp ordered dentfer lsts has the followng three advantages: 1. It allows nsertng a set dentfer nto an dentfer lst n constant tme, as opposed to logarthmc tme wth dentfer sorted lsts. 2. It facltates pggybackng of removal operatons on nserton and search operatons, by checkng for expred tuples at the end of dentfer lsts at nserton and search tme. Thus, the removal operaton s performed n amortzed constant tme as opposed to logarthmc tme wth dentfer sorted lsts. 3. Tmestamp sorted dentfer lsts make t possble to end the mergng process, used for search operatons, at a specfed tme wthn the wndow, thus enablng tme based partal processng. 2 For weghted sets, the weghts should also be stored wthn the dentfer lsts, n order to answer nner product queres.

12 12 B. Gedk et al 5. Selectve Processng - What Should We Process? Selectve processng extends partal processng to ntellgently select the tuples to be used durng jon processng under heavy system load. Gven the constrant on the amount of processng defned at the partal processng phase, selectve processng ams at maxmzng the output rate or the output utlty of the stream jons. Three mportant factors are used to determne what we should select for jon processng: (1) the characterstcs of stream wndow segments, (2) the proftablty of jon drectons, and (3) the utlty of dfferent stream tuples. We frst descrbe tme correlaton adaptaton and jon drecton adaptaton, whch form the core of our selectve processng approach. Then we dscuss utlty-based load sheddng. The man deas behnd tme correlaton adaptaton and jon drecton adaptaton are to prortze segments (basc wndows) of the wndows n order to process parts that wll yeld hgher output (tme correlaton adaptaton) and to start load sheddng from one of the wndows f one drecton of the jon s producng more output than the other (jon drecton adaptaton). Algorthm 2: Tme Correlaton Adaptaton TmeCorrelatonAdapt() (1) every T c seconds (2) for = 1 to 2 (3) sort n desc. order {ô,j j [1..n ]} nto array O (4) for j = 1 to n (5) o,j ô,j /(γ r b λ 2 λ 1 T c) (6) s j k, where O[j] = ô,k (7) for j = 1 to n (8) ô,j 5.1. Tme Correlaton Adaptaton For the purpose of tme correlaton adaptaton, we dvde the wndows of the jon nto basc wndows. Concretely, wndow W s dvded nto n basc wndows of sze b seconds each, where n = 1+ w /b. B,j denotes the jth basc wndow n W, j [1..n ]. Tuples do not move from one basc wndow to another. As a result, tuples leave the jon operator one basc wndow at a tme and the basc wndows slde dscretely b seconds at a tme. The newly fetched tuples are nserted nto the frst basc wndow. When the frst basc wndow s full, meanng that the newly fetched tuple has a tmestamp that s at least b seconds larger than the oldest tuple n the frst basc wndow, the last basc wndow s empted and all the basc wndows are shfted, last basc wndow becomng the frst. The newly fetched tuples can now flow nto the new frst basc wndow, whch s empty. The basc wndows are managed n a crcular buffer, so that the shft of wndows s a constant tme operaton. The basc wndows themselves can be organzed as ether lnked lsts (f no ndexng s used) or as nverted/hashed ndexes (f ndexng s used). Tme correlaton adaptaton s perodcally performed every T c seconds. T c s called the tme correlaton adaptaton perod. Durng the tme between two consecutve adaptaton steps, the jon operaton performs two types of processng. For a newly fetched tuple, t ether performs selectve processng or full processng. Selectve processng s carred out by lookng for matches wth tuples n

13 CPU Load Sheddng for Bnary Stream Jons 13 hgh prorty basc wndows of the opposte wndow, where the number of basc wndows used depends on the amount of load sheddng to be performed. Full processng s done by comparng the newly fetched tuple aganst all the tuples from the opposte wndow. The am of full processng s to collect statstcs about the usefulness of the basc wndows for the jon operaton. Ths cannot be done wth selectve processng, snce t ntroduces bas as t processes only certan segments of the jon wndows and cannot capture changng tme correlatons. Algorthm 3: Tuple Processng and Tme Correlaton ProcessTuple() (1) when processng tuple t aganst wndow W (2) f rand < r γ (3) process t aganst all tuples n B,j, j [1..n ] (4) foreach match n B,j, j [1..n ] (5) ô,j ô,j + 1 (6) else (7) a r W (8) for j = 1 to n (9) a a B,s j (1) f a > (11) process t aganst all tuples n B j,s (12) else (13) r e 1 + a/ B,s j (14) process t aganst r e fracton of tuples n B,s j (15) break The detals of the adaptaton step and full processng are gven n Algorthm 2 and n lnes 1-5 of Algorthm 3. Full processng s only done for a sampled subset of the stream, based on a parameter called samplng probablty, denoted as γ. A newly fetched tuple goes through selectve processng wth probablty 1 r γ. In other words, t goes through full processng wth probablty r γ. The fracton parameter r s used to scale the samplng probablty, so that the full processng does not consume all processng resources when the load on the system s hgh. The goal of full processng s to calculate for each basc wndow B,j, the expected number of output tuples produced from comparng a newly fetched tuple t wth a tuple n B,j, denoted as o,j. These values are used later durng the adaptaton step to prortze wndows. In partcular, o,j values are used to calculate s j values. Here, s j s the ndex of the basc wndow whch has the jth hghest o value among the basc wndows wthn W. Concretely: s j = k, where o,k s the jth tem n the sorted (desc.) lst {o,l l [1..n ]} Ths means that B,s 1 s the hghest prorty basc wndow n W, B,s 2 s the next, and so on. Lnes 7-14 n Algorthm 3 gve a sketch of selectve processng. Durng selectve processng, s j values are used to gude the load sheddng. Concretely, n order to process a newly fetched tuple t aganst wndow W, frst the number of tuples from wndow W, that are gong to be consdered for processng, s determned by calculatng r W, where W denotes the number of tuples n the wndow. The fracton parameter r s determned by rate adaptaton as descrbed n Secton 4.1. Then, tuple t s processed aganst basc wndows, startng

14 14 B. Gedk et al from the hghest prorty one,.e. B,s 1, gong n decreasng order of prorty. A basc wndow B,s j s searched for matches completely, f addng B,s j number of tuples to the number of tuples used so far from wndow W to process tuple t does not exceeds r W. Otherwse an approprate fracton of the basc wndow s used and the processng s completed for tuple t Impact of Basc Wndow Sze The settng of basc wndow sze parameter b nvolves trade-offs. Smaller values are better to capture the peak of the match probablty dstrbuton, whle they also ntroduce overhead n processng. For nstance, recallng Secton 4.2.1, n an ndexed jon operaton, the dentfer lsts have to be looked up for each basc wndow. Although the lsts themselves are shorter and the total mergng cost does not ncrease wth smaller basc wndows, the cost of lookng up the dentfer lsts from the hash tables ncreases wth ncreasng number of basc wndows, n. Here we analyze how well the match probablty dstrbuton, whch s dependent on the tme correlaton between the streams, s utlzed for a gven value of the basc wndow sze parameter b, under a gven load condton. We use r to denote the fracton of tuples n jon wndows that can be used for processng. Thus, r s used to model the current load of the system. We assume that r can go over 1, n whch case processng resources are abundant. We use f (.) to denote the match probablty dstrbuton functon for wndow W, where T 2 T 1 f (y)dy gves the probablty that a newly fetched tuple wll match wth a tuple t n W that has a tmestamp T(t) [T T 1,T T 2 ]. Note that, due to dscrete movement of basc wndows, a basc wndow covers a tme varyng area under the match probablty dstrbuton functon. Ths area, denoted as p,j for basc wndow B,j, can be calculated by observng that the basc wndow B,j covers the area over the nterval [max(,x b + (j 2) b),mn(w,x b + (j 1) b)] on the tme axs ([,w ]), when only x [,1] fracton of the frst basc wndow s full. Then, we have: p,j = 1 x= mn(w,x b+(j 1) b) t=max(,x b+(j 2) b) f (y) dy dx For the followng dscusson, we overload the notaton s j, such that sj = k, where p,k s the jth tem n the sorted lst {p,l l [1..n ]}. The number of basc wndows whose tuples are all consdered for processng s denoted as c e. The fracton of tuples n the last basc wndow used, that are consdered for processng, s denoted as c p. c p s zero f the last used basc wndow s completely processed. We have: c e = mn(n, r w /b ), c p = { r w c e b b c e < n otherwse Then the area under f that represents the porton of wndow W processed, denoted as p u, can be calculated as: p u c p p s ce+1 + c e j=1 p,s j Let us defne g(f,a) as the maxmum area under the functon f wth a total

15 CPU Load Sheddng for Bnary Stream Jons 15 extent of a on the tme axs. Then we can calculate the optmalty of p u, denoted as φ, as follows: p u φ = g(f,w mn(1,r )) When φ = 1, the jon processng s optmal wth respect to output rate (gnorng the overhead of small basc wndows). Otherwse, the expected output rate s φ tmes the optmal value, under current load condtons (r ) and basc wndow sze settng (b). Fgure 5 plots φ (on z-axs) as a functon of b/w (on x-axs) and r (on y-axs) for two dfferent match probablty dstrbutons, the bottom one beng more skewed. We make the followng three observatons from the fgure: Decreasng avalablty of computatonal resources negatvely nfluences the optmalty of the jon for a fxed basc wndow sze. The ncreasng skewness n the match probablty dstrbuton decreases the optmalty of the jon for a fxed basc wndow sze. Smaller basc wndows szes provde better jon optmalty, when the avalable computatonal resources are low or the match probablty dstrbuton s skewed. The ntuton behnd these observatons s as follows. When only a small fracton of the jon wndows can be processed, we have to pck a small number of basc wndows that are around the peak of the match probablty dstrbuton. When the granularty of the basc wndows s not suffcently fne graned, the lmted resources are spent processng less benefcal segments that appear wthn the basc wndows. Ths effect s more pronounced when the match probablty dstrbuton s more skewed. As a result, small basc wndow szes are favorable for skewed probablty match dstrbutons and heavy load condtons. We report our expermental study on the effect of overhead, stemmng from managng large number of basc wndows, on the output rate of the jon n Secton Jon Drecton Adaptaton Due to tme-based correlaton between the streams, a newly fetched tuple from stream S 1 may match wth a tuple from stream S 2 that has already made ts way nto the mddle portons of wndow W 2. Ths means that, most of the tme, a newly fetched tuple from stream S 2 has to stay wthn the wndow W 2 for some tme, before t can be matched wth a tuple from stream S 1. Ths mples that one drecton of the jon processng may be of lesser value, n terms of the number of output tuples produced, than the other drecton. For nstance, processng a newly fetched tuple t from stream S 2 aganst wndow W 1 wll produce fewer output tuples when compared to the other way around, as the tuples to match t has not yet arrved at wndow W 1. In ths case, symmetry of the jon operaton can be broken durng load sheddng, n order to acheve a hgher output rate. Ths can be acheved by decreasng the fracton of tuples processed from wndow W 2 frst, and from W 1 later (f needed). We call ths jon drecton adaptaton. Jon drecton adaptaton s performed mmedately after rate adaptaton. Specfcally, two dfferent fracton parameters are defned, denoted as r for wndow W, {1,2}. Durng jon processng, r fracton of the tuples n wndow W are consdered, makng t possble to adjust jon drecton by changng r 1

16 16 B. Gedk et al 4 φ b/w r 1.5 densty of match probablty (T T(t)) / w 1 φ b/w.5 1 r 1.5 densty of match probablty (T T(t)) / w Fg. 5. Optmalty of the jon for dfferent loads and basc wndow szes under two dfferent match probablty dstrbutons and r 2. Ths requres replacng r wth r n lne 7 of Algorthm 3 and lne 5 of Algorthm 2. The constrant n settng of r values s that the number of tuple comparsons performed per tme unt should stay the same when compared to the case where there s a sngle r value as computed by Algorthm 1. The number of tuple comparsons performed per tme unt s gven by 2 =1 (r λ (λ w )), snce the number of tuples n wndow W s λ w. Thus, we should have 2 =1 (r λ (λ w )) = 2 =1 (r λ (λ w )),.e.: r (w 1 + w 2 ) = r 1 w 1 + r 2 w 2 The valuable drecton of the jon can be determned by comparng the expected number of output tuples produced from comparng a newly fetched tuple wth a tuple n W, denoted as o, for = 1 and 2. Ths can be computed as o = 1 n n j=1 o,j. Assumng o 1 > o 2, wthout loss of generalty, we can set r 1 = mn(1,r w1+w2 w 1 ). Ths maxmzes r 1, whle respectng the above constrant. The generc procedure to set r 1 and r 2 s gven n Algorthm 4. Jon drecton adaptaton, as t s descrbed n ths secton, assumes that any porton of one of the wndows s more valuable than all portons of the other wndow. Ths may not be the case for applcatons where both match probablty dstrbuton functons, f 1 (t) and f 2 (t), are non-flat. For nstance, n a traffc applcaton scenaro, a two way traffc flow between two ponts mples both drectons of the jon are valuable. We ntroduce a more advanced jon drecton adaptaton algorthm, that can handle such cases, n the next subsecton as part of utlty-based load sheddng.

17 CPU Load Sheddng for Bnary Stream Jons 17 Algorthm 4: Jon Drecton Adaptaton JonDrectonAdapt() (1) Intally: r 1 1, r 2 1 (2) upon completon of RateAdapt() call (3) o 1 1 P n 1 n 1 j=1 o 1,j (4) o 2 1 P n 2 n 2 j=1 o 2,j (5) f o 1 o 2 then r 1 mn(1, r w 1+w 2 ) w 1 (6) else r 1 max(, r w 1+w 2 w 2 ) w 1 w 1 (7) r 2 r w 1+w 2 r w 1 w 1 1 w Utlty-based Load Sheddng So far, we have targeted our load sheddng algorthms toward maxmzng the number of tuples produced by the jon operaton, a commonly used metrc n the lterature (Das et al, 23; Srvastava and Wdom, 24). Utlty-based load sheddng, also called semantc load sheddng (Tatbul et al, 23), s another metrc employed for gudng load sheddng. It has the beneft of beng able to dstngush hgh utlty output from output contanng a large number of tuples. In the context of jon operatons, utlty-based load sheddng promotes output that results from matchng tuples of hgher mportance/utlty. In ths secton, we descrbe how utlty-based load sheddng s ntegrated nto the mechansm descrbed so far. We assume that each tuple has an assocated mportance level, defned by the type of the tuple, and specfed by the utlty value attached to that type. Ths type and utlty value assgnment s applcaton specfc and s external to our jon algorthm. For nstance, n the news jon example, certan type of news, e.g., securty news, may be of hgher value, and smlarly n the stock tradng example, phone calls from nsders may be of hgher nterest when compared to calls from regulars. It s the responsblty of the doman expert to specfy what types of tuples are consdered mportant. In some cases, data mnng operators can also be used to mark tuples wth ther mportance types. As long as each tuple has an assgned utlty value, our utlty-based load sheddng algorthm can maxmze output utlty whle workng n tandem wth the selectve processng based load sheddng algorthm that has rate, tme correlaton, and jon drecton adaptaton. We denote the tuple type doman as Z, type of a tuple t as Z(t), and utlty of a tuple t, where Z(t) = z Z, as V(z). Type domans and ther assocated utlty values can be set based on applcaton needs. In the rest of the paper, the utlty value of an output tuple of the the jon operaton that s obtaned by matchng tuples t a and t b, s assumed to contrbute a utlty value of max(v(z(t a )), V(Z(t b ))) to the output. Our approach can also accommodate other functons, lke average (.5 (V(Z(t a )) + V(Z(t b )))). We denote the frequency of appearance of a tuple of type z n stream S as ω,z, where z Z ω,z = 1. The man dea behnd utlty-based load sheddng s to use a dfferent fracton parameter for each dfferent type of tuple fetched from a dfferent stream, denoted as r,z, where z Z and {1,2}. The motvaton behnd ths s to do less load sheddng for tuples that provde hgher output utlty. The extra work done for such tuples s compensated by dong more load sheddng for tuples that provde lower output utlty. The expected output utlty obtaned from comparng a

18 18 B. Gedk et al tuple t of type z wth a tuple n wndow W s denoted as u,z, and s used to determne r,z s. In order to formalze ths problem, we extend some of the notaton from Secton The number of basc wndows from W whose tuples are all consdered for processng aganst a tuple of type z, s denoted as c e (,z). The fracton of tuples n the last basc wndow used from W, that are consdered for processng, s denoted as c p (,z). c p (,z) s zero f the last used basc wndow s completely processed. Thus, we have: c e (,z) = n r,z, c p (,z) = n r,z c e (,z) Then, the area under f that represents the porton of wndow W processed for a tuple of type z, denoted as p u (,z), can be calculated as follows: p u (,z) c p (,z) p,s ce(,z)+1 c e(,z) + p,s j j=1 Wth these defntons, the maxmzaton of the output utlty can be defned formally as max ( 2 λ (λ w ) z Z =1 subject to the processng constrant: r (w 2 + w 1 ) = ( 2 w z Z =1 ( ) ) ω,z u,z p u (,z) ( ω,z r,z ) ) The r value used here s computed by Algorthm 1, as part of rate adaptaton. Although the formulaton s complex, ths s ndeed a fractonal knapsack problem and has a greedy optmal soluton. Assumng that some bufferng s performed outsde the jon, ths problem can be reformulated as follows. Consder I,j,z as an tem that represents processng of a tuple of type z aganst basc wndow B,j. Item I,j,z has a volume of λ 1 λ 2 ω,z b unts (whch s the number of comparsons made per tme unt to process ncomng tuples of type z aganst tuples n B,j ) and a value of λ 1 λ 2 ω,z b u,z p,s j n unts (whch s the utlty ganed per tme unt, from comparng ncomng tuples of type z wth tuples n B,j ). The am s to pck the maxmum number of tems, where fractonal tems are acceptable, so that the total value s maxmzed and the total volume of the pcked tems s at most λ 1 λ 2 r (w 2 +w 1 ). r,j,z [,1] s used to denote how much of tem I,j,z s pcked. Note that the number of unknown varables here (r,j,z s) s (n 1 +n 2 ) Z, and the soluton of the orgnal problem can be calculated from these varables as, r,z = n j=1 r,j,z.

19 CPU Load Sheddng for Bnary Stream Jons 19 Algorthm 5: Jon Drecton Adapt, Utlty-based Sheddng VJonDrectonAdapt() (1) upon completon of RateAdapt() call (2) heap: H (3) for = 1 to 2 (4) foreach z Z (5) r,z (6) v,s 1,z u,z n o,s 1/ P n k=1 o,k (7) Intalze H wth {v,s 1,z [1..2], z Z} (8) a λ 1 λ 2 r (w 1 + w 2 ) (9) whle H s not empty (1) use, j, z s.t. v,j,z = topmost tem n H (11) pop the frst tem from H (12) a a ω,z λ 1 λ 2 b (13) f a > (14) r,z r,z + 1 n (15) else (16) r e 1 + a/(λ 1 λ 2 ω,z b) (17) r,z r,z + r e/n (18) return (19) f j < n (2) v,s j+1 (21) nsert v,s j+1,z u,z n o,s j+1,z nto H / P n k=1 o,k The values of the fracton varables are determned durng jon drecton adaptaton. A smple way to do ths, s to sort the tems based on ther value over volume ratos, v,j,z = u,z p,s j n (note that o,j / n k=1 o,k can be used as an estmate of p,s j), and to pck as much as possble of the tem that s most valuable per unt volume. However, snce the number of tems s large, the sort step s costly, especally for large number of basc wndows and large szed domans. A more effcent soluton, wth worst case complexty O( Z +(n 1 +n 2 ) log Z ), s descrbed n Algorthm 5, whch replaces Algorthm 4. Algorthm 5 makes use of the s j values that defne an order between value over volume ratos of tems for a fxed type z and wndow W. The algorthm keeps the tems representng dfferent streams and types wth the hghest value over volume ratos (2 Z of them), n a heap. It teratvely pcks an tem from the heap and replaces t wth the tem havng the next hghest value over volume rato wth the same stream and type subscrpt ndex. Ths process contnues untl the capacty constrant s reached. Durng ths process r,z values are calculated progressvely. If the tem pcked represents wndow W and type z, then r,z s ncremented by 1/n unless the tem s pcked fractonally, n whch case the ncrement on r,z s adjusted accordngly. 6. Experments The adaptve load sheddng algorthms presented n ths paper have been mplemented and successfully demonstrated as part of a large-scale stream processng prototype at IBM Watson Research. We report four sets of expermental results to demonstrate the effectveness of the algorthms ntroduced n ths paper. The frst set of experments llustrates the need for sheddng CPU load for both

20 match probablty 2 B. Gedk et al tme n wndow (x w/1) tme n wndow (x w/1) Fg. 6. Probablty match dstrbutons, κ =.6 and κ =.8 ndexed (usng nverted ndexes) and non-ndexed jons. The second set demonstrates the performance of the partal processng-based load sheddng step keepng tuples wthn wndows and sheddng excessve load by partally processng the jon through rate adaptaton. The thrd set shows the performance gan n terms of output rate for selectve processng, whch ncorporates tme correlaton adaptaton and jon drecton adaptaton. The effect of basc wndow sze on the performance s also nvestgated expermentally. The fnal set of experments presents results on the utlty-based load sheddng mechansms ntroduced and ther ablty to maxmze the output utlty under dfferent workloads. Note that the overhead cost assocated wth dynamc adaptaton has been fully taken nto account and t manfests tself n the output rate of the jon operatons. Hence, we do not separately show the overhead cost Expermental Setup The jon operaton s mplemented as a Java package, named ssjon.*, and s customzable wth respect to supported features, such as rate adaptaton, tme correlaton adaptaton, jon drecton adaptaton, and utlty-based load sheddng, as well as varous parameters assocated wth these features. Streams used n the experments reported n ths secton are tmestamp ordered tuples, where each tuple ncludes a sngle attrbute, that can ether be a set, weghted set, or a sngle value. The sets are composed of varable number of tems, where each tem s an nteger n the range [1..L]. L s taken as 1 n the experments. Number of tems contaned n sets follow a normal dstrbuton wth mean µ and standard devaton σ. In the experments, µ s taken as 5 and σ s taken as 1. The popularty of tems n terms of how frequently they occur n a set, follows a Zpf dstrbuton wth parameter κ. For equ-jons on sngle-valued attrbutes, L s taken as 5 wth µ = 1 and σ =. The tme-based correlaton between streams s modeled usng two parameters, tme shft parameter denoted as τ and cycle perod parameter denoted as ς. Cycle perod s used to change the popularty ranks of tems as a functon of tme. Intally at tme, the most popular tem s 1, the next 2, and so on. Later at tme T, the most popular tem s a = 1+ L T mod ς ς, the next a+1, and so on. Tme shft s used to ntroduce a delay between matchng tems from dfferent streams. Applyng a tme shft of τ to one of the streams means that the most (T τ) mod ς popular tem s a = 1 + L ς at tme T, for that stream. Fgure 6 shows the resultng probablty of match dstrbuton f 1, when a tme delay of τ = 5 8 ς s appled to stream S 2 and ς = 2 w, where w 1 = w 2 = w. The two hstograms represent two dfferent scenaros, n whch κ s

21 CPU Load Sheddng for Bnary Stream Jons 21 nput drop rate (per second) rate = 5 (per sec) rate = 1 (per sec) rate = 2 (per sec) rate = 4 (per sec) non-ndexed jon nput drop rate (per second) rate = 5 (per sec) rate = 1 (per sec) rate = 2 (per sec) rate = 4 (per sec) ndexed jon 5sec 25sec 2mn 5sec 1mn 25sec wndow sze 5sec 25sec 2mn 5sec 1mn 25sec wndow sze Fg. 7. Tuple drop rates for non-ndexed and ndexed jon operatons, as a functon of wndow sze, wth varyng stream rates taken as.6 and.8, respectvely. These settngs for τ and ς parameters are also used n the rest of the experments, unless otherwse stated. We change the value of parameter κ to model varyng amounts of skewness n match probablty dstrbutons. Experments are performed usng tme varyng stream rates and varous wndow szes. The default settngs of some of the system parameters are as follows: T r = 5 seconds, T c = 5 seconds, δ r = 1.2, γ =.1. We place nput buffers of sze 1 seconds n front of the nputs of the jon operaton. We report results from overlap and equalty jons. Other types of jons show smlar results. The experments are performed on an IBM PC wth 512MB man memory and 2.4Ghz Intel Pentum4 processor, usng Sun JDK For comparsons, we also mplemented a random drop scheme. It performs load sheddng by randomly droppng tuples from the nput buffers and performng the jon fully wth the avalable tuples n the jon wndows. It s mplemented separately from our selectve jon framework and does not nclude any overhead due to adaptatons Processng Power Lmtaton We frst valdate that the processng power happens to be the lmtng resource, for both ndexed and non-ndexed jon operatons. Graphs n Fgure 7 plot nput tuple drop rates (sum of the drop rates of the two streams) for non-ndexed (left) and ndexed jons (rght), as a functon of wndow sze for dfferent stream rates. The jon operaton performed s an overlap jon wth a threshold value of 3. It s observed from the fgure that, for a non-ndexed jon, even a low stream rate of 5 tuples per second results n droppng approxmately half of the tuples, when the wndow sze s set to 125 seconds. Ths corresponds to a rather small wndow sze of around 15 KBytes, when compared to the total memory avalable. Although ndexed jon mproves performance by decreasng tuple drop rates, t s only effectve for moderate wndow szes and low stream rates. A stream rate of 1 tuples per second results n droppng approxmately one quarter of the tuples for a 625 seconds wndow (approx. 1 mnutes). Ths corresponds to a wndow sze of around 1.5 MBytes. As a result, CPU load sheddng s a must for costly stream jons, even when ndexes are employed.

22 22 B. Gedk et al 6.3. Rate Adaptaton We study the mpact of rate adaptaton on output rate of the jon operaton. For the purpose of the experments n ths subsecton, tme shft parameter s set to zero,.e. τ =, so that there s no tme shft between the streams and the match probablty decreases gong from the begnnng of the wndows to the end. A non-ndexed overlap jon, wth threshold value of 3 and 2 seconds wndow on one of the streams, s used. Fgure 8 shows the stream rates used (on the left y-axs) as a functon of tme. The rate of the streams stay at 1 tuples per second for around 6 seconds, then jump to 5 tuples per seconds for around 15 seconds and drop to 3 tuples per second for around 3 seconds before gong back to ts ntal value. Fgure 8 also shows (on the rght y-axs) how fracton parameter r adapts to the changng stream rates. The graphs n Fgure 9 show the resultng stream output rates as a functon of tme wth and wthout rate adaptaton, respectvely. No rate adaptaton case represents random tuple droppng. It s observed that rate adaptaton mproves output rate when the stream rates ncrease. That s the tme when tuple droppng starts for the non-adaptve case. The mprovement s around 1% when stream rates are 5 tuples per second and around 5% when 3 tuples per second. The ablty of rate adaptaton to keep output rate hgh s manly due to the tme algned nature of the streams. In ths scenaro, only the tuples that are closer to the begnnng of the wndow are useful for generatng matches and the partal processng uses the begnnng part of the wndow, as dctated by the fracton parameter r. The graphs n Fgure 1 plot the average output rates of the jon over the perod shown n Fgure 9 as a functon of skewness parameter κ, for dfferent wndow szes. It shows that the mprovement n output rate, provded by rate adaptaton, ncreases not only wth ncreasng skewness of the match probablty dstrbuton, but also wth ncreasng wndow szes. Ths s because larger wndows mply that more load sheddng has to be performed Selectve Processng Here, we study the mpact of tme correlaton adaptaton and jon drecton adaptaton on output rate of the jon operaton. For the purpose of the experments n ths subsecton, tme shft parameter s taken as τ = 5 8 ς. A non-ndexed overlap jon, wth threshold value of 3 and 2 seconds wndows on both of the streams, s used. Basc wndow szes on both wndows are set to 1 second for tme correlaton adaptaton. Fgure 11 shows the stream rates used (on the left y-axs) as a functon of tme. Fgure 11 also shows (on the rght y-axs) how fracton parameters r 1 and r 2 adapt to the changng stream rates wth jon drecton adaptaton. Note that the reducton n fracton parameter values start wth the one (r 2 n ths case) correspondng to the wndow that s less useful n terms of generatng output tuples when processed aganst a newly fetched tuple from the other stream. The graphs n Fgure 12 show the resultng stream output rates as a functon of tme wth three dfferent jon settngs. It s observed that, when the stream rates ncrease, the tme correlaton adaptaton combned wth rate adaptaton provdes mprovement on output rate (around 5%), when compared to rate

23 CPU Load Sheddng for Bnary Stream Jons 23 nput rate (per second) rate fracton tme (seconds) fracton output rate (tuples per second) rate adaptve random drop tme (seconds) Fg. 8. Stream rates and fracton parameter r Fg. 9. Improvement n output rate wth rate adaptaton output rate (tuples per second) random drop, w (,1) random drop, w (1,1) rate adaptve, w (,1) rate adaptve, w (1,1) κ (skew n match pdf) nput rate (per seconds) rate fracton1 fracton tme (seconds) fracton Fg. 1. Improvement n average output rate wth rate adaptaton Fg. 11. Stream rates and fracton parameters r 1 and r 2 adaptaton only case. Moreover, applyng jon drecton adaptaton on top of tme correlaton adaptaton provdes addtonal mprovement n output rate (around 4%). The graphs n Fgure 13 plot the average output rates of the jon as a functon of skewness parameter κ, for dfferent jon settngs. Ths tme, the overlap threshold s set to 4, whch results n lower number of matchng tuples. It s observed that the mprovement n output rates, provded by tme correlaton and jon drecton adaptaton, ncrease wth ncreasng skewness n match probablty dstrbuton. The ncreasng skewness does not mprove the performance of rate adaptve-only case, due to ts lack of tme correlaton adaptaton whch n turn makes t unable to locate the productve porton of the wndow for processng, especally when the tme lag τ s large and the fracton parameter r s small Indexed Jons The dea of selectve processng apples equally well to ndexed jons, as t apples to non-ndex jons, as long as the cost of processng a tuple aganst the wndow defned on the opposte stream monotoncally ncreases wth the ncreasng fracton of the wndow processed. Ths assumpton s vald for most type of jons, such as the nverted-ndex supported set jons dscussed before. One excepton to ths s the hash-table supported equ-jons, snce probng the opposte wndow

24 24 B. Gedk et al output rate (tuples per second) rate adaptve rate and tme correlaton adaptve rate, tme correlaton, and jon drecton adaptve tme (seconds) output rate (tuples per seconds) rate adaptve rate and tme correlaton adaptve rate, tme correlaton, and jon drecton adaptve κ (skew n match pdf) Fg. 12. Improvement n output rate wth tme correlaton and jon drecton adaptaton Fg. 13. Improvement n average output rate wth tme correlaton and jon drecton adaptaton output rate (tuples per second) non-ndexed jon basc wndow sze (seconds) κ=.6 κ=.8 none output rate (tuples per second) ndexed jon basc wndow sze (secs) κ=.6,d κ=.8,d κ=.6,tme κ=.8,tme 1 none Fg. 14. Impact of basc wndow sze for ndexed and non-ndexed jon, for varous basc wndow szes takes constant tme for equ-jons wth hash tables. As a result, no gan s to be expected from usng selectve processng for equ-jons. The graphs n Fgure 15 plot the mprovement provded by selectve processng, wth all adaptatons appled, over random droppng n terms of the output rate of the jon as a functon of stream rates, for equ-jons and overlap jons wth and wthout ndexng support. For the non-ndexed scenaros wndows of sze 2 seconds are used, whereas for ndexed scenaros wndows of sze 2 seconds are used to ncrease the cost of the jon to necesstate load sheddng (recall Fgure 7). We make three observatons from Fgure 15. Frst, for non-ndexed jons (equjon or overlap jon), selectve processng provdes up to 5% hgher output rate, the mprovement beng hgher for overlap jons, as they are more costly to evaluate. Second, for overlap jons wth nverted ndexes the mprovement provded by selectve processng s not observed untl the stream rates get hgh enough, 3 tuples/sec n ths example. Ths s ntutve, snce wth ndexes the jon s evaluated much faster and as a result the load sheddng starts when the stream rates are hgh enough to saturate the processng resources. From the fgure, we observe up to around 1% mprovement over random droppng for the case of ndexed overlap jon. The relatve mprovement n output rate s lkely to ncrease for rates hgher than 5 tuples/sec, as the trend of the lne for ndexed overlap jon suggests n Fgure 15. Thrd, we see no mprovement n output rate when ndexed equ-jon s used. As the zoomed-n porton of the fgure shows,

25 CPU Load Sheddng for Bnary Stream Jons 25 mprovrment n output rate (%) nput rate (tuples per second) equ-jon, ndexed, w (2,2) equ-jon, NLJ, w (2,2) overlap-jon, NLJ, w (2,2) overlap-jon, ndexed, w (2,2) Fg. 15. Improvement n output rate compared to random droppng, for ndexed and non-ndexed jons, for overlap and equalty jon condtons there s no deteroraton n the output rate ether. As a result, selectve samplng s performance-wse ndfferent n the case of equ-jons, compared to random droppng Basc Wndow Sze We study the mpact of basc wndow sze on output rate of the jon operaton. The graphs n Fgure 14 plot average jon output rate as a functon of basc wndow sze, for dfferent κ values. The graphs on the left represents a non-ndexed overlap jon, wth threshold value of 3 and 2 seconds wndows, respectvely, on both of the streams. The graphs on the rght represents an ndexed overlap jon, wth threshold value of 3 and 2 seconds wndows, respectvely, on both of the streams. For the ndexed case, both dentfer sorted and tme sorted nverted ndexes are used. The none value on the x-axs of the graphs represent the case where basc wndows are not used (note that ths s not same as usng a basc wndow equal n sze to jon wndow). For both experments, a stream rate of 5 tuples/sec s used. As expected, small basc wndows provde hgher jon output rates. However, there are two nterestng observatons for the ndexed jon case. Frst, for very small basc wndow szes, we observe a drop n the output rate. Ths s due to the overhead of processng large number of basc wndows wth ndexed jon. In partcular, the cost of lookng up dentfer lsts for each basc wndow that s used for jon processng, creates an overhead. Further decreasng basc wndow sze does not help n better capturng the peak of the match probablty dstrbuton. Second, dentfer sorted nverted ndexes show sgnfcantly lower output rate, especally when the basc wndow szes are large. Ths s because dentfer sorted nverted ndexes do not allow partal processng based on tme Utlty-based Load Sheddng We study the effectveness of utlty-based load sheddng n mprovng the output utlty of the jon operaton. We consder three dfferent scenaros n terms of settng type frequences; () unform, () nversely proportonal to utlty values, and () drectly proportonal to utlty values. For the experments n ths subsecton, we use a non-ndexed overlap jon, wth threshold value of 3 and 2

26 26 B. Gedk et al x mprovement n output utlty frequency: unform frequency: nverse frequency: drect skew n utlty values x mprovement n output utlty value dfference factor = 1 value dfference factor = 5 value dfference factor = number of types Fg. 16. Improvement n output utlty for dfferent type frequency models Fg. 17. Improvement n output utlty for dfferent type doman szes seconds wndows on both of the streams. 5 tuples per second s used as the stream rate. The graphs n Fgure 16 plot the mprovements n the output utlty of the jon, compared to the case where no utlty-based load sheddng s used, as a functon of skewness n utlty values. Both jons are rate, tme correlaton, and jon drecton adaptve. In ths experment, there are three dfferent tuple types,.e. Z = 3. For a skewness value of k, the utlty values of the types are {1,1/2 k,1/3 k }. It s observed from the fgure that, the mprovement n the output utlty ncreases wth ncreasng skewness for unform and nversely proportonal type frequences, where t stays almost stable for drectly proportonal type frequences. Moreover, the best mprovement s provded when tem frequences are nversely proportonal to utlty values. Note that ths s the most natural case, as n most applcatons, rare tems are of hgher nterest. The graph n Fgure 17 studes the effect of doman sze on the mprovement n the output utlty. It plots the mprovement n the output utlty as a functon of type doman sze, for dfferent value dfference factors. A value dfference factor of x means the hghest utlty value s x tmes the lowest utlty value and the utlty values follow a Zpf dstrbuton (wth parameter log Z x). The type frequences are selected as nversely proportonal to utlty values. It s observed from the fgure that, there s an ntal decrease n the output utlty mprovement wth ncreasng type doman sze. But the mprovement values stablze quckly as the type doman sze gets larger. Same observaton holds for dfferent amounts of skewness n utlty values. 7. Concluson We have presented an adaptve CPU load sheddng approach for stream jon operatons. In partcular, we showed how rate adaptaton, combned wth tme-based correlaton adaptaton and jon drecton adaptaton, can ncrease the number of output tuples produced by a jon operaton. Our load sheddng algorthms employed a selectve processng approach, as opposed to commonly used tuple droppng. Ths enabled our algorthms to ncely ntegrate utlty-based load sheddng wth tme correlaton-based load sheddng n order to mprove output utlty of the jon for the applcatons where some tuples are evdently more valued than others. Our expermental results showed that (a) our adaptve load sheddng al-

27 CPU Load Sheddng for Bnary Stream Jons 27 gorthms are very effectve under varyng nput stream rates, varyng CPU load condtons, and varyng tme correlatons between the streams; and (b) our approach sgnfcantly outperforms the approach that randomly drops tuples from the nput streams. References Arasu A, Babcock B, Babu S, Datar M, Ito K, Motwan R, Nshzawa I, Srvastava U, Thomas D, Varma R, Wdom J (23) STREAM: The Stanford stream data manager. IEEE Data Engneerng Bulletn, vol. 26, March 23. Babcock B, Babu S, Datar M, Motwan R, Wdom J (22) Models and ssues n data stream systems. In Proceedngs of the ACM Symposum on Prncples of Database Systems, 22. Babcock B, Babu S, Motwan R, Datar M (23) Chan: operator schedulng for memory mnmzaton n data stream systems. In Proceedngs of the ACM Internatonal Conference on Management of Data, 23. Babcock B, Datar M, Motwan R (24) Load sheddng for aggregaton queres over data streams. In Proceedngs of the IEEE Internatonal Conference on Data Engneerng, 24. Balakrshnan H, Balaznska M, Carney D, Cetntemel U, Chernack M, Convey C, Galvez E, Salz J, Stonebraker M, Tatbul N, Tbbetts R, Zdonk S (24) Retrospectve on Aurora. VLDB Journal Specal Issue on Data Stream Processng, 24. Brogan WL (199) Modern Control Theory (3rd Edton). Prentce Hall, October 1, 199. Carney D, Cetntemel U, Chernack M, Convey C, Lee S, Sedman G, Stonebraker M, Tatbul N, Zdonk S (22) Montorng streams: A new class of data management applcatons. In Proceedngs of the Internatonal Conference on Very Large Data Bases, 22. Chandrasekaran S, Cooper O, Deshpande A, Frankln MJ, Hellersten JM, Hong W, Krshnamurthy S, Madden SR, Raman V, Ress F, Shah MA (23) TelegraphCQ: Contnuous dataflow processng for an uncertan world. In Proceedngs of the Conference on Innovatve Database Research, 23. Chandrasekaran S, Frankln MJ (24) Remembrance of streams past: Overload-senstve management of archved streams. In Proceedngs of the Internatonal Conference on Very Large Databases, 24. Das A, Gehrke J, Redewald M (23) Approxmate jon processng over data streams. In Proceedngs of the ACM Internatonal Conference on Management of Data, 23. Ghanem TM, Hammad MA, Mokbel MF, Aref WG, Elmagarmd AK (25) Query processng usng negatve tuples n stream query engnes. Purdue Unversty Techncal Report CSD TR# 4-4, 25. Golab L, Garg S, Ozsu MT (24) On ndexng sldng wndows over onlne data streams. In Proceedngs of the Internatonal Conference on Extendng Database Technology, 24. Golab L, Ozsu MT (23) Processng sldng wndow mult-jons n contnuous queres over data streams. In Proceedngs of the Internatonal Conference on Very Large Data Bases, 23. Hammad MA, Aref WG (23) Stream wndow jon: Trackng movng objects n sensor-network databases. In Proceedngs of the Internatonal Conference on Scentfc and Statstcal Database Management, 23. Helmer S, Westmann T, Moerkotte G (1998) Dag-Jon: An opportunstc jon algorthm for 1:N relatonshps. In Proceedngs of the Internatonal Conference on Very Large Data Bases, Mamouls N (23) Effcent processng of jons on set-valued attrbutes. In Proceedngs of the ACM Internatonal Conference on Management of Data, 23. Motwan R, Wdom J, Arasu A, Babcock B, Babu S, Datar M, Manku G, Olston C, Rosensten J, Varma R (23) Query processng, resource management, and approxmaton n a data stream management system. In Proceedngs of the Conference on Innovatve Database Research, 23. Kang J, Naughton J, Vglas S (23) Evaluatng wndow jons over unbounded streams. In Proceedngs of the IEEE Internatonal Conference on Data Engneerng, 23. Klenberg J (22) Bursty and herarchcal structure n streams. In Proceedngs of the ACM Internatonal Conference on Knowledge Dscovery and Data Mnng, 22. Madden SR, Frankln MJ, Hellersten JM, Hong W (22) TAG: a TnyAGgregaton servce for

28 28 B. Gedk et al ad-hoc sensor networks. In Proceedngs of the Symposum on Operatng Systems Desgn and Implementaton, 22. Srvastava U, Wdom J (24) Flexble tme management n data stream systems. In Proceedngs of the ACM Symposum on Prncples of Database Systems, 24. Srvastava U, Wdom J (24) Memory-lmted executon of wndowed stream jons. In Proceedngs of the Internatonal Conference on Very Large Databases, 24. Tatbul N, Cetntemel U, Zdonk S, Chernack M, Stonebraker M (23) Load sheddng n a data stream manager. In Proceedngs of the Internatonal Conference on Very Large Data Bases, 23. Tucker PA, Maer D, Sheard T, Fegaras L (23) Explotng punctuaton semantcs n contnuous data streams. IEEE Transactons on Knowledge and Data Engneerng, vol. 15, 23. Wu K-L, Chen S-K, Yu PS (26) Query ndexng wth contanment-encoded ntervals for effcent stream processng. Knowledge and Informaton Systems, 9(1):629, January 26. Author Bographes Bugra Gedk receved the B.S. degree n C.S. from the Blkent Unversty, Ankara, Turkey, and the Ph.D. degree n C.S. from the College of Computng at the Georga Insttute of Technology, Atlanta, GA, USA. He s wth the IBM Thomas J. Watson Research Center, currently a member of the Software Tools and Technques Group. Dr. Gedk s research nterests le n data ntensve dstrbuted computng systems, spannng data-centrc peer-to-peer overlay networks, moble and sensorbased dstrbuted data management systems, and dstrbuted data stream processng systems. Hs research focus s on developng system-level archtectures and technques to address scalablty problems n dstrbuted contnual query systems and applcatons. He s the recpent of the ICDCS 23 best paper award. He has served n the program commttees of several nternatonal conferences, such as ICDE, MDM, and CollaborateCom. Kun-Lung Wu receved the B.S. degree n E.E. from the Natonal Tawan Unversty, Tape, Tawan, the M.S. and Ph.D. degrees n C.S. both from the Unversty of Illnos at Urbana- Champagn. He s wth the IBM Thomas J. Watson Research Center, currently a member of the Software Tools and Technques Group. Hs recent research nterests nclude data streams, contnual queres, moble computng, Internet technologes and applcatons, database systems and dstrbuted computng. He has publshed extensvely and holds many patents n these areas. Dr. Wu s a Senor Member of the IEEE Computer Socety and a member of the ACM. He s the Program Co-Char for the IEEE Jont Conference on e-commerce Technology (CEC 27) and Enterprse Computng, e-commerce and e-servces (EEE 27). He was an Assocate Edtor for the IEEE Trans. on Knowledge and Data Engneerng, He was the general char for the 3rd Internatonal Workshop on E-Commerce and Web-Based Informaton Systems (WECWIS 21). He has served as an organzng and program commttee member on varous conferences. He has receved varous IBM awards, ncludng IBM Corporate Envronmental Affar Excellence Award, Research Dvson Award, and several Inventon Achevement Awards. He receved a best paper award from IEEE EEE 24. He s an IBM Master Inventor.

29 CPU Load Sheddng for Bnary Stream Jons 29 Phlp S. Yu receved the B.S. Degree n E.E. from Natonal Tawan Unversty, the M.S. and Ph.D. degrees n E.E. from Stanford Unversty, and the M.B.A. degree from New York Unversty. He s wth the IBM Thomas J. Watson Research Center and currently manager of the Software Tools and Technques group. Hs research nterests nclude data mnng, Internet applcatons and technologes, database systems, multmeda systems, parallel and dstrbuted processng, and performance modelng. Dr. Yu has publshed more than 43 papers n refereed journals and conferences. He holds or has appled for more than 25 US patents. Dr. Yu s a Fellow of the ACM and a Fellow of the IEEE. He s assocate edtors of ACM Transactons on the Internet Technology and ACM Transactons on Knowledge Dscovery n Data. He s a member of the IEEE Data Engneerng steerng commttee and s also on the steerng commttee of IEEE Conference on Data Mnng. He was the Edtor-n-Chef of IEEE Transactons on Knowledge and Data Engneerng (21-24), an edtor, advsory board member and also a guest co-edtor of the specal ssue on mnng of databases. He had also served as an assocate edtor of Knowledge and Informaton Systems. In addton to servng as program commttee member on varous conferences, he wll be servng as the general char of 26 ACM Conference on Informaton and Knowledge Management and the program char of the 26 jont conferences of the 8th IEEE Conference on E-Commerce Technology (CEC 6) and the 3rd IEEE Conference on Enterprse Computng, E-Commerce and E-Servces (EEE 6). He was the program char or co-chars of the 11th IEEE Intl. Conference on Data Engneerng, the 6th Pacfc Area Conference on Knowledge Dscovery and Data Mnng, the 9th ACM SIG- MOD Workshop on Research Issues n Data Mnng and Knowledge Dscovery, the 2nd IEEE Intl. Workshop on Research Issues on Data Engneerng: Transacton and Query Processng, the PAKDD Workshop on Knowledge Dscovery from Advanced Databases, and the 2nd IEEE Intl. Workshop on Advanced Issues of E-Commerce and Web-based Informaton Systems. He served as the general char of the 14th IEEE Intl. Conference on Data Engneerng and the general co-char of the 2nd IEEE Intl. Conference on Data Mnng. He has receved several IBM honors ncludng 2 IBM Outstandng Innovaton Awards, an Outstandng Techncal Achevement Award, 2 Research Dvson Awards and the 84th plateau of Inventon Achevement Awards. He receved an Outstandng Contrbutons Award from IEEE Intl. Conference on Data Mnng n 23 and also an IEEE Regon 1 Award for promotng and perpetuatng numerous new electrcal engneerng concepts n Dr. Yu s an IBM Master Inventor. Lng Lu s an assocate professor at the College of Computng at Georga Tech. There, she drects the research programs n Dstrbuted Data Intensve Systems Lab (DSL), examnng research ssues and techncal challenges n buldng large scale dstrbuted computng systems that can grow wthout lmts. Dr. Lu and the DSL research group have been workng on varous aspects of dstrbuted data ntensve systems, rangng from decentralzed overlay networks, exemplfed by peer to peer computng, data grd computng, to moble computng systems and locaton based servces, sensor network computng, and enterprse computng systems. She has publshed over 15 nternatonal journal and conference artcles. Her research group has produced a number of software systems that are ether open sources or drectly accessble onlne, among whch the most popular ones are WebCQ and XWRAPElte. Dr. Lu s currently on the edtoral board of several nternatonal journals, ncludng IEEE Transactons on Knowledge and Data Engneerng, Internatonal Journal of Very large Database systems (VLDBJ), Internatonal Journal of Web Servces Research, and has chared a number of conferences as a PC char, a vce PC char, or a general char, ncludng IEEE Internatonal Conference on Data Engneerng (ICDE 24, ICDE 26, ICDE 27), IEEE Internatonal Conference on Dstrbuted Computng (ICDCS 26), IEEE Internatonal Conference on Web Servces (ICWS 24). She s a recpent of IBM Faculty Award (23, 26). Dr. Lu s current research s partly sponsored by grants from NSF CISE CSR, ITR, CyberTrust, a grant from AFOSR, an IBM SUR grant, and an IBM faculty award.

Adaptive Load Shedding for Windowed Stream Joins

Adaptive Load Shedding for Windowed Stream Joins Adaptve Load Sheddng for Wndowed Stream Jons Bu gra Gedk College of Computng, GaTech bgedk@cc.gatech.edu Kun-Lung Wu, Phlp Yu T.J. Watson Research, IBM {klwu,psyu}@us.bm.com Lng Lu College of Computng,

More information

Adaptive Load Shedding for Windowed Stream Joins

Adaptive Load Shedding for Windowed Stream Joins Adaptve Load Sheddng for Wndowed Stream Jons Buğra Gedk, Kun-Lung Wu, Phlp S. Yu, Lng Lu College of Computng, Georga Tech Atlanta GA 333 {bgedk,lnglu}@cc.gatech.edu IBM T. J. Watson Research Center Yorktown

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

SAO: A Stream Index for Answering Linear Optimization Queries

SAO: A Stream Index for Answering Linear Optimization Queries SAO: A Stream Index for Answerng near Optmzaton Queres Gang uo Kun-ung Wu Phlp S. Yu IBM T.J. Watson Research Center {luog, klwu, psyu}@us.bm.com Abstract near optmzaton queres retreve the top-k tuples

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

Efficient Distributed File System (EDFS)

Efficient Distributed File System (EDFS) Effcent Dstrbuted Fle System (EDFS) (Sem-Centralzed) Debessay(Debsh) Fesehaye, Rahul Malk & Klara Naherstedt Unversty of Illnos-Urbana Champagn Contents Problem Statement, Related Work, EDFS Desgn Rate

More information

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory Background EECS. Operatng System Fundamentals No. Vrtual Memory Prof. Hu Jang Department of Electrcal Engneerng and Computer Scence, York Unversty Memory-management methods normally requres the entre process

More information

CMPS 10 Introduction to Computer Science Lecture Notes

CMPS 10 Introduction to Computer Science Lecture Notes CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not

More information

Reducing Frame Rate for Object Tracking

Reducing Frame Rate for Object Tracking Reducng Frame Rate for Object Trackng Pavel Korshunov 1 and We Tsang Oo 2 1 Natonal Unversty of Sngapore, Sngapore 11977, pavelkor@comp.nus.edu.sg 2 Natonal Unversty of Sngapore, Sngapore 11977, oowt@comp.nus.edu.sg

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions Sortng Revew Introducton to Algorthms Qucksort CSE 680 Prof. Roger Crawfs Inserton Sort T(n) = Θ(n 2 ) In-place Merge Sort T(n) = Θ(n lg(n)) Not n-place Selecton Sort (from homework) T(n) = Θ(n 2 ) In-place

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Summarizing Data using Bottom-k Sketches

Summarizing Data using Bottom-k Sketches Summarzng Data usng Bottom-k Sketches Edth Cohen AT&T Labs Research 8 Park Avenue Florham Park, NJ 7932, USA edth@research.att.com Ham Kaplan School of Computer Scence Tel Avv Unversty Tel Avv, Israel

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

TN348: Openlab Module - Colocalization

TN348: Openlab Module - Colocalization TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages

More information

Problem Set 3 Solutions

Problem Set 3 Solutions Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,

More information

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Proceedngs of the Wnter Smulaton Conference M E Kuhl, N M Steger, F B Armstrong, and J A Jones, eds A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Mark W Brantley Chun-Hung

More information

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search Sequental search Buldng Java Programs Chapter 13 Searchng and Sortng sequental search: Locates a target value n an array/lst by examnng each element from start to fnsh. How many elements wll t need to

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe CSCI 104 Sortng Algorthms Mark Redekopp Davd Kempe Algorthm Effcency SORTING 2 Sortng If we have an unordered lst, sequental search becomes our only choce If we wll perform a lot of searches t may be benefcal

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

Meta-heuristics for Multidimensional Knapsack Problems

Meta-heuristics for Multidimensional Knapsack Problems 2012 4th Internatonal Conference on Computer Research and Development IPCSIT vol.39 (2012) (2012) IACSIT Press, Sngapore Meta-heurstcs for Multdmensonal Knapsack Problems Zhbao Man + Computer Scence Department,

More information

Simulation Based Analysis of FAST TCP using OMNET++

Simulation Based Analysis of FAST TCP using OMNET++ Smulaton Based Analyss of FAST TCP usng OMNET++ Umar ul Hassan 04030038@lums.edu.pk Md Term Report CS678 Topcs n Internet Research Sprng, 2006 Introducton Internet traffc s doublng roughly every 3 months

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

Programming in Fortran 90 : 2017/2018

Programming in Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values

More information

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between

More information

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

Learning-Based Top-N Selection Query Evaluation over Relational Databases

Learning-Based Top-N Selection Query Evaluation over Relational Databases Learnng-Based Top-N Selecton Query Evaluaton over Relatonal Databases Lang Zhu *, Wey Meng ** * School of Mathematcs and Computer Scence, Hebe Unversty, Baodng, Hebe 071002, Chna, zhu@mal.hbu.edu.cn **

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

Hermite Splines in Lie Groups as Products of Geodesics

Hermite Splines in Lie Groups as Products of Geodesics Hermte Splnes n Le Groups as Products of Geodescs Ethan Eade Updated May 28, 2017 1 Introducton 1.1 Goal Ths document defnes a curve n the Le group G parametrzed by tme and by structural parameters n the

More information

Related-Mode Attacks on CTR Encryption Mode

Related-Mode Attacks on CTR Encryption Mode Internatonal Journal of Network Securty, Vol.4, No.3, PP.282 287, May 2007 282 Related-Mode Attacks on CTR Encrypton Mode Dayn Wang, Dongda Ln, and Wenlng Wu (Correspondng author: Dayn Wang) Key Laboratory

More information

Private Information Retrieval (PIR)

Private Information Retrieval (PIR) 2 Levente Buttyán Problem formulaton Alce wants to obtan nformaton from a database, but she does not want the database to learn whch nformaton she wanted e.g., Alce s an nvestor queryng a stock-market

More information

Advanced Computer Networks

Advanced Computer Networks Char of Network Archtectures and Servces Department of Informatcs Techncal Unversty of Munch Note: Durng the attendance check a stcker contanng a unque QR code wll be put on ths exam. Ths QR code contans

More information

Today s Outline. Sorting: The Big Picture. Why Sort? Selection Sort: Idea. Insertion Sort: Idea. Sorting Chapter 7 in Weiss.

Today s Outline. Sorting: The Big Picture. Why Sort? Selection Sort: Idea. Insertion Sort: Idea. Sorting Chapter 7 in Weiss. Today s Outlne Sortng Chapter 7 n Wess CSE 26 Data Structures Ruth Anderson Announcements Wrtten Homework #6 due Frday 2/26 at the begnnng of lecture Proect Code due Mon March 1 by 11pm Today s Topcs:

More information

Dynamic Voltage Scaling of Supply and Body Bias Exploiting Software Runtime Distribution

Dynamic Voltage Scaling of Supply and Body Bias Exploiting Software Runtime Distribution Dynamc Voltage Scalng of Supply and Body Bas Explotng Software Runtme Dstrbuton Sungpack Hong EE Department Stanford Unversty Sungjoo Yoo, Byeong Bn, Kyu-Myung Cho, Soo-Kwan Eo Samsung Electroncs Taehwan

More information

Run-Time Operator State Spilling for Memory Intensive Long-Running Queries

Run-Time Operator State Spilling for Memory Intensive Long-Running Queries Run-Tme Operator State Spllng for Memory Intensve Long-Runnng Queres Bn Lu, Yal Zhu, and lke A. Rundenstener epartment of Computer Scence, Worcester Polytechnc Insttute Worcester, Massachusetts, USA {bnlu,

More information

Report on On-line Graph Coloring

Report on On-line Graph Coloring 2003 Fall Semester Comp 670K Onlne Algorthm Report on LO Yuet Me (00086365) cndylo@ust.hk Abstract Onlne algorthm deals wth data that has no future nformaton. Lots of examples demonstrate that onlne algorthm

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

Can We Beat the Prefix Filtering? An Adaptive Framework for Similarity Join and Search

Can We Beat the Prefix Filtering? An Adaptive Framework for Similarity Join and Search Can We Beat the Prefx Flterng? An Adaptve Framework for Smlarty Jon and Search Jannan Wang Guolang L Janhua Feng Department of Computer Scence and Technology, Tsnghua Natonal Laboratory for Informaton

More information

AADL : about scheduling analysis

AADL : about scheduling analysis AADL : about schedulng analyss Schedulng analyss, what s t? Embedded real-tme crtcal systems have temporal constrants to meet (e.g. deadlne). Many systems are bult wth operatng systems provdng multtaskng

More information

Analysis of Continuous Beams in General

Analysis of Continuous Beams in General Analyss of Contnuous Beams n General Contnuous beams consdered here are prsmatc, rgdly connected to each beam segment and supported at varous ponts along the beam. onts are selected at ponts of support,

More information

VRT012 User s guide V0.1. Address: Žirmūnų g. 27, Vilnius LT-09105, Phone: (370-5) , Fax: (370-5) ,

VRT012 User s guide V0.1. Address: Žirmūnų g. 27, Vilnius LT-09105, Phone: (370-5) , Fax: (370-5) , VRT012 User s gude V0.1 Thank you for purchasng our product. We hope ths user-frendly devce wll be helpful n realsng your deas and brngng comfort to your lfe. Please take few mnutes to read ths manual

More information

Maintaining temporal validity of real-time data on non-continuously executing resources

Maintaining temporal validity of real-time data on non-continuously executing resources Mantanng temporal valdty of real-tme data on non-contnuously executng resources Tan Ba, Hong Lu and Juan Yang Hunan Insttute of Scence and Technology, College of Computer Scence, 44, Yueyang, Chna Wuhan

More information

CS 534: Computer Vision Model Fitting

CS 534: Computer Vision Model Fitting CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust

More information

Greedy Technique - Definition

Greedy Technique - Definition Greedy Technque Greedy Technque - Defnton The greedy method s a general algorthm desgn paradgm, bult on the follong elements: confguratons: dfferent choces, collectons, or values to fnd objectve functon:

More information

arxiv: v3 [cs.ds] 7 Feb 2017

arxiv: v3 [cs.ds] 7 Feb 2017 : A Two-stage Sketch for Data Streams Tong Yang 1, Lngtong Lu 2, Ybo Yan 1, Muhammad Shahzad 3, Yulong Shen 2 Xaomng L 1, Bn Cu 1, Gaogang Xe 4 1 Pekng Unversty, Chna. 2 Xdan Unversty, Chna. 3 North Carolna

More information

Learning from Multiple Related Data Streams with Asynchronous Flowing Speeds

Learning from Multiple Related Data Streams with Asynchronous Flowing Speeds Learnng from Multple Related Data Streams wth Asynchronous Flowng Speeds Zh Qao, Peng Zhang, Jng He, Jnghua Yan, L Guo Insttute of Computng Technology, Chnese Academy of Scences, Bejng, 100190, Chna. School

More information

Self-tuning Histograms: Building Histograms Without Looking at Data

Self-tuning Histograms: Building Histograms Without Looking at Data Self-tunng Hstograms: Buldng Hstograms Wthout Lookng at Data Ashraf Aboulnaga Computer Scences Department Unversty of Wsconsn - Madson ashraf@cs.wsc.edu Surajt Chaudhur Mcrosoft Research surajtc@mcrosoft.com

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z. TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS Muradalyev AZ Azerbajan Scentfc-Research and Desgn-Prospectng Insttute of Energetc AZ1012, Ave HZardab-94 E-mal:aydn_murad@yahoocom Importance of

More information

Optimal Workload-based Weighted Wavelet Synopses

Optimal Workload-based Weighted Wavelet Synopses Optmal Workload-based Weghted Wavelet Synopses Yoss Matas School of Computer Scence Tel Avv Unversty Tel Avv 69978, Israel matas@tau.ac.l Danel Urel School of Computer Scence Tel Avv Unversty Tel Avv 69978,

More information

Query and Update Load Shedding With MobiQual

Query and Update Load Shedding With MobiQual Dodda Sudeep et al,int.j.computer Technology & Applcatons,Vol 3 (1), 470-474 Query and Update Load Sheddng Wth MobQual Dodda Sudeep 1, Bala Krshna 2 1 Pursung M.Tech(CS), Nalanda Insttute of Engneerng

More information

Chapter 6 Programmng the fnte element method Inow turn to the man subject of ths book: The mplementaton of the fnte element algorthm n computer programs. In order to make my dscusson as straghtforward

More information

Shared Running Buffer Based Proxy Caching of Streaming Sessions

Shared Running Buffer Based Proxy Caching of Streaming Sessions Shared Runnng Buffer Based Proxy Cachng of Streamng Sessons Songqng Chen, Bo Shen, Yong Yan, Sujoy Basu Moble and Meda Systems Laboratory HP Laboratores Palo Alto HPL-23-47 March th, 23* E-mal: sqchen@cs.wm.edu,

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated.

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated. Some Advanced SP Tools 1. umulatve Sum ontrol (usum) hart For the data shown n Table 9-1, the x chart can be generated. However, the shft taken place at sample #21 s not apparent. 92 For ths set samples,

More information

Topology Design using LS-TaSC Version 2 and LS-DYNA

Topology Design using LS-TaSC Version 2 and LS-DYNA Topology Desgn usng LS-TaSC Verson 2 and LS-DYNA Wllem Roux Lvermore Software Technology Corporaton, Lvermore, CA, USA Abstract Ths paper gves an overvew of LS-TaSC verson 2, a topology optmzaton tool

More information

Assembler. Building a Modern Computer From First Principles.

Assembler. Building a Modern Computer From First Principles. Assembler Buldng a Modern Computer From Frst Prncples www.nand2tetrs.org Elements of Computng Systems, Nsan & Schocken, MIT Press, www.nand2tetrs.org, Chapter 6: Assembler slde Where we are at: Human Thought

More information

FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK

FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK L-qng Qu, Yong-quan Lang 2, Jng-Chen 3, 2 College of Informaton Scence and Technology, Shandong Unversty of Scence and Technology,

More information

Video Proxy System for a Large-scale VOD System (DINA)

Video Proxy System for a Large-scale VOD System (DINA) Vdeo Proxy System for a Large-scale VOD System (DINA) KWUN-CHUNG CHAN #, KWOK-WAI CHEUNG *# #Department of Informaton Engneerng *Centre of Innovaton and Technology The Chnese Unversty of Hong Kong SHATIN,

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Virtual Machine Migration based on Trust Measurement of Computer Node

Virtual Machine Migration based on Trust Measurement of Computer Node Appled Mechancs and Materals Onlne: 2014-04-04 ISSN: 1662-7482, Vols. 536-537, pp 678-682 do:10.4028/www.scentfc.net/amm.536-537.678 2014 Trans Tech Publcatons, Swtzerland Vrtual Machne Mgraton based on

More information

Evaluation of an Enhanced Scheme for High-level Nested Network Mobility

Evaluation of an Enhanced Scheme for High-level Nested Network Mobility IJCSNS Internatonal Journal of Computer Scence and Network Securty, VOL.15 No.10, October 2015 1 Evaluaton of an Enhanced Scheme for Hgh-level Nested Network Moblty Mohammed Babker Al Mohammed, Asha Hassan.

More information

Parallel matrix-vector multiplication

Parallel matrix-vector multiplication Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more

More information

Concurrent Apriori Data Mining Algorithms

Concurrent Apriori Data Mining Algorithms Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

High level vs Low Level. What is a Computer Program? What does gcc do for you? Program = Instructions + Data. Basic Computer Organization

High level vs Low Level. What is a Computer Program? What does gcc do for you? Program = Instructions + Data. Basic Computer Organization What s a Computer Program? Descrpton of algorthms and data structures to acheve a specfc ojectve Could e done n any language, even a natural language lke Englsh Programmng language: A Standard notaton

More information

Design and Analysis of Algorithms

Design and Analysis of Algorithms Desgn and Analyss of Algorthms Heaps and Heapsort Reference: CLRS Chapter 6 Topcs: Heaps Heapsort Prorty queue Huo Hongwe Recap and overvew The story so far... Inserton sort runnng tme of Θ(n 2 ); sorts

More information

Distributed Resource Scheduling in Grid Computing Using Fuzzy Approach

Distributed Resource Scheduling in Grid Computing Using Fuzzy Approach Dstrbuted Resource Schedulng n Grd Computng Usng Fuzzy Approach Shahram Amn, Mohammad Ahmad Computer Engneerng Department Islamc Azad Unversty branch Mahallat, Iran Islamc Azad Unversty branch khomen,

More information

Help for Time-Resolved Analysis TRI2 version 2.4 P Barber,

Help for Time-Resolved Analysis TRI2 version 2.4 P Barber, Help for Tme-Resolved Analyss TRI2 verson 2.4 P Barber, 22.01.10 Introducton Tme-resolved Analyss (TRA) becomes avalable under the processng menu once you have loaded and selected an mage that contans

More information

Avoiding congestion through dynamic load control

Avoiding congestion through dynamic load control Avodng congeston through dynamc load control Vasl Hnatyshn, Adarshpal S. Seth Department of Computer and Informaton Scences, Unversty of Delaware, Newark, DE 976 ABSTRACT The current best effort approach

More information

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide Lobachevsky State Unversty of Nzhn Novgorod Polyhedron Quck Start Gude Nzhn Novgorod 2016 Contents Specfcaton of Polyhedron software... 3 Theoretcal background... 4 1. Interface of Polyhedron... 6 1.1.

More information

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan

More information

Intro. Iterators. 1. Access

Intro. Iterators. 1. Access Intro Ths mornng I d lke to talk a lttle bt about s and s. We wll start out wth smlartes and dfferences, then we wll see how to draw them n envronment dagrams, and we wll fnsh wth some examples. Happy

More information

Real-Time Guarantees. Traffic Characteristics. Flow Control

Real-Time Guarantees. Traffic Characteristics. Flow Control Real-Tme Guarantees Requrements on RT communcaton protocols: delay (response s) small jtter small throughput hgh error detecton at recever (and sender) small error detecton latency no thrashng under peak

More information

Cache Performance 3/28/17. Agenda. Cache Abstraction and Metrics. Direct-Mapped Cache: Placement and Access

Cache Performance 3/28/17. Agenda. Cache Abstraction and Metrics. Direct-Mapped Cache: Placement and Access Agenda Cache Performance Samra Khan March 28, 217 Revew from last lecture Cache access Assocatvty Replacement Cache Performance Cache Abstracton and Metrcs Address Tag Store (s the address n the cache?

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

Channel 0. Channel 1 Channel 2. Channel 3 Channel 4. Channel 5 Channel 6 Channel 7

Channel 0. Channel 1 Channel 2. Channel 3 Channel 4. Channel 5 Channel 6 Channel 7 Optmzed Regonal Cachng for On-Demand Data Delvery Derek L. Eager Mchael C. Ferrs Mary K. Vernon Unversty of Saskatchewan Unversty of Wsconsn Madson Saskatoon, SK Canada S7N 5A9 Madson, WI 5376 eager@cs.usask.ca

More information

Needed Information to do Allocation

Needed Information to do Allocation Complexty n the Database Allocaton Desgn Must tae relatonshp between fragments nto account Cost of ntegrty enforcements Constrants on response-tme, storage, and processng capablty Needed Informaton to

More information

5 The Primal-Dual Method

5 The Primal-Dual Method 5 The Prmal-Dual Method Orgnally desgned as a method for solvng lnear programs, where t reduces weghted optmzaton problems to smpler combnatoral ones, the prmal-dual method (PDM) has receved much attenton

More information

3. CR parameters and Multi-Objective Fitness Function

3. CR parameters and Multi-Objective Fitness Function 3 CR parameters and Mult-objectve Ftness Functon 41 3. CR parameters and Mult-Objectve Ftness Functon 3.1. Introducton Cogntve rados dynamcally confgure the wreless communcaton system, whch takes beneft

More information

Overview. Basic Setup [9] Motivation and Tasks. Modularization 2008/2/20 IMPROVED COVERAGE CONTROL USING ONLY LOCAL INFORMATION

Overview. Basic Setup [9] Motivation and Tasks. Modularization 2008/2/20 IMPROVED COVERAGE CONTROL USING ONLY LOCAL INFORMATION Overvew 2 IMPROVED COVERAGE CONTROL USING ONLY LOCAL INFORMATION Introducton Mult- Smulator MASIM Theoretcal Work and Smulaton Results Concluson Jay Wagenpfel, Adran Trachte Motvaton and Tasks Basc Setup

More information

GSLM Operations Research II Fall 13/14

GSLM Operations Research II Fall 13/14 GSLM 58 Operatons Research II Fall /4 6. Separable Programmng Consder a general NLP mn f(x) s.t. g j (x) b j j =. m. Defnton 6.. The NLP s a separable program f ts objectve functon and all constrants are

More information