Contrary to Popular Belief Incremental Discretization can be Sound, Computationally Efficient and Extremely Useful for Streaming Data

Size: px

Start display at page:

Download "Contrary to Popular Belief Incremental Discretization can be Sound, Computationally Efficient and Extremely Useful for Streaming Data"

Kevin McKinney
5 years ago
Views:

1 Contrary to Popular Belef Incremental Dscretzaton can be Sound, Computatonally Effcent and Extremely Useful for Streamng Data Geoffrey I. Webb Faculty of Informaton Technology, Monash Unversty, Vctora, Australa Abstract Dscretzaton of streamng data has receved surprsngly lttle attenton. Ths mght be because streamng data requre ncremental dscretzaton wth cutponts that may vary over tme and ths s perceved as undesrable. We argue, to the contrary, that t can be desrable for a dscretzaton to evolve n synchronzaton wth an evolvng data stream, even when the learner assumes that attrbute values meanngs reman nvarant over tme. We examne the ssues assocated wth dscretzaton n the context of dstrbuton drft and develop computatonally effcent ncremental dscretzaton algorthms. We show that dscretzaton can reduce the error of a classcal ncremental learner and that allowng a dscretzaton to drft n synchronzaton wth dstrbuton drft can further reduce error. I. INTRODUCTION It s surprsng that dscretzaton of numerc data streams has receved lttle attenton. One reason may be that the cut ponts are lkely to have to change as the stream progresses, because the dstrbuton of values may vary. Ths may bas potental users aganst usng dscretzaton because t may seem unntutve to use dscretzed values whose meanng changes over tme. We argue, to the contrary, that changng over tme the cut ponts assocated wth each dscretzed value mght sometmes be necessary f the nterval s to retan the relevant meanng for a gven task. Ths paper nvestgates dscretzaton of numerc stream data. We present two effcent and effectve ncremental dscretzaton algorthms. The frst approxmates equal frequency dscretzaton over the entre stream to the tme step. The second uses a wndow of recent values and performs equal frequency dscretzaton on these, allowng the cut ponts to exactly track a non-statonary dstrbuton. Our experments demonstrate that dscretzaton can reduce error for the stateof-the-art streamng learner Logstc Regresson (LR) wth Stochastc Gradent Descent. We further demonstrate that for some streamng data t s ndeed useful to have dscretzatons whose cut ponts change over tme, trackng the evoluton of the underlyng concepts. II. DISCRETIZATION FOR STREAMING DATA We wsh to ncrementally update a model Θ to predct the posteror probablty dstrbuton P (y x ) of the classes y {c 1,... c k } for objects x = x 1,..., xa whle vewng a large or nfnte stream S = {x 1,...x n } of objects. We use Θ to denote the model at tme step and P Θ (y x ) to denote the class dstrbuton predcted by model Θ for object x. We assume that the true class y for each x becomes avalable after x s classfed and can be used for subsequent tranng of the classfer. The attrbute values x j of the objects may be ether categorcal or numerc. A dscretzaton δ of a numerc attrbute X s a set of m ntervals called bns. These bns can be defned by cut ponts {κ 1,..., κ m 1 }. These cut ponts dvde the doman of X nto bns b 1... b m usng a scheme such as b 1 = [, κ 1 ], b m = (κ m 1, ] and for 1 < < m, b = (κ 1, κ ]. A dscretzaton of attrbute X defnes a mappng between values v of X and bn ndexes, δv = z such that v b z. Dscretzaton s closely related to both hstograms and quantles. A hstogram of a numerc attrbute X wth respect to a dataset S can be vewed as a dscretzaton of X augmented wth a vector of counts η 1,... η m such that η k represents {j : x j b k}, the number of records whose value for the attrbute falls wthn the bn. A p th quantle Q p of an attrbute X wth respect to S s a value such that {j : x j < Qp } /n < p {j : x j > Q p } /n < 1 p. That s, t s the value of x pn f the data were sorted on the attrbute. If pn s not an nteger then the p th quantle may be any value n [x pn, x pn ] and s often set to x pn + (x pn x pn )/2. III. ISSUES IN DISCRETIZATION FOR STREAMING DATA The cut ponts for dscretzaton of streamng data may need to change over tme. Ths s because the process that generates the stream S may be non-statonary, n whch case t s not gong to be possble to antcpate what the future dstrbuton of values for an attrbute wll be and hence mpossble to predetermne what ntervals wll be relevant n the future. If the ntervals are predetermned and reman statc then they are lkely to eventually lose relevance. However, such changes to the ntervals over tme may appear undesrable, as they seem to mply that the meanngs of the bns must change. We suspect that ths has been a key reason why there has been lttle prevous research nto dscretzaton for streamng data. However, ths concern may be msguded. If a dstrbuton s non-statonary then t actually may be desrable for the dscretzaton to drft n synchronzaton wth the changes n the dstrbuton. For example, consder a stream of data that ncludes an ncome attrbute. The values of ths attrbute can be expected to grow over tme. For at least some applcatons t seems credble that we should want the dscretzaton to reflect ths evoluton. For example, t may be necessary for the cut

2 pont on a bn representng hgh ncome to ncrease over tme f that nterval s to retan ts relevant meanng. A further ssue s that some algorthms do not requre contnuty over tme n the bns that are used. For example, Nave Bayes [1] requres at classfcaton tme estmates of the pror probablty of each class, P (y ) and of the lkelhood of each attrbute value gven the class P (x j y ). These can be derved from counts of the relatve frequency of each class and of each par of class and attrbute value. It s not relevant what the ntervals were for prevous classfcatons, only that these necessary statstcs be avalable for the current dscretzaton. Hence, Nave Bayes can be well served by a technque that mantans a sutable augmented hstogram over tme and t s rrelevant whether the number of bns or ther cut ponts change. Rather, the key ssue s whether the counts are suffcently accurate for effectve classfcaton [2]. On the other hand, most dscrmnatve learnng algorthms do not operate n ths manner and do requre that the number of bns and ther meanng be constant over tme. For such algorthms, quantle-based dscretzaton, such as equal frequency dscretzaton, may be effectve. Ths unsupervsed dscretzaton strategy requres that the number of bns, m, be pre-specfed, together wth a set of quantles that specfy the cutponts. For equal frequency dscretzaton the range of attrbute X s dvded nto m bns, each contanng the same number of tranng examples, that s, nto bns b 1,... b m such that k, l (1, m) {j : x j b k} = {j : x j b l}. Ths s drectly related to the problem of fndng quantles, as the k th bn has an nterval (Q k 1 m, Q k m ]. Quantle-based dscretzaton allows at least one type of meanng of an nterval to reman nvarant even whle the cut ponts change. Consder agan the case of an attrbute for ncome. Suppose t s dscretzed nto three bns, the lower, mddle and upper thrds of ncome. If a streamng dscretzaton algorthm s able to mantan such a dscretzaton over tme, varyng the cut ponts as needed, at least one potentally mportant meanng of the ntervals wll reman constant. Supervsed dscretzaton often results n more useful bns than unsupervsed approaches [3]. However, supervsed dscretzaton does not appear feasble for dscrmnatve learners n a streamng context, as the cuts selected by a supervsed approach may vary dramatcally over tme and classcal dscrmnatve learners cannot track and adjust for ths. In contrast quantle-based dscretzaton can mantan a constant set of bns, each wth a meanng that remans nvarant even whle the cut values that defne the bns drft. If meanngful quantles can be dentfed for a learnng problem then these should be used. However, we show that even when such nformaton s not known, smple equal frequency dscretzaton can be effectve. It may appear counter-ntutve that dscretzaton should mprove the performance of a learnng algorthm that can handle numerc values drectly, because t s clear that dscretzaton loses nformaton. However, even though a dscretzed varable contans less nformaton than the undscretzed orgnal, the models that a learner forms may be able to employ that nformaton more effectvely. Consder for example a smple lnear model such as created by Logstc Regresson. Such a model requres that the TABLE I. UPDATE SAMPLES globals s: the sample sze n: the number of nstances seen n the stream to date V : a vector of set of samples, ndexed by attrbute 1: procedure UPDATESAMPLES(x = x 1,..., x a ) 2: f rand() s/n then 3: for = 1 to a do 4: f x s not mssng then 5: f V = s then 6: remove a random element from V 7: end f 8: add x to V 9: end f 10: end for 11: else 12: for = 1 to a do 13: f V < s and x s not mssng then 14: add x to V 15: end f 16: end for 17: end f 18: end procedure predctveness of a numerc value be proportonal to ts value. It cannot drectly model the case where only unusually hgh values are ndcatve of one class, and average or low values are all equally ndcatve of the other, or where average values are ndcatve of one class and ether hgh or low values ndcatve of the other. By dscretzng the attrbute and then treatng each dscrete value as a bnary varable a lnear classfer can model the predctveness of ndvdual segments of the number lne rrespectve of ther relatve absolute values. IV. INCREMENTAL DISCRETIZATION ALGORITHM IDA The Incremental Dscretzaton Algorthm (IDA) approxmates quantle-based dscretzaton on the entre data stream encountered to date by mantanng a random sample of the data whch s used to calculate the cut ponts. A random sample s used because: 1) t s not feasble for hgh-throughput streams to mantan a complete record of all values observed to date; 2) t s computatonally effcent; and 3) t s possble to place tght bounds on the expected varance of the cut ponts [4]. We use the reservor samplng algorthm [5] to mantan the random sample of s values V for each attrbute. The frst s values of each X are added to the correspondng V. Thereafter, when the n th object x n, y n s encountered, wth probablty s/n, each of ts values x n replaces a randomly selected value of the correspondng V. See Table I. We store the values of each attrbute n a vector of nterval heaps [6], where V j stores the values for the j th bn of X. Ths provdes effcent access to the mnmum and maxmum values n a bn, and drect access to a random value wthn a bn when replacng a value selected at random. Ths data structure ensures that nserton and deleton are of order O(log s) and retrevng a cut pont s constant tme. The algorthm for nsertng a value v nto V s presented n Table II. Recall that m s the number of bns. V j and V j denote, respectvely the maxmum and mnmum value n V j. Lne 2 uses bnary

3 TABLE II. globals m: the number of bns 1: procedure INSERTVALUE(v, V ) 2: t = V mod m 3: j = argmn j V j v 4: nsert v nto V j 5: f j < t then 6: for k = j to t 1 do 7: add V k to V k+1 8: remove V k from V k 9: end for 10: else 11: for k = t to j 1 do 12: add V k+1 13: remove V k+1 14: end for 15: end f 16: end procedure INSERT VALUE to V k from V k+1 search to fnd the bn n whch the value belongs. Lne 3 fnds the target bn the next bn that should ncrease n sze. The value s nserted nto the approprate bn. If t s not the target bn, the excess value s shuffled up or down to the target. Deleton s a mnor varaton on nserton. The cut ponts are accessed n constant tme by returnng the maxmum value of the approprate bn. IDA mantans a random sample of the stream from ts begnnng to the current pont of tme. As suggested n the ntroducton, n some contexts t mght be valuable to have the ntervals drft, so that the actual cut pont assocated wth the lowest range of ncome, for example, drfts upwards as nflaton ncreases ncomes. IDA s ntervals wll drft over tme to reflect overall changes n the total dstrbuton to date. However, t does not drectly track the current dstrbuton. A varant that more precsely tracks the evoluton of a data stream s to mantan S as a wndow of the s most recent objects. In ths case the dscretzaton wll change as the dstrbuton changes, but wll be more subject to frequent random mnor fluctuatons than a more gradual update approach. We call the latter approach the Incremental Dscretzaton Algorthm wth a Wndow (IDAW). Ths requres the addtonal overhead of mantanng for each value the wndow of values n tme order so that the oldest value can at each step be dentfed and replaced by the newest value. A. Computatonal Complexty The computatonal complexty of IDA s domnated by the costs of mantanng the samples and determnng the quantles from those samples. The requred operatons are to nsert a new value (only requred whle the sample s not yet at full sze), to replace a random value wth a new value, and to return the requred quantles. As each bn s mantaned as an nterval heap [6], fndng the quantles takes constant tme and nsertng or removng a value from a bn V j takes O(log V j ) = O(log(s/m)) tme. As replacement requres up to m nsertons and deletons, replacement requres order O(m log(s/m)) tme. However, these relatvely expensve updates are only requred on average once every s/t updates, where t s the current tme step or sze of the stream to date. Thus the amortzed cost s O([ s =1 m log /m + t =s+1 s m log s/m]/t), where the frst term represents the ntal s tme steps durng whch the sample s bult up to ts operatng sze and the second term represents updates to the sample once t reaches operatng sze. It s readly apparent that these updates rapdly become very rare and that as the sze of the stream becomes very large the amortzed cost becomes neglgbly small. The stuaton s more complex for IDAW, whch mantans a wndow of the s most recent values for each attrbute. Ths requres that the values be mantaned n both tme and value order. Mantanng an order by tme can be acheved very effcently wth a crcular buffer, whch supports all updates and accesses n constant tme. As the elements to be replaced n a replacement operaton are no longer selected at random, t s not effcent to mantan the bns as nterval heaps, as above. Rather we need to use slghtly more expensve balanced bnary trees for whch the tme to dentfy the locaton of the value to be removed s O(log(s/m)), whch ths does not ncrease the overall complexty of the update operaton relatve to that for IDA. The major computatonal penalty, however, s that these updates must be performed for every object encountered n the queue, whch makes the mantenance of the dscretzaton a non-trval ongong overhead. V. RELATED RESEARCH As we have noted above, mantanng the /m-quantles for each 1 m s the key requrement n order to dscretze a data stream nto m equal frequency bns. These quantles provde the requred cut ponts. Algorthms exst for fndng approxmate quantles n data streams wth strct bounds on the error [7] and [8]. However, they rely on the records n the stream appearng n random order, a requrement that s lkely to be strongly volated n many learnng applcatons. Ths renders these algorthms napproprate for our purposes. A dscretzaton technque should be matched to the propertes of the learnng algorthm. A number of papers have nvestgated dscretzaton of streamng data n the context of nave Bayes (NB) [9] [11]. NB s an unusual algorthm n that the model t learns for categorcal data can be represented n the form of an augmented hstogram, requrng counts of both the frequency of each attrbute value and the jont frequency of each combnaton of an attrbute and a class value. As a consequence t does not matter f there s a change n ether the number of values of an attrbute or the meanng of an attrbute value, so long as the approprate counts are mantaned. In contrast, many other ncremental learnng algorthms, such as lnear classfers wth weghts learned by stochastc gradent descent, requre that the number of attrbute values remans constant and that ther meanngs do not change. In the current work we target algorthms that requre the number of bns and ther meanngs to be nvarant. Partton Incremental Dscretzaton (PID) [10] allows the number of ntervals to reman constant. It operates by formng two layers of dscretzaton. The top layer s the dscretzaton used by the learnng algorthm. The bottom layer contans many more bns than the top layer. In ther example case

4 for equal frequency dscretzaton the bottom layer ams to mantan bns that contan 1/20 the number of nstances requred by each bn at the top level. Top level bns are formed by aggregaton of consecutve lower-level bns untl approxmately the correct sze bn s obtaned. The lower-level bns are ntally formed by settng cut ponts at equal dstances along the number lne between an ndcatve lower and upper value on the attrbute. Then as the stream s consumed, the counts for the lower-level bns are ncremented as approprate. When a lower-level bn exceeds a threshold sze t s splt on a value md-way between ts mnmum and maxmum values, and each count s set to one half the count for the orgnal bn. Ths may result n some naccuracy n the counts, but such naccuracy only matters when the two parts of a splt lowerlevel nterval end up n dfferent top-level bns, as otherwse both of the bns that have been formed wll fall wthn the one top-level bn and the top-level bn s total count wll reman accurate. The paper does not specfy the threshold for splttng. In our study we use twce the target sze for a lower-level bn. In other words, a lower-level bn s splt n two when t exceeds 1/10 th the target sze for an upper-level bn. PID has three potental lmtatons. Frst, as lower-level bns move from one hgher-level bn to another, there mght be abrupt changes n the cut ponts from one update to the next. Second, f the spread of values on the number lne s not unform, the number of bns created may become very large. Ths s because a small number of ntal bns may need to be repeatedly splt to accommodate the majorty of the data. Thrd, the splttng process mght result n major naccuraces n the estmated counts when there are very large numbers of repettons of a sngle value v. In ths case the lower-level bn nto whch v falls wll rapdly grow to exceed the sze threshold and be splt. However, the dvson of the counts across the two resultng bns wll be naccurate, as all the repettons of v belong n the same bn but wll be attrbuted equally to each of the new bns. Ths may occur repeatedly, causng a dmnshngly small proporton of the true count for v to be allocated to the correct bn. VI. EVALUATION We seek to evaluate three prmary contrbutons 1) IDA, a new algorthm for effcent and effectve dscretzaton of streamng data that approxmates the mantenance of equal frequency dscretzaton over all of the data observed up to the current tme; 2) IDAW, a varant of IDA that seeks to mantan an equal frequency dscretzaton over the data dstrbuton at the current tme; and 3) the hypothess that dscretzaton based on quantles can allow the cut ponts to drft over tme wthout changng the relevant meanng of the ntervals. It s also mportant to understand exactly how much power s lost by performng ncremental rather than batch dscretzaton. To assess these contrbutons and ssues we evaluate each component of our new algorthms n turn. We frst compare dscretzaton usng the full data (Pre- Dsc) aganst dscretzaton usng all the data encountered up to the tme of classfcaton (All-So-Far). Note that both Pre- Dsc and All-So-Far set hypothetcal benchmarks. Nether s feasble n a real-world streamng data context because the frst requres seeng all data that wll ever come through the stream n advance and the second requres retanng and analyzng all data n the stream. The next relevant test s to assess the loss n accuracy that results from usng a random sample rather than dscretzng on all the data encountered to date. To ths end we compare IDA aganst All-So-Far. It s also mportant to compare aganst the current state-ofthe-art n ncremental dscretzaton, PID. Ths s the only pror ncremental dscretzaton technque capable of supportng equal frequency dscretzaton. PID requres that the user provdes an ntal estmate of the lkely mnmum and maxmum value for each attrbute. To ensure that the evaluaton s as favorable to PID as possble we use the true mnmum and maxmum n place of these estmates, values that are often not known n practce for real streamng data. In order to understand what advantage, f any, dscretzaton can confer, we compare LR wth IDA to LR performed on normalzed numerc data (No-Dsc). To ensure that ths comparson s as favorable as possble to the non-dscretzaton opton we normalze usng the mnmum and maxmum values of each attrbute n the data, replacng each value x wth 2(x X )/( X X ) 1.0, (1) where X and X denote respectvely the mnmum and maxmum values for the attrbute X. Ths normalzes values to the nterval [ 1.0, 1.0]. Such normalzaton would clearly often not be possble n practce wth streamng data because t s often not possble to know n advance the mnmum and maxmum values for an attrbute. We also wsh to nvestgate the dea of allowng the dscretzaton to drft over tme, closely trackng the current dstrbuton of values. To ths end we compare IDAW to IDA usng sample szes of We perform all experments usng LR wth sngle-pass Stochastc Gradent Descent (LRSGD) usng regularzaton rate µ = and learnng rate or step sze λ = These are rates that we have found to be effectve n prevous expermental work on the current datasets when not usng dscretzaton. The regularzaton rate s not reduced over tme as we are seekng to learn n the presence of dstrbuton and concept drft and hence the target s non-statonary and so we cannot assume that we are ever approachng a fxed optmum. LRSGD has been selected as an exemplar of the type of ncremental learnng algorthm normally assocated wth numerc data that we beleve may beneft from dscretzaton. All experments use the procedure outlned n Table III. We use 5 bn dscretzaton because 10 bns obtaned the same overall results and 5 bns provded the best results for Pre-Dsc. A. Comparsons wthout dstrbuton or concept drft We are presentng a new approach to dscretzaton. Whle t s desgned for use wth streamng data t s mportant to establsh how much accuracy s lost relatve to the non-streamng baselne n a stuaton where there s clearly no dstrbuton or concept drft. To ths end we perform experments where the data are shuffled to ensure there s no systematc drft

5 TABLE III. STREAM LEARNING PROCEDURE 1: procedure STREAMTEST(data stream: S, dscretzer:, learner: λ) 2: ntalze the dscretzaton δ as requred by 3: for = 1 to S do 4: update δ by applyng (δ, x ) 5: apply the learner, ŷ = λ(δ(x )) 6: record the error I(ŷ y ) 7: end for 8: end procedure Fg. 1. Pre Dsc All So Far IDA PID IDAW No Dsc 0-1 loss on data wthout dstrbuton drft sensor power supply arlnes electrcty gas sensor over tme and compare our streamng algorthms aganst predscretzaton usng all the data and a streamng dscretzaton that uses all the data up to the current pont of tme. 20 experments were performed for each data stream, each tme shufflng the data n advance. We use the only publc real-world stream classfcaton datasets of whch we are aware, arlnes and electrcty, obtaned from the MOA webste [12]; gas-sensor, obtaned from the UCI Repostory [13]; and power-supply and sensor, obtaned from the Stream Data Mnng Repostory [14]. We present the resultng mean 0-1 loss for each algorthm on each dataset n Fg. 1, wth error bars representng 1 standard devaton marked, but too close to be readly dscerned. We use two-taled match-par t-tests for sgnfcance, employng an adjusted crtcal value of 0.05/75 = after a Bonferonn correcton for the 75 comparsons performed (15 pars of algorthms tmes 5 datasets). In no case s there a sgnfcant dfference between the error of the dscretzaton technques on arlnes, electrcty, gas-sensor or power-supply (p = to ). On sensor, IDAW has sgnfcantly hgher error than the other dscretzaton technques (p = to ). Ths may be due to the nstablty of the quantles as they are contnually updated. On all streams the use of LR wthout dscretzaton results n hgher error than ts use wth any dscretzaton technque (p = to ). These results demonstrate that our computatonally effcent use of small samples provdes performance that s close to optmal n the absence of concept drft. B. Comparsons on real world data To establsh the value of our algorthms n the context of dstrbuton drft, t s useful to assess performance on realworld stream data. Our fnal study compares the algorthms on the real-world data used n the prevous experment Fg. 2. Pre Dsc All So Far IDA PID IDAW No Dsc Errors for each approach on each data stream power supply sensor arlnes electrcty gas sensor We process our real-world datasets n ther orgnal order. Because they are n a fxed order t s not possble to have repeated trals and hence not possble to perform statstcal tests. In consequence, one should be cautous n nterpretng the apparent dfferences as meanngful unless they are qute substantal. The 0-1 loss s presented n Fgure 2. All the dscretzaton technques appear to enjoy a substantal advantage relatve to no dscretzaton on all data streams other than arlnes for whch the advantage s small. Mantanng an exact dscretzaton over all the data to the current pont offers smlar accuracy to pre-dscretzaton on all datasets except sensor for whch t appears to substantally ncrease error. It s not apparent why All-So-Far should be penalzed on ths partcular data stream. The two approaches that seek to approxmate All-So-Far, IDA and PID, both acheve error very close to ts error. The IDAW approach of trackng the current dstrbuton delvers very substantal reductons n error for the electrcty and sensor data streams, but results n substantal ncreases n error for gas-sensor and power-supply. The beneft of ths approach on the electrcty and sensor data streams supports our hypothess that mantanng dscretzatons based on quantles as they vary over tme can mantan meanng whle the cut-ponts vary. However, the results for the other data streams show that some types of dstrbuton drft do not take ths form. C. Runnng tmes Due to the large number of repettons of processng large datasets we conducted all experments on a heterogeneous grd system. As a result, compute tmes are only ndcatve at best. Nonetheless we present n Fgure 3 the compute tmes for the experments on real-world data n order to gve a feel for the computatonal profles of the technques that we have developed. The software s mplemented n C++ but lttle attempt has been made to optmze the dscretzaton process. The key observaton s that IDA and ts varants n most cases ncur only modest computatonal overheads relatve to no dscretzaton. The relatvely poor performance of PID should be treated wth cauton as we have made no attempt to optmze our remplementaton of the technque.

6 IDA IDAW PID arlnes electrcty gas-sensor power-supply sensor Fg. 3. Runnng tmes for each ncremental dscretzaton technque on each data stream, presented n multples of tme taken wthout dscretzaton VII. CONCLUSIONS We have explored the key ssues that surround dscretzaton of streamng data and presented two new technques based on samplng. Most dscrmnatve algorthms requre that the number and meanng of the bns reman nvarant. We argue that bnnng on fxed quantles of the dstrbuton, rather than fxed absolute values, can mantan an approprate meanng over streamng data wth dstrbuton drft. Hence one bn can represent the top p values for the data and so on, even as the absolute values n that range vary. Our new stream dscretzaton technques use a sample of values for an attrbute to mantan an equal frequency dscretzaton. They dffer only n the composton of the sample. IDA uses the reservor samplng algorthm to mantan a sample drawn unformly at random from the entre stream up untl the current tme. Ths approxmates the mantenance of an equal frequency dscretzaton over the entre stream up to the current pont. Its desrable features nclude nvolvng neglgble computaton once the stream becomes large, as updates to the sample become very rare. Our results show that t s very effectve n the absence of concept drft and can substantally reduce the error of LRSGD. Even wth a very small sample t only ncreases the error very modestly compared wth equal frequency dscretzaton over all data n the stream to date. IDAW s a varant of IDA that s useful when t s desrable to more closely track the current dstrbuton of the data. IDAW mantans a wndow of the most recent values for an attrbute and dscretzes these. Ths approach ncurs greater computaton than IDA, as the sample must be updated at every tme step. Further, the values must be mantaned n two orders, value order to support dscretzaton and tme order to allow mantenance of the wndow. Nonetheless we show that ths addtonal computatonal burden can delver substantal beneft n the context of ncremental concept drft. It remans an open topc for future research whether t s possble to dentfy when drftng dscretzaton such as that provded by IDAW s approprate and when non-drftng dscretzaton such as provded by IDA wll be more effectve. The computatonal burden of IDAW could be greatly reduced n contexts where the rate of expected drft relatve to the rate at whch objects arrve s low, by only updatng wth occasonal randomly selected objects. We conducted our experments usng LRSGD. We have shown that wth ths learner dscretzaton can delver substantal reductons n error relatve to learnng from undscretzed data. Ths s not to clam that more sophstcated treatment of undscretzed data could not acheve even better results. Our objectve s to show that dscretzaton s a practcal addton to the streamng data toolbox whch s worthy of consderaton, rather than to argue that t provdes unversal beneft. Whle our research has only consdered classfcaton learnng from stream data, dscretzaton s lkely to also prove valuable for other data mnng actvtes on data streams ncludng temset mnng [15] and clusterng [16]. We leave t to future research to explore the potental benefts of dscretzaton n these contexts and the relatve merts of alternatve stream dscretzaton strateges. It s a surprsng gap n the data mnng lterature that relatvely lttle has been done on dscretzaton for streamng data. Perhaps the greatest contrbuton of ths paper s to have shown that t can be done n a computatonally effcent manner and that t can delver substantal value. The executable bnares, scrpts, datasets and nstructons requred to replcate the experments can be downloaded from webb/software/ ncremental-dscretzaton.tgz. REFERENCES [1] G. I. Webb, Nave Bayes, n Encyclopeda of Machne Learnng, C. Sammut and G. I. Webb, Eds. Sprnger, 2011, pp [2] Y. Yang and G. I. Webb, Dscretzaton for nave-bayes learnng: Managng dscretzaton bas and varance, Machne Learnng, vol. 74, no. 1, pp , [3] S. Garca, J. Luengo, J. Saez, V. Lopez, and F. Herrera, A survey of dscretzaton technques: Taxonomy and emprcal analyss n supervsed learnng, IEEE Transactons on Knowledge and Data Engneerng, vol. 25, no. 4, pp , Aprl [4] A. Stuart and J. K. Ord, Kendall s Advanced Theory of Statstcs, 6th ed. Edward Arnold, [5] J. S. Vtter, Random samplng wth a reservor, ACM Trans. Mathematcal Software, vol. 11, no. 1, pp , [6] J. van Leeuwen and D. Wood, Interval heaps, The Computer Journal, vol. 36, no. 3, pp , [7] S. Guha and A. McGregor, Stream order and order statstcs: Quantle estmaton n random-order streams, SIAM Journal on Computng, vol. 38, no. 5, pp , [8] A. Gupta and F. X. Zane, Countng nversons n lsts, n Proc. Fourteenth Annual ACM-SIAM Symp. Dscrete Algorthms, ser. SODA 03, 2003, pp [9] T. Elomaa and P. Lehtnen, Mantanng optmal mult-way splts for numercal attrbutes n data streams, n PAKDD08, 2008, pp [10] J. Gama and C. Pnto, Dscretzaton from data streams: applcatons to hstograms and data mnng, n Proc ACM Symp. Appled Computng. ACM, 2006, pp [11] J. Lu, Y. Yang, and G. I. Webb, Incremental dscretzaton for nave- Bayes classfer, n Proc. 2nd Int. Conf. Advanced Data Mnng and Applcatons (ADMA 2006). Sprnger, 2006, pp [12] MOA, [Onlne]. Avalable: [13] K. Bache and M. Lchman, UCI machne learnng repostory, [Onlne]. Avalable: [14] X. Xu, Stream data mnng repostory, [Onlne]. Avalable: xqzhu/stream.html [15] N. Jang and L. Gruenwald, Cf-stream: mnng closed frequent temsets n data streams, n ACM SIGKDD-06. ACM, 2006, pp [16] C. C. Aggarwal, J. Han, J. Wang, and P. S. Yu, A framework for clusterng evolvng data streams, n Proceedngs of the 29th Internatonal Conference on Very Large Data Bases-Volume 29, 2003, pp

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,