Language Understanding in the Wild: Combining Crowdsourcing and Machine Learning

Size: px
Start display at page:

Download "Language Understanding in the Wild: Combining Crowdsourcing and Machine Learning"

Transcription

1 Language Understandng n the Wld: Combnng Crowdsourng and Mahne Learnng Edwn Smpson Unversty of Oxford, UK edwn@robots.ox.a.uk Pushmeet Kohl Mrosoft Researh, Cambrdge, UK pkohl@mrosoft.om Matteo Venanz Unversty of Southampton, UK mv1g10@es.soton.a.uk John Guver Mrosoft Researh, Cambrdge, UK joguver@mrosoft.om Nholas R. Jennngs Unversty of Southampton, UK nrj@es.soton.a.uk Steven Reee Unversty of Oxford, UK reee@robots.ox.a.uk Stephen J. Roberts Unversty of Oxford, UK sjrob@robots.ox.a.uk ABSTRACT Soal meda has led to the demoratsaton of opnon sharng. A wealth of nformaton about publ opnons, urrent events, and authors nsghts nto spef tops an be ganed by understandng the text wrtten by users. However, there s a wde varaton n the language used by dfferent authors n dfferent ontexts on the web. Ths dversty n language makes nterpretaton an extremely hallengng task. Crowdsourng presents an opportunty to nterpret the sentment, or top, of free-text. However, the subjetvty and bas of human nterpreters rase hallenges n nferrng the semants expressed by the text. To overome ths problem, we present a novel Bayesan approah to language understandng that reles on aggregated rowdsoured judgements. Our model enodes the relatonshps between s and text features n douments, suh as tweets, web artles, and blog posts, aountng for the varyng relablty of human lers. It allows nferene of annotatons that sales to arbtrarly large pools of douments. Our evaluaton usng two hallengng rowdsourng datasets shows that by effently explotng language models learnt from aggregated rowdsoured s, we an provde up to 25% mproved lassfatons when only a small porton, less than 4% of douments has been led. Compared to the sx state-of-the-art methods, we redue by up to 67% the number of rowd responses requred to aheve omparable auray. Our method was a jont wnner of the CrowdFlower - CrowdSale 2013 Shared Task hallenge at the onferene on Human Computaton and Crowdsourng (HCOMP 2013). Copyrght s held by the Internatonal World Wde Web Conferene Commttee (IW3C2). IW3C2 reserves the rght to provde a hyperlnk to the author s ste f the Materal s used n eletron meda. WWW 2015, May 18 22, 2015, Florene, Italy. ACM /15/05. General Terms Crowdsourng, mahne learnng, varatonal Bayes, lassfer ombnaton, text lassfaton, sentment analyss, human omputaton 1. INTRODUCTION Soal meda provdes an nreasngly rh soure of nformaton about publ opnon and urrent events, whh an be valuable to professonals aross a wde range of ndustres. For example, Twtter 1 an reflet the publ s sentment about the weather, suh as n the data olleted durng the CrowdSale 2013 Shared Task hallenge 2, opnon of major health emergenes suh as the H1N1 flu pandem [6], or knowledge of dsaster events suh as Typhoon Hayan [5]. Mnng ths large body of unstrutured data requres an understandng of the language used n eah spef ontext. For example, the sentment of a doument, whh reflets the author s atttudes or opnon of a subjet, s aptured n the language they use. However, that relatonshp between sentment and language typally depends on fators suh as the vewpont and the gender of the authors and the ontext of ther wrtng. For example, dstntve terms suh as love and dude are more frequently used by female and male Twtter users, respetvely, to refer to the same onept of a frend or a famly member [15]. Smlarly, reports posted by members of the publ to Ushahd after the 2010 Hat earthquake used a type of language that s sgnfantly dfferent to that seen n other loatons and other types of emergeny [22]. Ths dversty n soal meda text nhbts the performane of any gener method for automated doument lassfaton n the wld. However, ths problem an be allevated by human nterpreters who an use ther bakground knowledge and natural language understandng sklls to reognse the sentment of douments and adapt to the dverse language used n dfferent ontexts. Interpretng sentment or relevane of a pee of text s hghly subjetve and, along wth varatons n annotators skll levels, t an result n dsagreement. To overome ths

2 problem, exstng methods for rowdsoured doument lassfaton requre s from multple annotators for every doument n the orpus [28, 26], whh an be prohbtvely ostly or tme onsumng [22]. Fortunately, the ourrene of ertan terms n eah doument also provdes weak ndatons of the sentment of a doument, whh an be used to redue the ost of employng human nterpreters to annotate the entre orpus. Therefore, we propose a hybrd approah to large sale doument lassfaton that ntegrates human ntellgene wth automated analyss of text. In ths paper we present Bayesan Classfer Combnaton wth Words (BCCWords), a framework for ombnng annotatons from a rowd of workers wth text features to lassfy a orpus of douments. Ths approah s an example of an emergng researh area known as human-agent olletves [12]. We ntrodue a salable Bayesan nferene mehansm for BCCWords, whh learns posteror dstrbutons over the workers relablty and doument lassfatons, gven the douments text features and a set of rowdsoured annotatons. Our method not only allows us to handle the varyng error rates and bas of ndvdual members of the rowd, but also allows us to annotate an entre set of douments when only a subset have been led by the rowd, by leveragng the nferred language model to automatally annotate the remanng douments. In more detal, we make the followng ontrbutons to the state-of-the-art: 1. We present a novel gener model, BCCWords, that ombnes human and omputer nterpretatons of free text douments and nfers ther sentment. 2. We present a novel salable varatonal Bayes nferene algorthm, BCCWords VB, for tranng the BCCWords model. Ths algorthm was frst demonstrated at the CrowdSale 2013 Shared Task Challenge and was a jont wnner. 3. We derve an effent nferene deomposton method that allows our algorthm to perform bath nferene over hundreds of thousands of douments and demonstrate nferene wth 569, 786 rowdsoured sentment judgements for 98, 979 douments n approxmately 20 mnutes on a standard laptop. 4. We present an exhaustve evaluaton of our algorthm on two real datasets of text annotatons and ompare t aganst sx state-of-the-art methods for rowd-based text lassfaton and data aggregaton. Spefally, our evaluaton shows that our algorthm s up to 25% more aurate when only a small porton, less than 4% of the douments, have been led and that our algorthm redues by up to 67% the amount of rowd s requred to aheve omparable auray wth standard methods. The paper s strutured as follows. We revew the lterature on language modellng and aggregaton models for rowdsoured judgements n Seton 2. Seton 3 presents our model n detal, then Seton 4 provdes mathematal detals for our varatonal nferene algorthm. Seton 5 demonstrates the effay of our approah by omparng t aganst state-of-the-art benhmarks on two real world rowdsourng datasets. Fnally, we onlude and dsuss future work n Seton AGGREGATING JUDGEMENTS Many applatons n the lterature have employed rowdsourng, whereby multple people proess eah doument or data pont [13, 1, 33]. A key hallenge n suh rowdsourng applatons s to mtgate the bas of subjetve lers. Prevous work has addressed ths problem by ombnng rowd responses to obtan relable aggregate lassfatons. However, as yet, these methods have not exploted the language used n the text to further assst n nterpretng the text. We propose to use the varatons n language assoated wth sentment to redue errors and bas arse when employng members of the publ to perform lng tasks. A further hallenge wth real world applatons of doument rowdsourng s the ost of employng a suffent number of annotators to rapdly a large dataset. For example, the Ushahd dataset omprses at least 40, 000 text messages whh had to be nterpreted n the frst month after the earthquake n Hat, whh proved to be nfeasble [22]. However, a sutable language model would enable automated analyss at muh greater sale and allows the annotators to fous ther efforts on the most dffult douments. We therefore propose a learnng method for harnessng the sklls of human lers to learn a bespoke language model from muh larger sets of douments. A number of methods have been used n the lterature to address the hallenge of aggregatng annotatons from the rowd, nludng the smple tehnque of majorty votng [18]. However, smple majorty votng treats all annotators as equally relable and does not provde any meanngful measure of onfdene n the ombned deson to aount for onflts n judgement or low annotator skll levels. To overome ths problem, probablst methods have been developed whh learn the skll levels or bas of eah annotator and aggregate ther desons aordngly [26, 7, 32, 25, 31]. These methods are prone to error when only small amounts of gold s are avalable as they do not onsder unertanty n skll levels and other model parameters. For example, when only one s obtaned from a worker, these methods may nfer that the worker s ether perfetly relable or totally nompetent when, n realty, the worker s nether. Ths s a ommon problem wth approahes to nferene that use maxmum lkelhood or maxmum a-posteror solutons [4]. In order to overome ths lmtaton, algorthms for aggregatng rowdsoured data nludng SFlter [8] and Bayesan Classfer Combnaton (BCC) [14, 30] apture the unertanty n the workers skll levels or bas, as well as the unertanty n the aggregated s. Unfortunately, these methods do not explot the text features of douments, and onsequently requre eah doument to be led by the rowd, often multple tmes, to obtan onfdent lassfatons. Prevous work has ntrodued methods for automat text lassfaton based solely on word ontent, suh as the bagof-words lassfer [11]. Although suh methods have been appled to automated sentment analyss, they need a language model for eah applaton ontext [21]. Ths often requres large amounts of tranng data and substantal effort by the system desgner to ope wth the dversty n language [17]. In ontrast our approah uses a rowd of human annotators to learn a language model rapdly and heaply. In the followng setons we develop the BCCWords model and then demonstrate ts effay wth benhmark methods. 993

3 3. THE BCCWORDS MODEL In ths seton we desrbe our novel BCCWords model. Ths model s an extenson of the ndependent Bayesan lassfer ombnaton (IBCC) model presented n [28] whh lassfed data ponts usng only rowdsoured s. BC- CWords models the rowd as multple heterogeneous lassfers, and uses both the rowd s responses and the word struture of douments to lassfy them. An advantage of BCCWords s that t an be nferred n a sem-supervsed or unsupervsed manner. It does not requre separate tranng and test phases but uses a sngle, ombned learnng phase over all avalable data. The sem-supervsed approah smultaneously learns from led tranng data and the latent struture n the entre dataset, makng t partularly sutable when gold-standard data s lmted. We start by ntrodung our notaton. There s a rowd of K annotators expressng ther judgement about the orret lassfaton of N douments over a range of C possble lasses. The lasses may represent sentment lasses, top s, or other types of annotaton. Eah doument,, has an unknown true lass t C. The judgement of annotator k for doument s denoted as l (k), where l (k) C. Also, we assume that the nth word, w,n, of doument takes a value d from a dtonary of sze D words. For notatonal smplty, we assume a dense set of judgements n whh eah annotator rates all N douments. However, as wll beome lear n seton 4.1, our model naturally supports sparsty n the dataset, whh s the ase for the CrowdFlower dataset used n Seton 5. C true values α (k) 0, Dr. K workers N douments l (k) Dr. p β 0 t w d ω,d γ 0, D words Dr. Fgure 1: The fator graph of BCCWords. The rular, shaded nodes represent observed varables and the square, shaded nodes represent the hyperparameters. The plates desrbe () the set of K annotators, () the N douments, () the C possble true values and (v) the D words ontaned n the dtonary of terms used n the douments. The fator graph of our Bayesan Combnaton Model, BCCWords, s shown n Fgure 1, and the model s desrbed as follows. We assume that eah annotator draws judgements for douments of lass t = from a ategoral dstrbuton wth parameters : l (k) π (k), t Cat( ) where s the auray vetor of annotator k for douments of lass. That s, eah element of spefes the probablty that annotator k wll gve a judgement when presented wth a doument whose true lass s :, = p(l (k) = t = ) where p s the Drhlet dstrbuton. The set of auray vetors for all s alled the onfuson matrx representng k s relablty. In Fgure 1, the annotator onfuson matres are shown n the left-hand plate for all K annotators, deptng how the response of an annotator depends on the true lass t of the doument they are judgng. The use of onfuson matres allows our model to ombne annotators of very dfferent skll levels, and s able to handle those who make random guesses or whose responses are the opposte of what we expet. Furthermore, a onfuson matrx aounts for the personal bas of an annotator, sne a tendeny to selet a more postve or negatve judgment,, than other members of the rowd, when presented wth douments of true lass, wll result n an nreased lkelhood,,. A personal bas toward seletng judgement for all douments wll result n hgh lkelhoods, for all true lasses, thus the model wll lean that the from annotator k s less strongly dsrmnatve. Our language model s defned as follows. Gven a doument of lass we assume that the probablty that the nth word s d (.e. w,n = d) follows a ategoral dstrbuton wth parameters ω = {ω,d d}: w,n ω,d, t Cat(ω ), where ω,d = p(w,n = d t = ) s the probablty that a randomly-drawn word from a doument of lass s the word d. Ths probablst representaton of text n douments orresponds to a mxture of bag-of-words model [11], where eah mxture omponent s a bag-of-words model assoated wth one partular objet lass. The word dstrbutons are represented n Fgure 1 n the rght-hand plate, showng the varables orrespondng to the D words n the dtonary. We assume that the true lass for eah doument, t, s drawn from a ategoral dstrbuton wth parameters ρ: t ρ Cat(ρ). The parameters ρ an be regarded as the proporton of douments n eah lass, so that ρ = p(t = ρ). These parameters are shown at the top of Fgure 1. To model the unertanty n the latent varables n our model, we assgn onjugate Drhlet pror dstrbutons to, ω and ρ, for eah lass C and annotator k K: α (k) 0, Dr(α(k) 0, ) ρ β 0 Dr(β 0 ) ω γ 0, Dr(γ 0, ), where α (k) 0,, s the per-annotator onfuson matrx hyperparameter, and γ 0, s the hyperparameter for the bag-ofwords dstrbuton for eah lass. These hyperparameters have ntutve nterpretatons as pror pseudo-ounts, meanng that ther values are equvalent to a number of pror observatons, whh represent the strength of pror belefs. When mplementng BCCWords, the dagonal values of hyperparameters α 0, of the onfuson matres an be set to hgher values than the off-dagonals, enodng the pror belef that annotators are expeted to be better than random. The hyperparameters for the word dstrbutons, γ 0,, and 994

4 the lass proportons, β 0, an both be set so that the prors are unform. Ths reflets an ntal lak of nformaton n the word struture of the douments and the lass dstrbuton of the douments. To enable us to perform Bayesan nferene over our model, we frst spefy the omplete jont dstrbuton: ( p l, t, π (1) 1,.., π(k) C, ω1,.., ωc, ρ α(1) 0,1 { N K = =1 ρ t k=1 =1,.., α(k) t,l (k). 0,C, β 0, γ 0,1,.., γ 0,C W n=1 { C K p(ω γ 0, ) ) ω t,w,n } p(ρ β 0 ) k=1 } p(π (k) α (k) 0, ) where l s the set of s from all annotators for all douments, and W s the number of words n doument. The next seton desrbes a method that uses ths jont dstrbuton to estmate the posteror dstrbuton over the unknown varables n our model. 4. EFFICIENT VARIATIONAL INFERENCE The BCCWords model presented n the prevous seton s tuned, or nferred, by learnng the parameters of the posteror dstrbuton over ts unknown varables, so that the model fts the data. In ths seton we desrbe an effent method for nferene usng varatonal Bayes (VB) [4]. The next subseton presents detals of the VB algorthm for BC- CWords. Then, Seton 4.2 desrbes how ths BCCWords- VB algorthm an be extended to bath proessng and subsequently an sale to large datasets when omputer memory apaty s lmted. Varatonal Bayesan nferene s an approxmate method for obtanng a strt lower bound of the true (log) jont posteror. VB expltly takes unertanty nto aount at all levels of nferene, allowng us to margnalse (albet under the VB approxmatons) over unknown varables, rather than seletng the sngle most lkely value. The approxmaton offers huge speed ups over Monte Carlo samplng based Bayesan methods [28], and the performane degradaton appears small n BCC models. Hene, VB s our preferred algorthm for workng wth potentally large sets of douments. In our experments we mplement our VB algorthm usng Infer.NET [20], whh s a framework that enables rapd development and runnng of Bayesan nferene n graphal models. In partular, the Infer.NET nferene engne enables us to swth between alternatve nferene algorthms for BCCWords, nludng Gbbs samplng [9] and Expetaton propagaton [19], that are potentally more aurate but muh slower than VB and less sutable for performng nferene over large sale datasets. The varatonal Bayesan nferene algorthm uses an approxmaton to the jont probablty dstrbuton, q(t, θ) = q(t)q(θ), that fatorses between the true lasses of the douments, t = {t }, and the set of model parameters, θ = (1) { k, ω, ρ}. The algorthm terates between updatng the approxmate posteror dstrbuton over the true lasses of the douments, q(t), and the model parameters q(θ), untl t onverges. The theory behnd varatonal nferene guarantees that eah teraton redues the Kullbak- Lebler dvergene [16] between the approxmate soluton and the true posteror at eah teraton, so that the approxmaton beomes loser to the exat soluton wth eah teraton. The updates an be vewed as passng messages between the true lass s t and model parameters θ. As we wll see n the detaled explanaton below, the spef forms of the fators q(t) and q(θ) arse naturally from the BCCWords model and ts hoe of dstrbutons. The ondtonal ndependene relatons n our model allow us to further fatorse the dstrbuton over the model parameters, q(θ) wthout addtonal approxmaton: { } q(θ) = q(ρ) C q(ω ) k K q( ) Ths means we an update eah subset of model parameters separately, and eah of these fators wll exhange dfferent messages wth q(t). Ths type of algorthm s also known as varatonal message passng (VMP) and has an effent salable mplementaton whh s desrbed n Seton The BCCWords VB Algorthm We now present the mplementaton of the varatonal nferene approah to BCCWords desrbed n the prevous seton. Spefally, we desrbe the detals of the teratve updates wthn a step-by-step desrpton of the nferene algorthm based on the VB equatons that we derve for BCCWords. Inputs: the algorthm takes as nput a data set of annotators responses, l, and where avalable, a set of known target s whh are gold-standard tranng s. We note that gold s are not neessary and the algorthm an operate n unsupervsed mode. To run the algorthm, we must also selet values for α (k) 0,, k, β 0 and γ 0,, as desrbed above. A number of tehnques an be used to ntalse these hyperparameters when the hoe of values s unlear [3]. Step 1. Intalsaton: ntalse approxmate posteror dstrbutons over the model parameters, θ. The hoe of ntal dstrbutons affets the number of teratons requred for onvergene. In our mplementaton we ntalse the posteror dstrbutons over the model parameters by settng them to ther pror dstrbutons. Step 2. Update true lass predtons: update the approxmate posteror q(t ) over the lass of eah doument, N. For any doument whh has a gold, the value of t s known so we do not need to update q(t ). Instead we set q(t = ) = 1 where s the observed value of t, and q(t ) = 0 for all other lass values. For all other douments, we obtan the urrent estmate q (t ) of the probablty that the true lass of s : q (t = ) = r, C r,, (2) where r, s the expetaton of the lkelhood, gven by r, = E ρ,π,ω [ln p(t =, ρ, π, l, ω )] W K = E ρ[ln ρ ] + [ln ] + E ω [ln ω,w,n ]. (3) k=1 E π (k),l (k) n=1 The expetatons n ths equaton are found usng the urrent estmates of the dstrbutons over the model parameters, and are defned expltly n the subsequent steps of 995

5 the algorthm. These terms an be seen as messages from the model parameters to the true lass s, t. Equaton 2 an then be used to determne the messages to pass to the model parameters θ, whh are expetatons over the suffent statsts of the set of true lass s for all douments. The message for ρ ontans expeted ounts of eah true lass: N N = q(t = ). =1 The message for the onfuson matres ontans the ounts of eah judgement C gven the true C: N (k), = N =1 δ (k) l q(t, = ) (4) where δ (k) l s the Kroneker delta and s unty f l (k), = and zero otherwse. Smlarly, the message for the word dstrbutons ontans ounts of word ourrenes n eah lass: W N N,d = δ w,n,d q(t = ). (5) =1 n=1 Step 3. Update onfuson matres: update the approxmate posteror q(π (k) ) for eah lass C and eah annotator k K. The pror dstrbutons over the onfuson matres are Drhlet dstrbutons, whh are onjugate to the ategoral dstrbutons. Ths means that the posteror dstrbutons over the onfuson matres are also Drhlets, wth updated parameters: q ( ) ( = Dr α (k),1 ),..., α(k), (6) where L s the ardnalty of C and α (k) s alulated by addng ounts from the true lass message to the pror pseudo-ounts α (k) 0, : α (k), = α (k) 0,, + N (k),. (7) A more detaled dervaton of these teratve update equatons an be found n [28]. We an now alulate the message to send bak to the true s, whh s the expetaton term requred for Equaton (3): [ ] ( ) E ln, = Ψ α (k), Ψ ( L b=1,l α (k),b where Ψ(.) s the standard dgamma funton. ), (8) Step 4. Update word dstrbutons: update the approxmate posteror q(ω ) for eah row C to urrent estmate q (ω ). Agan, we have a posteror Drhlet dstrbuton due to the use of onjugate exponental-famly dstrbutons n our model: q (ω ) = Dr (ω γ,1,..., γ,d), (9) where the parameters are updated by γ,d = γ 0,,d + N,d. The message to the true lass s, whh s requred for Equaton (3), ontans the terms: ( D ) E [ln ω,d ] = Ψ (γ,d ) Ψ γ,d. (10) d =1 Step 5. Update lass proportons: update the approxmate posteror q(ρ) usng the Drhlet parameter update: q (ρ) = Dr (ρ β 1,..., β C) (11) where the parameters are updated by β = β 0, + N. The message from ths parameter s: ( L ) E [ln ρ ] = Ψ (β ) Ψ β b. (12) b=1 So, for one teraton of the algorthm, we alulate the updated parameters, dstrbutons and expetaton terms defned n steps 2 to 5. Step 6. Chek onvergene: f the target dstrbutons q (t = ) have not onverged to a stable soluton wthn a gven tolerane, repeat the algorthm from Step 2. Outputs: predtons of the doument lass s, gven by the urrent estmates of q(t = ), ther posteror expetatons. The algorthm also outputs approxmate posteror dstrbutons over the model parameters, q( ), q(ω ), and q(ρ) for eah row C and eah annotator k K. 4.2 Salablty Through Inferene Deomposton Performng a task suh as sentment analyss or dsaster report analyss an requre us to work wth extremely large datasets wth vast memory requrements. A major soure of memory usage s the large set of annotator onfuson matres that the nferene algorthm must teratvely update. For example, the Ushahd dataset, gathered after the Hat earthquake, was nterpreted by approxmately 700 workers [22]. To resolve memory exhauston dffultes of VB nferene at sale, ths seton proposes a salable verson of the BCCwords-VB algorthm, Salable BCCwords (SalBC- CWords), whh an be run on a sngle omputer. SalBCC- Words s dental to BCCWords exept that we deompose the entre data set nto a set of bathes of data by dstrbutng the annotators aross P parttons. Durng eah teraton of the VB nferene algorthm SalBCCWords swthes eah bath n and then out of memory n turn. Bathes produe messages that summarse eah porton of data and oupy onsderably less omputer memory than the entre data set. We hose to dstrbute the workers between the bathes so that eah bath ontans all the responses from workers n that bath. Ths parttonng rteron s sensble as eah bath only has to represent a subset of annotators, and thus only represents a small set of onfuson matres. The splts an be hosen to meet memory onstrants. The orrespondng fator graph s shown n Fgure 2, showng how all parttons share the same lass dstrbuton of douments and the word dstrbutons ondtoned on doument lass. When bath p s proessed, the pseudo ounts N (k), are alulated usng Equaton (4), for all k p. The log onfuson matrx messages, M p,,, for bath p for eah lass and doument are alulated as follows, M p,, = k p E [ ln,l (k) usng Equaton 8 to alulate the expeted log onfuson matrx. The log onfuson matrx for bath p s then deleted ] 996

6 C true values Worker k n bath p α (k) 0, Dr. l (k) w d p Dr. β 0 t γ 0, ω,d Dr. D words Worker k n bath p α (k ) 0, π (k ) Dr. l (k ) Fgure 2: Fator graph for Salable BCCwords. The four plates nluded n the graph desrbe () the set of workers K n the bath p, () the set of workers K n the bath p, () the C possble true values and (v) the D words ontaned n the dtonary of terms used n the tweets. The plate for the N douments s omtted for smplfaton. from memory. One a message s obtaned for eah bath, the log true lass predton probablty s alulated usng, r, = E ρ[ln ρ ] + W M p,, + E ω [ln ω,w,n ] p P n=1 as per Equaton 3. Hene, salbccwords s mathematally equvalent to BCCWords and both methods onverge to the same soluton. The remanng steps of the salbccwords algorthm are dental to those of BCCWords. We note that SalBCCwords may proess the bathes n any order and not all the bathes need be updated durng eah teraton of the VB algorthm. In our experments, we provde an empral evaluaton of both our algorthms showng the advantage of SalBCCWords-VB n memory oupany. 5. EMPIRICAL EVALUATION We evaluate the effay of our approah, SalBCCWords, usng two real-world datasets, omparng performane aganst the followng fve rval benhmark approahes. We note that BCCWords-VB and SalBCCWords produe the same lassfatons as they are mathematally equvalent and therefore only SalBCCWords results are shown. Majorty Votng (MV) s a popular and smple algorthm for obtanng a sngle deson from multple opnons provded by a rowd [18, 29]. MV greedly assgns a lass to eah doument by hoosng the wth the most votes from the rowd. All votes are onsdered wth unform weght, thus treatng all annotators as equally relable. Typally, no measure of unertanty n the fnal deson s provded. Vote Dstrbuton treats the fraton of votes n support of eah lass as the probablty of that lass. It therefore represents a smple tehnque for estmatng the empral probablty that a doument has a partular true, assumng that all annotators are equally relable. Bag-of-words Classfer + MV trans a bag-of-words lassfer by treatng the majorty vote as gold-led tranng data [11]. Therefore, ths approah learns a language model that an be used to lassfy douments that have not yet been led by the rowd, but does not aount for varyng relablty of the rowd lers when tranng the model. Dawd & Skene s a model for ombnng s from multple lassfers, usng onfuson matres to model the relablty of ndvdual lers [7]. The learnng algorthm for Dawd & Skene does not aount for unertanty n the onfuson matres or other model parameters, whh an lead to errors when gold-led data s lmted. Independent Bayesan Classfer Combnaton (IBCC) learns the onfuson matres usng varatonal Bayes (VB). Therefore, n ontrast to Dawd&Skene, t handles model unertanty and an operate n unsupervsed settngs when gold-led examples are unavalable. However, ths model does not onsder text features and reles solely on s provded by the rowd. Communty-Based Bayesan Classfer Combnaton (CBCC) s an extenson of IBCC that models ommuntes of workers wth smlar onfuson matres. It learns both the onfuson matres of eah ommunty and eah worker but, lke IBCC, t does not aount for text features n the douments [30]. We run CBCC wth three ommuntes as suggested n the orgnal paper, for both CF and SP. In our experments, we set the prors for IBCC, CBCC and SalBCCWords as follows. For the lass proportons, ρ, we used unbased prors by settng the values of β 0, to be equal for all lasses. For the workers onfuson matres, we used nformatve prors, settng the dagonal ounts α (k) 0,, to C + 1.5, wth the off-dagonals set to 1. Ths means that workers are ntally assumed to be reasonably aurate. For SalBCCWords, the word dstrbutons were gven unnformatve prors, by settng unform values for γ 0,,d for all words d D. 5.1 Datasets We evaluate our approah usng two rowdsourng datasets, whh provde real sentment judgements obtaned from human workers. The two datasets demonstrate our approah on two very dfferent knds of doument, wth dstnt sentment analyss problems. The CrowdFlower dataset (CF) was provded by Crowd- Flower 3 as part of the 2013 Crowdsourng at Sale shared task hallenge. The dataset ontans 569, 375 judgements for 98, 980 tweets. Ths dataset nludes 300 tweets wth gold-standard sentment s, whh orrespond to 1, 720 judgements from 461 workers. The judgements reflet the sentment of tweets dsussng the weather, and an take values from four sentment ategores: negatve (0), neutral (1), postve (2), tweet not related to weather (4) and annot tell (5). Ths dataset therefore onerns a mult-lass lng problem

7 IBCC CBCC SalBCCWords MV Text lassfer Dawd&Skene MV Vote dstrbuton IBCC CBCC SalBCCWords Bag of words + MV Dawd&Skene MV VoteDstrbuton Auray Auray ,000 50,000 75, , , , # s # s (a) CrowdFlower (CF) (b) Sentment Polarty (SP) Fgure 3: Auray of seven methods measured wth nreasng proportons of s for both datasets. The Sentment Polarty dataset (SP) ontans annotatons for a set of 5, 000 sentenes from move revews, extrated by [23] from the webste RottenTomatoes 4. Ths dataset has gold-standard sentment s for all the move revews assgned by the webste, whh marked them as ether fresh (postve) or rotten (negatve). A set of 27, 747 sentment judgements were olleted from 203 workers usng Amazon Mehanal Turk (AMT) 5 by [27]. The SP dataset therefore presents a bnary sentment analyss problem, wth workers fored to selet ether postve (1) or negatve (0), wth no opton to express ther unertanty. The voabulary of a real-world text orpus s often extremely large, so most pratal deployments of language modellng methods employ a set of heurst pre-proessng steps to remove nosy data that would otherwse add unneessary omputaton and memory osts. In our experments, the dtonary for both datasets was obtaned usng the standard approah of stemmng the text through the Porter s stemmer algorthm [24], then removng ommon stop words, before extratng the 300 words wth the hghest term frequeny x nverse doument frequeny (TF-IDF) sore [2]. TF-IDF s a heurst for seletng words that are mportant n dstngushng douments wthn a orpus, where term frequeny s the number of ourrenes of word d wthn the orpus, and nverse doument frequeny = log(n/n d ), where N d s the number of douments ontanng d. Whle SalBCCWords s agnost to the type of features suppled, these standard text pre-proessng steps are used to allow the experments to fous on omparng rowdsoured sentment lassfaton methods. 5.2 Performane Comparsons We nvestgated how the effetveness of the language model learnt by SalBCCWords vares wth the number of s suppled by the rowd. To do ths we ompared the performane of the alternatve methods on our two datasets and evaluated the effay of eah usng four standard metrs: Auray s the proporton of douments that were orretly led. For methods suh as SalBCCWords that output probabltes we assgn the wth the hghest probablty. Average reall s the mean aross all lasses of the reall rate, defned as the fraton of postve nstanes of a gven lass that were orretly led. Negatve log probablty densty (NLPD) s an error measure the lower the better based on how muh weght a lassfer gave to the orret lass of eah doument as defned n [30]. AUC s the area under the urve of the reever operatng haraterst (RoC), whh s the probablty that a randomly-hosen postve example s assgned a hgher probablty than a randomly-hosen negatve example [28]. Ths s a measure of an algorthm s ablty to dfferentate between lasses, regardless of whether the lasses are mbalaned. For the CF dataset, we provde the mean AUC over pars of lasses, usng the method of [10]. The experment s run teratvely, startng by runnng eah method wth 2% randomly-hosen judgements from the rowd, then evaluatng the lassfaton effay. We then nrease the number of judgements by addng a further 2% randomly-seleted s from the rowd and re-runnng all the methods. Ths proess s repeated untl all rowdsoured s have been used by the predton methods. Fgure 3 shows the auray for the sx methods for both datasets, whh mproves for all methods as they get more data. In partular, SalBCCWords has hghest auray for a small number of s, demonstratng the added value of the language model. SalBCCWords mantans the hghest auray throughout, although IBCC, CBCC and Dawd & Skene ath up for large numbers of rowdsoured la- 998

8 CF (20% s) SP (20% s) Method Auray Avg. reall NLPD AUC Auray Avg. reall NLPD AUC Majorty vote Vote dstrbuton Bag-of-words lassfer + MV Dawd&Skene IBCC CBCC Salable BCCwords Table 1: Auray, average reall and negatve log probablty densty sore (NLPD) for the CF and the SP datasets for the sx tested methods (one for eah row) when usng 20% rowdsoured s. Usng ths subset of s, 70% of the douments n both datasets have at least one rowdsoured. CF (all s) SP (all s) Method Auray Avg. reall NLPD AUC Auray Avg. reall NLPD AUC Majorty vote Vote dstrbuton Bag-of-words lassfer + MV Dawd&Skene IBCC CBCC Salable BCCwords Table 2: Auray, average reall and negatve log probablty densty sore (NLPD) for the CF and the SP datasets for the sx tested methods (one for eah row) when usng all avalable rowdsoured s. bels. The auray of SalBCCWords s 25% hgher than CBCC (7 vs. 0.40) after 20,000 s n CF and s 8% hgher than CBCC (4 vs. 0) after 110 s n SP. Importantly, n order to aheve the same auray, Sal- BCCWords requres up to 56, 935 fewer s n CF and up to 440 fewer s n SP ompared to the benhmarks. Furthermore, Dawd & Skene ntally nfers a poor model of worker auray due to sare s, whh leads to poor lassfaton performane. Suh a old start phase s mtgated n the BCC methods by aountng for unertanty n the workers auray. Both MV and Vote Dstrbutons are more aurate than Dawd & Skene n the ntal phase but they are less aurate than all the other methods when a larger amount of rowd s have been used. Table 1 shows predton metrs for both datasets when only 20% of the omplete set of rowdsoured s were used. Usng ths subset redues the number of s avalable for eah doument so that only 70% of douments have one or more rowdsoured s. To lassfy the 30% of douments wth no rowdsoured s, SalBCCWords and the bag-of-words lassfer apply ther language models, whle other methods have no nformaton about these douments and assgn a default ategory to all unled douments. SalBCCWords has the hghest auray and average reall on both datasets, sgnfantly outperformng the bagof-words model, whh also uses a language model but does not model annotator bas and error rates. Table 2 shows predton metrs for both datasets, when all rowdsoured s were used. SalBCCWords has the hghest auray and average reall on both datasets and ompettve AUC and NLPD on the SP dataset. Ths demonstrates that SalBCCWords performs well even when s are plentful (n ths ase, on average 6 s per doument). 5.3 Language Model The language model nferred by SalBCCWords represents the probabltes of eah word n the dtonary ondtoned on the sentment lasses. In Fgure 4, the top row shows the Wordles (word louds) wth the most probable 27 words n the fve sentment lasses of the CF dataset. SalBCCWords s able to dentfy words that dsrmnate the sentment lasses, suh as love and perfet, whh are more lkely to our n the postve sentment lass, whereas words suh as old and hate are more lkely to appear n the negatve lass. We note that ommon words suh as day are hghly lkely n both postve and negatve lasses and are therefore not good dsrmnators n ths dataset. However, SalBCCWords naturally uses the most dsrmnatve words to nfer the sentment lass through Equaton 2. In the seond row of Fgure 4, the wordles show the words that most strongly ndate the lass,.e. the words d wth hghest probablty p(t = w,n = d) for lass. Here we an see that day s not ndatve of sentment lass and there s lttle overlap between the word louds for eah lass. Fgure 5 shows the Wordles for the SP dataset, wth the word good beng equally lkely n both sentment lasses, suggestng that words that seem ntutvely postve may be poor dsrmnators, possbly beause ther meanng s hghly ontext-dependent. To valdate the qualty of the language model nferred by SalBCCWords usng the rowd s, we ompare t to a language model learned by tranng SalBCCWords on the gold-standard s. For both models, we rank words by ther probabltes ω,d n eah lass, to examne whh terms the model has nferred are mportant to eah lass. Usng the non-parametr Kendall s τ rank orrelaton test, we fnd a sgnfant postve orrelaton between the rank- 999

9 (a) Postve (b) Negatve () Neutral (d) Not related (e) Unknown (f) Postve (g) Negatve (h) Neutral () Not related (j) Unknown Fgure 4: Top row (a) to (e): word louds of the most probable 27 words from SalBCCWords for the sentment lasses of the CF dataset. Word sze s proportonal to estmated lkelhood gven the true lass. Seond row (f ) to (j): word louds for the most dsrmnatve 27 words for three lasses of the CF dataset. Word sze s proportonal to the lass probablty gven the word. Colours are for legblty only. Postve Gold Crowd Negatve Neutral Unknown Not related Postve Negatve (a) CrowdFlower (CF) (b) Sentment Polarty (SP) Table 3: The Kendall s τ rank orrelaton oeffents (p < 10 5 ) for the word dstrbutons, ω,d, estmated by BCCWords usng gold s and rowd judgements. Colour ntensty ndates the orrelaton strength. CWords on the SP dataset shown n Table 2. Ths suggests that the rowd s desons may more aurately reflet the gold-led data when lassfyng the SP dataset, whh has only two dametrally opposed lasses, rather than the fve less easly dstngushed lasses of CF. 5.4 (a) Postve (b) Negatve Fgure 5: Word louds of the most probable 27 words from SalBCCWords for the sentment lasses of the SP dataset. Word sze s proportonal to estmated lkelhood gven the true lass. Colours are for legblty only. Proflng Crowd Workers Besdes predtng doument lassfatons and the language model, SalBCCWords learns the onfuson matres that haraterse the workers skll levels aross sentment lasses. Fgure 6 shows example rowd members wth very dfferent onfuson matres. For example, subfgure (a) shows a ompetent annotator who provdes aurate s aross all sentment lasses, hene the hghly peaked lkelhoods along the dagonals. Subfgure (b) shows an annotator whose relablty s nonsstent aross the lasses, wth varyng lkelhoods of norret worker s. Ths fgure shows that we are able to detet annotators wth very dfferent behavour wthn our two real-world datasets. The BCCWords model aptures not just the overall skll level, but also the ngs obtaned from the model traned on the rowdsoured data and the model traned on gold s, as shown n Table 3 for both datasets. Ths ndates that SalBCCWords an effetvely tran a language model usng rowd s when gold-standard data s unavalable. Kendall s τ s muh hgher for SP, whh reflets the hgher auray of SalBC- 1000

10 auray and bas of the annotator for eah spef lass, shown by the dstrbuton n eah row of Fgure 6. Good worker Inaurate worker True Worker True (a) CF dataset Worker (b) CF dataset True Worker () SP dataset Fgure 7: Memory usage of BCCWords (orange lne) and the salable mplementaton of BCCWords (blue lne) measured on real data. True We ompared our algorthm wth sx benhmark methods on two real-world rowdsourng datasets and showed that our method an mprove auray by 25% over both standard text lassfers and promnent aggregaton models for rowdsoured data wth annotatons for a small porton of douments. Furthermore, our approah sgnfantly redues, by up to 67%, the amount of s that must be obtaned through rowdsourng to aheve omparable auray wth rval methods. We are urrently nvestgatng other promnent applatons of our method: dentfyng ad requrements n dsaster response usng reports from members of the publ and frst responders; evaluatng nvestor sentment towards ompanes expressed n free-text reports; and determnng student sentment from onlne forum postngs to ad pastoral are. These domans provde vast amounts of unstrutured data that an beneft from nsghts provded by human annotators and the salablty of automated methods. Future work wll evaluate BCCWords wth the dfferent types of features avalable n these domans, nludng alternatve text features, doument metadata, and mage features. The way that BCCWords learns the annotator onfuson matres ould be modfed for problems wth ordnal lasses, suh as those representng dfferent strengths of sentment, to take advantage of relatonshps between neghbourng lasses when samples of annotator behavour for eah and true lass value are sparse. Worker (d) SP dataset Fgure 6: Confuson matres of four workers, wth the lkelhood of worker gven true lass on the vertal axs. These profles show two workers who are very well albrated n ther opnons (left) and two workers who provde less aurate s (rght). 5.5 Memory Usage We ompared the memory usage of BCCWords-VB and SalBCCWords on the CF dataset. Fgure 7 shows a plot of memory demand when runnng the BCCWords-VB algorthm wth nreasng subsets of s. In partular, we measured memory demand through the standard memory profler avalable n.net6 that provdes the approxmate memory alloated on the garbage olleton heaps to store the model nstanes of BCCWords. Despte the hgh nose of these ounters, whh explans the varablty of the urves n the graph, we an stll observe the general nreasng trend of memory usage when usng more s. As shown n Fgure 7, the SalBCCWords algorthm uses up to 80% less memory than the standard mplementaton of BCCWords (1 GB vs. 200 MB) when the dataset nludes 50, 000 s. 6. CONCLUSIONS 7. Ths paper presents BCCWords, a novel algorthm for ombnng rowdsoured annotatons wth text features n order to determne the sentment of douments. We presented a salable varatonal Bayes nferene algorthm for BCCWords and demonstrated how t an be mplemented for a large orpus annotated by rowd workers. Our analyss demonstrated that BCCWords s able to dentfy key dfferentatng text features, whh produe more aurate sentment lassfatons when rowdsoured s are sare. It s able to lassfy short messages suh as tweets, despte the lmted number of text features n these short messages. 6 ACKNOWLEDGMENTS We thank Gabrella Kaza, Lum, London, UK for ntal dsussons. Ths work was funded by the EPSRC ORCHID programme grant (EP/I011587/1) and Mrosoft Researh, Cambrdge, UK. 8. REFERENCES [1] Y. Bahrah, T. Graepel, T. Mnka, and J. Guver. How To Grade a Test Wthout Knowng the Answers A Bayesan Graphal Model for Adaptve Crowdsourng and Apttude Testng. In Pro. of the 29th Int. Conf. on Mahne Learnng, pages ACM, CLR Profler, lrprofler.odeplex.om 1001

11 [2] R. Baeza-Yates and B. Rbero-Neto. Modern Informaton Retreval. Addson Wesley, [3] J. Bergstra and Y. Bengo. Random Searh for Hyper-Parameter Optmzaton. The Journal of Mahne Learnng Researh, 13: , [4] C. Bshop. Pattern Reognton and Mahne Learnng. Sprnger, 4th edton, [5] D. Butler. Crowdsourng Goes Manstream n Typhoon Hayan Response. Nature News, do: /nature , [6] C. Chew and G. Eysenbah. Pandems n the Age of Twtter: Content Analyss of Tweets durng the 2009 H1N1 Outbreak. PloS One, 5(11):e14118, [7] A. P. Dawd and A. M. Skene. Maxmum Lkelhood Estmaton of Observer Error-Rates Usng the EM Algorthm. Journal of the Royal Statstal Soety. Seres C (Appled Statsts), 28(1):20 28, Jan [8] P. Donmez, J. Carbonell, and J. Shneder. A Probablst Framework to Learn from Multple Annotators wth Tme-Varyng Auray. In Pro. of the Int. Conf. on Data Mnng, pages , [9] A. Gelfand and A. Smth. Samplng-Based Approahes to Calulatng Margnal Denstes. Journal of the Ameran Statstal Assoaton, 85(410): , [10] D. J. Hand and R. J. Tll. A Smple Generalsaton of the Area Under the ROC Curve for Multple Class Classfaton Problems. Mahne learnng, 45(2): , [11] Z. S. Harrs. Dstrbutonal Struture. Word, pages , [12] N. R. Jennngs, L. Moreau, D. Nholson, S. D. Ramhurn, S. Roberts, T. Rodden, and A. Rogers. On Human-Agent Colletves. Communatons of the ACM, [13] E. Kamar, S. Haker, and E. Horvtz. Combnng Human and Mahne Intellgene n Large-Sale Crowdsourng. In Pro. of the 11th Int. Conf. on Autonomous Agents and Multagent Systems, pages , [14] H. Km and Z. Ghahraman. Bayesan Classfer Combnaton. In Pro. of the 15th Int. Conf. on Artfal Intellgene and Statsts, page 619, [15] F. Kvran-Swane, S. Brody, and M. Naaman. Effets of Gender and Te Strength on Twtter Interatons. Frst Monday, 18(9), [16] S. Kullbak and R. A. Lebler. On Informaton and Suffeny. The Annals of Mathematal Statsts, 22(1):79 86, [17] A. Levenberg, S. Pulman, K. Molanen, E. Smpson, and S. Roberts. Predtng Eonom Indators from Web Text Usng Sentment Composton. In Int. Journal of Computer and Communaton Engneerng, [18] N. Lttlestone and M. Warmuth. The Weghted Majorty Algorthm. In 30th Annual Symposum on Foundatons of Computer Sene, pages IEEE, [19] T. Mnka. Expetaton Propagaton for Approxmate Bayesan Inferene. In Pro. of the 17th Conf. on Unertanty n Artfal Intellgene, page 362, [20] T. Mnka, J. Wnn, J. Guver, and D. Knowles. Infer.NET 2.5. Mrosoft Researh Cambrdge. See mrosoft. om/nfernet, [21] K. Molanen and S. Pulman. Sentment omposton. In Pro. of the Reent Advanes n Natural Language Proessng Int. Conf., pages , [22] N. Morrow, N. Mok, A. Papendek, and N. Komh. Independent Evaluaton of the Ushahd Hat Projet. Development Informaton Systems., 8:2011, [23] B. Pang and L. Lee. A Sentmental Eduaton: Sentment Analyss usng Subjetvty Summarzaton Based on Mnmum Cuts. In Pro. of the 42nd annual meetng on Assoaton for Computatonal Lngusts, page 271, [24] M. F. Porter. An algorthm for suffx strppng. Program: Eletron lbrary and Informaton Systems, 14(3): , [25] V. C. Raykar and S. Yu. Elmnatng Spammers and Rankng Annotators for Crowdsoured Labelng Tasks. Journal of Mahne Learnng Researh, 13: , [26] V. C. Raykar, S. Yu, L. H. Zhao, G. H. Valadez, C. Florn, L. Bogon, and L. Moy. Learnng From Crowds. Journal of Mahne Learnng Researh, 11: , [27] F. Rodrgues, F. Perera, and B. Rbero. Learnng from Multple Annotators: Dstngushng Good from Random Labelers. Pattern Reognton Letters, 34(12): , [28] E. Smpson, S. Roberts, I. Psoraks, and A. Smth. Dynam Bayesan Combnaton of Multple Imperfet Classfers. In Intellgent Systems Referene Lbrary seres: Deson Makng and Imperfeton, pages Sprnger, [29] L. Tran-Thanh, M. Venanz, A. Rogers, and N. R. Jennngs. Effent Budget Alloaton wth Auray Guarantees for Crowdsourng Classfaton Tasks. In Pro. of the 12th Int. Conf. on Autonomous Agents and Multagent Systems, pages , [30] M. Venanz, J. Guver, G. Kaza, P. Kohl, and M. Shokouh. Communty-based Bayesan Aggregaton Models for Crowdsourng. In Pro. of the 23rd Int. Conf. on World Wde Web, pages , [31] P. Welnder, S. Branson, P. Perona, and S. J. Belonge. The Multdmensonal Wsdom of Crowds. In Advanes n Neural Informaton Proessng Systems, pages , [32] J. Whtehll, T.-f. Wu, J. Bergsma, J. R. Movellan, and P. L. Ruvolo. Whose Vote Should Count More: Optmal Integraton of Labels from Labelers of Unknown Expertse. In Advanes n Neural Informaton Proessng Systems, pages , [33] K. W. Wllett, C. J. Lntott, S. P. Bamford, K. L. Masters, B. D. Smmons, K. R. Casteels, E. M. Edmondson, L. F. Fortson, S. Kavraj, W. C. Keel, et al. Galaxy Zoo 2: Detaled Morphologal Classfatons for 304,122 Galaxes from the Sloan Dgtal Sky Survey. Monthly Notes of the Royal Astronomal Soety, 435(4): ,

Color Texture Classification using Modified Local Binary Patterns based on Intensity and Color Information

Color Texture Classification using Modified Local Binary Patterns based on Intensity and Color Information Color Texture Classfaton usng Modfed Loal Bnary Patterns based on Intensty and Color Informaton Shvashankar S. Department of Computer Sene Karnatak Unversty, Dharwad-580003 Karnataka,Inda shvashankars@kud.a.n

More information

Performance Evaluation of TreeQ and LVQ Classifiers for Music Information Retrieval

Performance Evaluation of TreeQ and LVQ Classifiers for Music Information Retrieval Performane Evaluaton of TreeQ and LVQ Classfers for Mus Informaton Retreval Matna Charam, Ram Halloush, Sofa Tsekerdou Athens Informaton Tehnology (AIT) 0.8 km Markopoulo Ave. GR - 19002 Peana, Athens,

More information

Fuzzy Modeling for Multi-Label Text Classification Supported by Classification Algorithms

Fuzzy Modeling for Multi-Label Text Classification Supported by Classification Algorithms Journal of Computer Senes Orgnal Researh Paper Fuzzy Modelng for Mult-Label Text Classfaton Supported by Classfaton Algorthms 1 Beatrz Wlges, 2 Gustavo Mateus, 2 Slva Nassar, 2 Renato Cslagh and 3 Rogéro

More information

Adaptive Class Preserving Representation for Image Classification

Adaptive Class Preserving Representation for Image Classification Adaptve Class Preservng Representaton for Image Classfaton Jan-Xun M,, Qankun Fu,, Wesheng L, Chongqng Key Laboratory of Computatonal Intellgene, Chongqng Unversty of Posts and eleommunatons, Chongqng,

More information

Link Graph Analysis for Adult Images Classification

Link Graph Analysis for Adult Images Classification Lnk Graph Analyss for Adult Images Classfaton Evgeny Khartonov Insttute of Physs and Tehnology, Yandex LLC 90, 6 Lev Tolstoy st., khartonov@yandex-team.ru Anton Slesarev Insttute of Physs and Tehnology,

More information

A Fast Way to Produce Optimal Fixed-Depth Decision Trees

A Fast Way to Produce Optimal Fixed-Depth Decision Trees A Fast Way to Produe Optmal Fxed-Depth Deson Trees Alreza Farhangfar, Russell Grener and Martn Znkevh Dept of Computng Sene Unversty of Alberta Edmonton, Alberta T6G 2E8 Canada {farhang, grener, maz}@s.ualberta.a

More information

Multilabel Classification with Meta-level Features

Multilabel Classification with Meta-level Features Multlabel Classfaton wth Meta-level Features Sddharth Gopal Carnege Mellon Unversty Pttsburgh PA 523 sgopal@andrew.mu.edu Ymng Yang Carnege Mellon Unversty Pttsburgh PA 523 ymng@s.mu.edu ABSTRACT Effetve

More information

TAR based shape features in unconstrained handwritten digit recognition

TAR based shape features in unconstrained handwritten digit recognition TAR based shape features n unonstraned handwrtten dgt reognton P. AHAMED AND YOUSEF AL-OHALI Department of Computer Sene Kng Saud Unversty P.O.B. 578, Ryadh 543 SAUDI ARABIA shamapervez@gmal.om, yousef@s.edu.sa

More information

CS 534: Computer Vision Model Fitting

CS 534: Computer Vision Model Fitting CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust

More information

Boosting Weighted Linear Discriminant Analysis

Boosting Weighted Linear Discriminant Analysis . Okada et al. / Internatonal Journal of Advaned Statsts and I&C for Eonoms and Lfe Senes Boostng Weghted Lnear Dsrmnant Analyss azunor Okada, Arturo Flores 2, Marus George Lnguraru 3 Computer Sene Department,

More information

LOCAL BINARY PATTERNS AND ITS VARIANTS FOR FACE RECOGNITION

LOCAL BINARY PATTERNS AND ITS VARIANTS FOR FACE RECOGNITION IEEE-Internatonal Conferene on Reent Trends n Informaton Tehnology, ICRTIT 211 MIT, Anna Unversty, Chenna. June 3-5, 211 LOCAL BINARY PATTERNS AND ITS VARIANTS FOR FACE RECOGNITION K.Meena #1, Dr.A.Suruland

More information

Progressive scan conversion based on edge-dependent interpolation using fuzzy logic

Progressive scan conversion based on edge-dependent interpolation using fuzzy logic Progressve san onverson based on edge-dependent nterpolaton usng fuzzy log P. Brox brox@mse.nm.es I. Baturone lum@mse.nm.es Insttuto de Mroeletróna de Sevlla, Centro Naonal de Mroeletróna Avda. Rena Meredes

More information

Research on Neural Network Model Based on Subtraction Clustering and Its Applications

Research on Neural Network Model Based on Subtraction Clustering and Its Applications Avalable onlne at www.senedret.om Physs Proeda 5 (01 ) 164 1647 01 Internatonal Conferene on Sold State Deves and Materals Sene Researh on Neural Networ Model Based on Subtraton Clusterng and Its Applatons

More information

Pattern Classification: An Improvement Using Combination of VQ and PCA Based Techniques

Pattern Classification: An Improvement Using Combination of VQ and PCA Based Techniques Ameran Journal of Appled Senes (0): 445-455, 005 ISSN 546-939 005 Sene Publatons Pattern Classfaton: An Improvement Usng Combnaton of VQ and PCA Based Tehnques Alok Sharma, Kuldp K. Palwal and Godfrey

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

Matrix-Matrix Multiplication Using Systolic Array Architecture in Bluespec

Matrix-Matrix Multiplication Using Systolic Array Architecture in Bluespec Matrx-Matrx Multplaton Usng Systol Array Arhteture n Bluespe Team SegFault Chatanya Peddawad (EEB096), Aman Goel (EEB087), heera B (EEB090) Ot. 25, 205 Theoretal Bakground. Matrx-Matrx Multplaton on Hardware

More information

Bottom-Up Fuzzy Partitioning in Fuzzy Decision Trees

Bottom-Up Fuzzy Partitioning in Fuzzy Decision Trees Bottom-Up Fuzzy arttonng n Fuzzy eson Trees Maej Fajfer ept. of Mathemats and Computer Sene Unversty of Mssour St. Lous St. Lous, Mssour 63121 maejf@me.pl Cezary Z. Janow ept. of Mathemats and Computer

More information

Cluster ( Vehicle Example. Cluster analysis ( Terminology. Vehicle Clusters. Why cluster?

Cluster (  Vehicle Example. Cluster analysis (  Terminology. Vehicle Clusters. Why cluster? Why luster? referene funton R R Although R and R both somewhat orrelated wth the referene funton, they are unorrelated wth eah other Cluster (www.m-w.om) A number of smlar ndvduals that our together as

More information

Data Mining: Model Evaluation

Data Mining: Model Evaluation Data Mnng: Model Evaluaton Aprl 16, 2013 1 Issues: Evaluatng Classfcaton Methods Accurac classfer accurac: predctng class label predctor accurac: guessng value of predcted attrbutes Speed tme to construct

More information

Steganalysis of DCT-Embedding Based Adaptive Steganography and YASS

Steganalysis of DCT-Embedding Based Adaptive Steganography and YASS Steganalyss of DCT-Embeddng Based Adaptve Steganography and YASS Qngzhong Lu Department of Computer Sene Sam Houston State Unversty Huntsvlle, TX 77341, U.S.A. lu@shsu.edu ABSTRACT Reently well-desgned

More information

Unsupervised Learning

Unsupervised Learning Pattern Recognton Lecture 8 Outlne Introducton Unsupervsed Learnng Parametrc VS Non-Parametrc Approach Mxture of Denstes Maxmum-Lkelhood Estmates Clusterng Prof. Danel Yeung School of Computer Scence and

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

FUZZY SEGMENTATION IN IMAGE PROCESSING

FUZZY SEGMENTATION IN IMAGE PROCESSING FUZZY SEGMENTATION IN IMAGE PROESSING uevas J. Er,, Zaldívar N. Danel,, Roas Raúl Free Unverstät Berln, Insttut für Inforat Tausstr. 9, D-495 Berln, Gerany. Tel. 0049-030-8385485, Fax. 0049-030-8387509

More information

Bayesian Classifier Combination

Bayesian Classifier Combination Bayesan Classfer Combnaton Zoubn Ghahraman and Hyun-Chul Km Gatsby Computatonal Neuroscence Unt Unversty College London London WC1N 3AR, UK http://www.gatsby.ucl.ac.uk {zoubn,hckm}@gatsby.ucl.ac.uk September

More information

Semi-analytic Evaluation of Quality of Service Parameters in Multihop Networks

Semi-analytic Evaluation of Quality of Service Parameters in Multihop Networks U J.T. (4): -4 (pr. 8) Sem-analyt Evaluaton of Qualty of Serve arameters n Multhop etworks Dobr tanassov Batovsk Faulty of Sene and Tehnology, ssumpton Unversty, Bangkok, Thaland bstrat

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

Measurement and Calibration of High Accuracy Spherical Joints

Measurement and Calibration of High Accuracy Spherical Joints 1. Introduton easurement and Calbraton of Hgh Auray Spheral Jonts Ale Robertson, Adam Rzepnewsk, Alexander Sloum assahusetts Insttute of Tehnolog Cambrdge, A Hgh auray robot manpulators are requred for

More information

Multi-scale and Discriminative Part Detectors Based Features for Multi-label Image Classification

Multi-scale and Discriminative Part Detectors Based Features for Multi-label Image Classification Proeedngs of the wenty-seventh Internatonal Jont Conferene on Artfal Intellgene (IJCAI-8) Mult-sale and Dsrmnatve Part Detetors Based Features for Mult-lael Image Classfaton Gong Cheng, Deheng Gao, Yang

More information

Performance Analysis of Hybrid (supervised and unsupervised) method for multiclass data set

Performance Analysis of Hybrid (supervised and unsupervised) method for multiclass data set IOSR Journal of Computer Engneerng (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 16, Issue 4, Ver. III (Jul Aug. 2014), PP 93-99 www.osrjournals.org Performane Analyss of Hybrd (supervsed and

More information

Mixture Models and the Segmentation of Multimodal Textures. Roberto Manduchi. California Institute of Technology.

Mixture Models and the Segmentation of Multimodal Textures. Roberto Manduchi. California Institute of Technology. Mxture Models and the Segmentaton of Multmodal Textures oberto Manduh Jet ropulson Laboratory Calforna Insttute of Tehnology asadena, CA 91109 manduh@pl.nasa.gov 1 Introduton Abstrat Aproblem wth usng

More information

Biostatistics 615/815

Biostatistics 615/815 The E-M Algorthm Bostatstcs 615/815 Lecture 17 Last Lecture: The Smplex Method General method for optmzaton Makes few assumptons about functon Crawls towards mnmum Some recommendatons Multple startng ponts

More information

Interval uncertain optimization of structures using Chebyshev meta-models

Interval uncertain optimization of structures using Chebyshev meta-models 0 th World Congress on Strutural and Multdsplnary Optmzaton May 9-24, 203, Orlando, Florda, USA Interval unertan optmzaton of strutures usng Chebyshev meta-models Jngla Wu, Zhen Luo, Nong Zhang (Tmes New

More information

USING GRAPHING SKILLS

USING GRAPHING SKILLS Name: BOLOGY: Date: _ Class: USNG GRAPHNG SKLLS NTRODUCTON: Recorded data can be plotted on a graph. A graph s a pctoral representaton of nformaton recorded n a data table. t s used to show a relatonshp

More information

Bit-level Arithmetic Optimization for Carry-Save Additions

Bit-level Arithmetic Optimization for Carry-Save Additions Bt-leel Arthmet Optmzaton for Carry-Sae s Ke-Yong Khoo, Zhan Yu and Alan N. Wllson, Jr. Integrated Cruts and Systems Laboratory Unersty of Calforna, Los Angeles, CA 995 khoo, zhanyu, wllson @sl.ula.edu

More information

An Adaptive Filter Based on Wavelet Packet Decomposition in Motor Imagery Classification

An Adaptive Filter Based on Wavelet Packet Decomposition in Motor Imagery Classification An Adaptve Flter Based on Wavelet Paket Deomposton n Motor Imagery Classfaton J. Payat, R. Mt, T. Chusak, and N. Sugno Abstrat Bran-Computer Interfae (BCI) s a system that translates bran waves nto eletral

More information

Scalable Parametric Runtime Monitoring

Scalable Parametric Runtime Monitoring Salable Parametr Runtme Montorng Dongyun Jn Patrk O Nel Meredth Grgore Roşu Department of Computer Sene Unversty of Illnos at Urbana Champagn Urbana, IL, U.S.A. {djn3, pmeredt, grosu}@s.llnos.edu Abstrat

More information

Pixel-Based Texture Classification of Tissues in Computed Tomography

Pixel-Based Texture Classification of Tissues in Computed Tomography Pxel-Based Texture Classfaton of Tssues n Computed Tomography Ruhaneewan Susomboon, Danela Stan Rau, Jaob Furst Intellgent ultmeda Proessng Laboratory Shool of Computer Sene, Teleommunatons, and Informaton

More information

Three supervised learning methods on pen digits character recognition dataset

Three supervised learning methods on pen digits character recognition dataset Three supervsed learnng methods on pen dgts character recognton dataset Chrs Flezach Department of Computer Scence and Engneerng Unversty of Calforna, San Dego San Dego, CA 92093 cflezac@cs.ucsd.edu Satoru

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

A Semi-parametric Approach for Analyzing Longitudinal Measurements with Non-ignorable Missingness Using Regression Spline

A Semi-parametric Approach for Analyzing Longitudinal Measurements with Non-ignorable Missingness Using Regression Spline Avalable at http://pvamu.edu/aam Appl. Appl. Math. ISSN: 93-9466 Vol., Issue (June 5), pp. 95 - Applatons and Appled Mathemats: An Internatonal Journal (AAM) A Sem-parametr Approah for Analyzng Longtudnal

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

Bayesian Aggregation of Categorical Distributions with Applications in Crowdsourcing

Bayesian Aggregation of Categorical Distributions with Applications in Crowdsourcing Bayesan Aggregaton of Categorcal Dstrbutons wth Applcatons n Crowdsourcng Alexandry Augustn Southampton Unversty Southampton, UK aa7e14@ecs.soton.ac.uk Matteo Venanz Mcrosoft London, UK mavena@mcrosoft.com

More information

Optimal shape and location of piezoelectric materials for topology optimization of flextensional actuators

Optimal shape and location of piezoelectric materials for topology optimization of flextensional actuators Optmal shape and loaton of pezoeletr materals for topology optmzaton of flextensonal atuators ng L 1 Xueme Xn 2 Noboru Kkuh 1 Kazuhro Satou 1 1 Department of Mehanal Engneerng, Unversty of Mhgan, Ann Arbor,

More information

Computing Cloud Cover Fraction in Satellite Images using Deep Extreme Learning Machine

Computing Cloud Cover Fraction in Satellite Images using Deep Extreme Learning Machine Computng Cloud Cover Fraton n Satellte Images usng Deep Extreme Learnng Mahne L-guo WENG, We-bn KONG, Mn XIA College of Informaton and Control, Nanjng Unversty of Informaton Sene & Tehnology, Nanjng Jangsu

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

Connectivity in Fuzzy Soft graph and its Complement

Connectivity in Fuzzy Soft graph and its Complement IOSR Journal of Mathemats (IOSR-JM) e-issn: 2278-5728, p-issn: 2319-765X. Volume 1 Issue 5 Ver. IV (Sep. - Ot.2016), PP 95-99 www.osrjournals.org Connetvty n Fuzzy Soft graph and ts Complement Shashkala

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

A MPAA-Based Iterative Clustering Algorithm Augmented by Nearest Neighbors Search for Time-Series Data Streams

A MPAA-Based Iterative Clustering Algorithm Augmented by Nearest Neighbors Search for Time-Series Data Streams A MPAA-Based Iteratve Clusterng Algorthm Augmented by Nearest Neghbors Searh for Tme-Seres Data Streams Jessa Ln 1, Mha Vlahos 1, Eamonn Keogh 1, Dmtros Gunopulos 1, Janwe Lu 2, Shouan Yu 2, and Jan Le

More information

Microprocessors and Microsystems

Microprocessors and Microsystems Mroproessors and Mrosystems 36 (2012) 96 109 Contents lsts avalable at SeneDret Mroproessors and Mrosystems journal homepage: www.elsever.om/loate/mpro Hardware aelerator arhteture for smultaneous short-read

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Modeling Radiometric Uncertainty for Vision with Tone-mapped Color Images

Modeling Radiometric Uncertainty for Vision with Tone-mapped Color Images 1 Modelng Radometr Unertanty for Vson wth Tone-mapped Color Images Ayan Chakrabart, Yng Xong, Baohen Sun, Trevor Darrell, Danel Sharsten, Todd Zkler, and Kate Saenko arxv:1311.6887v [s.cv] 9 Apr 14 Abstrat

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

Avatar Face Recognition using Wavelet Transform and Hierarchical Multi-scale LBP

Avatar Face Recognition using Wavelet Transform and Hierarchical Multi-scale LBP 2011 10th Internatonal Conferene on Mahne Learnng and Applatons Avatar Fae Reognton usng Wavelet Transform and Herarhal Mult-sale LBP Abdallah A. Mohamed, Darryl D Souza, Naouel Bal and Roman V. Yampolsky

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Gabor-Filtering-Based Completed Local Binary Patterns for Land-Use Scene Classification

Gabor-Filtering-Based Completed Local Binary Patterns for Land-Use Scene Classification Gabor-Flterng-Based Completed Loal Bnary Patterns for Land-Use Sene Classfaton Chen Chen 1, Lbng Zhou 2,*, Janzhong Guo 1,2, We L 3, Hongjun Su 4, Fangda Guo 5 1 Department of Eletral Engneerng, Unversty

More information

DETECTING AND ANALYZING CORROSION SPOTS ON THE HULL OF LARGE MARINE VESSELS USING COLORED 3D LIDAR POINT CLOUDS

DETECTING AND ANALYZING CORROSION SPOTS ON THE HULL OF LARGE MARINE VESSELS USING COLORED 3D LIDAR POINT CLOUDS ISPRS Annals of the Photogrammetry, Remote Sensng and Spatal Informaton Senes, Volume III-3, 2016 XXIII ISPRS Congress, 12 19 July 2016, Prague, Czeh Republ DETECTING AND ANALYZING CORROSION SPOTS ON THE

More information

A Robust Method for Estimating the Fundamental Matrix

A Robust Method for Estimating the Fundamental Matrix Proc. VIIth Dgtal Image Computng: Technques and Applcatons, Sun C., Talbot H., Ourseln S. and Adraansen T. (Eds.), 0- Dec. 003, Sydney A Robust Method for Estmatng the Fundamental Matrx C.L. Feng and Y.S.

More information

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Proceedngs of the Wnter Smulaton Conference M E Kuhl, N M Steger, F B Armstrong, and J A Jones, eds A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Mark W Brantley Chun-Hung

More information

International Journal of Pharma and Bio Sciences HYBRID CLUSTERING ALGORITHM USING POSSIBILISTIC ROUGH C-MEANS ABSTRACT

International Journal of Pharma and Bio Sciences HYBRID CLUSTERING ALGORITHM USING POSSIBILISTIC ROUGH C-MEANS ABSTRACT Int J Pharm Bo S 205 Ot; 6(4): (B) 799-80 Researh Artle Botehnology Internatonal Journal of Pharma and Bo Senes ISSN 0975-6299 HYBRID CLUSTERING ALGORITHM USING POSSIBILISTIC ROUGH C-MEANS *ANURADHA J,

More information

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS ARPN Journal of Engneerng and Appled Scences 006-017 Asan Research Publshng Network (ARPN). All rghts reserved. NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS Igor Grgoryev, Svetlana

More information

Private Information Retrieval (PIR)

Private Information Retrieval (PIR) 2 Levente Buttyán Problem formulaton Alce wants to obtan nformaton from a database, but she does not want the database to learn whch nformaton she wanted e.g., Alce s an nvestor queryng a stock-market

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

A Toolbox for Easily Calibrating Omnidirectional Cameras

A Toolbox for Easily Calibrating Omnidirectional Cameras A oolbox for Easly Calbratng Omndretonal Cameras Davde Saramuzza, Agostno Martnell, Roland Segwart Autonomous Systems ab Swss Federal Insttute of ehnology Zurh EH) CH-89, Zurh, Swtzerland {davdesaramuzza,

More information

The Simulation of Electromagnetic Suspension System Based on the Finite Element Analysis

The Simulation of Electromagnetic Suspension System Based on the Finite Element Analysis 308 JOURNAL OF COMPUTERS, VOL. 8, NO., FEBRUARY 03 The Smulaton of Suspenson System Based on the Fnte Element Analyss Zhengfeng Mng Shool of Eletron & Mahanal Engneerng, Xdan Unversty, X an, Chna Emal:

More information

Classifying Acoustic Transient Signals Using Artificial Intelligence

Classifying Acoustic Transient Signals Using Artificial Intelligence Classfyng Acoustc Transent Sgnals Usng Artfcal Intellgence Steve Sutton, Unversty of North Carolna At Wlmngton (suttons@charter.net) Greg Huff, Unversty of North Carolna At Wlmngton (jgh7476@uncwl.edu)

More information

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters Proper Choce of Data Used for the Estmaton of Datum Transformaton Parameters Hakan S. KUTOGLU, Turkey Key words: Coordnate systems; transformaton; estmaton, relablty. SUMMARY Advances n technologes and

More information

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated.

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated. Some Advanced SP Tools 1. umulatve Sum ontrol (usum) hart For the data shown n Table 9-1, the x chart can be generated. However, the shft taken place at sample #21 s not apparent. 92 For ths set samples,

More information

Session 4.2. Switching planning. Switching/Routing planning

Session 4.2. Switching planning. Switching/Routing planning ITU Semnar Warsaw Poland 6-0 Otober 2003 Sesson 4.2 Swthng/Routng plannng Network Plannng Strategy for evolvng Network Arhtetures Sesson 4.2- Swthng plannng Loaton problem : Optmal plaement of exhanges

More information

Edge Detection in Noisy Images Using the Support Vector Machines

Edge Detection in Noisy Images Using the Support Vector Machines Edge Detecton n Nosy Images Usng the Support Vector Machnes Hlaro Gómez-Moreno, Saturnno Maldonado-Bascón, Francsco López-Ferreras Sgnal Theory and Communcatons Department. Unversty of Alcalá Crta. Madrd-Barcelona

More information

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION SHI-LIANG SUN, HONG-LEI SHI Department of Computer Scence and Technology, East Chna Normal Unversty 500 Dongchuan Road, Shangha 200241, P. R. Chna E-MAIL: slsun@cs.ecnu.edu.cn,

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervsed Learnng and Clusterng Why consder unlabeled samples?. Collectng and labelng large set of samples s costly Gettng recorded speech s free, labelng s tme consumng 2. Classfer could be desgned

More information

FULLY AUTOMATIC IMAGE-BASED REGISTRATION OF UNORGANIZED TLS DATA

FULLY AUTOMATIC IMAGE-BASED REGISTRATION OF UNORGANIZED TLS DATA FULLY AUTOMATIC IMAGE-BASED REGISTRATION OF UNORGANIZED TLS DATA Martn Wenmann, Bors Jutz Insttute of Photogrammetry and Remote Sensng, Karlsruhe Insttute of Tehnology (KIT) Kaserstr. 12, 76128 Karlsruhe,

More information

On the End-to-end Call Acceptance and the Possibility of Deterministic QoS Guarantees in Ad hoc Wireless Networks

On the End-to-end Call Acceptance and the Possibility of Deterministic QoS Guarantees in Ad hoc Wireless Networks On the End-to-end Call Aeptane and the Possblty of Determnst QoS Guarantees n Ad ho Wreless Networks S. Srram T. heemarjuna Reddy Dept. of Computer Sene Dept. of Computer Sene and Engneerng Unversty of

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

Self-tuning Histograms: Building Histograms Without Looking at Data

Self-tuning Histograms: Building Histograms Without Looking at Data Self-tunng Hstograms: Buldng Hstograms Wthout Lookng at Data Ashraf Aboulnaga Computer Scences Department Unversty of Wsconsn - Madson ashraf@cs.wsc.edu Surajt Chaudhur Mcrosoft Research surajtc@mcrosoft.com

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

Minimize Congestion for Random-Walks in Networks via Local Adaptive Congestion Control

Minimize Congestion for Random-Walks in Networks via Local Adaptive Congestion Control Journal of Communatons Vol. 11, No. 6, June 2016 Mnmze Congeston for Random-Walks n Networks va Loal Adaptve Congeston Control Yang Lu, Y Shen, and Le Dng College of Informaton Sene and Tehnology, Nanjng

More information

Discriminative Dictionary Learning with Pairwise Constraints

Discriminative Dictionary Learning with Pairwise Constraints Dscrmnatve Dctonary Learnng wth Parwse Constrants Humn Guo Zhuoln Jang LARRY S. DAVIS UNIVERSITY OF MARYLAND Nov. 6 th, Outlne Introducton/motvaton Dctonary Learnng Dscrmnatve Dctonary Learnng wth Parwse

More information

A Flexible Solution for Modeling and Tracking Generic Dynamic 3D Environments*

A Flexible Solution for Modeling and Tracking Generic Dynamic 3D Environments* A Flexble Soluton for Modelng and Trang Gener Dynam 3D Envronments* Radu Danesu, Member, IEEE, and Sergu Nedevsh, Member, IEEE Abstrat The traff envronment s a dynam and omplex 3D sene, whh needs aurate

More information

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1 4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:

More information

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between

More information

TN348: Openlab Module - Colocalization

TN348: Openlab Module - Colocalization TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

Design Level Performance Modeling of Component-based Applications. Yan Liu, Alan Fekete School of Information Technologies University of Sydney

Design Level Performance Modeling of Component-based Applications. Yan Liu, Alan Fekete School of Information Technologies University of Sydney Desgn Level Performane Modelng of Component-based Applatons Tehnal Report umber 543 ovember, 003 Yan Lu, Alan Fekete Shool of Informaton Tehnologes Unversty of Sydney Ian Gorton Paf orthwest atonal Laboratory

More information

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET 1 BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET TZU-CHENG CHUANG School of Electrcal and Computer Engneerng, Purdue Unversty, West Lafayette, Indana 47907 SAUL B. GELFAND School

More information

Elsevier Editorial System(tm) for NeuroImage Manuscript Draft

Elsevier Editorial System(tm) for NeuroImage Manuscript Draft Elsever Edtoral System(tm) for NeuroImage Manusrpt Draft Manusrpt Number: Ttle: Comparson of ampltude normalzaton strateges on the auray and relablty of group ICA deompostons Artle Type: Tehnal Note Seton/Category:

More information

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science EECS 730 Introducton to Bonformatcs Sequence Algnment Luke Huan Electrcal Engneerng and Computer Scence http://people.eecs.ku.edu/~huan/ HMM Π s a set of states Transton Probabltes a kl Pr( l 1 k Probablty

More information

Parallel Numerics. 1 Preconditioning & Iterative Solvers (From 2016)

Parallel Numerics. 1 Preconditioning & Iterative Solvers (From 2016) Technsche Unverstät München WSe 6/7 Insttut für Informatk Prof. Dr. Thomas Huckle Dpl.-Math. Benjamn Uekermann Parallel Numercs Exercse : Prevous Exam Questons Precondtonng & Iteratve Solvers (From 6)

More information

Problem Set 3 Solutions

Problem Set 3 Solutions Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

Intelligent Information Acquisition for Improved Clustering

Intelligent Information Acquisition for Improved Clustering Intellgent Informaton Acquston for Improved Clusterng Duy Vu Unversty of Texas at Austn duyvu@cs.utexas.edu Mkhal Blenko Mcrosoft Research mblenko@mcrosoft.com Prem Melvlle IBM T.J. Watson Research Center

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

Meta-heuristics for Multidimensional Knapsack Problems

Meta-heuristics for Multidimensional Knapsack Problems 2012 4th Internatonal Conference on Computer Research and Development IPCSIT vol.39 (2012) (2012) IACSIT Press, Sngapore Meta-heurstcs for Multdmensonal Knapsack Problems Zhbao Man + Computer Scence Department,

More information

Monte Carlo Integration

Monte Carlo Integration Introducton Monte Carlo Integraton Dgtal Image Synthess Yung-Yu Chuang 11/9/005 The ntegral equatons generally don t have analytc solutons, so we must turn to numercal methods. L ( o p,ωo) = L e ( p,ωo)

More information

Fast Sparse Gaussian Processes Learning for Man-Made Structure Classification

Fast Sparse Gaussian Processes Learning for Man-Made Structure Classification Fast Sparse Gaussan Processes Learnng for Man-Made Structure Classfcaton Hang Zhou Insttute for Vson Systems Engneerng, Dept Elec. & Comp. Syst. Eng. PO Box 35, Monash Unversty, Clayton, VIC 3800, Australa

More information