Sentiment Classification and Polarity Shifting

Size: px
Start display at page:

Download "Sentiment Classification and Polarity Shifting"

Transcription

1 Sentment Classfcaton and Polarty Shftng Shoushan L Sopha Yat Me Lee Yng Chen Chu-Ren Huang Guodong Zhou Department of CBS The Hong Kong Polytechnc Unversty {shoushan.l, sophaym, chenyng3176, Natural Language Processng Lab School of Computer Scence and Technology Soochow Unversty gdzhou@suda.edu.cn Abstract Polarty ng marked by varous lngustc structures has been a challenge to automatc sentment classfcaton. In ths paper, we propose a machne learnng approach to ncorporate polarty ng nformaton nto a document-level sentment classfcaton system. Frst, a feature selecton method s adopted to automatcally generate the tranng data for a bnary classfer on polarty ng detecton of sentences. Then, by usng the obtaned bnary classfer, each document n the orgnal polarty classfcaton tranng data s splt nto two parttons, polarty-ed and polarty-uned, whch are used to tran two base classfers respectvely for further classfer combnaton. The expermental results across four dfferent domans demonstrate the effectveness of our approach. 1 Introducton Sentment classfcaton s a specal task of text classfcaton whose objectve s to classfy a text accordng to the sentmental polartes of opnons t contans (Pang et al., 2002), e.g., favorable or unfavorable, postve or negatve. Ths task has receved consderable nterests n the computatonal lngustc communty due to ts potental applcatons. In the lterature, machne learnng approaches have domnated the research n sentment classfcaton and acheved the state-of-the-art performance (e.g., Kennedy and Inkpen, 2006; Pang et al., 2002). In a typcal machne learnng approach, a document (text) s modeled as a bag-of-words,.e. a set of content words wthout any word order or syntactc relaton nformaton. In other words, the underlyng assumpton s that the sentmental orentaton of the whole text depends on the sum of the sentmental polartes of content words. Although ths assumpton s reasonable and has led to ntal success, t s lngustcally unsound snce many functon words and constructons can the sentmental polartes of a text. For example, n the sentence The char s not comfortable, the polarty of the word comfortable s postve whle the polarty of the whole sentence s reversed because of the negaton word not. Therefore, the overall sentment of a document s not necessarly the sum of the content parts (Turney, 2002). Ths phenomenon s one man reason why machne learnng approaches fal under some crcumstances. As a typcal case of polarty ng, negaton has been pad close attenton and wdely studed n the lterature (Na et al., 2004; Wlson et al., 2009; Kennedy and Inkpen, 2006). Generally, there are two steps to ncorporate negaton nformaton nto a system: negaton detecton and negaton classfcaton. For negaton detecton, some negaton trgger words, such as no, not, and never, are usually appled to recognze negaton phrases or sentences. As for negaton classfcaton, one way to mport negaton nformaton s to drectly reverse the polarty of the words whch contan negaton trgger words as far as term-countng approaches are consdered (Kennedy and Inkpen, 2006). An alternatve way s to add some negaton features (e.g., negaton bgrams or negaton phrases) nto 635 Proceedngs of the 23rd Internatonal Conference on Computatonal Lngustcs (Colng 2010), pages , Bejng, August 2010

2 machne learnng approaches (Na et al., 2004). Such approaches have acheved certan success. There are, however, some shortcomngs wth current approaches n ncorporatng negaton nformaton. In terms of negaton detecton, frstly, the negaton trgger word dctonary s ether manually constructed or reles on exstng resources. Ths leads to certan lmtatons concernng the qualty and coverage of the dctonary. Secondly, t s dffcult to adapt negaton detecton to other languages due to ts language dependence nature of negaton constructons and words. Thrdly, apart from negaton, many other phenomena, e.g., contrast transton wth trgger words lke but, however, and nevertheless, can the sentmental polarty of a phrase or sentence. Therefore, consderng negaton alone s nadequate to deal wth the polarty ng problem, especally for document-level sentment classfcaton. In terms of negaton classfcaton, although t s easy for term-countng approaches to ntegrate negaton nformaton, they rarely outperform a machne learnng baselne (Kennedy and Inkpen, 2006). Even for machne learnng approaches, although negaton nformaton s sometmes effectve for local cases (e.g., not good), t fals on long-dstance cases (e.g., I don t thnk t s good). In ths paper, we frst propose a feature selecton method to automatcally generate a large scale polarty ng tranng data for polarty ng detecton of sentences. Then, a classfer combnaton method s presented for ncorporatng polarty ng nformaton. Compared wth prevous ones, our approach hghlghts the followng advantages:frst of all, we apply a bnary classfer to detect polarty ng rather than merely relyng on trgger words or phrases. Ths enables our approach to handle dfferent knds of polarty ng phenomena. More mportantly, a feature selecton method s presented to automatcally generate the labeled tranng data for polarty ng detecton of sentences. The remander of ths paper s organzed as follows. Secton 2 ntroduces the related work of sentment classfcaton. Secton 3 presents our approach n detals. Expermental results are presented and analyzed n Secton 4. Fnally, Secton 5 draws the concluson and outlnes the future work. 2 Related Work Generally, sentment classfcaton can be performed at four dfferent levels: word level (Webe, 2000), phrase level (Wlson et al., 2009), sentence level (Km and Hovy, 2004; Lu et al., 2005), and document level (Turney, 2002; Pang et al., 2002; Pang and Lee, 2004; Rloff et al., 2006). Ths paper focuses on document-level sentment classfcaton. In the lterature, there are manly two knds of approaches on document-level sentment classfcaton: term-countng approaches (lexcon-based) and machne learnng approaches (corpus-based). Term-countng approaches usually nvolve dervng a sentment measure by calculatng the total number of negatve and postve terms (Turney, 2002; Km and Hovy, 2004; Kennedy and Inkpen, 2006). Machne learnng approaches recast the sentment classfcaton problem as a statstcal classfcaton task (Pang and Lee, 2004). Compared to term-countng approaches, machne learnng approaches usually acheve much better performance (Pang et al., 2002; Kennedy and Inkpen, 2006), and have been adopted to more complcated scenaros, such as doman adaptaton (Bltzer et al., 2007), mult-doman learnng (L and Zong, 2008) and sem-supervsed learnng (Wan, 2009; Dasgupta and Ng, 2009) for sentment classfcaton. Polarty ng plays a crucal role n phrase-level, sentence-level, and document-level sentment classfcaton. However, most of prevous studes merely focus on negaton ng (polarty ng caused by the negaton structure). As one poneer research on sentment classfcaton, Pang et al. (2002) propose a machne learnng approach to tackle negaton ng by addng the tag not to every word between a negaton trgger word/phrase (e.g., not, sn't, ddn't, etc.) and the frst punctuaton mark followng the negaton trgger word/phrase. To ther dsappontment, consderng negaton ng has a neglgble effect and even slghtly harms the overall performance. Kennedy and Inkpen (2006) explore negaton ng by ncorporatng negaton bgrams as addtonal features nto machne learnng approaches. The 636

3 expermental results show that consderng sentment ng greatly mproves the performance of term-countng approaches but only slghtly mproves the performance of machne learnng approaches. Other studes such as Na et al. (2004), Dng et al. (2008), and Wlson et al. (2009) also explore negaton ng and acheve some mprovements 1. Nonetheless, as far as machne learnng approaches are concerned, the mprovement s rather nsgnfcant (normally less than 1%). More recently, Ikeda et al. (2008) frst propose a machne learnng approach to detect polarty ng for sentence-level sentment classfcaton, based on a manually-constructed dctonary contanng thousands of postve and negatve sentmental words, and then adopt a term-countng approach to ncorporate polarty ng nformaton. 3 Sentment Classfcaton wth Polarty Shftng Detecton Documents Polarty-ed Sentences Postve/Negatve Polarty Shftng Detector Polarty-uned Sentences Polarty Classfer Fgure 1: General framework of our approach The motvaton of our approach s to mprove the performance of sentment classfcaton by robust treatment of sentment polarty ng between sentences. Wth the help of a bnary classfer, the sentences n a document are dvded nto two parts: sentences whch contan polarty ng structures and sentences wthout any polarty ng structure. Fgure 1 llustrates the general framework of our approach. Note that ths framework s a general one, that s, dfferent polarty ng detecton methods can be appled to dfferentate polarty-ed sentences from those polarty-uned sentences and dfferent 1 Note that Dng et al. (2006) also consder but-clause, another mportant structure for sentment ng. Wlson et al. (2009) use conjunctve and dependency relatons among polarty words. polarty classfcaton methods can be adopted to ncorporate sentment ng nformaton. For clarfcaton, the tranng data used for polarty ng detecton and polarty classfcaton are referred to as the polarty ng tranng data and the polarty classfcaton tranng data, respectvely. 3.1 Polarty Shftng Detecton In ths paper, polarty ng means that the polarty of a sentence s dfferent from the polarty expressed by the sum of the content words n the sentence. For example, n the sentence I am not dsapponted, the negaton structure makes the polarty of the word 'dsapponted' dfferent from that of the whole sentence (negatve vs. postve). Apart from the negaton structure, many other lngustc structures allow polarty ng, such as contrast transton, modals, and pre-suppostonal tems (Polany and Zaenen, 2006). We refer these structures as polarty ng structures. One of the great challenges n buldng a polarty ng detector les on the lack of relevant tranng data snce manually creatng a large scale corpus of polarty ng sentences s tme-consumng and labor-ntensve. Ikeda et al. (2008) propose an automatc way for collectng the polarty ng tranng data based on a manually-constructed large-scale dctonary. Instead, we adopt a feature selecton method to buld a large scale tranng corpus of polarty ng sentences, gven only the already avalable document-level polarty classfcaton tranng data. Wth the help of the feature selecton method, the top-ranked word features wth strong sentmental polarty orentaton, e.g., great, love, worst are frst chosen as the polarty trgger words. Then, those sentences wth the top-ranked polarty trgger words n both categores of postve and negatve documents are selected. Fnally, those canddate sentences takng opposte-polarty compared to the contanng trgger word are deemed as polarty-ed. The basc dea of automatcally generatng the polarty ng tranng data s based on the assumpton that the real polarty of a word or phrase s decded by the major polarty category where the word or phrase appears more often. As a result, the sentences n the 637

4 frequently-occurrng category would be seen as polarty-uned whle the sentences n the nfrequently-occurrng category would be seen as polarty-ed. In the lterature, varous feature selecton methods, such as Mutual Informaton (MI), Informaton Gan (IG) and B-Normal Separaton (BNS) (Yang and Pedersen, 1997; Forman 2003), have been employed to cope wth the problem of the hgh-dmensonal feature space whch s normal n sentment classfcaton. In ths paper, we employ the theoretcal framework, proposed by L et al. (2009), ncludng two basc measurements,.e. frequency measurement and rato measurement, where the frst measures, the document frequency of a term n one category, and the second measures, the rato between the document frequency n one category and other categores. In partcular, a novel method called Weghed Frequency and Odds (WFO) s proposed to ncorporate both basc measurements: λ P( t c ) 1 λ WFO( t, c ) = P( t c ) {max(0,log )} P( t c ) where P( t c ) denotes the probablty that a document x contans the term t wth the condton that x belongs to category c ; P( t c ) denotes the probablty that a document x contans the term t wth the condton that x does not belong to category c. The left part of the formula P( t c ) mples the frst basc measurement and the rght part log( P( t c ) / P( t c )) mples the second one. The parameter λ ( 0 λ 1)s thus to tune the weght between the two basc measurements. Especally, when λ equals 0, the WFO method fades to the MI method whch fully prefers the second basc measurement. Fgure 2 llustrates our algorthm for automatcally generatng the polarty ng tranng data where c1 and c2 denote the two sentmental orentaton categores,.e. negatve and postve. Step A segments a document nto sentences wth punctuatons. Besdes, two specal words, but and and, are used to further segment some contrast transton structures and compound sentences. Step B employs the WFO method to rank all features ncludng the words. Step D extracts those polarty-ed and polarty-uned sentences contanng ttop where Nmax denotes the upper-lmt number of sentences n each category of the polarty ng tranng data and #(x) denotes the total number of the elements n x. Apart from that, the frst word n the followng sentence s also ncluded to capture a common knd of long-dstance polarty ng structure: contrast transton. Thus, mportant trgger words lke however and but may be consdered. Fnally, Step E guarantees the balance between the two categores of the polarty ng tranng data. Gven the polarty ng tranng data, we apply SVM classfcaton algorthm to tran a polarty-ng detector wth word ungram features. Input: The polarty classfcaton tranng data: the negatve sentmental document set D and the postve sentmental c 1 document set D c. 2 Output: The polarty ng tranng data: the polarty-uned sentence set S and the polarty- ed sentence set S. Procedure: A. Segment documents un D c 1 and D c 2 to sngle sentences S and S c 1 c 2. B. Apply feature selecton on the polarty classfcaton tranng data and get the ranked features, ( t,..., t,..., t ) top 1 top top N C. S = {}, S un = {} D. For t n top ( ttop 1,..., t..., ttop N ) : D1) f #( S )> N max : break D2) Collect all sentences Stop, c and 1 whch contan ttop from S c 1 and respectvely D3) f #( S )>#( S ): c1 put put else: put put c2 c2 S nto S S nto S un c1 S nto S c1 S nto S un c2 Stop, c2 E. Randomly select Nmax sentences from Sun as the output of S un Fgure 2: The algorthm for automatcally generatng the polarty ng tranng data S c 2 638

5 3.2 Polarty Classfcaton wth Classfer Combnaton After polarty ng detecton, each document n the polarty classfcaton tranng data s dvded nto two parts, one contanng polarty-ed sentences and the other contanng polarty-uned sentences, whch are used to form the polarty-ed tranng data and the polarty-uned tranng data. In ths way, two dfferent polarty classfers, f and f 2, can be traned on the polarty-ed tranng data and the polarty-uned tranng data respectvely. Along wth classfer f 3, traned on all orgnal polarty classfcaton tranng data, we now have three base classfers n hand for possble classfer combnaton va a multple classfer system. The key ssue n constructng a multple classfer system (MCS) s to fnd a sutable way to combne the outputs of the base classfers. In MCS lterature, varous methods are avalable for combnng the outputs, such as fxed rules ncludng the votng rule, the product rule and the sum rule (Kttler et al., 1998) and traned rules ncludng the weghted sum rule (Fumera and Rol, 2005) and the meta-learnng approaches (Vlalta and Drss, 2002). In ths study, we employ the product rule, a popular fxed rule, and stackng (Džerosk and Ženko, 2004), a well-known traned rule, to combne the outputs. Formally, each base classfer provdes some knd of confdence measurements, e.g., posteror probabltes of the test sample belongng to each class. Formally, each base classfer fl ( l = 1, 2,3) assgns a test sample (denoted as x l ) a posteror probablty vector P( x l ) : P( x ) = ( p( c x ), p( c x )) t l 1 l 2 l where p( c1 x l ) denotes the probablty that the l-th base classfer consders the sample belongng c 1. The product rule combnes the base classfers by multplyng the posteror possbltes and usng the multpled possblty for decson,.e. assgn y c when j = arg max p( c x ) j l l= 1 Stackng belongs to well-known meta-learnng (Vlalta and Drss, 2002). The 3 I key dea behnd meta-learnng s to tran a meta-classfer wth nput attrbutes that are the outputs of the base classfers. Hence, meta-learnng usually needs some development data for generatng the meta-tranng data. Let x ' denote a feature vector of a sample from the development data. The output of the l-th base classfer f on ths sample s the probablty l dstrbuton over the category set { c1, c 2},.e. P( x' l ) = ( p( c1 x' l ), pl ( c2 x' l )) A meta-classfer can be traned usng the development data wth the meta-level feature meta 2 3 vector x R meta x = ( P( x' l= 1), P( x' l= 2), P( x' l= 3)) Stackng s a specfc meta-learnng rule, n whch a leave-one-out or a cross-valdaton procedure on the tranng data s appled to generate the meta-tranng data nstead of usng extra development data. In our experments, we perform stackng wth 10-fold cross-valdaton to generate the meta-tranng data. 4 Expermentaton 4.1 Expermental Settng The experments are carred out on product revews from four domans: books, DVDs, electroncs, and ktchen applances (Bltzer et al., 2007) 2. Each doman contans 1000 postve and 1000 negatve revews. For sentment classfcaton, all classfers ncludng the polarty ng detector, three base classfers and the meta-classfer n stackng are traned by SVM usng the SVM-lght tool 3 wth Logstc Regresson method for probablty measurng (Platt, 1999). In all the experments, each dataset s randomly and evenly splt nto two subsets: 50% documents as the tranng data and the remanng 50% as the test data. The features nclude word ungrams and bgrams wth Boolean weghts. 4.2 Expermental Results on Polarty Shftng Data To better understand the polarty ng phenomena n document-level sentment classfcaton, we randomly nvestgate Ths data set s collected by Bltzer et al. (2007): 3 It s avalable at: 639

6 polarty-ed sentences, together wth ther contexts (.e. the sentences before and after t), automatcally generated by the WFO ( λ = 0 ) feature selecton method. We fnd that nearly half of the automatcally generated polartyed sentences are actually polarty-uned sentences or dffcult to decde. That s to say, the polarty ng tranng data s nosy to some extent. One man reason s that some automatcally selected trgger words do not really contan sentment nformaton, e.g., hear, nformaton etc. Another reason s that some reversed opnon s gven n a revew wthout any explct polarty ng structures. To gan more nsghts, we manually checked 100 sentences whch are explctly polarty-ed and can also be judged by human accordng to ther contexts. Table 1 presents some typcal structures causng polarty ng. It shows that the most common polarty ng type s Explct Negaton (37%), usually expressed by trgger words such as not, no, or wthout, e.g., n the sentence I am not happy wth ths flashcard at all. Another common type of polarty ng s Contrast Transton (20%), expressed by trgger words such as however, e.g., n the sentence It s large and stylsh, however, I cannot recommend t because of the ld. Other less common yet productve polarty ng types nclude Excepton and Untl. Excepton structure s usually expressed by the trgger phrase the only to ndcate the one and only advantage of the product, e.g., n the sentence The only thng that I lke about t s that bamboo s a renewable resource. Untl structure s often expressed by the trgger word untl to show the reversed polarty, e.g. n the sentence Ths unt was a great addton untl the probe went bad after only a few months. Polarty Shftng Structures Trgger Words/Phrases Dstrbuton (%) Explct Negaton not, no, wthout 37 Contrast Transton but, however, 20 unfortunately Implct Negaton avod, hardly, 7 False Impresson look, seem 6 Lkelhood probably, perhaps 5 Counter-factual should, would 5 Excepton the only 5 Untl untl 3 Table 1: Statstcs on varous polarty ng structures 4.3 Expermental Results on Polarty Classfcaton For comparson, several classfers wth dfferent classfcaton methods are developed. 1) Baselne classfer, whch apples SVM wth all ungrams and bgrams. Note that t also serves as a base classfer n the followng combned classfers. 2) Base classfer 1, a base classfer for the classfer combnaton method. It works on the polarty-uned data. 3) Base classfer 2, another base classfer for the classfer combnaton method. It works on the polarty-ed data. 4) Negaton classfer, whch apples SVM wth all ungrams and bgrams plus negaton bgrams. It s a natural extenson of the baselne classfer wth the consderaton of negaton bgrams. In ths study, the negaton bgrams are collected usng some negaton trgger words, such as not and never. If a negaton trgger word s found n a sentence, each word n the sentence s attached wth the word _not to form a negaton bgram. 5) Product classfer, whch combnes the baselne classfer, the base classfer 1 and the base classfer 2 usng the product rule. 6) Stackng classfer, a combned classfer smlar to the Product classfer. It uses the stackng classfer combnaton method nstead of the product rule. Please note that we do not compare our approach wth the one as proposed n Ikeda et al. (2008) due to the absence of a manually-collected sentment dctonary. Besdes, t s well known that a combnaton strategy tself s capable of mprovng the classfcaton performance. To justfy whether the mprovement s due to the combnaton strategy or our polarty ng detecton or both, we frst randomly splt the tranng data nto two portons and tran two base classfers on each porton, then apply the stackng method to combne them along wth the baselne classfer. The correspondng results are shown as Random+Stackng n Table 2. Fnally, n our experments, t-test s performed to evaluate the sgnfcance of the performance mprovement between two systems employng dfferent methods (Yang and Lu, 1999). 640

7 Doman Baselne Base Classfer 1 Base Classfer 2 Negaton Classfer Random + Stackng Shftng + Product Shftng + Stackng Book DVD Electronc Ktchen Table 2: Performance comparson of dfferent classfers wth equally-splttng between tranng and test data Performance comparson of dfferent classfers Table 2 shows the accuracy results of dfferent methods usng 2000 polarty ed sentences and 2000 polarty-uned sentences to tran the polarty ng detector (N max =2000). Compared to the baselne classfer, t shows that: 1) The base classfer 1, whch only uses the polarty-uned sentences as the tranng data, acheves smlar performance. 2) The base classfer 2 acheves much lower performance due to much fewer sentences nvolved. 3) Includng negaton bgrams usually allows nsgnfcant mprovements (p-value>0.1), whch s consstent wth most of prevous works (Pang et al., 2002; Kennedy and Inkpen, 2006). 4) Both the product and stackng classfers wth polarty ng detecton sgnfcantly mprove the performance (p-value<0.05). Compared to the product rule, the stackng classfer s preferable, probably due to the performance unbalance among the ndvdual classfers, e.g., the performance of the base classfer 2 s much lower than the other two. Although stackng wth two randomly generated base classfers,.e. Random + Stackng, also consstently outperforms the baselne classfer, the mprovements are much lower than what has been acheved by our approach. Ths suggests that both the classfer combnaton strategy and polarty ng detecton contrbute to the overall performance mprovement. Effect of WFO feature selecton method Fgure 3 presents the accuracy curve of the stackng classfer when usng dfferent Lambda ( λ ) values n the WFO feature selecton method. It shows that those feature selecton methods whch prefer frequency nformaton, e.g., MI and BNS, are better n automatcally generatng the polarty ng tranng data. Ths s reasonable snce hgh frequency terms, e.g., s, t, a, etc., tend to obey our assumpton that the real polarty of one top term should belong to the polarty category where the term appears frequently Performance of the Stackng Classfer Lambda= Book DVD Electronc Ktchen Fgure 3: Performance of the stackng classfer usng WFO wth dfferent Lambda ( λ ) values Performance of the Stackng Classfer Book DVD Electronc Ktchen Fgure 4: Performance of the stackng classfer over dfferent szes of the polarty ng tranng data (wth N max sentences n each category) Effect of a classfer over dfferent szes of the polarty ng tranng data Another factor whch mght nfluence the overall performance s the sze of the polarty ng tranng data. Fgure 4 presents the overall performance on dfferent numbers of the polarty ng sentences when usng the stackng classfer. It shows that 1000 to 4000 sentences are enough for the performance mprovement. When the number s too large, the nosy tranng data may harm polarty ng detecton. When the number s too small, t s not enough for the automatcally generated polarty ng tranng data to capture varous polarty ng structures. 641

8 Doman: Book Doman: DVD % 40% 50% 60% 70% 80% 90% 100% The tranng data szes % 40% 50% 60% 70% 80% 90% 100% The tranng data szes 0.9 Doman: Electronc 0.9 Doman: Ktchen % 40% 50% 60% 70% 80% 90% 100% The tranng data szes % 40% 50% 60% 70% 80% 90% 100% The tranng data szes Baselne BaseClassfer 1 BaseClassfer 2 Stackng Fgure 5: Performance of dfferent classfers over dfferent szes of the polarty classfcaton tranng data Effect of dfferent classfers over dfferent szes of the polarty classfcaton tranng data Fgure 5 shows the classfcaton results of dfferent classfers wth varyng szes of the polarty classfcaton tranng data. It shows that our approach s able to mprove the overall performance robustly. We also notce the bg dfference between the performance of the baselne classfer and that of the base classfer 1 when usng 30% tranng data n Book doman and 90% tranng data n DVD doman. Detaled exploraton of the polarty ng sentences n the tranng data shows that ths dfference s manly attrbuted to the poor performance of the polarty ng detector. Even so, the stackng classfer guarantees no worse performance than the baselne classfer. 5 Concluson and Future Work In ths paper, we propose a novel approach to ncorporate polarty ng nformaton nto document-level sentment classfcaton. In our approach, we frst propose a machne-learnng-based classfer to detect polarty ng and then apply two classfer combnaton methods to perform polarty classfcaton. Partcularly, the polarty ng tranng data s automatcally generated through a feature selecton method. As shown n our expermental results, our approach s able to consstently mprove the overall performance across dfferent domans and tranng data szes, although the automatcally generated polarty ng tranng data s prone to nose. Furthermore, we conclude that those feature selecton methods, whch prefer frequency nformaton, e.g., MI and BNS, are good choces for generatng the polarty ng tranng data. In our future work, we wll explore better ways n generatng less-nosy polarty ng tranng data. In addton, snce our approach s language-ndependent, t s readly applcable to sentment classfcaton tasks n other languages. For avalablty of the automatcally generated polarty ng tranng data, please contact the frst author (for research purpose only). Acknowledgments Ths research work has been partally supported by Start-up Grant for Newly Apponted Professors, No. 1-BBZM n the Hong Kong Polytechnc Unversty and two NSFC grants, No and No We also thank the three anonymous revewers for ther helpful comments. 642

9 References Bltzer J., M. Dredze, and F. Perera Bographes, Bollywood, Boom-boxes and Blenders: Doman Adaptaton for Sentment Classfcaton. In Proceedngs of ACL-07. Dasgupta S. and V. Ng Mne the Easy and Classfy the Hard: Experments wth Automatc Sentment Classfcaton. In Proceedngs of ACL-IJCNLP-09. Dng X., B. Lu, and P. Yu A Holstc Lexcon-based Approach to Opnon Mnng. In Proceedngs of the Internatonal Conference on Web Search and Web Data Mnng, WSDM-08. Džerosk S. and B. Ženko Is Combnng Classfers wth Stackng Better than Selectng the Best One? Machne Learnng, vol.54(3), pp , Forman G An Extensve Emprcal Study of Feature Selecton Metrcs for Text Classfcaton. The Journal of Machne Learnng Research, 3(1), pp Fumera G. and F. Rol A Theoretcal and Expermental Analyss of Lnear Combners for Multple Classfer Systems. IEEE Trans. PAMI, vol.27, pp , 2005 Ikeda D., H. Takamura, L. Ratnov, and M. Okumura Learnng to Shft the Polarty of Words for Sentment Classfcaton. In Proceedngs of IJCNLP-08. Kennedy, A. and D. Inkpen Sentment Classfcaton of Move Revews usng Contextual Valence Shfters. Computatonal Intellgence, vol.22(2), pp , Km S. and E. Hovy Determnng the Sentment of Opnons. In Proceedngs of COLING-04. Kttler J., M. Hatef, R. Dun, and J. Matas On Combnng Classfers. IEEE Trans. PAMI, vol.20, pp , 1998 L S., R. Xa, C. Zong, and C. Huang A Framework of Feature Selecton Methods for Text Categorzaton. In Proceedngs of ACL-IJCNLP-09. L S. and C. Zong Mult-doman Sentment Classfcaton. In Proceedngs of ACL-08: HLT, short paper. Lu B., M. Hu, and J. Cheng Opnon Observer: Analyzng and Comparng Opnons on the Web. In Proceedngs of WWW-05. Na J., H. Su, C. Khoo, S. Chan, and Y. Zhou Effectveness of Smple Lngustc Processng n Automatc Sentment Classfcaton of Product Revews. In Conference of the Internatonal Socety for Knowledge Organzaton (ISKO-04). Pang B. and L. Lee A Sentmental Educaton: Sentment Analyss usng Subjectvty Summarzaton based on Mnmum Cuts. In Proceedngs of ACL-04. Pang B., L. Lee, and S. Vathyanathan Thumbs up? Sentment Classfcaton usng Machne Learnng Technques. In Proceedngs of EMNLP-02. Platt J Probablstc Outputs for Support Vector Machnes and Comparsons to Regularzed Lkelhood Methods. In: A. Smola, P. Bartlett, B. Schoelkopf and D. Schuurmans (Eds.): Advances n Large Margn Classers. MIT Press, Cambrdge, Polany L. and A. Zaenen Contextual Valence Shfters. Computng atttude and affect n text: Theory and applcaton. Sprnger Verlag. Rloff E., S. Patwardhan, and J. Webe Feature Subsumpton for Opnon Analyss. In Proceedngs of EMNLP-06. Turney P Thumbs Up or Thumbs Down? Semantc Orentaton Appled to Unsupervsed Classfcaton of Revews. In Proceedngs of ACL-02. Vlalta R. and Y. Drss A Perspectve Vew and Survey of Meta-learnng. Artfcal Intellgence Revew, 18(2), pp Wan X Co-Tranng for Cross-Lngual Sentment Classfcaton. In Proceedngs of ACL-IJCNLP-09. Webe J Learnng Subjectve Adjectves from Corpora. In Proceedngs of AAAI Wlson T., J. Webe, and P. Hoffmann Recognzng Contextual Polarty: An Exploraton of Features for Phrase-Level Sentment Analyss. Computatonal Lngustcs, vol.35(3), pp , Yang Y. and X. Lu, X A Re-Examnaton of Text Categorzaton methods. In Proceedngs of SIGIR-99. Yang Y. and J. Pedersen A Comparatve Study on Feature Selecton n Text Categorzaton. In Proceedngs of ICML

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION SHI-LIANG SUN, HONG-LEI SHI Department of Computer Scence and Technology, East Chna Normal Unversty 500 Dongchuan Road, Shangha 200241, P. R. Chna E-MAIL: slsun@cs.ecnu.edu.cn,

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

The Research of Support Vector Machine in Agricultural Data Classification

The Research of Support Vector Machine in Agricultural Data Classification The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

Edge Detection in Noisy Images Using the Support Vector Machines

Edge Detection in Noisy Images Using the Support Vector Machines Edge Detecton n Nosy Images Usng the Support Vector Machnes Hlaro Gómez-Moreno, Saturnno Maldonado-Bascón, Francsco López-Ferreras Sgnal Theory and Communcatons Department. Unversty of Alcalá Crta. Madrd-Barcelona

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET 1 BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET TZU-CHENG CHUANG School of Electrcal and Computer Engneerng, Purdue Unversty, West Lafayette, Indana 47907 SAUL B. GELFAND School

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

Incremental Learning with Support Vector Machines and Fuzzy Set Theory

Incremental Learning with Support Vector Machines and Fuzzy Set Theory The 25th Workshop on Combnatoral Mathematcs and Computaton Theory Incremental Learnng wth Support Vector Machnes and Fuzzy Set Theory Yu-Mng Chuang 1 and Cha-Hwa Ln 2* 1 Department of Computer Scence and

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

Transformation Networks for Target-Oriented Sentiment Classification ACL / 25

Transformation Networks for Target-Oriented Sentiment Classification ACL / 25 Transformaton Networks for Target-Orented Sentment Classfcaton 1 Xn L 1, Ldong Bng 2, Wa Lam 1, Be Sh 1 1 The Chnese Unversty of Hong Kong 2 Tencent AI Lab ACL 2018 1 Jont work wth Tencent AI Lab Transformaton

More information

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc. [Type text] [Type text] [Type text] ISSN : 0974-74 Volume 0 Issue BoTechnology 04 An Indan Journal FULL PAPER BTAIJ 0() 04 [684-689] Revew on Chna s sports ndustry fnancng market based on market -orented

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

Experiments in Text Categorization Using Term Selection by Distance to Transition Point

Experiments in Text Categorization Using Term Selection by Distance to Transition Point Experments n Text Categorzaton Usng Term Selecton by Dstance to Transton Pont Edgar Moyotl-Hernández, Héctor Jménez-Salazar Facultad de Cencas de la Computacón, B. Unversdad Autónoma de Puebla, 14 Sur

More information

Enhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques

Enhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques Enhancement of Infrequent Purchased Product Recommendaton Usng Data Mnng Technques Noraswalza Abdullah, Yue Xu, Shlomo Geva, and Mark Loo Dscplne of Computer Scence Faculty of Scence and Technology Queensland

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Collaboratively Regularized Nearest Points for Set Based Recognition

Collaboratively Regularized Nearest Points for Set Based Recognition Academc Center for Computng and Meda Studes, Kyoto Unversty Collaboratvely Regularzed Nearest Ponts for Set Based Recognton Yang Wu, Mchhko Mnoh, Masayuk Mukunok Kyoto Unversty 9/1/013 BMVC 013 @ Brstol,

More information

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following. Complex Numbers The last topc n ths secton s not really related to most of what we ve done n ths chapter, although t s somewhat related to the radcals secton as we wll see. We also won t need the materal

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned

More information

An Evaluation of Divide-and-Combine Strategies for Image Categorization by Multi-Class Support Vector Machines

An Evaluation of Divide-and-Combine Strategies for Image Categorization by Multi-Class Support Vector Machines An Evaluaton of Dvde-and-Combne Strateges for Image Categorzaton by Mult-Class Support Vector Machnes C. Demrkesen¹ and H. Cherf¹, ² 1: Insttue of Scence and Engneerng 2: Faculté des Scences Mrande Galatasaray

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1 4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:

More information

Investigating the Performance of Naïve- Bayes Classifiers and K- Nearest Neighbor Classifiers

Investigating the Performance of Naïve- Bayes Classifiers and K- Nearest Neighbor Classifiers Journal of Convergence Informaton Technology Volume 5, Number 2, Aprl 2010 Investgatng the Performance of Naïve- Bayes Classfers and K- Nearest Neghbor Classfers Mohammed J. Islam *, Q. M. Jonathan Wu,

More information

CAN COMPUTERS LEARN FASTER? Seyda Ertekin Computer Science & Engineering The Pennsylvania State University

CAN COMPUTERS LEARN FASTER? Seyda Ertekin Computer Science & Engineering The Pennsylvania State University CAN COMPUTERS LEARN FASTER? Seyda Ertekn Computer Scence & Engneerng The Pennsylvana State Unversty sertekn@cse.psu.edu ABSTRACT Ever snce computers were nvented, manknd wondered whether they mght be made

More information

Relational Lasso An Improved Method Using the Relations among Features

Relational Lasso An Improved Method Using the Relations among Features Relatonal Lasso An Improved Method Usng the Relatons among Features Kotaro Ktagawa Kumko Tanaka-Ish Graduate School of Informaton Scence and Technology, The Unversty of Tokyo ktagawa@cl.c..u-tokyo.ac.jp

More information

CSCI 5417 Information Retrieval Systems Jim Martin!

CSCI 5417 Information Retrieval Systems Jim Martin! CSCI 5417 Informaton Retreval Systems Jm Martn! Lecture 11 9/29/2011 Today 9/29 Classfcaton Naïve Bayes classfcaton Ungram LM 1 Where we are... Bascs of ad hoc retreval Indexng Term weghtng/scorng Cosne

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines (IJCSIS) Internatonal Journal of Computer Scence and Informaton Securty, Herarchcal Web Page Classfcaton Based on a Topc Model and Neghborng Pages Integraton Wongkot Srura Phayung Meesad Choochart Haruechayasak

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

Meta-heuristics for Multidimensional Knapsack Problems

Meta-heuristics for Multidimensional Knapsack Problems 2012 4th Internatonal Conference on Computer Research and Development IPCSIT vol.39 (2012) (2012) IACSIT Press, Sngapore Meta-heurstcs for Multdmensonal Knapsack Problems Zhbao Man + Computer Scence Department,

More information

Unsupervised Learning

Unsupervised Learning Pattern Recognton Lecture 8 Outlne Introducton Unsupervsed Learnng Parametrc VS Non-Parametrc Approach Mxture of Denstes Maxmum-Lkelhood Estmates Clusterng Prof. Danel Yeung School of Computer Scence and

More information

Machine Learning 9. week

Machine Learning 9. week Machne Learnng 9. week Mappng Concept Radal Bass Functons (RBF) RBF Networks 1 Mappng It s probably the best scenaro for the classfcaton of two dataset s to separate them lnearly. As you see n the below

More information

Pruning Training Corpus to Speedup Text Classification 1

Pruning Training Corpus to Speedup Text Classification 1 Prunng Tranng Corpus to Speedup Text Classfcaton Jhong Guan and Shugeng Zhou School of Computer Scence, Wuhan Unversty, Wuhan, 430079, Chna hguan@wtusm.edu.cn State Key Lab of Software Engneerng, Wuhan

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

TN348: Openlab Module - Colocalization

TN348: Openlab Module - Colocalization TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages

More information

UB at GeoCLEF Department of Geography Abstract

UB at GeoCLEF Department of Geography   Abstract UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department

More information

Related-Mode Attacks on CTR Encryption Mode

Related-Mode Attacks on CTR Encryption Mode Internatonal Journal of Network Securty, Vol.4, No.3, PP.282 287, May 2007 282 Related-Mode Attacks on CTR Encrypton Mode Dayn Wang, Dongda Ln, and Wenlng Wu (Correspondng author: Dayn Wang) Key Laboratory

More information

Deep Classification in Large-scale Text Hierarchies

Deep Classification in Large-scale Text Hierarchies Deep Classfcaton n Large-scale Text Herarches Gu-Rong Xue Dkan Xng Qang Yang 2 Yong Yu Dept. of Computer Scence and Engneerng Shangha Jao-Tong Unversty {grxue, dkxng, yyu}@apex.sjtu.edu.cn 2 Hong Kong

More information

A Statistical Model Selection Strategy Applied to Neural Networks

A Statistical Model Selection Strategy Applied to Neural Networks A Statstcal Model Selecton Strategy Appled to Neural Networks Joaquín Pzarro Elsa Guerrero Pedro L. Galndo joaqun.pzarro@uca.es elsa.guerrero@uca.es pedro.galndo@uca.es Dpto Lenguajes y Sstemas Informátcos

More information

SI485i : NLP. Set 5 Using Naïve Bayes

SI485i : NLP. Set 5 Using Naïve Bayes SI485 : NL Set 5 Usng Naïve Baes Motvaton We want to predct somethng. We have some text related to ths somethng. somethng = target label text = text features Gven, what s the most probable? Motvaton: Author

More information

A Novel Term_Class Relevance Measure for Text Categorization

A Novel Term_Class Relevance Measure for Text Categorization A Novel Term_Class Relevance Measure for Text Categorzaton D S Guru, Mahamad Suhl Department of Studes n Computer Scence, Unversty of Mysore, Mysore, Inda Abstract: In ths paper, we ntroduce a new measure

More information

Face Detection with Deep Learning

Face Detection with Deep Learning Face Detecton wth Deep Learnng Yu Shen Yus122@ucsd.edu A13227146 Kuan-We Chen kuc010@ucsd.edu A99045121 Yzhou Hao y3hao@ucsd.edu A98017773 Mn Hsuan Wu mhwu@ucsd.edu A92424998 Abstract The project here

More information

Classifying Acoustic Transient Signals Using Artificial Intelligence

Classifying Acoustic Transient Signals Using Artificial Intelligence Classfyng Acoustc Transent Sgnals Usng Artfcal Intellgence Steve Sutton, Unversty of North Carolna At Wlmngton (suttons@charter.net) Greg Huff, Unversty of North Carolna At Wlmngton (jgh7476@uncwl.edu)

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

A Lazy Ensemble Learning Method to Classification

A Lazy Ensemble Learning Method to Classification IJCSI Internatonal Journal of Computer Scence Issues, Vol. 7, Issue 5, September 2010 ISSN (Onlne): 1694-0814 344 A Lazy Ensemble Learnng Method to Classfcaton Haleh Homayoun 1, Sattar Hashem 2 and Al

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

Journal of Process Control

Journal of Process Control Journal of Process Control (0) 738 750 Contents lsts avalable at ScVerse ScenceDrect Journal of Process Control j ourna l ho me pag e: wwwelsevercom/locate/jprocont Decentralzed fault detecton and dagnoss

More information

Japanese Dependency Analysis Based on Improved SVM and KNN

Japanese Dependency Analysis Based on Improved SVM and KNN Proceedngs of the 7th WSEAS Internatonal Conference on Smulaton, Modellng and Optmzaton, Bejng, Chna, September 15-17, 2007 140 Japanese Dependency Analyss Based on Improved SVM and KNN ZHOU HUIWEI and

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

A New Approach For the Ranking of Fuzzy Sets With Different Heights

A New Approach For the Ranking of Fuzzy Sets With Different Heights New pproach For the ankng of Fuzzy Sets Wth Dfferent Heghts Pushpnder Sngh School of Mathematcs Computer pplcatons Thapar Unversty, Patala-7 00 Inda pushpndersnl@gmalcom STCT ankng of fuzzy sets plays

More information

Automatic Text Categorization of Mathematical Word Problems

Automatic Text Categorization of Mathematical Word Problems Automatc Text Categorzaton of Mathematcal Word Problems Suleyman Cetntas 1, Luo S 2, Yan Png Xn 3, Dake Zhang 3, Joo Young Park 3 1,2 Department of Computer Scence, 2 Department of Statstcs, 3 Department

More information

Bayesian Classifier Combination

Bayesian Classifier Combination Bayesan Classfer Combnaton Zoubn Ghahraman and Hyun-Chul Km Gatsby Computatonal Neuroscence Unt Unversty College London London WC1N 3AR, UK http://www.gatsby.ucl.ac.uk {zoubn,hckm}@gatsby.ucl.ac.uk September

More information

A Fusion of Stacking with Dynamic Integration

A Fusion of Stacking with Dynamic Integration A Fuson of Stackng wth Dynamc Integraton all Rooney, Davd Patterson orthern Ireland Knowledge Engneerng Laboratory Faculty of Engneerng, Unversty of Ulster Jordanstown, ewtownabbey, BT37 OQB, U.K. {nf.rooney,

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

Available online at Available online at Advanced in Control Engineering and Information Science

Available online at   Available online at   Advanced in Control Engineering and Information Science Avalable onlne at wwwscencedrectcom Avalable onlne at wwwscencedrectcom Proceda Proceda Engneerng Engneerng 00 (2011) 15000 000 (2011) 1642 1646 Proceda Engneerng wwwelsevercom/locate/proceda Advanced

More information

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 An Iteratve Soluton Approach to Process Plant Layout usng Mxed

More information

Online Detection and Classification of Moving Objects Using Progressively Improving Detectors

Online Detection and Classification of Moving Objects Using Progressively Improving Detectors Onlne Detecton and Classfcaton of Movng Objects Usng Progressvely Improvng Detectors Omar Javed Saad Al Mubarak Shah Computer Vson Lab School of Computer Scence Unversty of Central Florda Orlando, FL 32816

More information

Journal of Chemical and Pharmaceutical Research, 2014, 6(6): Research Article. A selective ensemble classification method on microarray data

Journal of Chemical and Pharmaceutical Research, 2014, 6(6): Research Article. A selective ensemble classification method on microarray data Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(6):2860-2866 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A selectve ensemble classfcaton method on mcroarray

More information

Hierarchical clustering for gene expression data analysis

Hierarchical clustering for gene expression data analysis Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally

More information

Selecting Query Term Alterations for Web Search by Exploiting Query Contexts

Selecting Query Term Alterations for Web Search by Exploiting Query Contexts Selectng Query Term Alteratons for Web Search by Explotng Query Contexts Guhong Cao Stephen Robertson Jan-Yun Ne Dept. of Computer Scence and Operatons Research Mcrosoft Research at Cambrdge Dept. of Computer

More information

Under-Sampling Approaches for Improving Prediction of the Minority Class in an Imbalanced Dataset

Under-Sampling Approaches for Improving Prediction of the Minority Class in an Imbalanced Dataset Under-Samplng Approaches for Improvng Predcton of the Mnorty Class n an Imbalanced Dataset Show-Jane Yen and Yue-Sh Lee Department of Computer Scence and Informaton Engneerng, Mng Chuan Unversty 5 The-Mng

More information

A Modified Median Filter for the Removal of Impulse Noise Based on the Support Vector Machines

A Modified Median Filter for the Removal of Impulse Noise Based on the Support Vector Machines A Modfed Medan Flter for the Removal of Impulse Nose Based on the Support Vector Machnes H. GOMEZ-MORENO, S. MALDONADO-BASCON, F. LOPEZ-FERRERAS, M. UTRILLA- MANSO AND P. GIL-JIMENEZ Departamento de Teoría

More information

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between

More information

CLASSIFICATION OF ULTRASONIC SIGNALS

CLASSIFICATION OF ULTRASONIC SIGNALS The 8 th Internatonal Conference of the Slovenan Socety for Non-Destructve Testng»Applcaton of Contemporary Non-Destructve Testng n Engneerng«September -3, 5, Portorož, Slovena, pp. 7-33 CLASSIFICATION

More information

Intelligent Information Acquisition for Improved Clustering

Intelligent Information Acquisition for Improved Clustering Intellgent Informaton Acquston for Improved Clusterng Duy Vu Unversty of Texas at Austn duyvu@cs.utexas.edu Mkhal Blenko Mcrosoft Research mblenko@mcrosoft.com Prem Melvlle IBM T.J. Watson Research Center

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

Credibility Adjusted Term Frequency: A Supervised Term Weighting Scheme for Sentiment Analysis and Text Classification

Credibility Adjusted Term Frequency: A Supervised Term Weighting Scheme for Sentiment Analysis and Text Classification Credblty Adjusted Term Frequency: A Supervsed Term Weghtng Scheme for Sentment Analyss and Text Classfcaton Yoon Km New York Unversty yhk255@nyu.edu Owen Zhang zhonghua.zhang2006@gmal.com Abstract We provde

More information

Learning-based License Plate Detection on Edge Features

Learning-based License Plate Detection on Edge Features Learnng-based Lcense Plate Detecton on Edge Features Wng Teng Ho, Woo Hen Yap, Yong Haur Tay Computer Vson and Intellgent Systems (CVIS) Group Unverst Tunku Abdul Rahman, Malaysa wngteng_h@yahoo.com, woohen@yahoo.com,

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Web Spam Detection Using Multiple Kernels in Twin Support Vector Machine

Web Spam Detection Using Multiple Kernels in Twin Support Vector Machine Web Spam Detecton Usng Multple Kernels n Twn Support Vector Machne ABSTRACT Seyed Hamd Reza Mohammad, Mohammad Al Zare Chahook Yazd Unversty, Yazd, Iran mohammad_6468@stu.yazd.ac.r chahook@yazd.ac.r Search

More information

Data Mining: Model Evaluation

Data Mining: Model Evaluation Data Mnng: Model Evaluaton Aprl 16, 2013 1 Issues: Evaluatng Classfcaton Methods Accurac classfer accurac: predctng class label predctor accurac: guessng value of predcted attrbutes Speed tme to construct

More information

Learning to Classify Documents with Only a Small Positive Training Set

Learning to Classify Documents with Only a Small Positive Training Set Learnng to Classfy Documents wth Only a Small Postve Tranng Set Xao-L L 1, Bng Lu 2, and See-Kong Ng 1 1 Insttute for Infocomm Research, Heng Mu Keng Terrace, 119613, Sngapore 2 Department of Computer

More information

Benchmarking of Update Learning Strategies on Digit Classifier Systems

Benchmarking of Update Learning Strategies on Digit Classifier Systems 2012 Internatonal Conference on Fronters n Handwrtng Recognton Benchmarkng of Update Learnng Strateges on Dgt Classfer Systems D. Barbuzz, D. Impedovo, G. Prlo Dpartmento d Informatca Unverstà degl Stud

More information

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION Paulo Quntlano 1 & Antono Santa-Rosa 1 Federal Polce Department, Brasla, Brazl. E-mals: quntlano.pqs@dpf.gov.br and

More information

Scheduling Remote Access to Scientific Instruments in Cyberinfrastructure for Education and Research

Scheduling Remote Access to Scientific Instruments in Cyberinfrastructure for Education and Research Schedulng Remote Access to Scentfc Instruments n Cybernfrastructure for Educaton and Research Je Yn 1, Junwe Cao 2,3,*, Yuexuan Wang 4, Lanchen Lu 1,3 and Cheng Wu 1,3 1 Natonal CIMS Engneerng and Research

More information

Backpropagation: In Search of Performance Parameters

Backpropagation: In Search of Performance Parameters Bacpropagaton: In Search of Performance Parameters ANIL KUMAR ENUMULAPALLY, LINGGUO BU, and KHOSROW KAIKHAH, Ph.D. Computer Scence Department Texas State Unversty-San Marcos San Marcos, TX-78666 USA ae049@txstate.edu,

More information

CS 534: Computer Vision Model Fitting

CS 534: Computer Vision Model Fitting CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust

More information

Manifold-Ranking Based Keyword Propagation for Image Retrieval *

Manifold-Ranking Based Keyword Propagation for Image Retrieval * Manfold-Rankng Based Keyword Propagaton for Image Retreval * Hanghang Tong,, Jngru He,, Mngjng L 2, We-Yng Ma 2, Hong-Jang Zhang 2 and Changshu Zhang 3,3 Department of Automaton, Tsnghua Unversty, Bejng

More information

Learning-Based Top-N Selection Query Evaluation over Relational Databases

Learning-Based Top-N Selection Query Evaluation over Relational Databases Learnng-Based Top-N Selecton Query Evaluaton over Relatonal Databases Lang Zhu *, Wey Meng ** * School of Mathematcs and Computer Scence, Hebe Unversty, Baodng, Hebe 071002, Chna, zhu@mal.hbu.edu.cn **

More information

Efficient Text Classification by Weighted Proximal SVM *

Efficient Text Classification by Weighted Proximal SVM * Effcent ext Classfcaton by Weghted Proxmal SVM * Dong Zhuang 1, Benyu Zhang, Qang Yang 3, Jun Yan 4, Zheng Chen, Yng Chen 1 1 Computer Scence and Engneerng, Bejng Insttute of echnology, Bejng 100081, Chna

More information

Detection of an Object by using Principal Component Analysis

Detection of an Object by using Principal Component Analysis Detecton of an Object by usng Prncpal Component Analyss 1. G. Nagaven, 2. Dr. T. Sreenvasulu Reddy 1. M.Tech, Department of EEE, SVUCE, Trupath, Inda. 2. Assoc. Professor, Department of ECE, SVUCE, Trupath,

More information

Fuzzy Modeling of the Complexity vs. Accuracy Trade-off in a Sequential Two-Stage Multi-Classifier System

Fuzzy Modeling of the Complexity vs. Accuracy Trade-off in a Sequential Two-Stage Multi-Classifier System Fuzzy Modelng of the Complexty vs. Accuracy Trade-off n a Sequental Two-Stage Mult-Classfer System MARK LAST 1 Department of Informaton Systems Engneerng Ben-Guron Unversty of the Negev Beer-Sheva 84105

More information

Using Fuzzy Logic to Enhance the Large Size Remote Sensing Images

Using Fuzzy Logic to Enhance the Large Size Remote Sensing Images Internatonal Journal of Informaton and Electroncs Engneerng Vol. 5 No. 6 November 015 Usng Fuzzy Logc to Enhance the Large Sze Remote Sensng Images Trung Nguyen Tu Huy Ngo Hoang and Thoa Vu Van Abstract

More information

SRBIR: Semantic Region Based Image Retrieval by Extracting the Dominant Region and Semantic Learning

SRBIR: Semantic Region Based Image Retrieval by Extracting the Dominant Region and Semantic Learning Journal of Computer Scence 7 (3): 400-408, 2011 ISSN 1549-3636 2011 Scence Publcatons SRBIR: Semantc Regon Based Image Retreval by Extractng the Domnant Regon and Semantc Learnng 1 I. Felc Raam and 2 S.

More information

An Anti-Noise Text Categorization Method based on Support Vector Machines *

An Anti-Noise Text Categorization Method based on Support Vector Machines * An Ant-Nose Text ategorzaton Method based on Support Vector Machnes * hen Ln, Huang Je and Gong Zheng-Hu School of omputer Scence, Natonal Unversty of Defense Technology, hangsha, 410073, hna chenln@nudt.edu.cn,

More information

Petri Net Based Software Dependability Engineering

Petri Net Based Software Dependability Engineering Proc. RELECTRONIC 95, Budapest, pp. 181-186; October 1995 Petr Net Based Software Dependablty Engneerng Monka Hener Brandenburg Unversty of Technology Cottbus Computer Scence Insttute Postbox 101344 D-03013

More information