Optimally Combining Positive and Negative Features for Text Categorization

Size: px
Start display at page:

Download "Optimally Combining Positive and Negative Features for Text Categorization"

Transcription

1 Optally Cobnng Postve and Negatve Features for Text Categorzaton Zhaohu Zheng Rohn Srhar CEDAR, Dept. of Coputer Scence and Engneerng, State Unversty of New York at Buffalo, NY 460 USA Abstract Ths paper presents a novel local feature selecton approach for text categorzaton. It constructs a feature set for each category by frst selectng a set of ters hghly ndcatve of ebershp as well as another set of ters hghly ndcatve of non-ebershp, then unfyng the two sets. The sze rato of the two sets was eprcally chosen to obtan optal perforance. Ths s n contrast wth the standard local feature selecton approaches that ether () only select the ters ost ndcatve of ebershp; or () plctly but not optally cobne the ters ost ndcatve of ebershp wth non-ebershp. The experental coparson between the proposed approach and standard approaches was conducted on four feature selecton etrcs: chsquare, correlaton coeffcen odds rato, and GSS coeffcent. The results show that the proposed approach proves text categorzaton perforance.. Introducton Text categorzaton s a achne learnng task, defned as autoatcally assgnng predefned category labels to new free text docuents. A growng nuber of statstcal achne learnng technques have been appled to text categorzaton n recent years, notable aong whch are fve approaches: nearest neghbor classfer (Creey et al., 99; Yang, 994), Bayesan classfer (Tzeras & Hartan, 993; Lews & Rnguette, 994), decson tree (Apte, Daerau, & Wess, 994), neural networks (wener, Pederson & Wegend, 995; Ng, Goh & Low, 997), and support vector achnes (Joachs, 998). One ajor dffculty n text categorzaton probles s the hgh densonalty of the nput feature space typcal for textual data. Ths s because each dstnct ter or token appearng n the docuent collecton represents one denson n the feature space. For a typcal docuent collecton, there are tens of thousands or even hundreds of thousands of dstnct ters or tokens. After the elnaton of stop words and steng, the set of features s stll too large for any learnng algorths, e.g. neural networks. In order to prove scalablty of text categorzaton, we need to apply feature selecton technques to reduce the feature sze further ore. Varous feature selecton ethods have been proposed n the lterature and ther relatve erts have been tested by experentally evaluatng the text categorzaton perforance. There are two dstnct ways of vewng feature selecton, dependng on whether the task s perfored locally or globally: () local feature selecton. For each category, a set of ters s chosen for classfcaton based on the relevant and rrelevant docuents n ths category. () global feature selecton. A set of ters s chosen for the classfcaton under all categores based on the relevant docuents n the categores. The local feature selecton for each category can be vewed as the global feature selecton for two "categores": relevant and rrelevant. Local feature selecton s of nterest n ths paper. Several feature selecton easures have been explored n the lterature ncludng Docuent Frequency (DF), Inforaton Gan (IG), Mutual Inforaton (MI), Chsquare (CHI), Correlaton Coeffcent (CC), Odds rato (OR) and GSS coeffcent (GSS) (Galavot Sebastan, & S, 000; Mtchell, 996; Mladen, 998; Ng, Goh & Low, 997; Qunlan, 986; Rjsbergen, 979; Sebastan, 00; Schutze, Hull & Pederson, 995; Yang & Pedersen, 997). Out of the seven easures, CHI, CC, OR and GSS see to be the ost effectve based on the experents reported so far. We wll focus on the four easures. Ths paper presents a novel local feature selecton ethod that explctly selects and cobnes the features hghly ndcatve of ebershp and non-ebershp for each category n a way such that the optal perforance, e.g. F easure, wll be obtaned on a valdaton set. The features ndcatve of ebershp and non-ebershp are also referred to as the postve and negatve features respectvely. The presence of postve and negatve features n a docuent ndcates ts relevance and nonrelevance respectvely. The rest of the paper s organzed as follows. Secton Workshop on Learnng fro Ibalanced Datasets II, ICML, Washngton DC, 003.

2 descrbes the four feature selecton easures and standard ethods of usng the. Secton 3 presents the proposed feature selecton technque. In Secton 4, we descrbe naïve Bayes classfer whose perforance wll be used to evaluate the effectveness of varous feature selecton ethods. Experental results are analyzed n Secton 5. Conclusons are gven n Secton 6.. Related Work In ths secton, we wll frst brefly revew the four feature selecton easures, then present the ethods of usng the n the lterature, and fnally descrbe the balanced data proble and ts pacts on feature selecton. Note tha the ethods here refer to the schee of applyng feature selecton easures to ter selecton.. Feature Selecton Measures In what follows, A, B, C, and D wll denote the nubers of tes a ter t and a category c co-occur, t occurs wthout c, c occurs wthout and nether c nor t occurs, respectvely. N represents the total nuber of docuents... CHI-SQUARE (CHI) CHI easures the lack of ndependence between a ter t and a category c and can be copared to the ch-square dstrbuton wth one degree of freedo to judge extreeness (Yang, 999; Schutze, Hull & Pederson, 995). It s defned as: N[ c ) c ) c ) c )] χ ( c ) = t) t) N( AD CB) ( A+ C) ( B+ D) ( A+ B) ( C+ D) χ has a natural value of zero f t and c are ndependent. It s a noralzed value, and hence s coparable across ters for the sae category... CORRELATION COEFFICIENT (CC) Correlaton coeffcent CC ( c ) of a word t wth a category c was defned by Ng et al. as (Ng, Goh & Low, 997; Sebastan, 00): CC( = N[ ] t) t) N( AD CB) ( A+ C) ( B+ D) ( A+ B) ( C+ D) It s a varant of the CHI etrc, where CC = χ. CC can be vewed as a "one-sded" ch-square etrc. The postve values correspond to features ndcatve of ebershp, whle negatve values ndcate nonebershp. The greater (saller) the postve (negatve) values are, the stronger the ters wll be to ndcate the ebershp (non-ebershp). Standard CC based local feature selecton ethod selects the ters wth axu CC value as features. The ratonale behnd s that ters cong fro the rrelevant texts of a category are consdered useless. On the other hand, CHI s nonnegatve, whose values ndcate the ebershp or nonebershps of a ter to one category. Accordngly the abguous features wll be ranked lower. In contrast wth CC, CHI consders the ters cong fro both the relevant and non-relevant texts...3 ODDS RATIO (OR) Odds rato was proposed orgnally by van Rjsbergen et al. (979) for selectng ters for relevance feedback. The basc dea s that the dstrbuton of features on the relevant docuents s dfferent fro the dstrbuton of features on the non-relevant docuents. It has been used by Mladenc (998) for selectng ters n text categorzaton. It s defned as follows: t c ) [ t c )] ORt (, c ) = [ t ] t AD CB N AD CB The values greater than correspond to features ndcatve of ebershp, whle the values less than correspond to features ndcatve of non-ebershp. It only consders the ters fro relevant text. The Expected Lkelhood Estate (ELE) soothng technque was used n ths paper to handle sngulartes: A+ 0.5 B+ 0.5 ( ) ( 0.5) ( 0.5) (, ) A+ D+ ORt c A C B D = A+ 0.5 B+ 0.5 ( ) ( C+ 0.5) ( B+ 0.5) A+ C+ B+ D+..4 GSS COEFFICIENT GSS Coeffcent s another splfed varant of the GSS = ( χ statstcs proposed by Galavott et al. (000), whch s defned as: Slar to CC, the postve values correspond to features ndcatve of ebershp, whle negatve values ndcate

3 non-ebershp. Therefore, only the postve ters are consdered.. Feature Selecton Methods Each of the above four easures s actually a functon f ( c ) wth a ter t and a category c as ts paraeters. The value ndcates soe relatonshp between the ter and the category. In global feature selecton, we assess the value of a ter n a global, or category-ndependen sense. Ether the average or the axu of ther category-specfc values are usually coputed and copared (Yang & Pederson, 997). That s, f f avg ax ( t) = = Gven a vocabulary V and a functon f that aps ters to real values, we defne two subsets of V of sze l, vz., Max[ V, f, l] V and Mn[ V, f, l] V as follows: In other words, Max [ V, f, l] and Mn [ V, f, l] conssts of the l ters t j V wth the hghest and lowest f ( t j ) values respectvely. Then, n global feature selecton, the feature set F wll be Max [ V, f ax, l ] or Max[ V, f avg, l], where l s the sze of F. f ( ( t) = ax{ f ( } = x Mn[ V, f, l], y V Mn[ V, f, l], f ( x) f ( y), x Max[ V, f, l], y V Max[ V, f, l], f ( x) f ( y) In local feature selecton, a feature set s constructed for each category. Accordngly, the feature set F for c wll be Max[ V, f (,, l], where f can be any feature selecton easure that uses two way contngency table of a ter t and a category c. Local feature selecton ethods usng asyetrc easures, e.g. CC, OR and GSS, actually pck out the ters ost ndcatve of ebershp. They wll never consder negatve features unless all the postve features have already been selected. On the other hand, local feature selecton ethods usng syetrc easures, e.g. CHI, plctly cobne the ters ost ndcatve of ebershp and nonebershp. The sze rato between the postve and negatve features s nternally decded by thresholdng on the sze of feature set..3 Ibalanced Data Proble When tranng a bnary text classfer (text flterng syste) for a category, we use all the docuents n the tranng corpus that belong to that category as relevant tranng data and all the docuents n the tranng corpus that belong to all the other categores as non-relevant tranng data. It s often the case that there s an overwhelng nuber of non-relevant tranng docuents especally when there s a large collecton of categores wth each assgned to a sall nuber of docuents. Many approaches have been eployed to address the balanced data proble. The concepts of "query zone" and "category zone" were ntroduced to select a subset of the non-relevant docuents as the nonrelevant tranng data (Hears et al., 996; Ruz & Srnvasan, 999). These docuents are the ost relevant non-relevant docuents. Essentally, these ethods try to obtan ore balanced relevant and non-relevant tranng docuents. In ths paper, we consder ths proble fro a dfferent perspectve. Instead of balancng the tranng data, our ethod balances the postve and negatve features, e.g., generates the optal cobnaton of postve and negatve features accordng to the balanced data. The pacts of balanced data proble on the standard local feature selecton for text flterng can be llustrated as follows: () For the ethods usng the postve features only (e.g. CC, OR, or GSS), the non-relevant docuents are subject to sclassfcaton. It wll be even worse for the balanced data proble, where nonrelevant docuents donate. How to confdently reject the non-relevant docuents s very portant n that case. () When applyng to the balanced data the ethods plctly cobnng the postve and negatve features, e.g. CHI, the postve features usually have uch hgher values than the negatve features accordng to ts defnton and our prevous experents. Therefore, the postve features wll donate n the feature set. The slar stuaton occurs as descrbed n (). For exaple, the upper lt CHI value of a postve or negatve feature s N ( A + C) ( B + D). For the postve feature, t ( A + B) ( C + D) represents the case that the feature appears n every relevant docuen but never n any non-relevant docuent. For the negatve feature, t eans that the feature appears n every non-relevant docuen but never n any relevant docuent. Due to the large aount and dversty of the non-relevant docuents n balanced data se t s uch ore dffcult for a negatve feature to acheve that

4 axu than a postve feature. Ths extree exaple shed lght on why the CHI values of postve features are usually uch larger than that of negatve features. It wll be napproprate of standard local feature selecton usng CHI to sply copare ther CHI values wthout consderng whether they are postve or negatve. 3. Cobnng Postve and Negatve Features We beleve that: () The negatve features are also useful and should be ncluded n the feature set. Snce the local feature selecton can be vewed globally by consderng relevant and non-relevant as two "categores", the negatve features are actually fro the "non-relevant" category. In such a b-category proble, ntutvely thnkng, ters fro both of the should be consdered. The presence of negatve features n a docuent s a good ndcator of ts non-ebershp. Thus the text flterng perforance can be proved through confdent rejecton of non-relevant docuents. () Iplct cobnaton of postve and negatve features s not necessarly optal especally for balanced data se n whch the values of postve features are usually uch larger than negatve features. CHI ght only select the postve features (equvalent to standard CC based approach n ths case) when the sze of feature set s sall. Thus the sze rato of the postve and negatve features should be explctly set and eprcally tuned to dfferent scenaro: data collecton, text classfer, etc. Based on the above two observatons, we propose a new feature selecton approach contanng the followng three steps: For each category c : Step : generate a postve-feature set + F Max V, f (,, l ], l, 0 < l l, s a nature nuber. [ Step : generate a negatve-feature set F as Max[ V, f (, c ), l ], l = l l s a non-negatve nteger. Step 3: + = F F F. Where: l, l << V, s the predefned sze of feature set. l / l, 0 < l / l, s the key paraeter and should be chosen to optze the categorzaton perforance on a as valdaton set. When l = l, e.g. l = 0, the ethod corresponds to the standard local feature selecton ethod. So, the standard ethod can be vewed as one partcular case of our ethod. In Step, we ntend to pck out those ters ost ndcatve of ebershp of c, whle n step, those ters ost ndcatve of non-ebershp are selected as well. The feature set wll be the unon of the two. Accordngly, the functon f should satsfy: the ( larger the functon value s, the ore lkely the ter belongs to the category c. Obvously, CC, OR, and GSS can serve as such functons, whle CHI can not. The reasons why we present CHI n ths paper are as follows: () CHI has been proved to be an effectve and robust feature selecton easure n the lterature. In order to ake our experents coparable to others, we use t as our baselne. () CHI s very related n concept to CC based approaches usng ether the standard ethod or our proposed ethod, as wll be shown later. Based on the defnton of the three easures, we can easly obtan: Accordngly, Step can be rewrtten as: Step : Mn V, f (,, l ]. [ CC( = CC(, OR( c ) =, OR( GSS( = GSS(. generate a negatve-feature set F as Copared wth the standard ethods that only consder the ters ndcatve of ebershp, e.g., CC, OR and GSS, we add the step, whch add to the feature set those ters ndcatve of non-ebershp. The advantage of our approach over the standard one can be llustrated by the followng sple exaple: gven a lst of ters t,, t8 whose CC values are 9, 8.5, 8., 8,, -, -5.8, and -5.9 respectvely. If the sze of feature set s 6, t through t6 wll be selected. Suppose a new docuent contanng t5, t7 and t8 coes n; the syste wll assgn t as relevant although t s rrelevant. On the other hand, the proposed approach wll be ore lkely to choose t7 and t8 nstead of t5 and t6 and hence classfy the new docuent correctly. When applyng our ethod to CC, the resulted approach sees very slar to the standard CHI based approach:

5 () Both of the consder not only the ters ndcatve of ebershp but non-ebershp also. The proposed ethod usng CC explctly cobnes the whle standard CHI plctly consders the. () Because CHI value s equal to the squared CC value, aong those ters wth postve/negatve CC value ndcatve of ebershp/non-ebershp, the greater/saller the value s, the ore lkely t wll be selected as features by both ethods. However, the ajor dfferences between two approaches are: () CHI does not dfferentate between the ters ndcatve of ebershp and non-ebershp by coparng the squared values. Although t ght consder n concept both postve and negatve features, the sze rato between the s not optal. There are no extra paraeters to optze that rato. In contras due to ts desgn, our approach can optze the sze rato to get best perforance. Let us refer to the above exaple. If we apply CHI to select four features, t through t4 wll be selected, each of whch s fro relevant docuent set. When the sae new docuent coes n, the syste can hardly tell whether t s relevant or not. () Because the postve exaples are far fewer than the negatve exaples n the tranng corpus, CHI actually favors the postve features accordng to ts defnton. In other words, the CHI values are not coparable between the postve and the negatve features. Usually the values of postve features are uch larger than negatve features as descrbed n Secton.3. The proposed approach, however, allows the szes of the feature set to be as sall as needed whle guaranteeng that the syste uses both postve and negatve features n an optal way. 4. Naïve Bayes Classfer for Text Flterng Naïve Bayes classfer s a hghly practcal Bayesan learnng ethod [6]. The central dea s to use the jont probabltes of ters and categores to estate the probabltes of categores gven a docuent. The naïve part of such a odel s the splfyng assupton that the words are condtonally ndependent gven the category as well as the probablty of word occurrence s ndependent of poston wthn the text. For text flterng, the relevance score between a new docuent d and the category c can be calculated as: log c) + Score( d, c) = log c) + ω ω log ω c) log ω c) where: ω s the feature appearng n the docuent d ; c) and P (c) represent the pror probabltes of relevant and non-relevant respectvely; ω c) and ω c) represent the lkelhood probabltes of ω appearng n relevant and nonrelevant tranng docuents respectvely. A bnary decson (relevant or non-relevant) on d wth respect to the category c s obtaned by thresholdng on Score ( d, c). We tran one naïve Bayes classfer per category. A relevance score threshold s learned per category to eprcally optze F easure on the valdaton set. 5. Experental Results and Analyss 5. Experental Settng To ake our results coparable to others, we have used the Reuters-578 corpus (Yang, 999; Yang & Pederson, 997), as t s a wdely used benchark n text categorzaton doan. For ths paper, we use the ApteMod verson of Reuters-578 as descrbed by Yang (999). Fnally we obtan 90 categores n both the tranng and test sets, a tranng set of 7,769 docuents, and a test set of 3,09 docuents. The average nuber of categores per docuent s.3. The nuber of postve nstances per category ranges fro a nu of to axu of,877 n the tranng set. In order to autoatcally learn the category specfc paraeters, e.g. sze rato n feature selecton and thresholds n classfcaton, we use two thrds of the tranng set for tranng and the reanng one thrd as "valdaton". After obtanng these thresholds, the classfers wll be retraned on the whole tranng set. Classfcaton effectveness has been evaluated n ters of the standard precson, recall and F easure. The precson, recall and F for each category defned as: where: R α α PR =, P =, F =, β γ P + R c are α s the nuber of docuents correctly assgned by syste to category c, and β s the nuber of docuents assgned by syste to category c, and γ s the nuber of docuents fro category ( =,, ). c

6 These category-relatve values ay n turn be averaged accordng to two alternatve ways: () acro-averagng: the precsons and recalls can be coputed for the bnary decsons on each ndvdual category frst and then be averaged over categores. That s, acrof = = F, () cro-averagng: the precsons and recalls are coputed globally over all the n x bnary decsons where n s the nuber of total test docuents, and s the nuber of categores. That s, cror = = = α, β crop = = = cro-averagng F has been wdely used n cross-ethod coparsons. In ths paper, we wll focus on ths easure. Accordngly, the sze rato between the postve and negatve feature sets wll be optzed to get best cro-averaged F easure on the valdaton set. In order to copare our proposed feature selecton approach wth the standard one, we apply the to naïve Bayes classfers. Three groups of feature selecton ethods are consdered: Group : Standard CHI, Standard CC, and proved CC. The three ethods are referred as G, G, and G3 respectvely for short for. Group : Standard OR and proved OR, referred as G and G. Group 3: Standard GSS coeffcent and proved GSS coeffcen referred as G3 and G3. where: standard CHI, CC, OR and GSS represents the standard local feature selecton ethods usng CHI, CC, OR and GSS easures respectvely, whle the proved CC, OR and GSS are the applcaton of the proposed feature selecton ethod to CC, OR and GSS easures respectvely. Note tha there s no "proved CHI" ethod because CHI easure does not satsfy the requreent as entoned n Secton 3. However, due to ts slarty wth CC, we put standard CHI n the group of standard CC and proved CC. The feature selecton P R = = acrop =, acror =, and cror crop crof = cror + crop α, and γ ethods are copared wth each other n the sae group. Typcal sze of a local feature set s between 0 and 50 (Sebastan 00). In ths paper the perforances are reported at the range of Experental Results Table lsts the cro-averaged F values for naïve Bayes classfers wth the seven dfferent feature selecton ethods (as lsted n the frst row) at dfferent szes of feature set (as lsted n the frst colun). F G G G3 G G G3 G Table : Mcro-averaged F values for naïve Bayes classfers wth the seven feature selecton ethods at dfferent szes of features. As s shown n Table, the proved Correlaton Coeffcent ethod (G3) s uch better than the standard CC (G) and CHI (G) ethod, and the proved Odds rato (G) and GSS Coeffcent ethods (G3) greatly outperfor the correspondng standard ethods (G and G3 respectvely). Ths confrs our ntuton that by optally cobnng postve features wth negatve features, the text categorzaton perforance wll be rearkably proved. Table lsts the cro-averaged precson, recall of each ethod when the cro-averaged F s axu over the dfferent szes of features. For exaple, G acheve ts axu cro-averaged F (.784) as the sze of feature set s 50 accordng to the frst two coluns of Table. The second row n Table gave the correspondng cro-averaged precson and recall as well. Fro Table, we can see our proposed approach greatly ncreases the cro-averaged recall and F wthout hurtng precson too uch. Because we optze F easure for each category, the ore balanced croaveraged precson and recall are obtaned. It also explans why the cro-averaged precson reans unproved. In order to llustrate the rato of negatve features n the feature se we lst n Table 3 the nuber of categores, n whch the nuber of postve features s greater than, saller than or equal to the nuber of negatve features n case of proved CC (feature sze = 50). The three cases

7 Method crop McroR crof G G G G G G G Table : Mcro-averaged precson, recall and F values for naïve Bayes classfers wth the seven feature selecton ethods. l / l correspond to > 0.5, < 0.5 and = 0.5 respectvely n the frst colun of Table 3. Table 3 shows that n order to obtan best text categorzaton perforance n ters of F, we should select ore negatve features than postve features n 47 out of the 90 categores. It reconfrs the usefulness of negatve features. Our explanaton s: when the negatve exaples are overwhelng, rejecton of the negatve exaples wth hgh confdence (accuracy) wll be of ore portance, whch could be acheved by ncreasng the nuber of the negatve features. 6. Conclusons l / l Nuber of categores > < = Table 3: The nuber of categores n whch the sze of postve set s greater than, saller than or equal to the negatve set n the case that the proved CC obtan best perforance (feature sze s 50.) Experents wth four known feature selecton easures and ethods and a new feature selecton ethod have been descrbed. We proposed an effectve feature selecton ethod that optally cobnes the ters ost ndcatve of ebershp and non-ebershp. The an conclusons are: The ters ndcatve of non-ebershp are useful and should be consdered n local feature selecton. By explctly and optally settng the sze rato of the postve and negatve features, the text categorzaton perforance was proved greatly. References C. Apte, F. Daerau, & S. Wess (994). Towards language ndependent autoated learnng of text categorzaton odels. In Proceedng of the 7 th Annual ACM/SIGIR conference. R.H. Creecy, et al. (99). Tradng ps and eeory for knowledge engneerng: classfyng census returns on the connecton achne. Co. ACM, 35: Galavot L., Sebastan, F., & S, M. (000). Experents on the use of feature selecton and negatve evdence n autoated text categorzaton. In Proceedngs of ECDL-00, 4 th European Conference on Research and Advanced Technology for Dgtal Lbrares (Lsbon, Portugal, 000), Mart Hears et al. (996). Xerox TREC4 ste report. In Proceedngs of the Fourth Text Retreval Conference TREC-4. Thorsten Joachs (998). Text categorzaton wth Support Vector Machnes: Learnng wth Many Relevant Features. In European Conference on Machne Learnng (ECML), pages 37-4, Berln, Sprnger. D.D. Lews & M. Rnguette (994). Coparson of two learnng algorths for text categorzaton. In Proceedngs of the Thrd Annual Syposu on Docuent Analyss and Inforaton Retreval (SIGIR 94). To Mtchell (996). Machne Learnng. McGraw Hll. Mladen D. [998]. Machne Learng on nonhoogeneous, dstrbuted text data. PhD Dssertaton, Unversty of Ljubljana, Slovena, 998. H.T. Ng, W.B. Goh, & K.L. Low (997). Feature selecton, perceptron learnng, and a usablty case study for text categorzaton. In 0 th Ann Int ACM SIGIR Conference on Research and Developent n Inforaton Retreval (SIGIR 97), pages J.R. Qunlan (986). Inducton of decson trees. Machne Learnng, ():8-06. Van Rjsbergen CJ (979). Inforaton Retreval. Butterworths, London, nd edton. Ruz, M.E. & Srnvasan, P. (999). Herarchcal neural networks for text categorzaton. In Proceedngs of SIGIR-99, nd ACM Internatonal Conference on Research and Developent n Inforaton Retreval, 8-8. Sebastan F. (00). Machne Learnng n Autoated Text Categorzaton. ACM Coputng Surveys, Vol 34, No., pp. -47.

8 Schutze H, Hull D.A. & Pederson J.O. (995). A coparson of classfers and docuent representatons for the routng proble. In Proceedngs of the 8 th Annual Internatonal ACM SIGIR Conference on Research and Developent n Inforaton Retreval, Seattle, WA, pp K. Tzeras & S. Hartan (993). Autoatc ndexng based on bayesan nference networks. In Proc 6 th Int ACM SIGIR Conference on Research and Developent n Inforaton Retreval (SIGIR 93), pages -34. A.S. Wegend, E.D. Wener, & J.O. Pederson (999). Explotng herarchy n text categorzaton. Inforaton Retreval, (-):6990. E. Wener, J.O. Pederson & A.S. Wegend (995). A neural network approach to topc spottng. In Proceedngs of the Fourth Annual Syposu on Docuent Analyss and Inforaton Retreval (SDAIR 95), pages 37-33, Nevada, Las Vegas. Unversty of Nevada, Las Vegas. Y. Yang (994). Expert network: Effectve and effcent learnng fro huan decsons n text categorzaton and retreval. In 7th Ann Int ACM SIGIR Conference on Research and Developent n Inforaton Retreval (SIGIR 94), pages 3-. Y. Yang (999). An evaluaton of statstcal approaches to text categorzaton. Journal of Inforaton Retreval, (/): Y. Yang and J.P. Pedersen (997). A coparatve study on feature selecton n text categorzaton. In Jr. D. H. Fsher, edtor, The Fourteenth Internatonal Conference on Machne Learnng, pp Morgan Kaufann.

A Balanced Ensemble Approach to Weighting Classifiers for Text Classification

A Balanced Ensemble Approach to Weighting Classifiers for Text Classification A Balanced Enseble Approach to Weghtng Classfers for Text Classfcaton Gabrel Pu Cheong Fung 1, Jeffrey Xu Yu 1, Haxun Wang 2, Davd W. Cheung 3, Huan Lu 4 1 The Chnese Unversty of Hong Kong, Hong Kong,

More information

Merging Results by Using Predicted Retrieval Effectiveness

Merging Results by Using Predicted Retrieval Effectiveness Mergng Results by Usng Predcted Retreval Effectveness Introducton Wen-Cheng Ln and Hsn-Hs Chen Departent of Coputer Scence and Inforaton Engneerng Natonal Tawan Unversty Tape, TAIWAN densln@nlg.cse.ntu.edu.tw;

More information

A Cluster Tree Method For Text Categorization

A Cluster Tree Method For Text Categorization Avalable onlne at www.scencedrect.co Proceda Engneerng 5 (20) 3785 3790 Advanced n Control Engneerngand Inforaton Scence A Cluster Tree Meod For Text Categorzaton Zhaoca Sun *, Yunng Ye, Weru Deng, Zhexue

More information

Using Gini-Index for Feature Selection in Text Categorization

Using Gini-Index for Feature Selection in Text Categorization 3rd Internatonal Conference on Inforaton, Busness and Educaton Technology (ICIBET 014) Usng Gn-Index for Feature Selecton n Text Categorzaton Zhu Wedong 1, Feng Jngyu 1 and Ln Yongn 1 School of Coputer

More information

Optimization Methods: Integer Programming Integer Linear Programming 1. Module 7 Lecture Notes 1. Integer Linear Programming

Optimization Methods: Integer Programming Integer Linear Programming 1. Module 7 Lecture Notes 1. Integer Linear Programming Optzaton Methods: Integer Prograng Integer Lnear Prograng Module Lecture Notes Integer Lnear Prograng Introducton In all the prevous lectures n lnear prograng dscussed so far, the desgn varables consdered

More information

On-line Scheduling Algorithm with Precedence Constraint in Embeded Real-time System

On-line Scheduling Algorithm with Precedence Constraint in Embeded Real-time System 00 rd Internatonal Conference on Coputer and Electrcal Engneerng (ICCEE 00 IPCSIT vol (0 (0 IACSIT Press, Sngapore DOI: 077/IPCSIT0VNo80 On-lne Schedulng Algorth wth Precedence Constrant n Ebeded Real-te

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

Generating Fuzzy Term Sets for Software Project Attributes using and Real Coded Genetic Algorithms

Generating Fuzzy Term Sets for Software Project Attributes using and Real Coded Genetic Algorithms Generatng Fuzzy Ter Sets for Software Proect Attrbutes usng Fuzzy C-Means C and Real Coded Genetc Algorths Al Idr, Ph.D., ENSIAS, Rabat Alan Abran, Ph.D., ETS, Montreal Azeddne Zah, FST, Fes Internatonal

More information

Large Margin Nearest Neighbor Classifiers

Large Margin Nearest Neighbor Classifiers Large Margn earest eghbor Classfers Sergo Bereo and Joan Cabestany Departent of Electronc Engneerng, Unverstat Poltècnca de Catalunya (UPC, Gran Captà s/n, C4 buldng, 08034 Barcelona, Span e-al: sbereo@eel.upc.es

More information

Solutions to Programming Assignment Five Interpolation and Numerical Differentiation

Solutions to Programming Assignment Five Interpolation and Numerical Differentiation College of Engneerng and Coputer Scence Mechancal Engneerng Departent Mechancal Engneerng 309 Nuercal Analyss of Engneerng Systes Sprng 04 Nuber: 537 Instructor: Larry Caretto Solutons to Prograng Assgnent

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Low training strength high capacity classifiers for accurate ensembles using Walsh Coefficients

Low training strength high capacity classifiers for accurate ensembles using Walsh Coefficients Low tranng strength hgh capacty classfers for accurate ensebles usng Walsh Coeffcents Terry Wndeatt, Cere Zor Unv Surrey, Guldford, Surrey, Gu2 7H t.wndeatt surrey.ac.uk Abstract. If a bnary decson s taken

More information

Multiple Instance Learning via Multiple Kernel Learning *

Multiple Instance Learning via Multiple Kernel Learning * The Nnth nternatonal Syposu on Operatons Research and ts Applcatons (SORA 10) Chengdu-Juzhagou, Chna, August 19 23, 2010 Copyrght 2010 ORSC & APORC, pp. 160 167 ultple nstance Learnng va ultple Kernel

More information

Experiments in Text Categorization Using Term Selection by Distance to Transition Point

Experiments in Text Categorization Using Term Selection by Distance to Transition Point Experments n Text Categorzaton Usng Term Selecton by Dstance to Transton Pont Edgar Moyotl-Hernández, Héctor Jménez-Salazar Facultad de Cencas de la Computacón, B. Unversdad Autónoma de Puebla, 14 Sur

More information

A Novel System for Document Classification Using Genetic Programming

A Novel System for Document Classification Using Genetic Programming Journal of Advances n Inforaton Technology Vol. 6, No. 4, Noveber 2015 A Novel Syste for Docuent Classfcaton Usng Genetc Prograng Saad M. Darwsh, Adel A. EL-Zoghab, and Doaa B. Ebad Insttute of Graduate

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

Introduction. Leslie Lamports Time, Clocks & the Ordering of Events in a Distributed System. Overview. Introduction Concepts: Time

Introduction. Leslie Lamports Time, Clocks & the Ordering of Events in a Distributed System. Overview. Introduction Concepts: Time Lesle Laports e, locks & the Orderng of Events n a Dstrbuted Syste Joseph Sprng Departent of oputer Scence Dstrbuted Systes and Securty Overvew Introducton he artal Orderng Logcal locks Orderng the Events

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

Pose Invariant Face Recognition using Hybrid DWT-DCT Frequency Features with Support Vector Machines

Pose Invariant Face Recognition using Hybrid DWT-DCT Frequency Features with Support Vector Machines Proceedngs of the 4 th Internatonal Conference on 7 th 9 th Noveber 008 Inforaton Technology and Multeda at UNITEN (ICIMU 008), Malaysa Pose Invarant Face Recognton usng Hybrd DWT-DCT Frequency Features

More information

Handwritten English Character Recognition Using Logistic Regression and Neural Network

Handwritten English Character Recognition Using Logistic Regression and Neural Network Handwrtten Englsh Character Recognton Usng Logstc Regresson and Neural Network Tapan Kuar Hazra 1, Rajdeep Sarkar 2, Ankt Kuar 3 1 Departent of Inforaton Technology, Insttute of Engneerng and Manageent,

More information

TN348: Openlab Module - Colocalization

TN348: Openlab Module - Colocalization TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages

More information

Relevance Feedback in Content-based 3D Object Retrieval A Comparative Study

Relevance Feedback in Content-based 3D Object Retrieval A Comparative Study 753 Coputer-Aded Desgn and Applcatons 008 CAD Solutons, LLC http://www.cadanda.co Relevance Feedback n Content-based 3D Object Retreval A Coparatve Study Panagots Papadaks,, Ioanns Pratkaks, Theodore Trafals

More information

Prediction of Dumping a Product in Textile Industry

Prediction of Dumping a Product in Textile Industry Int. J. Advanced Networkng and Applcatons Volue: 05 Issue: 03 Pages:957-96 (03) IN : 0975-090 957 Predcton of upng a Product n Textle Industry.V.. GANGA EVI Professor n MCA K..R.M. College of Engneerng

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

Human Face Recognition Using Radial Basis Function Neural Network

Human Face Recognition Using Radial Basis Function Neural Network Huan Face Recognton Usng Radal Bass Functon eural etwor Javad Haddadna Ph.D Student Departent of Electrcal and Engneerng Arabr Unversty of Technology Hafez Avenue, Tehran, Iran, 594 E-al: H743970@cc.au.ac.r

More information

Comparative Study between different Eigenspace-based Approaches for Face Recognition

Comparative Study between different Eigenspace-based Approaches for Face Recognition Coparatve Study between dfferent Egenspace-based Approaches for Face Recognton Pablo Navarrete and Javer Ruz-del-Solar Departent of Electrcal Engneerng, Unversdad de Chle, CHILE Eal: {pnavarre, jruzd}@cec.uchle.cl

More information

Efficient Text Classification by Weighted Proximal SVM *

Efficient Text Classification by Weighted Proximal SVM * Effcent ext Classfcaton by Weghted Proxmal SVM * Dong Zhuang 1, Benyu Zhang, Qang Yang 3, Jun Yan 4, Zheng Chen, Yng Chen 1 1 Computer Scence and Engneerng, Bejng Insttute of echnology, Bejng 100081, Chna

More information

Pruning Training Corpus to Speedup Text Classification 1

Pruning Training Corpus to Speedup Text Classification 1 Prunng Tranng Corpus to Speedup Text Classfcaton Jhong Guan and Shugeng Zhou School of Computer Scence, Wuhan Unversty, Wuhan, 430079, Chna hguan@wtusm.edu.cn State Key Lab of Software Engneerng, Wuhan

More information

Nighttime Motion Vehicle Detection Based on MILBoost

Nighttime Motion Vehicle Detection Based on MILBoost Sensors & Transducers 204 by IFSA Publshng, S L http://wwwsensorsportalco Nghtte Moton Vehcle Detecton Based on MILBoost Zhu Shao-Png,, 2 Fan Xao-Png Departent of Inforaton Manageent, Hunan Unversty of

More information

Identifying Key Factors and Developing a New Method for Classifying Imbalanced Sentiment Data

Identifying Key Factors and Developing a New Method for Classifying Imbalanced Sentiment Data Identfyng Key Factors and Developng a New Method for Classfyng Ibalanced Sentent Data Long-Sheng Chen* and Kun-Cheng Sun Abstract Bloggers opnons related to coercal products/servces ght have a sgnfcant

More information

A New Scheduling Algorithm for Servers

A New Scheduling Algorithm for Servers A New Schedulng Algorth for Servers Nann Yao, Wenbn Yao, Shaobn Ca, and Jun N College of Coputer Scence and Technology, Harbn Engneerng Unversty, Harbn, Chna {yaonann, yaowenbn, cashaobn, nun}@hrbeu.edu.cn

More information

User Behavior Recognition based on Clustering for the Smart Home

User Behavior Recognition based on Clustering for the Smart Home 3rd WSEAS Internatonal Conference on REMOTE SENSING, Vence, Italy, Noveber 2-23, 2007 52 User Behavor Recognton based on Clusterng for the Sart Hoe WOOYONG CHUNG, JAEHUN LEE, SUKHYUN YUN, SOOHAN KIM* AND

More information

Performance Analysis of Coiflet Wavelet and Moment Invariant Feature Extraction for CT Image Classification using SVM

Performance Analysis of Coiflet Wavelet and Moment Invariant Feature Extraction for CT Image Classification using SVM Perforance Analyss of Coflet Wavelet and Moent Invarant Feature Extracton for CT Iage Classfcaton usng SVM N. T. Renukadev, Assstant Professor, Dept. of CT-UG, Kongu Engneerng College, Perundura Dr. P.

More information

Deep Classification in Large-scale Text Hierarchies

Deep Classification in Large-scale Text Hierarchies Deep Classfcaton n Large-scale Text Herarches Gu-Rong Xue Dkan Xng Qang Yang 2 Yong Yu Dept. of Computer Scence and Engneerng Shangha Jao-Tong Unversty {grxue, dkxng, yyu}@apex.sjtu.edu.cn 2 Hong Kong

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

CSCI 5417 Information Retrieval Systems Jim Martin!

CSCI 5417 Information Retrieval Systems Jim Martin! CSCI 5417 Informaton Retreval Systems Jm Martn! Lecture 11 9/29/2011 Today 9/29 Classfcaton Naïve Bayes classfcaton Ungram LM 1 Where we are... Bascs of ad hoc retreval Indexng Term weghtng/scorng Cosne

More information

Predicting Power Grid Component Outage In Response to Extreme Events. S. BAHRAMIRAD ComEd USA

Predicting Power Grid Component Outage In Response to Extreme Events. S. BAHRAMIRAD ComEd USA 1, rue d Artos, F-75008 PARIS CIGRE US Natonal Cottee http : //www.cgre.org 016 Grd of the Future Syposu Predctng Power Grd Coponent Outage In Response to Extree Events R. ESKANDARPOUR, A. KHODAEI Unversty

More information

What is Object Detection? Face Detection using AdaBoost. Detection as Classification. Principle of Boosting (Schapire 90)

What is Object Detection? Face Detection using AdaBoost. Detection as Classification. Principle of Boosting (Schapire 90) CIS 5543 Coputer Vson Object Detecton What s Object Detecton? Locate an object n an nput age Habn Lng Extensons Vola & Jones, 2004 Dalal & Trggs, 2005 one or ultple objects Object segentaton Object detecton

More information

Monte Carlo inference

Monte Carlo inference CS 3750 achne Learnng Lecture 0 onte Carlo nerence los Hauskrecht los@cs.ptt.edu 539 Sennott Square Iportance Saplng an approach or estatng the epectaton o a uncton relatve to soe dstrbuton target dstrbuton

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

A Semantic Model for Video Based Face Recognition

A Semantic Model for Video Based Face Recognition Proceedng of the IEEE Internatonal Conference on Inforaton and Autoaton Ynchuan, Chna, August 2013 A Seantc Model for Vdeo Based Face Recognton Dhong Gong, Ka Zhu, Zhfeng L, and Yu Qao Shenzhen Key Lab

More information

The Research of Support Vector Machine in Agricultural Data Classification

The Research of Support Vector Machine in Agricultural Data Classification The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou

More information

Research on action recognition method under mobile phone visual sensor Wang Wenbin 1, Chen Ketang 2, Chen Liangliang 3

Research on action recognition method under mobile phone visual sensor Wang Wenbin 1, Chen Ketang 2, Chen Liangliang 3 Internatonal Conference on Autoaton, Mechancal Control and Coputatonal Engneerng (AMCCE 05) Research on acton recognton ethod under oble phone vsual sensor Wang Wenbn, Chen Ketang, Chen Langlang 3 Qongzhou

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

A Novel Term_Class Relevance Measure for Text Categorization

A Novel Term_Class Relevance Measure for Text Categorization A Novel Term_Class Relevance Measure for Text Categorzaton D S Guru, Mahamad Suhl Department of Studes n Computer Scence, Unversty of Mysore, Mysore, Inda Abstract: In ths paper, we ntroduce a new measure

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Implementation Naïve Bayes Algorithm for Student Classification Based on Graduation Status

Implementation Naïve Bayes Algorithm for Student Classification Based on Graduation Status Internatonal Journal of Appled Busness and Informaton Systems ISSN: 2597-8993 Vol 1, No 2, September 2017, pp. 6-12 6 Implementaton Naïve Bayes Algorthm for Student Classfcaton Based on Graduaton Status

More information

Using Ambiguity Measure Feature Selection Algorithm for Support Vector Machine Classifier

Using Ambiguity Measure Feature Selection Algorithm for Support Vector Machine Classifier Usng Ambguty Measure Feature Selecton Algorthm for Support Vector Machne Classfer Saet S.R. Mengle Informaton Retreval Lab Computer Scence Department Illnos Insttute of Technology Chcago, Illnos, U.S.A

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

Adaptive Sampling with Optimal Cost for Class-Imbalance Learning

Adaptive Sampling with Optimal Cost for Class-Imbalance Learning Proceedngs of the Twenty-Nnth AAAI Conference on Artfcal Intellgence Adaptve Saplng wth Optal Cost for Class-Ibalance Learnng Yuxn Peng Insttute of Coputer Scence and Technology, Pekng Unversty, Bejng

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

A system based on a modified version of the FCM algorithm for profiling Web users from access log

A system based on a modified version of the FCM algorithm for profiling Web users from access log A syste based on a odfed verson of the FCM algorth for proflng Web users fro access log Paolo Corsn, Laura De Dosso, Beatrce Lazzern, Francesco Marcellon Dpartento d Ingegnera dell Inforazone va Dotsalv,

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

Joint Registration and Active Contour Segmentation for Object Tracking

Joint Registration and Active Contour Segmentation for Object Tracking Jont Regstraton and Actve Contour Segentaton for Object Trackng Jfeng Nng a,b, Le Zhang b,1, Meber, IEEE, Davd Zhang b, Fellow, IEEE and We Yu a a College of Inforaton Engneerng, Northwest A&F Unversty,

More information

Monte Carlo Evaluation of Classification Algorithms Based on Fisher's Linear Function in Classification of Patients With CHD

Monte Carlo Evaluation of Classification Algorithms Based on Fisher's Linear Function in Classification of Patients With CHD IOSR Journal of Matheatcs (IOSR-JM) e-issn: 2278-5728, p-issn: 2319-765X. Volue 13, Issue 1 Ver. IV (Jan. - Feb. 2017), PP 104-109 www.osrjournals.org Monte Carlo Evaluaton of Classfcaton Algorths Based

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

Hierarchical Semantic Perceptron Grid based Neural Network CAO Huai-hu, YU Zhen-wei, WANG Yin-yan Abstract Key words 1.

Hierarchical Semantic Perceptron Grid based Neural Network CAO Huai-hu, YU Zhen-wei, WANG Yin-yan Abstract Key words 1. Herarchcal Semantc Perceptron Grd based Neural CAO Hua-hu, YU Zhen-we, WANG Yn-yan (Dept. Computer of Chna Unversty of Mnng and Technology Bejng, Bejng 00083, chna) chhu@cumtb.edu.cn Abstract A herarchcal

More information

Survey of Classification Techniques in Data Mining

Survey of Classification Techniques in Data Mining Proceedngs of the Internatonal MultConference of Engneers and Coputer Scentsts 2009 Vol I Survey of Classfcaton Technques n Data Mnng Thar Nu Phyu Abstract Classfcaton s a data nng (achne learnng) technque

More information

UB at GeoCLEF Department of Geography Abstract

UB at GeoCLEF Department of Geography   Abstract UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department

More information

Selecting Query Term Alterations for Web Search by Exploiting Query Contexts

Selecting Query Term Alterations for Web Search by Exploiting Query Contexts Selectng Query Term Alteratons for Web Search by Explotng Query Contexts Guhong Cao Stephen Robertson Jan-Yun Ne Dept. of Computer Scence and Operatons Research Mcrosoft Research at Cambrdge Dept. of Computer

More information

Reliable Negative Extracting Based on knn for Learning from Positive and Unlabeled Examples

Reliable Negative Extracting Based on knn for Learning from Positive and Unlabeled Examples 94 JOURNAL OF COMPUTERS, VOL. 4, NO. 1, JANUARY 2009 Relable Negatve Extractng Based on knn for Learnng from Postve and Unlabeled Examples Bangzuo Zhang College of Computer Scence and Technology, Jln Unversty,

More information

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines (IJCSIS) Internatonal Journal of Computer Scence and Informaton Securty, Herarchcal Web Page Classfcaton Based on a Topc Model and Neghborng Pages Integraton Wongkot Srura Phayung Meesad Choochart Haruechayasak

More information

Key-Words: - Under sear Hydrothermal vent image; grey; blue chroma; OTSU; FCM

Key-Words: - Under sear Hydrothermal vent image; grey; blue chroma; OTSU; FCM A Fast and Effectve Segentaton Algorth for Undersea Hydrotheral Vent Iage FUYUAN PENG 1 QIAN XIA 1 GUOHUA XU 2 XI YU 1 LIN LUO 1 Electronc Inforaton Engneerng Departent of Huazhong Unversty of Scence and

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

A NOTE ON FUZZY CLOSURE OF A FUZZY SET

A NOTE ON FUZZY CLOSURE OF A FUZZY SET (JPMNT) Journal of Process Management New Technologes, Internatonal A NOTE ON FUZZY CLOSURE OF A FUZZY SET Bhmraj Basumatary Department of Mathematcal Scences, Bodoland Unversty, Kokrajhar, Assam, Inda,

More information

Analysis of Continuous Beams in General

Analysis of Continuous Beams in General Analyss of Contnuous Beams n General Contnuous beams consdered here are prsmatc, rgdly connected to each beam segment and supported at varous ponts along the beam. onts are selected at ponts of support,

More information

Pattern Classification of Back-Propagation Algorithm Using Exclusive Connecting Network

Pattern Classification of Back-Propagation Algorithm Using Exclusive Connecting Network World Acade of Scence, Engneerng and Technolog 36 7 Pattern Classfcaton of Bac-Propagaton Algorth Usng Eclusve Connectng Networ Insung Jung, and G-Na Wang Abstract The obectve of ths paper s to a desgn

More information

Comparing High-Order Boolean Features

Comparing High-Order Boolean Features Brgham Young Unversty BYU cholarsarchve All Faculty Publcatons 2005-07-0 Comparng Hgh-Order Boolean Features Adam Drake adam_drake@yahoo.com Dan A. Ventura ventura@cs.byu.edu Follow ths and addtonal works

More information

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan

More information

Classifying Acoustic Transient Signals Using Artificial Intelligence

Classifying Acoustic Transient Signals Using Artificial Intelligence Classfyng Acoustc Transent Sgnals Usng Artfcal Intellgence Steve Sutton, Unversty of North Carolna At Wlmngton (suttons@charter.net) Greg Huff, Unversty of North Carolna At Wlmngton (jgh7476@uncwl.edu)

More information

Issues and Empirical Results for Improving Text Classification

Issues and Empirical Results for Improving Text Classification Issues and Emprcal Results for Improvng Text Classfcaton Youngoong Ko 1 and Jungyun Seo 2 1 Dept. of Computer Engneerng, Dong-A Unversty, 840 Hadan 2-dong, Saha-gu, Busan, 604-714, Korea yko@dau.ac.kr

More information

Measuring Cohesion of Packages in Ada95

Measuring Cohesion of Packages in Ada95 Measurng Coheson of Packages n Ada95 Baowen Xu Zhenqang Chen Departent of Coputer Scence & Departent of Coputer Scence & Engneerng, Southeast Unversty Engneerng, Southeast Unversty Nanjng, Chna, 20096

More information

Aircraft Engine Gas Path Fault Diagnosis Based on Fuzzy Inference

Aircraft Engine Gas Path Fault Diagnosis Based on Fuzzy Inference 202 Internatonal Conference on Industral and Intellgent Inforaton (ICIII 202) IPCSIT vol.3 (202) (202) IACSIT Press, Sngapore Arcraft Engne Gas Path Fault Dagnoss Based on Fuzzy Inference Changzheng L,

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

2016 International Conference on Sustainable Energy, Environment and Information Engineering (SEEIE 2016) ISBN:

2016 International Conference on Sustainable Energy, Environment and Information Engineering (SEEIE 2016) ISBN: 06 Internatonal Conference on Sustanable Energy, Envronent and Inforaton Engneerng (SEEIE 06) ISBN: 978--60595-337-3 A Study on IEEE 80. MAC Layer Msbehavor under Dfferent Back-off Algorths Trong Mnh HOANG,,

More information

Investigating the Performance of Naïve- Bayes Classifiers and K- Nearest Neighbor Classifiers

Investigating the Performance of Naïve- Bayes Classifiers and K- Nearest Neighbor Classifiers Journal of Convergence Informaton Technology Volume 5, Number 2, Aprl 2010 Investgatng the Performance of Naïve- Bayes Classfers and K- Nearest Neghbor Classfers Mohammed J. Islam *, Q. M. Jonathan Wu,

More information

A Theory of Non-Deterministic Networks

A Theory of Non-Deterministic Networks A Theory of Non-Deternstc Networs Alan Mshcheno and Robert K rayton Departent of EECS, Unversty of Calforna at ereley {alan, brayton}@eecsbereleyedu Abstract oth non-deterns and ult-level networs copactly

More information

ENSEMBLE learning has been widely used in data and

ENSEMBLE learning has been widely used in data and IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 9, NO. 5, SEPTEMBER 2012 943 Sparse Kernel-Based Hyperspectral Anoaly Detecton Prudhv Gurra, Meber, IEEE, Heesung Kwon, Senor Meber, IEEE, andtothyhan Abstract

More information

y and the total sum of

y and the total sum of Lnear regresson Testng for non-lnearty In analytcal chemstry, lnear regresson s commonly used n the constructon of calbraton functons requred for analytcal technques such as gas chromatography, atomc absorpton

More information

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET 1 BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET TZU-CHENG CHUANG School of Electrcal and Computer Engneerng, Purdue Unversty, West Lafayette, Indana 47907 SAUL B. GELFAND School

More information

An Anti-Noise Text Categorization Method based on Support Vector Machines *

An Anti-Noise Text Categorization Method based on Support Vector Machines * An Ant-Nose Text ategorzaton Method based on Support Vector Machnes * hen Ln, Huang Je and Gong Zheng-Hu School of omputer Scence, Natonal Unversty of Defense Technology, hangsha, 410073, hna chenln@nudt.edu.cn,

More information

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach Angle Estmaton and Correcton of Hand Wrtten, Textual and Large areas of Non-Textual Document Images: A Novel Approach D.R.Ramesh Babu Pyush M Kumat Mahesh D Dhannawat PES Insttute of Technology Research

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

An Efficient Fault-Tolerant Multi-Bus Data Scheduling Algorithm Based on Replication and Deallocation

An Efficient Fault-Tolerant Multi-Bus Data Scheduling Algorithm Based on Replication and Deallocation BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volue 16, No Sofa 016 Prnt ISSN: 1311-970; Onlne ISSN: 1314-4081 DOI: 10.1515/cat-016-001 An Effcent Fault-Tolerant Mult-Bus Data

More information

Identifying Table Boundaries in Digital Documents via Sparse Line Detection

Identifying Table Boundaries in Digital Documents via Sparse Line Detection Identfyng Table Boundares n Dgtal Documents va Sparse Lne Detecton Yng Lu, Prasenjt Mtra, C. Lee Gles College of Informaton Scences and Technology The Pennsylvana State Unversty Unversty Park, PA, USA,

More information

IMAGE REPRESENTATION USING EPANECHNIKOV DENSITY FEATURE POINTS ESTIMATOR

IMAGE REPRESENTATION USING EPANECHNIKOV DENSITY FEATURE POINTS ESTIMATOR Sgnal & Iage Processng : An Internatonal Journal (SIPIJ) Vol.4, No., February 03 IMAGE REPRESENTATION USING EPANECHNIKOV DENSITY FEATURE POINTS ESTIMATOR Tranos Zuva, Kenelwe Zuva 3, Sunday O. Ojo, Selean

More information

Keyword Spotting Based on Phoneme Confusion Matrix

Keyword Spotting Based on Phoneme Confusion Matrix Keyword Spottng Based on Phonee Confuson Matrx Pengyuan Zhang, Jan Shao, Jang Han, Zhaoje Lu, Yonghong Yan ThnkIT Speech Lab, Insttute of Acoustcs, Chnese Acadey of Scences Bejng 00080 {pzhang, jshao,

More information

Impact of a New Attribute Extraction Algorithm on Web Page Classification

Impact of a New Attribute Extraction Algorithm on Web Page Classification Impact of a New Attrbute Extracton Algorthm on Web Page Classfcaton Gösel Brc, Banu Dr, Yldz Techncal Unversty, Computer Engneerng Department Abstract Ths paper ntroduces a new algorthm for dmensonalty

More information

Arabic Text Classification Using N-Gram Frequency Statistics A Comparative Study

Arabic Text Classification Using N-Gram Frequency Statistics A Comparative Study Arabc Text Classfcaton Usng N-Gram Frequency Statstcs A Comparatve Study Lala Khresat Dept. of Computer Scence, Math and Physcs Farlegh Dcknson Unversty 285 Madson Ave, Madson NJ 07940 Khresat@fdu.edu

More information

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following. Complex Numbers The last topc n ths secton s not really related to most of what we ve done n ths chapter, although t s somewhat related to the radcals secton as we wll see. We also won t need the materal

More information

Empirical Distributions of Parameter Estimates. in Binary Logistic Regression Using Bootstrap

Empirical Distributions of Parameter Estimates. in Binary Logistic Regression Using Bootstrap Int. Journal of Math. Analyss, Vol. 8, 4, no. 5, 7-7 HIKARI Ltd, www.m-hkar.com http://dx.do.org/.988/jma.4.494 Emprcal Dstrbutons of Parameter Estmates n Bnary Logstc Regresson Usng Bootstrap Anwar Ftranto*

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

A Weighted Method to Improve the Centroid-based Classifier

A Weighted Method to Improve the Centroid-based Classifier 016 Internatonal onference on Electrcal Engneerng and utomaton (IEE 016) ISN: 978-1-60595-407-3 Weghted ethod to Improve the entrod-based lassfer huan LIU, Wen-yong WNG *, Guang-hu TU, Nan-nan LIU and

More information

Journal of Chemical and Pharmaceutical Research, 2014, 6(6): Research Article. A selective ensemble classification method on microarray data

Journal of Chemical and Pharmaceutical Research, 2014, 6(6): Research Article. A selective ensemble classification method on microarray data Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(6):2860-2866 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A selectve ensemble classfcaton method on mcroarray

More information

An Image Fusion Approach Based on Segmentation Region

An Image Fusion Approach Based on Segmentation Region Rong Wang, L-Qun Gao, Shu Yang, Yu-Hua Cha, and Yan-Chun Lu An Image Fuson Approach Based On Segmentaton Regon An Image Fuson Approach Based on Segmentaton Regon Rong Wang, L-Qun Gao, Shu Yang 3, Yu-Hua

More information

Multimodal Biometric System Using Face-Iris Fusion Feature

Multimodal Biometric System Using Face-Iris Fusion Feature JOURNAL OF COMPUERS, VOL. 6, NO. 5, MAY 2011 931 Multodal Boetrc Syste Usng Face-Irs Fuson Feature Zhfang Wang, Erfu Wang, Shuangshuang Wang and Qun Dng Key Laboratory of Electroncs Engneerng, College

More information

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION SHI-LIANG SUN, HONG-LEI SHI Department of Computer Scence and Technology, East Chna Normal Unversty 500 Dongchuan Road, Shangha 200241, P. R. Chna E-MAIL: slsun@cs.ecnu.edu.cn,

More information