Optimally Combining Positive and Negative Features for Text Categorization

Size: px

Start display at page:

Download "Optimally Combining Positive and Negative Features for Text Categorization"

Antonia Dawson
5 years ago
Views:

1 Optally Cobnng Postve and Negatve Features for Text Categorzaton Zhaohu Zheng Rohn Srhar CEDAR, Dept. of Coputer Scence and Engneerng, State Unversty of New York at Buffalo, NY 460 USA Abstract Ths paper presents a novel local feature selecton approach for text categorzaton. It constructs a feature set for each category by frst selectng a set of ters hghly ndcatve of ebershp as well as another set of ters hghly ndcatve of non-ebershp, then unfyng the two sets. The sze rato of the two sets was eprcally chosen to obtan optal perforance. Ths s n contrast wth the standard local feature selecton approaches that ether () only select the ters ost ndcatve of ebershp; or () plctly but not optally cobne the ters ost ndcatve of ebershp wth non-ebershp. The experental coparson between the proposed approach and standard approaches was conducted on four feature selecton etrcs: chsquare, correlaton coeffcen odds rato, and GSS coeffcent. The results show that the proposed approach proves text categorzaton perforance.. Introducton Text categorzaton s a achne learnng task, defned as autoatcally assgnng predefned category labels to new free text docuents. A growng nuber of statstcal achne learnng technques have been appled to text categorzaton n recent years, notable aong whch are fve approaches: nearest neghbor classfer (Creey et al., 99; Yang, 994), Bayesan classfer (Tzeras & Hartan, 993; Lews & Rnguette, 994), decson tree (Apte, Daerau, & Wess, 994), neural networks (wener, Pederson & Wegend, 995; Ng, Goh & Low, 997), and support vector achnes (Joachs, 998). One ajor dffculty n text categorzaton probles s the hgh densonalty of the nput feature space typcal for textual data. Ths s because each dstnct ter or token appearng n the docuent collecton represents one denson n the feature space. For a typcal docuent collecton, there are tens of thousands or even hundreds of thousands of dstnct ters or tokens. After the elnaton of stop words and steng, the set of features s stll too large for any learnng algorths, e.g. neural networks. In order to prove scalablty of text categorzaton, we need to apply feature selecton technques to reduce the feature sze further ore. Varous feature selecton ethods have been proposed n the lterature and ther relatve erts have been tested by experentally evaluatng the text categorzaton perforance. There are two dstnct ways of vewng feature selecton, dependng on whether the task s perfored locally or globally: () local feature selecton. For each category, a set of ters s chosen for classfcaton based on the relevant and rrelevant docuents n ths category. () global feature selecton. A set of ters s chosen for the classfcaton under all categores based on the relevant docuents n the categores. The local feature selecton for each category can be vewed as the global feature selecton for two "categores": relevant and rrelevant. Local feature selecton s of nterest n ths paper. Several feature selecton easures have been explored n the lterature ncludng Docuent Frequency (DF), Inforaton Gan (IG), Mutual Inforaton (MI), Chsquare (CHI), Correlaton Coeffcent (CC), Odds rato (OR) and GSS coeffcent (GSS) (Galavot Sebastan, & S, 000; Mtchell, 996; Mladen, 998; Ng, Goh & Low, 997; Qunlan, 986; Rjsbergen, 979; Sebastan, 00; Schutze, Hull & Pederson, 995; Yang & Pedersen, 997). Out of the seven easures, CHI, CC, OR and GSS see to be the ost effectve based on the experents reported so far. We wll focus on the four easures. Ths paper presents a novel local feature selecton ethod that explctly selects and cobnes the features hghly ndcatve of ebershp and non-ebershp for each category n a way such that the optal perforance, e.g. F easure, wll be obtaned on a valdaton set. The features ndcatve of ebershp and non-ebershp are also referred to as the postve and negatve features respectvely. The presence of postve and negatve features n a docuent ndcates ts relevance and nonrelevance respectvely. The rest of the paper s organzed as follows. Secton Workshop on Learnng fro Ibalanced Datasets II, ICML, Washngton DC, 003.

2 descrbes the four feature selecton easures and standard ethods of usng the. Secton 3 presents the proposed feature selecton technque. In Secton 4, we descrbe naïve Bayes classfer whose perforance wll be used to evaluate the effectveness of varous feature selecton ethods. Experental results are analyzed n Secton 5. Conclusons are gven n Secton 6.. Related Work In ths secton, we wll frst brefly revew the four feature selecton easures, then present the ethods of usng the n the lterature, and fnally descrbe the balanced data proble and ts pacts on feature selecton. Note tha the ethods here refer to the schee of applyng feature selecton easures to ter selecton.. Feature Selecton Measures In what follows, A, B, C, and D wll denote the nubers of tes a ter t and a category c co-occur, t occurs wthout c, c occurs wthout and nether c nor t occurs, respectvely. N represents the total nuber of docuents... CHI-SQUARE (CHI) CHI easures the lack of ndependence between a ter t and a category c and can be copared to the ch-square dstrbuton wth one degree of freedo to judge extreeness (Yang, 999; Schutze, Hull & Pederson, 995). It s defned as: N[ c ) c ) c ) c )] χ ( c ) = t) t) N( AD CB) ( A+ C) ( B+ D) ( A+ B) ( C+ D) χ has a natural value of zero f t and c are ndependent. It s a noralzed value, and hence s coparable across ters for the sae category... CORRELATION COEFFICIENT (CC) Correlaton coeffcent CC ( c ) of a word t wth a category c was defned by Ng et al. as (Ng, Goh & Low, 997; Sebastan, 00): CC( = N[ ] t) t) N( AD CB) ( A+ C) ( B+ D) ( A+ B) ( C+ D) It s a varant of the CHI etrc, where CC = χ. CC can be vewed as a "one-sded" ch-square etrc. The postve values correspond to features ndcatve of ebershp, whle negatve values ndcate nonebershp. The greater (saller) the postve (negatve) values are, the stronger the ters wll be to ndcate the ebershp (non-ebershp). Standard CC based local feature selecton ethod selects the ters wth axu CC value as features. The ratonale behnd s that ters cong fro the rrelevant texts of a category are consdered useless. On the other hand, CHI s nonnegatve, whose values ndcate the ebershp or nonebershps of a ter to one category. Accordngly the abguous features wll be ranked lower. In contrast wth CC, CHI consders the ters cong fro both the relevant and non-relevant texts...3 ODDS RATIO (OR) Odds rato was proposed orgnally by van Rjsbergen et al. (979) for selectng ters for relevance feedback. The basc dea s that the dstrbuton of features on the relevant docuents s dfferent fro the dstrbuton of features on the non-relevant docuents. It has been used by Mladenc (998) for selectng ters n text categorzaton. It s defned as follows: t c ) [ t c )] ORt (, c ) = [ t ] t AD CB N AD CB The values greater than correspond to features ndcatve of ebershp, whle the values less than correspond to features ndcatve of non-ebershp. It only consders the ters fro relevant text. The Expected Lkelhood Estate (ELE) soothng technque was used n ths paper to handle sngulartes: A+ 0.5 B+ 0.5 ( ) ( 0.5) ( 0.5) (, ) A+ D+ ORt c A C B D = A+ 0.5 B+ 0.5 ( ) ( C+ 0.5) ( B+ 0.5) A+ C+ B+ D+..4 GSS COEFFICIENT GSS Coeffcent s another splfed varant of the GSS = ( χ statstcs proposed by Galavott et al. (000), whch s defned as: Slar to CC, the postve values correspond to features ndcatve of ebershp, whle negatve values ndcate

3 non-ebershp. Therefore, only the postve ters are consdered.. Feature Selecton Methods Each of the above four easures s actually a functon f ( c ) wth a ter t and a category c as ts paraeters. The value ndcates soe relatonshp between the ter and the category. In global feature selecton, we assess the value of a ter n a global, or category-ndependen sense. Ether the average or the axu of ther category-specfc values are usually coputed and copared (Yang & Pederson, 997). That s, f f avg ax ( t) = = Gven a vocabulary V and a functon f that aps ters to real values, we defne two subsets of V of sze l, vz., Max[ V, f, l] V and Mn[ V, f, l] V as follows: In other words, Max [ V, f, l] and Mn [ V, f, l] conssts of the l ters t j V wth the hghest and lowest f ( t j ) values respectvely. Then, n global feature selecton, the feature set F wll be Max [ V, f ax, l ] or Max[ V, f avg, l], where l s the sze of F. f ( ( t) = ax{ f ( } = x Mn[ V, f, l], y V Mn[ V, f, l], f ( x) f ( y), x Max[ V, f, l], y V Max[ V, f, l], f ( x) f ( y) In local feature selecton, a feature set s constructed for each category. Accordngly, the feature set F for c wll be Max[ V, f (,, l], where f can be any feature selecton easure that uses two way contngency table of a ter t and a category c. Local feature selecton ethods usng asyetrc easures, e.g. CC, OR and GSS, actually pck out the ters ost ndcatve of ebershp. They wll never consder negatve features unless all the postve features have already been selected. On the other hand, local feature selecton ethods usng syetrc easures, e.g. CHI, plctly cobne the ters ost ndcatve of ebershp and nonebershp. The sze rato between the postve and negatve features s nternally decded by thresholdng on the sze of feature set..3 Ibalanced Data Proble When tranng a bnary text classfer (text flterng syste) for a category, we use all the docuents n the tranng corpus that belong to that category as relevant tranng data and all the docuents n the tranng corpus that belong to all the other categores as non-relevant tranng data. It s often the case that there s an overwhelng nuber of non-relevant tranng docuents especally when there s a large collecton of categores wth each assgned to a sall nuber of docuents. Many approaches have been eployed to address the balanced data proble. The concepts of "query zone" and "category zone" were ntroduced to select a subset of the non-relevant docuents as the nonrelevant tranng data (Hears et al., 996; Ruz & Srnvasan, 999). These docuents are the ost relevant non-relevant docuents. Essentally, these ethods try to obtan ore balanced relevant and non-relevant tranng docuents. In ths paper, we consder ths proble fro a dfferent perspectve. Instead of balancng the tranng data, our ethod balances the postve and negatve features, e.g., generates the optal cobnaton of postve and negatve features accordng to the balanced data. The pacts of balanced data proble on the standard local feature selecton for text flterng can be llustrated as follows: () For the ethods usng the postve features only (e.g. CC, OR, or GSS), the non-relevant docuents are subject to sclassfcaton. It wll be even worse for the balanced data proble, where nonrelevant docuents donate. How to confdently reject the non-relevant docuents s very portant n that case. () When applyng to the balanced data the ethods plctly cobnng the postve and negatve features, e.g. CHI, the postve features usually have uch hgher values than the negatve features accordng to ts defnton and our prevous experents. Therefore, the postve features wll donate n the feature set. The slar stuaton occurs as descrbed n (). For exaple, the upper lt CHI value of a postve or negatve feature s N ( A + C) ( B + D). For the postve feature, t ( A + B) ( C + D) represents the case that the feature appears n every relevant docuen but never n any non-relevant docuent. For the negatve feature, t eans that the feature appears n every non-relevant docuen but never n any relevant docuent. Due to the large aount and dversty of the non-relevant docuents n balanced data se t s uch ore dffcult for a negatve feature to acheve that

4 axu than a postve feature. Ths extree exaple shed lght on why the CHI values of postve features are usually uch larger than that of negatve features. It wll be napproprate of standard local feature selecton usng CHI to sply copare ther CHI values wthout consderng whether they are postve or negatve. 3. Cobnng Postve and Negatve Features We beleve that: () The negatve features are also useful and should be ncluded n the feature set. Snce the local feature selecton can be vewed globally by consderng relevant and non-relevant as two "categores", the negatve features are actually fro the "non-relevant" category. In such a b-category proble, ntutvely thnkng, ters fro both of the should be consdered. The presence of negatve features n a docuent s a good ndcator of ts non-ebershp. Thus the text flterng perforance can be proved through confdent rejecton of non-relevant docuents. () Iplct cobnaton of postve and negatve features s not necessarly optal especally for balanced data se n whch the values of postve features are usually uch larger than negatve features. CHI ght only select the postve features (equvalent to standard CC based approach n ths case) when the sze of feature set s sall. Thus the sze rato of the postve and negatve features should be explctly set and eprcally tuned to dfferent scenaro: data collecton, text classfer, etc. Based on the above two observatons, we propose a new feature selecton approach contanng the followng three steps: For each category c : Step : generate a postve-feature set + F Max V, f (,, l ], l, 0 < l l, s a nature nuber. [ Step : generate a negatve-feature set F as Max[ V, f (, c ), l ], l = l l s a non-negatve nteger. Step 3: + = F F F. Where: l, l << V, s the predefned sze of feature set. l / l, 0 < l / l, s the key paraeter and should be chosen to optze the categorzaton perforance on a as valdaton set. When l = l, e.g. l = 0, the ethod corresponds to the standard local feature selecton ethod. So, the standard ethod can be vewed as one partcular case of our ethod. In Step, we ntend to pck out those ters ost ndcatve of ebershp of c, whle n step, those ters ost ndcatve of non-ebershp are selected as well. The feature set wll be the unon of the two. Accordngly, the functon f should satsfy: the ( larger the functon value s, the ore lkely the ter belongs to the category c. Obvously, CC, OR, and GSS can serve as such functons, whle CHI can not. The reasons why we present CHI n ths paper are as follows: () CHI has been proved to be an effectve and robust feature selecton easure n the lterature. In order to ake our experents coparable to others, we use t as our baselne. () CHI s very related n concept to CC based approaches usng ether the standard ethod or our proposed ethod, as wll be shown later. Based on the defnton of the three easures, we can easly obtan: Accordngly, Step can be rewrtten as: Step : Mn V, f (,, l ]. [ CC( = CC(, OR( c ) =, OR( GSS( = GSS(. generate a negatve-feature set F as Copared wth the standard ethods that only consder the ters ndcatve of ebershp, e.g., CC, OR and GSS, we add the step, whch add to the feature set those ters ndcatve of non-ebershp. The advantage of our approach over the standard one can be llustrated by the followng sple exaple: gven a lst of ters t,, t8 whose CC values are 9, 8.5, 8., 8,, -, -5.8, and -5.9 respectvely. If the sze of feature set s 6, t through t6 wll be selected. Suppose a new docuent contanng t5, t7 and t8 coes n; the syste wll assgn t as relevant although t s rrelevant. On the other hand, the proposed approach wll be ore lkely to choose t7 and t8 nstead of t5 and t6 and hence classfy the new docuent correctly. When applyng our ethod to CC, the resulted approach sees very slar to the standard CHI based approach:

5 () Both of the consder not only the ters ndcatve of ebershp but non-ebershp also. The proposed ethod usng CC explctly cobnes the whle standard CHI plctly consders the. () Because CHI value s equal to the squared CC value, aong those ters wth postve/negatve CC value ndcatve of ebershp/non-ebershp, the greater/saller the value s, the ore lkely t wll be selected as features by both ethods. However, the ajor dfferences between two approaches are: () CHI does not dfferentate between the ters ndcatve of ebershp and non-ebershp by coparng the squared values. Although t ght consder n concept both postve and negatve features, the sze rato between the s not optal. There are no extra paraeters to optze that rato. In contras due to ts desgn, our approach can optze the sze rato to get best perforance. Let us refer to the above exaple. If we apply CHI to select four features, t through t4 wll be selected, each of whch s fro relevant docuent set. When the sae new docuent coes n, the syste can hardly tell whether t s relevant or not. () Because the postve exaples are far fewer than the negatve exaples n the tranng corpus, CHI actually favors the postve features accordng to ts defnton. In other words, the CHI values are not coparable between the postve and the negatve features. Usually the values of postve features are uch larger than negatve features as descrbed n Secton.3. The proposed approach, however, allows the szes of the feature set to be as sall as needed whle guaranteeng that the syste uses both postve and negatve features n an optal way. 4. Naïve Bayes Classfer for Text Flterng Naïve Bayes classfer s a hghly practcal Bayesan learnng ethod [6]. The central dea s to use the jont probabltes of ters and categores to estate the probabltes of categores gven a docuent. The naïve part of such a odel s the splfyng assupton that the words are condtonally ndependent gven the category as well as the probablty of word occurrence s ndependent of poston wthn the text. For text flterng, the relevance score between a new docuent d and the category c can be calculated as: log c) + Score( d, c) = log c) + ω ω log ω c) log ω c) where: ω s the feature appearng n the docuent d ; c) and P (c) represent the pror probabltes of relevant and non-relevant respectvely; ω c) and ω c) represent the lkelhood probabltes of ω appearng n relevant and nonrelevant tranng docuents respectvely. A bnary decson (relevant or non-relevant) on d wth respect to the category c s obtaned by thresholdng on Score ( d, c). We tran one naïve Bayes classfer per category. A relevance score threshold s learned per category to eprcally optze F easure on the valdaton set. 5. Experental Results and Analyss 5. Experental Settng To ake our results coparable to others, we have used the Reuters-578 corpus (Yang, 999; Yang & Pederson, 997), as t s a wdely used benchark n text categorzaton doan. For ths paper, we use the ApteMod verson of Reuters-578 as descrbed by Yang (999). Fnally we obtan 90 categores n both the tranng and test sets, a tranng set of 7,769 docuents, and a test set of 3,09 docuents. The average nuber of categores per docuent s.3. The nuber of postve nstances per category ranges fro a nu of to axu of,877 n the tranng set. In order to autoatcally learn the category specfc paraeters, e.g. sze rato n feature selecton and thresholds n classfcaton, we use two thrds of the tranng set for tranng and the reanng one thrd as "valdaton". After obtanng these thresholds, the classfers wll be retraned on the whole tranng set. Classfcaton effectveness has been evaluated n ters of the standard precson, recall and F easure. The precson, recall and F for each category defned as: where: R α α PR =, P =, F =, β γ P + R c are α s the nuber of docuents correctly assgned by syste to category c, and β s the nuber of docuents assgned by syste to category c, and γ s the nuber of docuents fro category ( =,, ). c

6 These category-relatve values ay n turn be averaged accordng to two alternatve ways: () acro-averagng: the precsons and recalls can be coputed for the bnary decsons on each ndvdual category frst and then be averaged over categores. That s, acrof = = F, () cro-averagng: the precsons and recalls are coputed globally over all the n x bnary decsons where n s the nuber of total test docuents, and s the nuber of categores. That s, cror = = = α, β crop = = = cro-averagng F has been wdely used n cross-ethod coparsons. In ths paper, we wll focus on ths easure. Accordngly, the sze rato between the postve and negatve feature sets wll be optzed to get best cro-averaged F easure on the valdaton set. In order to copare our proposed feature selecton approach wth the standard one, we apply the to naïve Bayes classfers. Three groups of feature selecton ethods are consdered: Group : Standard CHI, Standard CC, and proved CC. The three ethods are referred as G, G, and G3 respectvely for short for. Group : Standard OR and proved OR, referred as G and G. Group 3: Standard GSS coeffcent and proved GSS coeffcen referred as G3 and G3. where: standard CHI, CC, OR and GSS represents the standard local feature selecton ethods usng CHI, CC, OR and GSS easures respectvely, whle the proved CC, OR and GSS are the applcaton of the proposed feature selecton ethod to CC, OR and GSS easures respectvely. Note tha there s no "proved CHI" ethod because CHI easure does not satsfy the requreent as entoned n Secton 3. However, due to ts slarty wth CC, we put standard CHI n the group of standard CC and proved CC. The feature selecton P R = = acrop =, acror =, and cror crop crof = cror + crop α, and γ ethods are copared wth each other n the sae group. Typcal sze of a local feature set s between 0 and 50 (Sebastan 00). In ths paper the perforances are reported at the range of Experental Results Table lsts the cro-averaged F values for naïve Bayes classfers wth the seven dfferent feature selecton ethods (as lsted n the frst row) at dfferent szes of feature set (as lsted n the frst colun). F G G G3 G G G3 G Table : Mcro-averaged F values for naïve Bayes classfers wth the seven feature selecton ethods at dfferent szes of features. As s shown n Table, the proved Correlaton Coeffcent ethod (G3) s uch better than the standard CC (G) and CHI (G) ethod, and the proved Odds rato (G) and GSS Coeffcent ethods (G3) greatly outperfor the correspondng standard ethods (G and G3 respectvely). Ths confrs our ntuton that by optally cobnng postve features wth negatve features, the text categorzaton perforance wll be rearkably proved. Table lsts the cro-averaged precson, recall of each ethod when the cro-averaged F s axu over the dfferent szes of features. For exaple, G acheve ts axu cro-averaged F (.784) as the sze of feature set s 50 accordng to the frst two coluns of Table. The second row n Table gave the correspondng cro-averaged precson and recall as well. Fro Table, we can see our proposed approach greatly ncreases the cro-averaged recall and F wthout hurtng precson too uch. Because we optze F easure for each category, the ore balanced croaveraged precson and recall are obtaned. It also explans why the cro-averaged precson reans unproved. In order to llustrate the rato of negatve features n the feature se we lst n Table 3 the nuber of categores, n whch the nuber of postve features s greater than, saller than or equal to the nuber of negatve features n case of proved CC (feature sze = 50). The three cases

7 Method crop McroR crof G G G G G G G Table : Mcro-averaged precson, recall and F values for naïve Bayes classfers wth the seven feature selecton ethods. l / l correspond to > 0.5, < 0.5 and = 0.5 respectvely n the frst colun of Table 3. Table 3 shows that n order to obtan best text categorzaton perforance n ters of F, we should select ore negatve features than postve features n 47 out of the 90 categores. It reconfrs the usefulness of negatve features. Our explanaton s: when the negatve exaples are overwhelng, rejecton of the negatve exaples wth hgh confdence (accuracy) wll be of ore portance, whch could be acheved by ncreasng the nuber of the negatve features. 6. Conclusons l / l Nuber of categores > < = Table 3: The nuber of categores n whch the sze of postve set s greater than, saller than or equal to the negatve set n the case that the proved CC obtan best perforance (feature sze s 50.) Experents wth four known feature selecton easures and ethods and a new feature selecton ethod have been descrbed. We proposed an effectve feature selecton ethod that optally cobnes the ters ost ndcatve of ebershp and non-ebershp. The an conclusons are: The ters ndcatve of non-ebershp are useful and should be consdered n local feature selecton. By explctly and optally settng the sze rato of the postve and negatve features, the text categorzaton perforance was proved greatly. References C. Apte, F. Daerau, & S. Wess (994). Towards language ndependent autoated learnng of text categorzaton odels. In Proceedng of the 7 th Annual ACM/SIGIR conference. R.H. Creecy, et al. (99). Tradng ps and eeory for knowledge engneerng: classfyng census returns on the connecton achne. Co. ACM, 35: Galavot L., Sebastan, F., & S, M. (000). Experents on the use of feature selecton and negatve evdence n autoated text categorzaton. In Proceedngs of ECDL-00, 4 th European Conference on Research and Advanced Technology for Dgtal Lbrares (Lsbon, Portugal, 000), Mart Hears et al. (996). Xerox TREC4 ste report. In Proceedngs of the Fourth Text Retreval Conference TREC-4. Thorsten Joachs (998). Text categorzaton wth Support Vector Machnes: Learnng wth Many Relevant Features. In European Conference on Machne Learnng (ECML), pages 37-4, Berln, Sprnger. D.D. Lews & M. Rnguette (994). Coparson of two learnng algorths for text categorzaton. In Proceedngs of the Thrd Annual Syposu on Docuent Analyss and Inforaton Retreval (SIGIR 94). To Mtchell (996). Machne Learnng. McGraw Hll. Mladen D. [998]. Machne Learng on nonhoogeneous, dstrbuted text data. PhD Dssertaton, Unversty of Ljubljana, Slovena, 998. H.T. Ng, W.B. Goh, & K.L. Low (997). Feature selecton, perceptron learnng, and a usablty case study for text categorzaton. In 0 th Ann Int ACM SIGIR Conference on Research and Developent n Inforaton Retreval (SIGIR 97), pages J.R. Qunlan (986). Inducton of decson trees. Machne Learnng, ():8-06. Van Rjsbergen CJ (979). Inforaton Retreval. Butterworths, London, nd edton. Ruz, M.E. & Srnvasan, P. (999). Herarchcal neural networks for text categorzaton. In Proceedngs of SIGIR-99, nd ACM Internatonal Conference on Research and Developent n Inforaton Retreval, 8-8. Sebastan F. (00). Machne Learnng n Autoated Text Categorzaton. ACM Coputng Surveys, Vol 34, No., pp. -47.

8 Schutze H, Hull D.A. & Pederson J.O. (995). A coparson of classfers and docuent representatons for the routng proble. In Proceedngs of the 8 th Annual Internatonal ACM SIGIR Conference on Research and Developent n Inforaton Retreval, Seattle, WA, pp K. Tzeras & S. Hartan (993). Autoatc ndexng based on bayesan nference networks. In Proc 6 th Int ACM SIGIR Conference on Research and Developent n Inforaton Retreval (SIGIR 93), pages -34. A.S. Wegend, E.D. Wener, & J.O. Pederson (999). Explotng herarchy n text categorzaton. Inforaton Retreval, (-):6990. E. Wener, J.O. Pederson & A.S. Wegend (995). A neural network approach to topc spottng. In Proceedngs of the Fourth Annual Syposu on Docuent Analyss and Inforaton Retreval (SDAIR 95), pages 37-33, Nevada, Las Vegas. Unversty of Nevada, Las Vegas. Y. Yang (994). Expert network: Effectve and effcent learnng fro huan decsons n text categorzaton and retreval. In 7th Ann Int ACM SIGIR Conference on Research and Developent n Inforaton Retreval (SIGIR 94), pages 3-. Y. Yang (999). An evaluaton of statstcal approaches to text categorzaton. Journal of Inforaton Retreval, (/): Y. Yang and J.P. Pedersen (997). A coparatve study on feature selecton n text categorzaton. In Jr. D. H. Fsher, edtor, The Fourteenth Internatonal Conference on Machne Learnng, pp Morgan Kaufann.

A Balanced Ensemble Approach to Weighting Classifiers for Text Classification

A Balanced Ensemble Approach to Weighting Classifiers for Text Classification A Balanced Enseble Approach to Weghtng Classfers for Text Classfcaton Gabrel Pu Cheong Fung 1, Jeffrey Xu Yu 1, Haxun Wang 2, Davd W. Cheung 3, Huan Lu 4 1 The Chnese Unversty of Hong Kong, Hong Kong,