Modeling Hierarchical User Interests Based on HowNet and Concept Mapping

Size: px

Start display at page:

Download "Modeling Hierarchical User Interests Based on HowNet and Concept Mapping"

Silas Reginald Fox
5 years ago
Views:

1 Modelng Herarchcal User Interests Based on HowNet and Concept Mappng Yhong L #1, Fang L #2 # Dept. of Computer Scence & Engneerng, Shangha Jao Tong Unversty No.800 Dong Chuan Rd. Shangha , P.R. Chna 1 flyran@sjtu.edu.cn 2 fl@sjtu.edu.cn Abstract Modelng user nterests plays an mportant role n personalzed servce on the Internet. Many systems use classfcaton methods to construct the user profle to represent user nterests. However, t s dffcult to cover all ndvdual nterests and capture the changes of user profles. Ths paper ntroduces a method wth two steps to solve the problem. In the frst step t clusters the vewed web pages to extract ndvdual nterests by extendng the vector of VSM wth sememes from the HowNet. The second step s to map each nterest cluster to the reference taxonomy to model herarchcal user nterests based on smlarty calculaton of keywords and the extended vector. Experments show that the method can model herarchcal user nterests wth a promsng result. When a new nterest emerges, t does not need any adjustment lke collectng new tranng data or rebuldng the classfer. It can capture the dversfed user nterests and map to an nterests taxonomy. I. INTRODUCTION It s known that Informaton on the Internet grows exponentally. Users always feel that they are drownng n a sea of nformaton. How to provde useful nformaton for dfferent ndvdual user has become a crtcal problem. User nterests are very mportant to personalzed search, collaboratve flterng and recommendaton systems. How to model user nterests s a key step for provdng personalzed servce. The common representatons of user nterests are keyword profle and concept profle [1]. Keyword profles are represented as sets of weghted words extracted from the web pages that a user browsed. Each set of words represents dversfed user s nterests. However, many sets of keywords have some relatonshps. The flat structure lke keywords profle can not model user nterests precsely and correctly. Herarchcal concept profles can provde a more standard structural way to model user nterests. Node of the concept profle can be consdered as an abstract topc that user may be nterested n, rather than set of words. The concept herarchy descrbes a standard way to model user nterests and ther relatonshps. It s close to the human concepton. There are two methods to model herarchcal user nterests on the Internet: herarchcal classfcaton and clusterng. Classfcaton method needs to tran a classfer to represent all ndvdual user nterests. Herarchcal clusterng methods can acheve ndvdual user nterests. However, t s dffcult to provde a standard personalzed servce. Based on our pror research [2], how to capture ndvdual nterests and model herarchcal user nterests wth concept profles s our research am. Our nterest profle constructon ncludes two steps: clusterng the web pages and mappng each cluster to a concept of the reference taxonomy. Due to the complexty of natural languages, deep content processng s mpossble wthout semantc nformaton. There are synonym, polysemy and other language problems, whch challenge n nformaton processng. Therefore, semantc knowledge platform such as Knowledge Grd [3] plays an mportant role n semantc processng. In our soluton, we attempt to use the Hownet [4], a Chnese knowledge base, to capture the semantc nformaton n the tradtonal vector space model n order to acqure user nterests. By smlarty calculaton of keywords, user nterests generated can be mapped to the reference user nterest ontology. The rest of the paper s organzed as follows. In Secton II we dscuss about the cluster constructon usng clusterng algorthm based on HowNet. Then we talk about our concept mappng method n Secton III. Our experment s shown n Secton IV. The related work s n Secton V and a concluson n Secton VI. II. CLUSTER CONSTRUCTION The content of browsed web page s a good ndcator to user nterests. Indvdual user nterests can be found based on the result of clusterng web pages. The cluster constructon conssts of three steps. The frst step s to extend vector space model based on HowNet to represent the semantc content of web pages. The second step s page clusterng. The thrd step s cluster refnement. A. Web Page Representaton based on HowNet The content of a web page can be represented as a vector n the Vector Space Model (VSM). In order to solve the sparseness of the VSM vector, the HowNet sememes are used to extend the vector. Each lexcal tem of HowNet s expressed wth a set of sememes. Sememes are the atomc semantc elements. For example: A Chnese word 农田 (means farmland) n the HowNet can be descrbed as:

2 DEF= land 栽植,#crop 庄稼,agrcultural 农 There are four sememes related to the word 农田 : land 陆地, plantng 栽植, crop 庄稼 and agrcultural 农. The symbol separated the Chnese word and the Englsh word of a here means that farmland s the locaton for plantng. # here means that farmland and crop have some co-relatonshp. Synonyms usually share the same sememes n the HowNet. The word 田地 also means farmland. It has the same sememes as the word 农田. If two words have been defned as contanng the same sememe X, these two words are called related words of X. For example, 农田 and 田地 are the related words of sememe land 陆地. Each sememe n HowNet belongs to a category. The most mportant categores are event category, entty category and the secondary feature category. Sememes have ther depth nformaton n the up-down relatonshp (see Fg. 1). General sememes are close to the root and specal sememes are deep down n the tree. For example, the depth of sememe crop 庄稼 s 6, whle the depth of sememe anmate 生物 s 4. As Formula 1 shows, three knds of contrbutons of sememe s are consdered. 1) The category of sememe: In HowNet, there are three most mportant categores such as entty, secondary feature and event. Entty category s about the nouns, whle the event category s about the verbs. Secondary feature category descrbes the doman of a word. Only sememes n these categores are used to extend the document vector. Suppose category( denotes the category of sememe s, then the category contrbuton of sememe s s defned as: 0.9 category ( = entty; λ ( = 0.8 category ( = sec ondaryfeat ure; (2) 0.7 catergory ( = event. 2) The depth of sememe: Specal sememes can contrbute more than the general sememes. Specal sememes are deep down n the up-down relatonshp tree (see Fg. 1). The deeper a sememe s, the bgger the contrbuton of the sememe s. The depth contrbuton of a sememe s s defned as: 0 depth( wdepth( = max Depthk 1 depth( < mn Depth depth( mn Depth & depth( max Depth depth( > max Depth k k k k (3) Fg 1. The tree of HowNet up-down relatonshp Snce sememes have so much useful nformaton, mportant sememes (whose categores are entty, event and secondary feature) are extracted from each word. They are added to the document vector as a new dmenson. Let D = {d 1,d 2,,d n } denotes a set of documents that a user has browsed. The set T = {t 1,t 2,,t m } denotes the words of all documents. Of course, the stopwords and words wth DF (document frequency) less than 2 are fltered. These words are not ncluded n T. Let S = {s 1,s 2,,s p } denotes the set of all the useful sememes related to any word n T. The HowNet extended document vector s defned as: Defnton 1 (Extended Document Vector) An extended document vector of the -th document d s defned as vector v = tfdf ( t ), tfdf ( t ),..., tfdf ( t ), w( s ), w( s )..., w( s )], [ 1 2 m 1 2 p where tfdf(t ) s the TFIDF weght of word t, w( s the weght of the sememe s. Each sememe s has ts related words t n the document. The weght of the sememe s s defned as w( = λ ( wdepth ( tfdf ( t). (1) t Re latedwords ( s ) The depth( s the sememe depth of sememe s. The parameter mndepth k and maxdepth k are for adjustment. Because the words whch are too general are useless, the sememes whch have the depth less than mndepth k are gnored. Then we suppose that the sememes whch have the depth more than maxdepth k have equal contrbutons. Because sememes n dfferent categores can t be treated equally, dfferent values are assgned to mndepth k (or maxdepth k ) for dfferent sememe categores. 3) The TFIDF weghts of related words: Consderng a sememe have some related words. If these words are mentoned for many tmes n a document, the sememe should be mportant. Suppose the set of related words of sememe s s RelatedWords(, we accumulate the TFIDF weghts of each related word t n RelatedWords( as a contrbuton. The frst and thrd contrbutons are defned from the Zh Ca s document representaton [5]. We mplement the depth of sememe as another contrbuton of a sememe. B. Clusterng Based on the extended vector of each browsed web page, the clusterng process s appled to the vectors. The cluster algorthm used s the Sphere K-means [6] for ts fast and effcent features. The Kaufman approach (KA) [7] s chosen as the ntalzaton method of the Sphere K-means.

3 C. Cluster Refnement After all the clusters have been generated, the clusters are refned. Frst, the top 3 nouns or verbs wth the hghest weghts n each cluster are checked. If two clusters have the same words n the top 3 words, we assume these two clusters share the same topc and merge them to be one cluster. Then the centrod s reconstructed for the new cluster. The process s repeated untl no such two clusters need to be merged. Fnally we get the clusters to represent dversfed user nterests. III. MAPPING CLUSTER TO CONCEPT BASED ON HOWNET A reference taxonomy T has been borrowed from the IAsk ste of Sna.com ( Suppose c s a cluster, o s a concept node of the reference taxonomy. The mappng process s to map each c to the concept node o of the taxonomy. In our method, clusters and concept nodes are regarded as documents. Both of them can be represented by the extended document vector whch defned n the secton II. The extended vector can represent the semantc content of cluster and concept nodes. The cosne functon s used to calculate the smlarty between the vectors. Due to varous web page contents, the sparse of the extended vector stll exsts. For example, the word 图书 (book) have sememes publcatons 书刊 and mass 众, whle another word 短篇小说 (short story) have a sememe readngs 读物. These two words are smlar. However they have no common sememes. The cosne of extended vectors can t dscover the smlarty between these words. In order to solve the problem, the HowNet smlarty calculaton from Qun Lu [8] s used to evaluate the smlarty between keywords. In the above example, the up-down relaton between the sememes such as publcatons 书刊 and readngs 读物 wll be consdered semantc smlar accordng to the calculaton result. For ths reason, the keywords are extracted from each cluster and concept. Each tme we get one from cluster and one from concept. Then the HowNet smlarty s calculated between keywords of cluster and concept. The HowNet smlarty values can be consdered as part of smlarty between cluster and concept. In the followng, frst the representaton of cluster and concept are defned. Then the smlarty calculaton between cluster and concept s defned. Fnally an algorthm to map each cluster to a concept node of a taxonomy T s dscussed. A. Cluster Representaton In mappng process, each cluster c s represented as an extended vector CV (see Defnton 2) and a keyword set CW = {cw 1,cw 2, cw nc }. Each cw s a keyword from the cluster. Defnton 2 (Extended Vector of a Cluster) The extended vector of a cluster c s defned as the centrod of the cluster: = 1 v c CV (4) d c where c s the number of pages n c, d s a document n cluster c, v s the extended document vector of d (see Defnton 1). The extended vector consdered the words and sememes of the whole content n each document. It provdes a global vew of the cluster. Keywords (CW) of the cluster c are also extracted for HowNet smlarty calculaton. They are the top 30 words (ether nouns or verb extracted from the extended vector of cluster. These words are used to be compared wth keywords of concept accordng to the HowNet smlarty. The Keyword sets are used to evaluate the semantc meanng between a cluster and a concept. B. Concept Representaton Each concept o of a reference taxonomy T s regarded as a document. The labels n the subtree of the concept o can be regarded as the content of the document o. Lke cluster, the concept o s represented as an extended vector OV and a keywords set OW = {ow 1,ow 2, ow no }. Defnton 3 (Extended Vector of a Concept) An extended vector of a concept o s defned as: OV=[tfdf(t 1 ),tfdf(t 2 ),,tfdf(t m ),w(s 1 ),w(s 2 ),,w(s p )], where each word t belongs to the words set T, each s j belongs to the sememes set S (see Secton II-A), tfdf(t ) s the TFIDF weght of word t n the document w(s j ) s the weght of sememe s j (see Formula 1). Suppose the taxonomy T s as Fg. 2, then the document of concept Basketball s Basketball NBA CBA. The document of concept Sports s Sports Football Basketball Png-Pong NBA CBA.... Fg. 2 A snppet of a reference taxonomy T In order to construct the keywords set of a concept, the labels whch are nouns and verbs from the subtree of the node are selected. Consderng the complexty of HowNet smlarty computaton, only labels from the top 2 levels of the subtree are chosen. For example, the keywords set of the concept Sports s {Sports, Football, Basketball, Png-Pong} n the Fg. 2.,

4 C. Smlarty calculaton between Concept and Cluster After the above two processes, each cluster c s represented as an extended vector CV and a keyword set CW = {cw 1,cw 2, cw nc }. Each concept o s represented as an extended vector OV and a keyword set OW = {ow 1,ow 2, ow no }. The smlarty between concept and cluster can be calculated accordng to these extended vectors and keywords. The extended vector smlarty between concept o and cluster c s calculated as defnton 4. Defnton 4 (Extended Vector Smlarty) The extended vector smlarty between concept o and cluster c s defned as: OV CV ExtendedVectorSm( = cos( OV, CV ) = (5) OV CV, where OV s the extended vector of concept CV s the extended vector of cluster c. Here the cosne functon s used to evaluate the smlarty between extended vector OV and CV. The keywords smlarty between concept o and cluster c s calculated as defnton 5. Defnton 5 (Keywords Smlarty) The keywords smlarty between concept o and cluster c s defned as: KeywordsSm( = β MaxKeywordsSm( + (1 β ) AverageKeywordsSm(, where AverageKeywordsSm( s the average value of the HowNet smlartes [8] between each par of keywords, MaxKeywordsSm( s the maxmum value of the HowNet smlartes between each par of keywords, β s a weght parameter for adjustment. It s between 0 and 1. The HowNet smlarty s computed between each par of keywords from cluster and concept. The maxmum value and average value of these HowNet smlarty values are then combned together to be the keywords smlarty. The maxmum value of the HowNet smlartes between each par of keywords s calculated as MaxKeyword ssm ( = MAX ( HownetSm ( ow, cw, j where ow s a word n the keywords set OW of the concept cw j s a word n the keywords set CW of the cluster c, HownetSm(ow, cw j ) s the HowNet smlarty between word ow and cw j. The average value of the HowNet smlartes between each par of keywords s calculated as AverageKeywordsSm( = Y OW CW = 1 j= 1 tfdf ( cw j j )), ) HownetSm( ow, cw j ), (6) (7) (8) where OW s the number of the keywords n concept CW s the number of the keywords n cluster c, tfdf(cw j ) s the TFIDF weght of word cw j n the extended vector of cluster c, Y = CW OW j= 1 1 tfdf ( cw j ) s a normalzaton factor. Consderng the dfferent mportance of the keywords n the cluster, the TFIDF weght of the keyword s mplemented when computng the average value of the HownNet smlarty. The smlarty calculaton between concept and cluster s based on the extended vector smlarty (see Defnton 4) and the keywords smlarty (see Defnton 5).It shows n the formula 9. αs the weght parameter for adjustment. We set t as 0.9, 0.1, 0.1 correspondng to the frst, second and thrd levels of the concepts. Smlarty ( = α ExtendedVe ctorsm( + (1 α ) KeywordsS m(. D. Mappng Algorthm Algorthm 1 ConceptMappng(c, r) Input: c: A cluster; r: The root concept n the nterest taxonomy T for mappng. Output: o: A concept node n taxonomy T whch mapped to cluster c. /* Ths functon maps cluster c to a concept o n the subtree of r. */ 1: o r; /*Intalzaton*/ 2: f (haschld(r)) /*node r has chld n taxonomy T*/ 3: topkconcepts top K chldren of concept r whch have the hghest smlarty wth cluster c; /* Return the leaf node o whch s most smlar to cluster c. */ 4: for each o_ n topkconcepts 5: o_temp ConceptMappng(c, o_); 6: f Smlarty(o_temp, >Smlarty( 7: o o_temp; 8: end f 9: end for 10: end f 11: return o; Fg. 3 The algorthm of concept mappng between cluster and taxonomy The mappng algorthm s shown n Fg. 3. Gven a cluster c and root concept node r, the algorthm searches n the sub- (9)

5 tree of concept r, and t returns a concept o mapped to the cluster c. The mappng process s a top-down method. Frst t gets the top K chldren of concept r wth the hghest smlarty about cluster c. These chldren are regarded as the set topkconcepts. Then t runs the same mappng process n the subtree of each node n topkconcepts. The process returns a leaf node for each subtree. These K leaves are regarded as canddates. Fnally t chooses the canddate o whch s most smlar to the cluster c. The Smlarty functon evaluates the smlarty between a concept and a cluster. It chooses K canddates of concepts to prevent the early wrong mappng n the ancestor nodes. The parameter K s 2 n our system and the experment. IV. EXPERIMENT To evaluate the performance of our method for constructng herarchcal user nterests, an experment s conducted. Usng our IE plug-n, we frst collected 13 volunteers browsng hstores of about 2-week nformaton. The total number of the vsted web pages s about 3350 web pages whch cover dfferent topcs ncludng poltcs, culture, economy, scence, entertanment and so on. For concept mappng, a topc herarchy s extracted from the IAsk ste of Sna.com ( as our reference taxonomy. The taxonomy contans 3 levels of concepts. After flterng some useless concepts, we get about 3662 concepts n the taxonomy. Then our clusterng process s performed to each volunteer s browsng hstory. The system generated 5-12 clusters for each volunteer. Three mappng methods (ncludng our mappng method) were appled on these clusters. These methods are lsted as follows: 1) The Baselne method: The method only uses the classc Vector Space Model (VSM) vector to represent a cluster and a concept. It uses the cosne value of the vectors to compute the smlarty between cluster and concept. 2) The ExtendedVector method: The method uses our extended vector (see Defnton 2 and Defnton 3) to represent a cluster and a concept. It uses the Extended Vector Smlarty (see Defnton 4) to compute the smlarty between cluster and concept. 3) The ExtendedVector+Keywords method: It s the one our system chooses. It uses both extended vector and keywords set to represent a cluster and a concept. Formula 9 was chosen to compute the smlarty between cluster and concept. The parameter αof the smlarty s set to 0.9, 0.1, 0.1 correspondng to the frst, second and thrd levels of the concepts. All three methods use algorthm 1(see Fg. 3) for concept mappng. Fg. 4 Part of concept mappng about user U1

6 TABLE I THE PRECEISION OF MAPPING (EXTENDEDVECTOR+KEYWORDS METHOD) User ID U1 U2 U3 U4 U5 U6 U7 U8 U9 U10 U11 U12 U13 Clusters P (Level 1 ) P (Level 2 ) P (Level 3 ) TABLE II MAPPING RESULT OF THE THREE METHODS Method P (Level 1 ) P (Level 2 ) P (Level 3 ) ExtendedVector + Keywords ExtendedVector Baselne We evaluate the mappng precsons of the three methods on volunteers clusters. When a cluster has been mapped to a concept, we say the cluster belongs to the concept and ts ancestor. The mappng precson on the -th level of taxonomy T s defned as: rghtmappng( o, c C Pr ecson( Level ) = (10) C, where Level s the -th level of taxonomy T, C s the set of evaluaton clusters, C s the number of clusters n C, c s a cluster n the set C, o s the concept of the -th level whch cluster c belongs to. rghtmappng(o, returns 1 when the cluster c really belongs to the concept node o on the -th level. Otherwse, t returns 0. The Fg. 4 shows part of the user U1 s nterest profle. The cluster c has been mapped to the concept Lgue1, n fact the cluster s about the Football club. On the 3 rd level of the taxonomy T, concept o 3 s Lgue1 whch the cluster c belongs to. It s a wrong mappng on ths level, so the rght- Mappng(o 3, s 0. On the 2 nd level of taxonomy, concept o 2 s Football, and cluster c really belongs to t. So the rght- Mappng(o 2, s 1. Table I shows the mappng result of each user based on the method of ExtendedVector+Keywords. The second row shows the number of clusters whch generated for each user. The next three rows show the precson of concept mappng on the 1 st, 2 nd and 3 rd level of the reference taxonomy, ndcated by P(Level 1 ), P(Level 2 ) and P(Level 3 ). The precson of 1 st level P(Level 1 ) has acheved better result for each user. Table II shows the mappng result of all three methods. Each row s the precsons of each method, whle each column s the precson of the result evaluated on each level. The precsons on the 1 st level are not so dfferent among these methods. The ExtendedVector +Keywords method only outperforms a lttle wth the precson of 0.91 on the 1 st level of mappng. The reason s that each concept on the 1 st level has a bg subtree. There are enough words whch can be used for the VSM vector representaton (Baselne) or Extended Vector representaton on the 1 st level. When comparng the result on the 2 nd level and the 3 rd level, the ExtendedVector+Keywords method s much better than the other two. The precson of our method s 0.78 on the 2 nd level, whle the precson of ExtendedVector method s The Baselne method has got the precson whch s only 0.57 on the 2 nd level. The results on the 2nd and 3rd level show that the HowNet smlarty of keywords plays an mportant role n the concept mappng when there are not so much labels to descrbe concepts. V. COMPARISON WITH OTHER METHODS User nterests can be represented as a user profle. Many machne learnng methods are employed to construct the herarchcal user nterests, such as herarchcal clusterng algorthms and classfcaton technques. Ther researches ndcate that specfc nterests are very dffcult to be dentfed. The herarchcal clusterng algorthms can construct an nterest herarchy wthout any reference taxonomy. Km [9] provdes a herarchcal clusterng algorthm called DHC to construct user nterests. Godoy [10] provdes another herarchcal clusterng algorthm called WebDCC to construct the user nterests. These unsupervsed methods do not need any tranng web pages. The keywords are extracted to represent an nterest node n the methods. However the keywords of a node are dversfed. It s dffcult to buld a standard profle for dfferent users. The classfcaton technque can construct a standard profle based on the reference taxonomy. N [11] constructed Chnese weblogger s nterests based on text classfcaton. They use the combnaton of classfers such as the Nave Bayes Classfer (NB) [12], Support Vector Machne (SVM) [13] and Roccho Classfer [14]. The system uses the data of SOHO s

7 Drectory [15] to tran the classfers. The top 2 levels n the herarchcal category space are consdered as reference taxonomy. The number of tranng documents s about PVA System [16] bulds the concept-herarchy-based user profle by classfyng the browsed web pages usng the vector space model. It use a three-level concept herarchy contanng 55 concepts of Yam search web ste [17] as reference taxonomy. Trajkova s system [18] uses the top 3 levels of ODP (Open Drectory Project) [19] as reference taxonomy. The web pages related to the ODP categores are collected as tranng data. A centrod-based document classfer was bult to construct user nterests. Lu [20] bulds a smlar system based on the top 2 levels of ODP taxonomy. Msearch project [21] also uses the ODP taxonomy. However, t bulds the profles by collectng and classfyng the user search hstores rather than the user s browsng hstory. User nterest profles are bult accordng to a text classfer. All these systems need tranng web pages to represent each nterest. When a taxonomy changes, the tranng data related to the taxonomy must be changed too. Our system provdes a method to construct a standard herarchy profle wthout any tranng web pages. Compared wth the herarchcal clusterng algorthms, our method can generate more standard and precse profle whch can capture dversfed ndvdual nterests. The unsupervsed herarchcal clusterng algorthms can generate dfferent user nterests for each user. However these dfferent herarchcal user nterests are dffcult to personalzed servce. On the other sde, keywords extracted from web pages can not be guaranteed to represent a standard user nterest. Some words are confusng. Usng the reference taxonomy, our method can provde a standard nterest vew of a user. Compared wth the classfcaton technque, our method provdes more adaptable way to ndvdual user nterests. The classfcaton technque needs tranng data as ts knowledge source. It collects web pages for each nterest node of the taxonomy. The tranng data (web page strongly depends on the taxonomy. As taxonomy changes, the large tranng data must be changed too. Our method does not need to collect tranng sets or change classfer when user nterests change. Only reference taxonomy and HowNet are used as the knowledge sources n our method. When the reference taxonomy changed, our system keeps unchanged. The semantc nformaton of HowNet has been appled n many applcatons such as document representaton and words smlarty calculaton. Based on these works, our system modfed the document representaton method for clusterng web pages and consdered the keywords smlarty defnton n the concept mappng process. VI. CONCLUSIONS In ths paper, we propose a method to construct the herarchcal user nterests. It doesn t need any tranng data for each concept node n the taxonomy. The nterest profle constructon conssts of two steps: clusterng the web pages browsed and mappng each cluster to a concept of the reference taxonomy. The HowNet nformaton s used n the clusterng process and the concept mappng process. In the clusterng process, each document s represented wth a HowNet extended vector. In the mappng process, both extended vector and keywords nformaton are used to compare the smlarty between a cluster and a concept. The experment shows that our method can construct a fne herarchcal user nterest profle. It also shows that wth the help of HowNet, the keywords smlarty plays an mportant role n the mappng process. In the future, the qualty of extracted keywords wll be mproved by dentfyng the relatonshp among words, clusters and documents. Snce the mappng process needs keywords of each cluster, f better keywords have been found to represent a cluster, the mappng process wll work more precsely. As we only used the up-down relatonshp between words, there are stll many other knds of relatonshps of words n the HowNet. These relatonshps can be taken nto account for the words smlarty computaton. Based on our method, dfferent reference taxonomy wll be expermented. Personalzed servce and recommendaton system would be explored n the near future. REFERENCE [1] S. Gauch, M. Speretta, A. Chandramoul, and A. Mcarell, User profles for personalzed nformaton access, The Adaptve Web, pp , [2] F. L, Y. L, Y. Wu, K. Zhou, F. L, X. Wang and B. Lu, Combnng browsng behavor and page contents for fndng user nterests wll be publshed n the Autonomous Systems -- Self-Organsaton, Management, and Control edted by Mahr & Sheng, publshed by Sprnger Verlag [3] H. Zhuge, Chna's E-Scence Knowledge Grd Envronment, IEEE Intellgent Systems, vol. 19, no. 1, pp , [4] Z. Dong and Q. Dong. (2000) HowNet. [Onlne]. Avalable: [5] Z. Ca, H. T. Geng, X. Zha and Q. S. Ca, An algorthm for conceptual clusterng of Chnese text, n Proc. the 3rd Internatonal Conference on Machne Learnng and Cybernetcs, Shangha, 2004, pp [6] I. S. Dhllon and D. S. Modha, Concept decompostons for large sparse text data usng clusterng, Machne Learnng, vol. 42, no. 1, pp , [7] J. A. Lozan J. M. Pena, and P. Larranage, An emprcal comparson of four ntalzaton methods for the kmeans algorthm, Pattern Recognton Letters, vol. 20, no. 10, pp , [8] Q. Lu and S. J. L, Word smlarty computng based on How-net, n Computatonal Lngustcs and Chnese Language Processng, vol. 7, no. 2, Shangha, 2002, pp [9] H. R. Km, and P. K. Chan, Learnng mplct user nterest herarchy for context n personalzaton, Appled Intellgence, [10] D. Godoy and A. Amand, Modelng user nterests by conceptual clusterng, Informaton Systems, vol. 31, no. 4, pp , [11] X. C. N, X. Y. Wu, and Y. Yu, Automatc dentfcaton of Chnese weblogger s nterests based on text classfcaton, n Proc. the 2006 IEEE/WIC/ACM Internatonal Conference on Web Intellgence. Hong Kong, Chna: IEEE, 2006, pp [12] A. Mccallum and K. Ngam, A comparson of Event Models for Naïve Byaes Text Classfcaton, In Proc. the AAAI-98 Workshop on Learnng for Text Categorzaton, 1998, pp [13] T. Joachms, Text categorzaton wth support vector machnes: learnng wth many relevant features, In Proc. of ECML-98, 1998, pp [14] T. Joachms, A Probablstc Analyss of the Roccho Algorthm wth TFIDF for Text Categorzaton, In Proc. of ICML-97, 1997, pp [15] SOHU ste,

8 [16] C. C. Chen, M. C. Chen, and Y. Sun, PVA: A Self-Adaptve Personal Vew Agent System, Journal of Intellgent Informaton Systems, vol. 18, no. 2, pp , [17] Yam search engne, [18] J. Trajkova and S. Gauch, Improvng ontology-based user profles, n Proc. of the RIAO conference, Vaucluse, France, 2004, pp [19] Open Drectory Project, [20] F. Lu, C. Yu, and W. Meng, Personalzed web search by mappng user queres to categores, n Proc. the 11th Internatonal Conference on Informaton and Knowledge Management, Mclean, Vrgna, 2002, pp [21] A. Seg, B. Mobasher, and R. Burke, Inferrng users nformaton context: Integratng user profles and concept herarches, n 2004 Meetng of the Internatonal Federaton of Classfcaton Socetes, Chcag 2004.

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department