Modeling Hierarchical User Interests Based on HowNet and Concept Mapping

Size: px
Start display at page:

Download "Modeling Hierarchical User Interests Based on HowNet and Concept Mapping"

Transcription

1 Modelng Herarchcal User Interests Based on HowNet and Concept Mappng Yhong L #1, Fang L #2 # Dept. of Computer Scence & Engneerng, Shangha Jao Tong Unversty No.800 Dong Chuan Rd. Shangha , P.R. Chna 1 flyran@sjtu.edu.cn 2 fl@sjtu.edu.cn Abstract Modelng user nterests plays an mportant role n personalzed servce on the Internet. Many systems use classfcaton methods to construct the user profle to represent user nterests. However, t s dffcult to cover all ndvdual nterests and capture the changes of user profles. Ths paper ntroduces a method wth two steps to solve the problem. In the frst step t clusters the vewed web pages to extract ndvdual nterests by extendng the vector of VSM wth sememes from the HowNet. The second step s to map each nterest cluster to the reference taxonomy to model herarchcal user nterests based on smlarty calculaton of keywords and the extended vector. Experments show that the method can model herarchcal user nterests wth a promsng result. When a new nterest emerges, t does not need any adjustment lke collectng new tranng data or rebuldng the classfer. It can capture the dversfed user nterests and map to an nterests taxonomy. I. INTRODUCTION It s known that Informaton on the Internet grows exponentally. Users always feel that they are drownng n a sea of nformaton. How to provde useful nformaton for dfferent ndvdual user has become a crtcal problem. User nterests are very mportant to personalzed search, collaboratve flterng and recommendaton systems. How to model user nterests s a key step for provdng personalzed servce. The common representatons of user nterests are keyword profle and concept profle [1]. Keyword profles are represented as sets of weghted words extracted from the web pages that a user browsed. Each set of words represents dversfed user s nterests. However, many sets of keywords have some relatonshps. The flat structure lke keywords profle can not model user nterests precsely and correctly. Herarchcal concept profles can provde a more standard structural way to model user nterests. Node of the concept profle can be consdered as an abstract topc that user may be nterested n, rather than set of words. The concept herarchy descrbes a standard way to model user nterests and ther relatonshps. It s close to the human concepton. There are two methods to model herarchcal user nterests on the Internet: herarchcal classfcaton and clusterng. Classfcaton method needs to tran a classfer to represent all ndvdual user nterests. Herarchcal clusterng methods can acheve ndvdual user nterests. However, t s dffcult to provde a standard personalzed servce. Based on our pror research [2], how to capture ndvdual nterests and model herarchcal user nterests wth concept profles s our research am. Our nterest profle constructon ncludes two steps: clusterng the web pages and mappng each cluster to a concept of the reference taxonomy. Due to the complexty of natural languages, deep content processng s mpossble wthout semantc nformaton. There are synonym, polysemy and other language problems, whch challenge n nformaton processng. Therefore, semantc knowledge platform such as Knowledge Grd [3] plays an mportant role n semantc processng. In our soluton, we attempt to use the Hownet [4], a Chnese knowledge base, to capture the semantc nformaton n the tradtonal vector space model n order to acqure user nterests. By smlarty calculaton of keywords, user nterests generated can be mapped to the reference user nterest ontology. The rest of the paper s organzed as follows. In Secton II we dscuss about the cluster constructon usng clusterng algorthm based on HowNet. Then we talk about our concept mappng method n Secton III. Our experment s shown n Secton IV. The related work s n Secton V and a concluson n Secton VI. II. CLUSTER CONSTRUCTION The content of browsed web page s a good ndcator to user nterests. Indvdual user nterests can be found based on the result of clusterng web pages. The cluster constructon conssts of three steps. The frst step s to extend vector space model based on HowNet to represent the semantc content of web pages. The second step s page clusterng. The thrd step s cluster refnement. A. Web Page Representaton based on HowNet The content of a web page can be represented as a vector n the Vector Space Model (VSM). In order to solve the sparseness of the VSM vector, the HowNet sememes are used to extend the vector. Each lexcal tem of HowNet s expressed wth a set of sememes. Sememes are the atomc semantc elements. For example: A Chnese word 农田 (means farmland) n the HowNet can be descrbed as:

2 DEF= land 栽植,#crop 庄稼,agrcultural 农 There are four sememes related to the word 农田 : land 陆地, plantng 栽植, crop 庄稼 and agrcultural 农. The symbol separated the Chnese word and the Englsh word of a here means that farmland s the locaton for plantng. # here means that farmland and crop have some co-relatonshp. Synonyms usually share the same sememes n the HowNet. The word 田地 also means farmland. It has the same sememes as the word 农田. If two words have been defned as contanng the same sememe X, these two words are called related words of X. For example, 农田 and 田地 are the related words of sememe land 陆地. Each sememe n HowNet belongs to a category. The most mportant categores are event category, entty category and the secondary feature category. Sememes have ther depth nformaton n the up-down relatonshp (see Fg. 1). General sememes are close to the root and specal sememes are deep down n the tree. For example, the depth of sememe crop 庄稼 s 6, whle the depth of sememe anmate 生物 s 4. As Formula 1 shows, three knds of contrbutons of sememe s are consdered. 1) The category of sememe: In HowNet, there are three most mportant categores such as entty, secondary feature and event. Entty category s about the nouns, whle the event category s about the verbs. Secondary feature category descrbes the doman of a word. Only sememes n these categores are used to extend the document vector. Suppose category( denotes the category of sememe s, then the category contrbuton of sememe s s defned as: 0.9 category ( = entty; λ ( = 0.8 category ( = sec ondaryfeat ure; (2) 0.7 catergory ( = event. 2) The depth of sememe: Specal sememes can contrbute more than the general sememes. Specal sememes are deep down n the up-down relatonshp tree (see Fg. 1). The deeper a sememe s, the bgger the contrbuton of the sememe s. The depth contrbuton of a sememe s s defned as: 0 depth( wdepth( = max Depthk 1 depth( < mn Depth depth( mn Depth & depth( max Depth depth( > max Depth k k k k (3) Fg 1. The tree of HowNet up-down relatonshp Snce sememes have so much useful nformaton, mportant sememes (whose categores are entty, event and secondary feature) are extracted from each word. They are added to the document vector as a new dmenson. Let D = {d 1,d 2,,d n } denotes a set of documents that a user has browsed. The set T = {t 1,t 2,,t m } denotes the words of all documents. Of course, the stopwords and words wth DF (document frequency) less than 2 are fltered. These words are not ncluded n T. Let S = {s 1,s 2,,s p } denotes the set of all the useful sememes related to any word n T. The HowNet extended document vector s defned as: Defnton 1 (Extended Document Vector) An extended document vector of the -th document d s defned as vector v = tfdf ( t ), tfdf ( t ),..., tfdf ( t ), w( s ), w( s )..., w( s )], [ 1 2 m 1 2 p where tfdf(t ) s the TFIDF weght of word t, w( s the weght of the sememe s. Each sememe s has ts related words t n the document. The weght of the sememe s s defned as w( = λ ( wdepth ( tfdf ( t). (1) t Re latedwords ( s ) The depth( s the sememe depth of sememe s. The parameter mndepth k and maxdepth k are for adjustment. Because the words whch are too general are useless, the sememes whch have the depth less than mndepth k are gnored. Then we suppose that the sememes whch have the depth more than maxdepth k have equal contrbutons. Because sememes n dfferent categores can t be treated equally, dfferent values are assgned to mndepth k (or maxdepth k ) for dfferent sememe categores. 3) The TFIDF weghts of related words: Consderng a sememe have some related words. If these words are mentoned for many tmes n a document, the sememe should be mportant. Suppose the set of related words of sememe s s RelatedWords(, we accumulate the TFIDF weghts of each related word t n RelatedWords( as a contrbuton. The frst and thrd contrbutons are defned from the Zh Ca s document representaton [5]. We mplement the depth of sememe as another contrbuton of a sememe. B. Clusterng Based on the extended vector of each browsed web page, the clusterng process s appled to the vectors. The cluster algorthm used s the Sphere K-means [6] for ts fast and effcent features. The Kaufman approach (KA) [7] s chosen as the ntalzaton method of the Sphere K-means.

3 C. Cluster Refnement After all the clusters have been generated, the clusters are refned. Frst, the top 3 nouns or verbs wth the hghest weghts n each cluster are checked. If two clusters have the same words n the top 3 words, we assume these two clusters share the same topc and merge them to be one cluster. Then the centrod s reconstructed for the new cluster. The process s repeated untl no such two clusters need to be merged. Fnally we get the clusters to represent dversfed user nterests. III. MAPPING CLUSTER TO CONCEPT BASED ON HOWNET A reference taxonomy T has been borrowed from the IAsk ste of Sna.com ( Suppose c s a cluster, o s a concept node of the reference taxonomy. The mappng process s to map each c to the concept node o of the taxonomy. In our method, clusters and concept nodes are regarded as documents. Both of them can be represented by the extended document vector whch defned n the secton II. The extended vector can represent the semantc content of cluster and concept nodes. The cosne functon s used to calculate the smlarty between the vectors. Due to varous web page contents, the sparse of the extended vector stll exsts. For example, the word 图书 (book) have sememes publcatons 书刊 and mass 众, whle another word 短篇小说 (short story) have a sememe readngs 读物. These two words are smlar. However they have no common sememes. The cosne of extended vectors can t dscover the smlarty between these words. In order to solve the problem, the HowNet smlarty calculaton from Qun Lu [8] s used to evaluate the smlarty between keywords. In the above example, the up-down relaton between the sememes such as publcatons 书刊 and readngs 读物 wll be consdered semantc smlar accordng to the calculaton result. For ths reason, the keywords are extracted from each cluster and concept. Each tme we get one from cluster and one from concept. Then the HowNet smlarty s calculated between keywords of cluster and concept. The HowNet smlarty values can be consdered as part of smlarty between cluster and concept. In the followng, frst the representaton of cluster and concept are defned. Then the smlarty calculaton between cluster and concept s defned. Fnally an algorthm to map each cluster to a concept node of a taxonomy T s dscussed. A. Cluster Representaton In mappng process, each cluster c s represented as an extended vector CV (see Defnton 2) and a keyword set CW = {cw 1,cw 2, cw nc }. Each cw s a keyword from the cluster. Defnton 2 (Extended Vector of a Cluster) The extended vector of a cluster c s defned as the centrod of the cluster: = 1 v c CV (4) d c where c s the number of pages n c, d s a document n cluster c, v s the extended document vector of d (see Defnton 1). The extended vector consdered the words and sememes of the whole content n each document. It provdes a global vew of the cluster. Keywords (CW) of the cluster c are also extracted for HowNet smlarty calculaton. They are the top 30 words (ether nouns or verb extracted from the extended vector of cluster. These words are used to be compared wth keywords of concept accordng to the HowNet smlarty. The Keyword sets are used to evaluate the semantc meanng between a cluster and a concept. B. Concept Representaton Each concept o of a reference taxonomy T s regarded as a document. The labels n the subtree of the concept o can be regarded as the content of the document o. Lke cluster, the concept o s represented as an extended vector OV and a keywords set OW = {ow 1,ow 2, ow no }. Defnton 3 (Extended Vector of a Concept) An extended vector of a concept o s defned as: OV=[tfdf(t 1 ),tfdf(t 2 ),,tfdf(t m ),w(s 1 ),w(s 2 ),,w(s p )], where each word t belongs to the words set T, each s j belongs to the sememes set S (see Secton II-A), tfdf(t ) s the TFIDF weght of word t n the document w(s j ) s the weght of sememe s j (see Formula 1). Suppose the taxonomy T s as Fg. 2, then the document of concept Basketball s Basketball NBA CBA. The document of concept Sports s Sports Football Basketball Png-Pong NBA CBA.... Fg. 2 A snppet of a reference taxonomy T In order to construct the keywords set of a concept, the labels whch are nouns and verbs from the subtree of the node are selected. Consderng the complexty of HowNet smlarty computaton, only labels from the top 2 levels of the subtree are chosen. For example, the keywords set of the concept Sports s {Sports, Football, Basketball, Png-Pong} n the Fg. 2.,

4 C. Smlarty calculaton between Concept and Cluster After the above two processes, each cluster c s represented as an extended vector CV and a keyword set CW = {cw 1,cw 2, cw nc }. Each concept o s represented as an extended vector OV and a keyword set OW = {ow 1,ow 2, ow no }. The smlarty between concept and cluster can be calculated accordng to these extended vectors and keywords. The extended vector smlarty between concept o and cluster c s calculated as defnton 4. Defnton 4 (Extended Vector Smlarty) The extended vector smlarty between concept o and cluster c s defned as: OV CV ExtendedVectorSm( = cos( OV, CV ) = (5) OV CV, where OV s the extended vector of concept CV s the extended vector of cluster c. Here the cosne functon s used to evaluate the smlarty between extended vector OV and CV. The keywords smlarty between concept o and cluster c s calculated as defnton 5. Defnton 5 (Keywords Smlarty) The keywords smlarty between concept o and cluster c s defned as: KeywordsSm( = β MaxKeywordsSm( + (1 β ) AverageKeywordsSm(, where AverageKeywordsSm( s the average value of the HowNet smlartes [8] between each par of keywords, MaxKeywordsSm( s the maxmum value of the HowNet smlartes between each par of keywords, β s a weght parameter for adjustment. It s between 0 and 1. The HowNet smlarty s computed between each par of keywords from cluster and concept. The maxmum value and average value of these HowNet smlarty values are then combned together to be the keywords smlarty. The maxmum value of the HowNet smlartes between each par of keywords s calculated as MaxKeyword ssm ( = MAX ( HownetSm ( ow, cw, j where ow s a word n the keywords set OW of the concept cw j s a word n the keywords set CW of the cluster c, HownetSm(ow, cw j ) s the HowNet smlarty between word ow and cw j. The average value of the HowNet smlartes between each par of keywords s calculated as AverageKeywordsSm( = Y OW CW = 1 j= 1 tfdf ( cw j j )), ) HownetSm( ow, cw j ), (6) (7) (8) where OW s the number of the keywords n concept CW s the number of the keywords n cluster c, tfdf(cw j ) s the TFIDF weght of word cw j n the extended vector of cluster c, Y = CW OW j= 1 1 tfdf ( cw j ) s a normalzaton factor. Consderng the dfferent mportance of the keywords n the cluster, the TFIDF weght of the keyword s mplemented when computng the average value of the HownNet smlarty. The smlarty calculaton between concept and cluster s based on the extended vector smlarty (see Defnton 4) and the keywords smlarty (see Defnton 5).It shows n the formula 9. αs the weght parameter for adjustment. We set t as 0.9, 0.1, 0.1 correspondng to the frst, second and thrd levels of the concepts. Smlarty ( = α ExtendedVe ctorsm( + (1 α ) KeywordsS m(. D. Mappng Algorthm Algorthm 1 ConceptMappng(c, r) Input: c: A cluster; r: The root concept n the nterest taxonomy T for mappng. Output: o: A concept node n taxonomy T whch mapped to cluster c. /* Ths functon maps cluster c to a concept o n the subtree of r. */ 1: o r; /*Intalzaton*/ 2: f (haschld(r)) /*node r has chld n taxonomy T*/ 3: topkconcepts top K chldren of concept r whch have the hghest smlarty wth cluster c; /* Return the leaf node o whch s most smlar to cluster c. */ 4: for each o_ n topkconcepts 5: o_temp ConceptMappng(c, o_); 6: f Smlarty(o_temp, >Smlarty( 7: o o_temp; 8: end f 9: end for 10: end f 11: return o; Fg. 3 The algorthm of concept mappng between cluster and taxonomy The mappng algorthm s shown n Fg. 3. Gven a cluster c and root concept node r, the algorthm searches n the sub- (9)

5 tree of concept r, and t returns a concept o mapped to the cluster c. The mappng process s a top-down method. Frst t gets the top K chldren of concept r wth the hghest smlarty about cluster c. These chldren are regarded as the set topkconcepts. Then t runs the same mappng process n the subtree of each node n topkconcepts. The process returns a leaf node for each subtree. These K leaves are regarded as canddates. Fnally t chooses the canddate o whch s most smlar to the cluster c. The Smlarty functon evaluates the smlarty between a concept and a cluster. It chooses K canddates of concepts to prevent the early wrong mappng n the ancestor nodes. The parameter K s 2 n our system and the experment. IV. EXPERIMENT To evaluate the performance of our method for constructng herarchcal user nterests, an experment s conducted. Usng our IE plug-n, we frst collected 13 volunteers browsng hstores of about 2-week nformaton. The total number of the vsted web pages s about 3350 web pages whch cover dfferent topcs ncludng poltcs, culture, economy, scence, entertanment and so on. For concept mappng, a topc herarchy s extracted from the IAsk ste of Sna.com ( as our reference taxonomy. The taxonomy contans 3 levels of concepts. After flterng some useless concepts, we get about 3662 concepts n the taxonomy. Then our clusterng process s performed to each volunteer s browsng hstory. The system generated 5-12 clusters for each volunteer. Three mappng methods (ncludng our mappng method) were appled on these clusters. These methods are lsted as follows: 1) The Baselne method: The method only uses the classc Vector Space Model (VSM) vector to represent a cluster and a concept. It uses the cosne value of the vectors to compute the smlarty between cluster and concept. 2) The ExtendedVector method: The method uses our extended vector (see Defnton 2 and Defnton 3) to represent a cluster and a concept. It uses the Extended Vector Smlarty (see Defnton 4) to compute the smlarty between cluster and concept. 3) The ExtendedVector+Keywords method: It s the one our system chooses. It uses both extended vector and keywords set to represent a cluster and a concept. Formula 9 was chosen to compute the smlarty between cluster and concept. The parameter αof the smlarty s set to 0.9, 0.1, 0.1 correspondng to the frst, second and thrd levels of the concepts. All three methods use algorthm 1(see Fg. 3) for concept mappng. Fg. 4 Part of concept mappng about user U1

6 TABLE I THE PRECEISION OF MAPPING (EXTENDEDVECTOR+KEYWORDS METHOD) User ID U1 U2 U3 U4 U5 U6 U7 U8 U9 U10 U11 U12 U13 Clusters P (Level 1 ) P (Level 2 ) P (Level 3 ) TABLE II MAPPING RESULT OF THE THREE METHODS Method P (Level 1 ) P (Level 2 ) P (Level 3 ) ExtendedVector + Keywords ExtendedVector Baselne We evaluate the mappng precsons of the three methods on volunteers clusters. When a cluster has been mapped to a concept, we say the cluster belongs to the concept and ts ancestor. The mappng precson on the -th level of taxonomy T s defned as: rghtmappng( o, c C Pr ecson( Level ) = (10) C, where Level s the -th level of taxonomy T, C s the set of evaluaton clusters, C s the number of clusters n C, c s a cluster n the set C, o s the concept of the -th level whch cluster c belongs to. rghtmappng(o, returns 1 when the cluster c really belongs to the concept node o on the -th level. Otherwse, t returns 0. The Fg. 4 shows part of the user U1 s nterest profle. The cluster c has been mapped to the concept Lgue1, n fact the cluster s about the Football club. On the 3 rd level of the taxonomy T, concept o 3 s Lgue1 whch the cluster c belongs to. It s a wrong mappng on ths level, so the rght- Mappng(o 3, s 0. On the 2 nd level of taxonomy, concept o 2 s Football, and cluster c really belongs to t. So the rght- Mappng(o 2, s 1. Table I shows the mappng result of each user based on the method of ExtendedVector+Keywords. The second row shows the number of clusters whch generated for each user. The next three rows show the precson of concept mappng on the 1 st, 2 nd and 3 rd level of the reference taxonomy, ndcated by P(Level 1 ), P(Level 2 ) and P(Level 3 ). The precson of 1 st level P(Level 1 ) has acheved better result for each user. Table II shows the mappng result of all three methods. Each row s the precsons of each method, whle each column s the precson of the result evaluated on each level. The precsons on the 1 st level are not so dfferent among these methods. The ExtendedVector +Keywords method only outperforms a lttle wth the precson of 0.91 on the 1 st level of mappng. The reason s that each concept on the 1 st level has a bg subtree. There are enough words whch can be used for the VSM vector representaton (Baselne) or Extended Vector representaton on the 1 st level. When comparng the result on the 2 nd level and the 3 rd level, the ExtendedVector+Keywords method s much better than the other two. The precson of our method s 0.78 on the 2 nd level, whle the precson of ExtendedVector method s The Baselne method has got the precson whch s only 0.57 on the 2 nd level. The results on the 2nd and 3rd level show that the HowNet smlarty of keywords plays an mportant role n the concept mappng when there are not so much labels to descrbe concepts. V. COMPARISON WITH OTHER METHODS User nterests can be represented as a user profle. Many machne learnng methods are employed to construct the herarchcal user nterests, such as herarchcal clusterng algorthms and classfcaton technques. Ther researches ndcate that specfc nterests are very dffcult to be dentfed. The herarchcal clusterng algorthms can construct an nterest herarchy wthout any reference taxonomy. Km [9] provdes a herarchcal clusterng algorthm called DHC to construct user nterests. Godoy [10] provdes another herarchcal clusterng algorthm called WebDCC to construct the user nterests. These unsupervsed methods do not need any tranng web pages. The keywords are extracted to represent an nterest node n the methods. However the keywords of a node are dversfed. It s dffcult to buld a standard profle for dfferent users. The classfcaton technque can construct a standard profle based on the reference taxonomy. N [11] constructed Chnese weblogger s nterests based on text classfcaton. They use the combnaton of classfers such as the Nave Bayes Classfer (NB) [12], Support Vector Machne (SVM) [13] and Roccho Classfer [14]. The system uses the data of SOHO s

7 Drectory [15] to tran the classfers. The top 2 levels n the herarchcal category space are consdered as reference taxonomy. The number of tranng documents s about PVA System [16] bulds the concept-herarchy-based user profle by classfyng the browsed web pages usng the vector space model. It use a three-level concept herarchy contanng 55 concepts of Yam search web ste [17] as reference taxonomy. Trajkova s system [18] uses the top 3 levels of ODP (Open Drectory Project) [19] as reference taxonomy. The web pages related to the ODP categores are collected as tranng data. A centrod-based document classfer was bult to construct user nterests. Lu [20] bulds a smlar system based on the top 2 levels of ODP taxonomy. Msearch project [21] also uses the ODP taxonomy. However, t bulds the profles by collectng and classfyng the user search hstores rather than the user s browsng hstory. User nterest profles are bult accordng to a text classfer. All these systems need tranng web pages to represent each nterest. When a taxonomy changes, the tranng data related to the taxonomy must be changed too. Our system provdes a method to construct a standard herarchy profle wthout any tranng web pages. Compared wth the herarchcal clusterng algorthms, our method can generate more standard and precse profle whch can capture dversfed ndvdual nterests. The unsupervsed herarchcal clusterng algorthms can generate dfferent user nterests for each user. However these dfferent herarchcal user nterests are dffcult to personalzed servce. On the other sde, keywords extracted from web pages can not be guaranteed to represent a standard user nterest. Some words are confusng. Usng the reference taxonomy, our method can provde a standard nterest vew of a user. Compared wth the classfcaton technque, our method provdes more adaptable way to ndvdual user nterests. The classfcaton technque needs tranng data as ts knowledge source. It collects web pages for each nterest node of the taxonomy. The tranng data (web page strongly depends on the taxonomy. As taxonomy changes, the large tranng data must be changed too. Our method does not need to collect tranng sets or change classfer when user nterests change. Only reference taxonomy and HowNet are used as the knowledge sources n our method. When the reference taxonomy changed, our system keeps unchanged. The semantc nformaton of HowNet has been appled n many applcatons such as document representaton and words smlarty calculaton. Based on these works, our system modfed the document representaton method for clusterng web pages and consdered the keywords smlarty defnton n the concept mappng process. VI. CONCLUSIONS In ths paper, we propose a method to construct the herarchcal user nterests. It doesn t need any tranng data for each concept node n the taxonomy. The nterest profle constructon conssts of two steps: clusterng the web pages browsed and mappng each cluster to a concept of the reference taxonomy. The HowNet nformaton s used n the clusterng process and the concept mappng process. In the clusterng process, each document s represented wth a HowNet extended vector. In the mappng process, both extended vector and keywords nformaton are used to compare the smlarty between a cluster and a concept. The experment shows that our method can construct a fne herarchcal user nterest profle. It also shows that wth the help of HowNet, the keywords smlarty plays an mportant role n the mappng process. In the future, the qualty of extracted keywords wll be mproved by dentfyng the relatonshp among words, clusters and documents. Snce the mappng process needs keywords of each cluster, f better keywords have been found to represent a cluster, the mappng process wll work more precsely. As we only used the up-down relatonshp between words, there are stll many other knds of relatonshps of words n the HowNet. These relatonshps can be taken nto account for the words smlarty computaton. Based on our method, dfferent reference taxonomy wll be expermented. Personalzed servce and recommendaton system would be explored n the near future. REFERENCE [1] S. Gauch, M. Speretta, A. Chandramoul, and A. Mcarell, User profles for personalzed nformaton access, The Adaptve Web, pp , [2] F. L, Y. L, Y. Wu, K. Zhou, F. L, X. Wang and B. Lu, Combnng browsng behavor and page contents for fndng user nterests wll be publshed n the Autonomous Systems -- Self-Organsaton, Management, and Control edted by Mahr & Sheng, publshed by Sprnger Verlag [3] H. Zhuge, Chna's E-Scence Knowledge Grd Envronment, IEEE Intellgent Systems, vol. 19, no. 1, pp , [4] Z. Dong and Q. Dong. (2000) HowNet. [Onlne]. Avalable: [5] Z. Ca, H. T. Geng, X. Zha and Q. S. Ca, An algorthm for conceptual clusterng of Chnese text, n Proc. the 3rd Internatonal Conference on Machne Learnng and Cybernetcs, Shangha, 2004, pp [6] I. S. Dhllon and D. S. Modha, Concept decompostons for large sparse text data usng clusterng, Machne Learnng, vol. 42, no. 1, pp , [7] J. A. Lozan J. M. Pena, and P. Larranage, An emprcal comparson of four ntalzaton methods for the kmeans algorthm, Pattern Recognton Letters, vol. 20, no. 10, pp , [8] Q. Lu and S. J. L, Word smlarty computng based on How-net, n Computatonal Lngustcs and Chnese Language Processng, vol. 7, no. 2, Shangha, 2002, pp [9] H. R. Km, and P. K. Chan, Learnng mplct user nterest herarchy for context n personalzaton, Appled Intellgence, [10] D. Godoy and A. Amand, Modelng user nterests by conceptual clusterng, Informaton Systems, vol. 31, no. 4, pp , [11] X. C. N, X. Y. Wu, and Y. Yu, Automatc dentfcaton of Chnese weblogger s nterests based on text classfcaton, n Proc. the 2006 IEEE/WIC/ACM Internatonal Conference on Web Intellgence. Hong Kong, Chna: IEEE, 2006, pp [12] A. Mccallum and K. Ngam, A comparson of Event Models for Naïve Byaes Text Classfcaton, In Proc. the AAAI-98 Workshop on Learnng for Text Categorzaton, 1998, pp [13] T. Joachms, Text categorzaton wth support vector machnes: learnng wth many relevant features, In Proc. of ECML-98, 1998, pp [14] T. Joachms, A Probablstc Analyss of the Roccho Algorthm wth TFIDF for Text Categorzaton, In Proc. of ICML-97, 1997, pp [15] SOHU ste,

8 [16] C. C. Chen, M. C. Chen, and Y. Sun, PVA: A Self-Adaptve Personal Vew Agent System, Journal of Intellgent Informaton Systems, vol. 18, no. 2, pp , [17] Yam search engne, [18] J. Trajkova and S. Gauch, Improvng ontology-based user profles, n Proc. of the RIAO conference, Vaucluse, France, 2004, pp [19] Open Drectory Project, [20] F. Lu, C. Yu, and W. Meng, Personalzed web search by mappng user queres to categores, n Proc. the 11th Internatonal Conference on Informaton and Knowledge Management, Mclean, Vrgna, 2002, pp [21] A. Seg, B. Mobasher, and R. Burke, Inferrng users nformaton context: Integratng user profles and concept herarches, n 2004 Meetng of the Internatonal Federaton of Classfcaton Socetes, Chcag 2004.

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Alignment Results of SOBOM for OAEI 2010

Alignment Results of SOBOM for OAEI 2010 Algnment Results of SOBOM for OAEI 2010 Pegang Xu, Yadong Wang, Lang Cheng, Tany Zang School of Computer Scence and Technology Harbn Insttute of Technology, Harbn, Chna pegang.xu@gmal.com, ydwang@ht.edu.cn,

More information

Deep Classification in Large-scale Text Hierarchies

Deep Classification in Large-scale Text Hierarchies Deep Classfcaton n Large-scale Text Herarches Gu-Rong Xue Dkan Xng Qang Yang 2 Yong Yu Dept. of Computer Scence and Engneerng Shangha Jao-Tong Unversty {grxue, dkxng, yyu}@apex.sjtu.edu.cn 2 Hong Kong

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

A Resources Virtualization Approach Supporting Uniform Access to Heterogeneous Grid Resources 1

A Resources Virtualization Approach Supporting Uniform Access to Heterogeneous Grid Resources 1 A Resources Vrtualzaton Approach Supportng Unform Access to Heterogeneous Grd Resources 1 Cunhao Fang 1, Yaoxue Zhang 2, Song Cao 3 1 Tsnghua Natonal Labatory of Inforamaton Scence and Technology 2 Department

More information

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc. [Type text] [Type text] [Type text] ISSN : 0974-74 Volume 0 Issue BoTechnology 04 An Indan Journal FULL PAPER BTAIJ 0() 04 [684-689] Revew on Chna s sports ndustry fnancng market based on market -orented

More information

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan

More information

Available online at Available online at Advanced in Control Engineering and Information Science

Available online at   Available online at   Advanced in Control Engineering and Information Science Avalable onlne at wwwscencedrectcom Avalable onlne at wwwscencedrectcom Proceda Proceda Engneerng Engneerng 00 (2011) 15000 000 (2011) 1642 1646 Proceda Engneerng wwwelsevercom/locate/proceda Advanced

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

UB at GeoCLEF Department of Geography Abstract

UB at GeoCLEF Department of Geography   Abstract UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

Ontology Generator from Relational Database Based on Jena

Ontology Generator from Relational Database Based on Jena Computer and Informaton Scence Vol. 3, No. 2; May 2010 Ontology Generator from Relatonal Database Based on Jena Shufeng Zhou (Correspondng author) College of Mathematcs Scence, Laocheng Unversty No.34

More information

A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION

A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION 1 THE PUBLISHING HOUSE PROCEEDINGS OF THE ROMANIAN ACADEMY, Seres A, OF THE ROMANIAN ACADEMY Volume 4, Number 2/2003, pp.000-000 A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION Tudor BARBU Insttute

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

CSCI 5417 Information Retrieval Systems Jim Martin!

CSCI 5417 Information Retrieval Systems Jim Martin! CSCI 5417 Informaton Retreval Systems Jm Martn! Lecture 11 9/29/2011 Today 9/29 Classfcaton Naïve Bayes classfcaton Ungram LM 1 Where we are... Bascs of ad hoc retreval Indexng Term weghtng/scorng Cosne

More information

Pruning Training Corpus to Speedup Text Classification 1

Pruning Training Corpus to Speedup Text Classification 1 Prunng Tranng Corpus to Speedup Text Classfcaton Jhong Guan and Shugeng Zhou School of Computer Scence, Wuhan Unversty, Wuhan, 430079, Chna hguan@wtusm.edu.cn State Key Lab of Software Engneerng, Wuhan

More information

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines (IJCSIS) Internatonal Journal of Computer Scence and Informaton Securty, Herarchcal Web Page Classfcaton Based on a Topc Model and Neghborng Pages Integraton Wongkot Srura Phayung Meesad Choochart Haruechayasak

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

The Research of Support Vector Machine in Agricultural Data Classification

The Research of Support Vector Machine in Agricultural Data Classification The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou

More information

Deep Classifier: Automatically Categorizing Search Results into Large-Scale Hierarchies

Deep Classifier: Automatically Categorizing Search Results into Large-Scale Hierarchies Deep Classfer: Automatcally Categorzng Search Results nto Large-Scale Herarches Dkan Xng 1, Gu-Rong Xue 1, Qang Yang 2, Yong Yu 1 1 Shangha Jao Tong Unversty, Shangha, Chna {xaobao,grxue,yyu}@apex.sjtu.edu.cn

More information

Experiments in Text Categorization Using Term Selection by Distance to Transition Point

Experiments in Text Categorization Using Term Selection by Distance to Transition Point Experments n Text Categorzaton Usng Term Selecton by Dstance to Transton Pont Edgar Moyotl-Hernández, Héctor Jménez-Salazar Facultad de Cencas de la Computacón, B. Unversdad Autónoma de Puebla, 14 Sur

More information

A Method of Hot Topic Detection in Blogs Using N-gram Model

A Method of Hot Topic Detection in Blogs Using N-gram Model 84 JOURNAL OF SOFTWARE, VOL. 8, NO., JANUARY 203 A Method of Hot Topc Detecton n Blogs Usng N-gram Model Xaodong Wang College of Computer and Informaton Technology, Henan Normal Unversty, Xnxang, Chna

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

Classifying Acoustic Transient Signals Using Artificial Intelligence

Classifying Acoustic Transient Signals Using Artificial Intelligence Classfyng Acoustc Transent Sgnals Usng Artfcal Intellgence Steve Sutton, Unversty of North Carolna At Wlmngton (suttons@charter.net) Greg Huff, Unversty of North Carolna At Wlmngton (jgh7476@uncwl.edu)

More information

Personalized Concept-Based Clustering of Search Engine Queries

Personalized Concept-Based Clustering of Search Engine Queries IEEE TRANSACTIONS ON JOURNAL NAME, MANUSCRIPT ID 1 Personalzed Concept-Based Clusterng of Search Engne Queres Kenneth Wa-Tng Leung, Wlfred Ng, and Dk Lun Lee Abstract The exponental growth of nformaton

More information

Document Representation and Clustering with WordNet Based Similarity Rough Set Model

Document Representation and Clustering with WordNet Based Similarity Rough Set Model IJCSI Internatonal Journal of Computer Scence Issues, Vol. 8, Issue 5, No 3, September 20 ISSN (Onlne): 694-084 www.ijcsi.org Document Representaton and Clusterng wth WordNet Based Smlarty Rough Set Model

More information

A Feature-Weighted Instance-Based Learner for Deep Web Search Interface Identification

A Feature-Weighted Instance-Based Learner for Deep Web Search Interface Identification Research Journal of Appled Scences, Engneerng and Technology 5(4): 1278-1283, 2013 ISSN: 2040-7459; e-issn: 2040-7467 Maxwell Scentfc Organzaton, 2013 Submtted: June 28, 2012 Accepted: August 08, 2012

More information

Virtual Machine Migration based on Trust Measurement of Computer Node

Virtual Machine Migration based on Trust Measurement of Computer Node Appled Mechancs and Materals Onlne: 2014-04-04 ISSN: 1662-7482, Vols. 536-537, pp 678-682 do:10.4028/www.scentfc.net/amm.536-537.678 2014 Trans Tech Publcatons, Swtzerland Vrtual Machne Mgraton based on

More information

SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE

SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE Dorna Purcaru Faculty of Automaton, Computers and Electroncs Unersty of Craoa 13 Al. I. Cuza Street, Craoa RO-1100 ROMANIA E-mal: dpurcaru@electroncs.uc.ro

More information

Online Text Mining System based on M2VSM

Online Text Mining System based on M2VSM FR-E2-1 SCIS & ISIS 2008 Onlne Text Mnng System based on M2VSM Yasufum Takama 1, Takash Okada 1, Toru Ishbash 2 1. Tokyo Metropoltan Unversty, 2. Tokyo Metropoltan Insttute of Technology 6-6 Asahgaoka,

More information

CAN COMPUTERS LEARN FASTER? Seyda Ertekin Computer Science & Engineering The Pennsylvania State University

CAN COMPUTERS LEARN FASTER? Seyda Ertekin Computer Science & Engineering The Pennsylvania State University CAN COMPUTERS LEARN FASTER? Seyda Ertekn Computer Scence & Engneerng The Pennsylvana State Unversty sertekn@cse.psu.edu ABSTRACT Ever snce computers were nvented, manknd wondered whether they mght be made

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Efficient Segmentation and Classification of Remote Sensing Image Using Local Self Similarity

Efficient Segmentation and Classification of Remote Sensing Image Using Local Self Similarity ISSN(Onlne): 2320-9801 ISSN (Prnt): 2320-9798 Internatonal Journal of Innovatve Research n Computer and Communcaton Engneerng (An ISO 3297: 2007 Certfed Organzaton) Vol.2, Specal Issue 1, March 2014 Proceedngs

More information

Utilizing Content to Enhance a Usage-Based Method for Web Recommendation based on Q-Learning

Utilizing Content to Enhance a Usage-Based Method for Web Recommendation based on Q-Learning Proceedngs of the Twenty-Frst Internatonal FLAIS Conference (2008) Utlzng Content to Enhance a Usage-Based Method for Web ecommendaton based on Q-Learnng Nma Taghpour Department of Computer Engneerng Amrkabr

More information

Recommended Items Rating Prediction based on RBF Neural Network Optimized by PSO Algorithm

Recommended Items Rating Prediction based on RBF Neural Network Optimized by PSO Algorithm Recommended Items Ratng Predcton based on RBF Neural Network Optmzed by PSO Algorthm Chengfang Tan, Cayn Wang, Yuln L and Xx Q Abstract In order to mtgate the data sparsty and cold-start problems of recommendaton

More information

CUM: An Efficient Framework for Mining Concept Units

CUM: An Efficient Framework for Mining Concept Units CUM: An Effcent Framework for Mnng Concept Unts P.Santh Thlagam Ananthanarayana V.S Department of Informaton Technology Natonal Insttute of Technology Karnataka - Surathkal Inda 575025 santh_soc@yahoo.co.n,

More information

Recommendations of Personal Web Pages Based on User Navigational Patterns

Recommendations of Personal Web Pages Based on User Navigational Patterns nternatonal Journal of Machne Learnng and Computng, Vol. 4, No. 4, August 2014 Recommendatons of Personal Web Pages Based on User Navgatonal Patterns Yn-Fu Huang and Ja-ang Jhang Abstract n ths paper,

More information

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION SHI-LIANG SUN, HONG-LEI SHI Department of Computer Scence and Technology, East Chna Normal Unversty 500 Dongchuan Road, Shangha 200241, P. R. Chna E-MAIL: slsun@cs.ecnu.edu.cn,

More information

Keyword-based Document Clustering

Keyword-based Document Clustering Keyword-based ocument lusterng Seung-Shk Kang School of omputer Scence Kookmn Unversty & AIrc hungnung-dong Songbuk-gu Seoul 36-72 Korea sskang@kookmn.ac.kr Abstract ocument clusterng s an aggregaton of

More information

The Effect of Similarity Measures on The Quality of Query Clusters

The Effect of Similarity Measures on The Quality of Query Clusters The effect of smlarty measures on the qualty of query clusters. Fu. L., Goh, D.H., Foo, S., & Na, J.C. (2004). Journal of Informaton Scence, 30(5) 396-407 The Effect of Smlarty Measures on The Qualty of

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION Paulo Quntlano 1 & Antono Santa-Rosa 1 Federal Polce Department, Brasla, Brazl. E-mals: quntlano.pqs@dpf.gov.br and

More information

A User Selection Method in Advertising System

A User Selection Method in Advertising System Int. J. Communcatons, etwork and System Scences, 2010, 3, 54-58 do:10.4236/jcns.2010.31007 Publshed Onlne January 2010 (http://www.scrp.org/journal/jcns/). A User Selecton Method n Advertsng System Shy

More information

A Web Site Classification Approach Based On Its Topological Structure

A Web Site Classification Approach Based On Its Topological Structure Internatonal Journal on Asan Language Processng 20 (2):75-86 75 A Web Ste Classfcaton Approach Based On Its Topologcal Structure J-bn Zhang,Zh-mng Xu,Kun-l Xu,Q-shu Pan School of Computer scence and Technology,Harbn

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Impact of a New Attribute Extraction Algorithm on Web Page Classification

Impact of a New Attribute Extraction Algorithm on Web Page Classification Impact of a New Attrbute Extracton Algorthm on Web Page Classfcaton Gösel Brc, Banu Dr, Yldz Techncal Unversty, Computer Engneerng Department Abstract Ths paper ntroduces a new algorthm for dmensonalty

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

Face Recognition Based on SVM and 2DPCA

Face Recognition Based on SVM and 2DPCA Vol. 4, o. 3, September, 2011 Face Recognton Based on SVM and 2DPCA Tha Hoang Le, Len Bu Faculty of Informaton Technology, HCMC Unversty of Scence Faculty of Informaton Scences and Engneerng, Unversty

More information

An Intelligent Context Interpreter based on XML Schema Mapping

An Intelligent Context Interpreter based on XML Schema Mapping An Intellgent Context Interpreter based on XML Schema Mappng Been-Chan Chen Dept. of Computer Scence and Informaton Engneerng Natonal Unversty of Tanan, Tanan, Tawan, R. O. C. e-mal: bcchen@mal.nutn.edu.tw

More information

Issues and Empirical Results for Improving Text Classification

Issues and Empirical Results for Improving Text Classification Issues and Emprcal Results for Improvng Text Classfcaton Youngoong Ko 1 and Jungyun Seo 2 1 Dept. of Computer Engneerng, Dong-A Unversty, 840 Hadan 2-dong, Saha-gu, Busan, 604-714, Korea yko@dau.ac.kr

More information

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach Angle Estmaton and Correcton of Hand Wrtten, Textual and Large areas of Non-Textual Document Images: A Novel Approach D.R.Ramesh Babu Pyush M Kumat Mahesh D Dhannawat PES Insttute of Technology Research

More information

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data Malaysan Journal of Mathematcal Scences 11(S) Aprl : 35 46 (2017) Specal Issue: The 2nd Internatonal Conference and Workshop on Mathematcal Analyss (ICWOMA 2016) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

Network Intrusion Detection Based on PSO-SVM

Network Intrusion Detection Based on PSO-SVM TELKOMNIKA Indonesan Journal of Electrcal Engneerng Vol.1, No., February 014, pp. 150 ~ 1508 DOI: http://dx.do.org/10.11591/telkomnka.v1.386 150 Network Intruson Detecton Based on PSO-SVM Changsheng Xang*

More information

LinkSelector: A Web Mining Approach to. Hyperlink Selection for Web Portals

LinkSelector: A Web Mining Approach to. Hyperlink Selection for Web Portals nkselector: A Web Mnng Approach to Hyperlnk Selecton for Web Portals Xao Fang and Olva R. u Sheng Department of Management Informaton Systems Unversty of Arzona, AZ 8572 {xfang,sheng}@bpa.arzona.edu Submtted

More information

Unsupervised Learning

Unsupervised Learning Pattern Recognton Lecture 8 Outlne Introducton Unsupervsed Learnng Parametrc VS Non-Parametrc Approach Mxture of Denstes Maxmum-Lkelhood Estmates Clusterng Prof. Danel Yeung School of Computer Scence and

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

Cross-lingual Pseudo Relevance Feedback Based on Weak Relevant Topic Alignment

Cross-lingual Pseudo Relevance Feedback Based on Weak Relevant Topic Alignment Cross-lngual Pseudo Relevance Feedback Based on Weak Relevant opc Algnment WANG Xu-wen Insttute of Medcal Informaton & Lbrary, Chnese Academy of Medcal Scences, Beng 100020 wang.xuwen@mcams.ac.cn ZHANG

More information

Enhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques

Enhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques Enhancement of Infrequent Purchased Product Recommendaton Usng Data Mnng Technques Noraswalza Abdullah, Yue Xu, Shlomo Geva, and Mark Loo Dscplne of Computer Scence Faculty of Scence and Technology Queensland

More information

LRD: Latent Relation Discovery for Vector Space Expansion and Information Retrieval

LRD: Latent Relation Discovery for Vector Space Expansion and Information Retrieval LRD: Latent Relaton Dscovery for Vector Space Expanson and Informaton Retreval Techncal Report KMI-06-09 March, 006 Alexandre Gonçalves, Janhan Zhu, Dawe Song, Vctora Uren, Roberto Pacheco In Proc. of

More information

A Hybrid Re-ranking Method for Entity Recognition and Linking in Search Queries

A Hybrid Re-ranking Method for Entity Recognition and Linking in Search Queries A Hybrd Re-rankng Method for Entty Recognton and Lnkng n Search Queres Gongbo Tang 1,2, Yutng Guo 2, Dong Yu 1,2(), and Endong Xun 1,2 1 Insttute of Bg Data and Language Educaton, Bejng Language and Culture

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

An Iterative Implicit Feedback Approach to Personalized Search

An Iterative Implicit Feedback Approach to Personalized Search An Iteratve Implct Feedback Approach to Personalzed Search Yuanhua Lv 1, Le Sun 2, Junln Zhang 2, Jan-Yun Ne 3, Wan Chen 4, and We Zhang 2 1, 2 Insttute of Software, Chnese Academy of Scences, Beng, 100080,

More information

Application of k-nn Classifier to Categorizing French Financial News

Application of k-nn Classifier to Categorizing French Financial News Applcaton of k-nn Classfer to Categorzng French Fnancal News Huazhong KOU, Georges GARDARIN 2, Alan D'heygère 2, Karne Zetoun PRSM Laboratory, Unversty of Versalles Sant-Quentn 45 Etats-Uns Road, 78035

More information

Hierarchical clustering for gene expression data analysis

Hierarchical clustering for gene expression data analysis Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally

More information

Design of Structure Optimization with APDL

Design of Structure Optimization with APDL Desgn of Structure Optmzaton wth APDL Yanyun School of Cvl Engneerng and Archtecture, East Chna Jaotong Unversty Nanchang 330013 Chna Abstract In ths paper, the desgn process of structure optmzaton wth

More information

I. INTRODUCTION II. RELATED STUDY

I. INTRODUCTION II. RELATED STUDY A Context-Aware based Dynamc User Preference Profle Constructon Method JeeHyun Km Qan Gao Young Im Cho Department of Computer Software School of nformaton College of Informaton Technology Seol Unversty

More information

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval Proceedngs of the Thrd NTCIR Workshop Descrpton of NTU Approach to NTCIR3 Multlngual Informaton Retreval Wen-Cheng Ln and Hsn-Hs Chen Department of Computer Scence and Informaton Engneerng Natonal Tawan

More information

Web Document Classification Based on Fuzzy Association

Web Document Classification Based on Fuzzy Association Web Document Classfcaton Based on Fuzzy Assocaton Choochart Haruechayasa, Me-Lng Shyu Department of Electrcal and Computer Engneerng Unversty of Mam Coral Gables, FL 33124, USA charuech@mam.edu, shyu@mam.edu

More information

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between

More information

Feature-Based Matrix Factorization

Feature-Based Matrix Factorization Feature-Based Matrx Factorzaton arxv:1109.2271v3 [cs.ai] 29 Dec 2011 Tanq Chen, Zhao Zheng, Quxa Lu, Wenan Zhang, Yong Yu {tqchen,zhengzhao,luquxa,wnzhang,yyu}@apex.stu.edu.cn Apex Data & Knowledge Management

More information

Related-Mode Attacks on CTR Encryption Mode

Related-Mode Attacks on CTR Encryption Mode Internatonal Journal of Network Securty, Vol.4, No.3, PP.282 287, May 2007 282 Related-Mode Attacks on CTR Encrypton Mode Dayn Wang, Dongda Ln, and Wenlng Wu (Correspondng author: Dayn Wang) Key Laboratory

More information

A Novel Term_Class Relevance Measure for Text Categorization

A Novel Term_Class Relevance Measure for Text Categorization A Novel Term_Class Relevance Measure for Text Categorzaton D S Guru, Mahamad Suhl Department of Studes n Computer Scence, Unversty of Mysore, Mysore, Inda Abstract: In ths paper, we ntroduce a new measure

More information

Audio Content Classification Method Research Based on Two-step Strategy

Audio Content Classification Method Research Based on Two-step Strategy (IJACSA) Internatonal Journal of Advanced Computer Scence and Applcatons, Audo Content Classfcaton Method Research Based on Two-step Strategy Sume Lang Department of Computer Scence and Technology Chongqng

More information

Recommender System based on Higherorder Logic Data Representation

Recommender System based on Higherorder Logic Data Representation Recommender System based on Hgherorder Logc Data Representaton Lnna L Bngru Yang Zhuo Chen School of Informaton Engneerng, Unversty Of Scence and Technology Bejng Abstract Collaboratve Flterng help users

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

An Image Fusion Approach Based on Segmentation Region

An Image Fusion Approach Based on Segmentation Region Rong Wang, L-Qun Gao, Shu Yang, Yu-Hua Cha, and Yan-Chun Lu An Image Fuson Approach Based On Segmentaton Regon An Image Fuson Approach Based on Segmentaton Regon Rong Wang, L-Qun Gao, Shu Yang 3, Yu-Hua

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

Visual Thesaurus for Color Image Retrieval using Self-Organizing Maps

Visual Thesaurus for Color Image Retrieval using Self-Organizing Maps Vsual Thesaurus for Color Image Retreval usng Self-Organzng Maps Chrstopher C. Yang and Mlo K. Yp Department of System Engneerng and Engneerng Management The Chnese Unversty of Hong Kong, Hong Kong ABSTRACT

More information

Syntactic Tree-based Relation Extraction Using a Generalization of Collins and Duffy Convolution Tree Kernel

Syntactic Tree-based Relation Extraction Using a Generalization of Collins and Duffy Convolution Tree Kernel Syntactc Tree-based Relaton Extracton Usng a Generalzaton of Collns and Duffy Convoluton Tree Kernel Mahdy Khayyaman Seyed Abolghasem Hassan Abolhassan Mrroshandel Sharf Unversty of Technology Sharf Unversty

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Semantic Image Retrieval Using Region Based Inverted File

Semantic Image Retrieval Using Region Based Inverted File Semantc Image Retreval Usng Regon Based Inverted Fle Dengsheng Zhang, Md Monrul Islam, Guoun Lu and Jn Hou 2 Gppsland School of Informaton Technology, Monash Unversty Churchll, VIC 3842, Australa E-mal:

More information

A Method of Query Expansion Based on Event Ontology

A Method of Query Expansion Based on Event Ontology A Method of Query Expanson Based on Event Ontology Zhaoman Zhong, Cunhua L, Yan Guan, Zongtan Lu A Method of Query Expanson Based on Event Ontology 1 Zhaoman Zhong, 1 Cunhua L, 1 Yan Guan, 2 Zongtan Lu,

More information

Machine Learning 9. week

Machine Learning 9. week Machne Learnng 9. week Mappng Concept Radal Bass Functons (RBF) RBF Networks 1 Mappng It s probably the best scenaro for the classfcaton of two dataset s to separate them lnearly. As you see n the below

More information

Intelligent Information Acquisition for Improved Clustering

Intelligent Information Acquisition for Improved Clustering Intellgent Informaton Acquston for Improved Clusterng Duy Vu Unversty of Texas at Austn duyvu@cs.utexas.edu Mkhal Blenko Mcrosoft Research mblenko@mcrosoft.com Prem Melvlle IBM T.J. Watson Research Center

More information

Classic Term Weighting Technique for Mining Web Content Outliers

Classic Term Weighting Technique for Mining Web Content Outliers Internatonal Conference on Computatonal Technques and Artfcal Intellgence (ICCTAI'2012) Penang, Malaysa Classc Term Weghtng Technque for Mnng Web Content Outlers W.R. Wan Zulkfel, N. Mustapha, and A. Mustapha

More information

Fuzzy C-Means Initialized by Fixed Threshold Clustering for Improving Image Retrieval

Fuzzy C-Means Initialized by Fixed Threshold Clustering for Improving Image Retrieval Fuzzy -Means Intalzed by Fxed Threshold lusterng for Improvng Image Retreval NAWARA HANSIRI, SIRIPORN SUPRATID,HOM KIMPAN 3 Faculty of Informaton Technology Rangst Unversty Muang-Ake, Paholyotn Road, Patumtan,

More information

Combining Multiple Resources, Evidence and Criteria for Genomic Information Retrieval

Combining Multiple Resources, Evidence and Criteria for Genomic Information Retrieval Combnng Multple Resources, Evdence and Crtera for Genomc Informaton Retreval Luo S 1, Je Lu 2 and Jame Callan 2 1 Department of Computer Scence, Purdue Unversty, West Lafayette, IN 47907, USA ls@cs.purdue.edu

More information

Automatic Text Categorization of Mathematical Word Problems

Automatic Text Categorization of Mathematical Word Problems Automatc Text Categorzaton of Mathematcal Word Problems Suleyman Cetntas 1, Luo S 2, Yan Png Xn 3, Dake Zhang 3, Joo Young Park 3 1,2 Department of Computer Scence, 2 Department of Statstcs, 3 Department

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

Kernel Collaborative Representation Classification Based on Adaptive Dictionary Learning

Kernel Collaborative Representation Classification Based on Adaptive Dictionary Learning Internatonal Journal of Intellgent Informaton Systems 2018; 7(2): 15-22 http://www.scencepublshnggroup.com/j/js do: 10.11648/j.js.20180702.11 ISSN: 2328-7675 (Prnt); ISSN: 2328-7683 (Onlne) Kernel Collaboratve

More information

A New Approach For the Ranking of Fuzzy Sets With Different Heights

A New Approach For the Ranking of Fuzzy Sets With Different Heights New pproach For the ankng of Fuzzy Sets Wth Dfferent Heghts Pushpnder Sngh School of Mathematcs Computer pplcatons Thapar Unversty, Patala-7 00 Inda pushpndersnl@gmalcom STCT ankng of fuzzy sets plays

More information

User Tweets based Genre Prediction and Movie Recommendation using LSI and SVD

User Tweets based Genre Prediction and Movie Recommendation using LSI and SVD User Tweets based Genre Predcton and Move Recommendaton usng LSI and SVD Saksh Bansal, Chetna Gupta Department of CSE/IT Jaypee Insttute of Informaton Technology,sec-62 Noda, Inda sakshbansal76@gmal.com,

More information

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide Lobachevsky State Unversty of Nzhn Novgorod Polyhedron Quck Start Gude Nzhn Novgorod 2016 Contents Specfcaton of Polyhedron software... 3 Theoretcal background... 4 1. Interface of Polyhedron... 6 1.1.

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information