Data Mining Approaches to User Modeling for Adaptive Hypermedia: Survey and Future Directions

Size: px
Start display at page:

Download "Data Mining Approaches to User Modeling for Adaptive Hypermedia: Survey and Future Directions"

Transcription

1 > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < Data Mnng Approaches to User Modelng for Adaptve Hypermeda: Survey and Future Drectons Enrque Fras-Martnez, Sherry Y. Chen and Xaohu Lu Abstract The ablty of an adaptve hypermeda system to create talored envronments depends manly on the amount and accuracy of nformaton stored n each user model. Some of the dffcultes that user modelng faces are the amount of data avalable to create user models, the adequacy of the data, the nose wthn that data and the necessty of capturng the mprecse nature of human behavor. Data mnng and machne learnng technques have the ablty to handle large amounts of data and to process uncertanty. These characterstcs make these technques sutable for automatc generaton of user models that smulate human decson-makng. Ths paper surveys dfferent data mnng technques that can be used to effcently and accurately capture user behavor. The paper also presents gudelnes that show whch technques may be used more effcently accordng to the task mplemented by the applcaton. Index Terms Adaptve Hypermeda, Data Mnng, Machne Learnng, User Modelng. I. INTRODUCTION Adaptve hypermeda can be defned as the technology that allows to personalze for each ndvdual user of a hypermeda applcaton the content and presentaton of the applcaton accordng to user preferences and characterstcs (Perkowtz et al., 998) (Perkowtz et al., 999) (Perkowtz et al., 2000). The process of personalzaton s defned as the ways n whch nformaton and servces can be talored to match the unque and specfc needs of an ndvdual or a communty (Callan et al., 200). Personalzaton s about buldng customer loyalty by developng a meanngful one-to-one relatonshp; by understandng the needs of each ndvdual and helpng satsfy a goal that effcently and knowledgeably addresses each ndvdual s need n a gven context (Recken, 2000). The more nformaton a user model has, the better the content and presentaton wll be personalzed. A user model s created trough a user modelng process n whch unobservable nformaton about a user s nferred from observable nformaton from that user; for example, usng the nteractons wth the system (Zukerman, Albrecht & Ncholson, 999). Ths work was supported n part by the UK Arts and Humantes Research Board (AHRB) under Grant MRG/AN983/APN6300. E. Fras-Martnez, S. Chen and X. Lu are wth Department of Informaton Systems & Computng, Brunel Unversty, Uxbrdge, Mddlesex UB8 3PH, Unted Kngdom ; phone: +44 (0) ; fax: +44 (0) ; e- mal: {enrque.fras-martnez, sherry.chen, xaohu.lu}@brunel.ac.uk. User models can be created usng a user-guded approach, n whch the models are drectly created usng the nformaton provded by each user, or an automatc approach, n whch the process of creatng a user model s controlled by the system and s hdden from the user. The user-guded approach produces adaptable servces and adaptable user-models (Fnk et al., 997), whle the automatc approach produces adaptve servces and adaptve user models (Fnk et al., 997) (Bruslovsky et al., 997). In general, a user model wll contan some adaptve and some adaptable elements. Ideally the set of adaptable elements should be reduced to the mnmum possble (elements lke age, gender, favorte background color, etc.), whle other elements (favorte topcs, patterns of behavor, etc.) should be created by a learnng process. These concepts have also been presented n the lterature as mplct and explct user modelng acquston (Quroga et al., 999). The problem of user modelng can be mplemented usng an automatc approach because a typcal user exhbts patterns when accessng a hypermeda system and the set of nteractons contanng those patterns can be stored n a log database n the server. In ths context, machne learnng and data mnng technques can be appled to recognze regulartes n user trals and to ntegrate them as part of the user model. Machne learnng encompasses technques where a machne acqures/learns knowledge from ts prevous experence (Wtten & Frank, 999). The output of a machne learnng technque s a structural descrpton of what has been learned that can be used to explan the orgnal data and to make predctons. From ths perspectve, data mnng and other machne learnng technques make t possble to automatcally create user models for the mplementaton of adaptve hypermeda servces. These technques make t possble to create user models n an envronment, such as a hypermeda applcaton, n whch, usually, users are not wllng to gve feedback of ther actons. Ths fact leads researchers to consder the dfferent technques that can be used to capture and model user behavor and whch elements of the behavor of a user each one of those technques can capture. In general the adaptve servce that s gong to be mplemented determnes the nformaton that a user model should contan. Once that s known, and takng nto account the characterstcs of the avalable data, dfferent data mnng/machne learnng

2 > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 2 technques can be chosen to capture the necessary patterns. Ths paper has surveyed the development of user models usng data mnng and machne learnng technques from 999 to 2004, focusng on the man ournals and conferences for user modelng, manly: User Modelng and User-Adapted Interacton, Internatonal Conference on User Modelng, IEEE Transactons on Neural Networks, Workshop of Intellgent Technques for Web Personalzaton (part of IJCAI Internatonal Jont Conference of Artfcal Intellgence) and Internatonal Workshop on Knowledge Dscovery on the WEB (WEBKDD, part of the ACM SIGKDD Internatonal Conference on Knowledge Dscovery and Data Mnng). The paper s ntentons are () to gve an up-to-date vew of data mnng technques appled to User Modelng and hghlght ther potental advantages and lmtatons, and (2) to gve basc gudelnes about whch technques can be useful for a gven adaptve applcaton. The paper complements the results of (Zuckerman et al., 200) that revewed the use of predctve statstcal models for user modelng. The organzaton of the paper s as follows. The paper frst defnes the concept of user model, ts relevance for adaptve hypermeda and the basc steps for the automatc creaton of user models. After that a set of unsupervsed and supervsed technques are presented. For each technque we present ts theoretcal background, ts pros and cons and ts applcatons n the feld of user modelng. Next we develop a gudelne on how to create a user model accordng to the needs of the adaptve hypermeda applcaton that s gong to be mplemented. The conclusons secton closes the paper. II. USER MODELING (UM) A user model should capture the behavor (patterns, goals, nterestng topcs, etc.) a user shows when nteractng wth the Web. A user model s defned as a set of nformaton structures desgned to represent one or more of the followng elements (Kobsa, 200): () representaton of goals, plans, preferences, tasks and/or abltes about one or more types of users; (2) representaton of relevant common characterstcs of users pertanng to specfc user subgroups or stereotypes; (3) the classfcaton of a user n one or more of these subgroups or stereotypes; (4) the recordng of user behavor; (5) the formaton of assumptons about the user based on the nteracton hstory and/or (6) the generalzaton of the nteracton hstores of many users nto groups. A. User Modelng and Adaptve Hypermeda Adaptve hypermeda allows to personalze to each ndvdual the content and presentaton of a web ste. The archtecture of an adaptve hypermeda system s usually dvded n two parts: the server sde and the clent sde. The server sde generates the user models from a data base contanng the nteractons of the users wth the system and the personal data/preferences that each user has gven to the system. Also, user models can contan knowledge ntroduced by the desgner (n the form of rules for example). These user models, n combnaton wth a hypermeda database, are used by the Decson Makng and Personalzaton Engne module Fg.. Generc Archtecture of an Adaptve Hypermeda Applcaton. to dentfy user needs, decde on the types of adaptaton to be performed and communcate them to an adaptve nterface. Fg. presents the archtecture of a generc AH system. A personalzed hypermeda system, by ts very nature, should respond n real tme to user nputs. To do so, the archtecture of the system should provde a quck access to the rght nformaton at the rght tme. Adaptve hypermeda uses the knowledge gven by user models to mplement one or more of the two basc types of adaptve tasks: Recommendaton (R). Recommendaton s the capablty of suggestng nterestng elements to a user based on some nformaton; for example from the tems to be recommended or from the behavor of other users. Recommendaton s also known n the lterature as collaboratve flterng. Classfcaton (C). Classfcaton bulds a model that maps or classfes data tems nto one of several predefned classes. Classfcaton s done usng only data related to that partcular tem. Ths knowledge can be used to talor the servces of each user stereotype (class). In the lterature, classfcaton has also been presented as content-based flterng. Both recommendaton and classfcaton are types of flterng applcatons. B. Automatc Generaton of User Models One of the processes presented n Fg. s the automatc generaton of user models from the nteracton data between the users and the system (done by the UM Generaton module). Fg. 2 presents the basc steps of ths module: () Data Collecton, (2) Preprocessng, (3) Pattern Dscovery and (4) Valdaton and Interpretaton: Data Collecton. In ths stage user data s gathered. For automatc user modellng the data collected ncludes: data regardng the nteracton between the user and the Web, data regardng the envronment of the user when nteractng wth the Web, drect feedback gven by the user, etc. Data Preprocessng/Informaton Extracton. The nformaton obtaned n the prevous stage cannot be drectly processed. It needs to be cleaned from nose and nconsstences n order to be used as the nput of the next

3 > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 3 phase. For user modellng, ths nvolves manly user dentfcaton and sesson reconstructon. Ths stage s amed at obtanng, from the data avalable, the semantc content about the user nteracton wth the system. Also n ths phase the data extracted should be adapted to the data structure used by standard pattern dscovery algorthms used n the next step. Pattern Dscovery. In ths phase, machne learnng and data mnng technques are appled to the data obtaned n the prevous stage n order to capture user behavour. The output of ths stage s a set of structural descrptons of what have been learned about user behavour and user nterests. These descrptons consttute the base of a user model. Dfferent technques wll capture dfferent user propertes and wll express them n dfferent ways. The knowledge needed to mplement an adaptve servce wll determne, among other factors, whch technques to apply n ths phase. Valdaton and Interpretaton. In ths phase the structures obtaned n the pattern dscovery stage are analyzed and nterpreted. The patterns dscovered can be nterpreted and valdated, usng doman knowledge and vsualzaton tools, n order to test the mportance and usablty of the knowledge obtaned. In general ths process s done wth the help of a user modellng desgner. User model heurstcs are used n each step of the process to extract, adapt and present the knowledge n a relevant way. The process of generaton of user models usng data mnng/machne learnng technques can be seen as a standard process of extractng knowledge from data where user modellng s used as a wrapper for the entre process. C. Data Mnng and ts Relevance to User Modelng As t has been presented n Fg. 2, the phase of Pattern Dscovery fnds out relevant nformaton about the behavor of a user (or set of users) when nteractng wth the Web. Data mnng and machne learnng technques are deal for that process because they are desgned to represent what have been learned from the nput data wth a structural representaton. Ths representaton stores the knowledge needed to mplement the two types of tasks prevously descrbed. Although the applcaton of machne learnng and data mnng technques work really well for modelng user behavor, t also faces some problems. As stated n (Webb et al., 200), among those challenges s the problem of needng large data sets, the problem of labelng data for supervsed machne leanng technques and the problem of computatonal complexty. Each data mnng/machne learnng technque wll capture dfferent relatonshps among the data avalable and wll express the results usng dfferent data structures. The key queston s to fnd out whch patterns need to be captured n order to mplement an adaptve servce. It s mportant, n order to choose a sutable learnng method, to know what knowledge s captured by each technque and how that knowledge can be used to mplement the two basc tasks. Also, the choce of learnng method depends largely on the type of tranng data avalable. The man dstncton n Fg. 2. Steps for Automatc Generaton of User Models. machne learnng research s between supervsed and unsupervsed learnng. Supervsed learnng requres the tranng data to be preclassfed. Ths means that each tranng tem s assgned a unque label, sgnfyng the class to whch the tem belongs. Gven these data, the learnng algorthm bulds a characterstc descrpton for each class, coverng the examples of ths class. The mportant feature of ths approach s that the class descrptons are bult condtonal to the preclassfcaton of the examples n the tranng set. In contrast, unsupervsed learnng methods do not requre preclassfcaton of the tranng examples. These methods form clusters of examples, whch share common characterstcs. The man dfference to supervsed learnng s that categores are not known n advance, but constructed by the learner tself. When the coheson of a cluster s hgh,.e., the examples n t are smlar, t defnes a new class. The rest of the paper presents how data mnng and machne learnng technques have been used for user modelng: whch knowledge can be captured wth each technque, examples of applcatons and ts lmts and strengths. The technques presented are dvded nto two groups: unsupervsed, whch ncludes clusterng (herarchcal, non-herarchcal and fuzzy clusterng) and assocaton rules, and supervsed, whch ncludes Decson trees/classfcaton rules, k-nearest neghbor (k-nn), Neural Networks and Support Vector Machnes (SVM). III. UNSUPERVISED APPROACHES TO USER MODELING The man unsupervsed technques are clusterng and assocaton rules. Clusterng comprses a wde varety of dfferent technques based n the same concept. A collecton of dfferent clusterng technques and ts varatons can be found n (Jan et al., 988). A. Clusterng for User Modelng The task of clusterng s to structure a gven set of unclassfed nstances (data vectors) by creatng concepts, based on smlartes found on the tranng data. A clusterng algorthm fnds the set of concepts that cover all examples verfyng that: () the smlarty between examples of the same concepts s maxmzed, and (2) the smlarty between examples of dfferent concepts s

4 > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 4 mnmzed. In a cluster algorthm the key element s how to obtan the smlarty between two tems of the tranng set. Clusterng technques can be classfed n hard clusterng and fuzzy clusterng. In non-fuzzy or hard clusterng, data s dvded nto crsp clusters, where each data pont belongs to exactly one cluster. In fuzzy clusterng, the data ponts can belong to more than one cluster, and assocated wth each of the nstances are membershp grades whch ndcate the degree to whch they belong to the dfferent clusters. Hard clusterng technques may be grouped nto two categores: herarchcal and non-herarchcal (Jan, 999). A herarchcal clusterng procedure nvolves the constructon of a herarchy or tree-lke structure, whch s bascally a nested sequence of parttons, whle non-herarchcal or parttonal procedures end up wth a partcular number of clusters at a sngle step. ) Basc Algorthms: Non-herarchcal Technques The man non-herarchcal clusterng technques are: () k- means clusterng and (2) Self-Organzng Maps (SOM). a) K-means Clusterng The k-means clusterng technque (MacQueen, 967) s gven as an nput the number of clusters k. The algorthm then pcks k tems, called seeds, from the tranng set n an arbtrary way. Then, n each teraton, each nput tem s assgned to the most smlar seed, and the seed of each cluster s recalculated to be the centrod of all tems assgned to that seed. Ths process s repeated untl the seed coordnates stablze. Ths algorthm ams at mnmzng an obectve functon, J, typcally a squared error functon: k n J = d = = k n = x = = ( ) c where d s the dstance measure between a data pont x and the cluster centre c. J s an ndcator of the dstance of the n data ponts from ther respectve cluster centres and t represents the compactness of the clusters created. Although t can be proved that the procedure wll always termnate, the k-means algorthm does not necessarly fnd the most optmal confguraton, correspondng to the global obectve functon mnmum. The K-means algorthm s popular because t s easy to understand and easy to mplement. Its man drawback s that t can take a consderable tme f the number of patterns nvolved s very large and the number of clusters s substantal. Another problem s that t s senstve to the ntal partton the selecton of the ntal patterns, and may converge to a local mnmum of the crteron functon value f the ntal partton s not properly chosen. A possble remedy s to run the algorthm wth a number of dfferent ntal parttons. If they all lead to the same fnal partton, ths mples that the global mnmum of the square error has been acheved. However, ths can be tme-consumng, and may not always work. 2 () b) Self Organzng Maps (SOM) The self-organsng map (SOM) algorthm was proposed by Teuvo Kohonen n 98 (Kohonen, 997). Apart from beng used n a wde varety of felds, the SOM offers an nterestng tool for exploratory data analyss, partcularly for parttonal clusterng and vsualzaton. It s capable of representng hghdmensonal data n a low dmensonal space (often a two or one dmensonal array) that preserves the structure of the orgnal data. A self-organsng network conssts of a set of nput nodes V = {v, v 2,, v N }, a set of output nodes C = {c, c 2,, c M }, a set of weght parameters W = {w, w 2,, w,, w NM } ( N, M, 0 w ), and a map topology that defnes the dstances between any gven two output nodes. The output nodes n SOM are usually arranged n a 2-dmensonal array to form a map. Each nput node s fully connected to every output node va a varable connecton. A weght parameter s assocated wth each of these connectons, and the weghts between the nput nodes and output nodes are teratvely changed durng the learnng phase untl a termnaton crteron s satsfed. For each nput vector v, there s one assocated wnner node on the output map. A wnner node s an output node that has mnmum dstance to the nput vector. The SOM algorthm starts by ntalzng the topology and sze of the output map (M), the connecton weghts to random values over the nterval [0, ], the gan value η (learnng rate) and the neghbourhood sze r, and normalzes both the nput vectors and the connected weght vectors. After that, the algorthm calculates the Eucldean dstance between the new nput vector v and each node on the output map, and desgnates the node wth the mnmum dstance as the wnner node c: N vk wk, =,2,,M (2) k = c = mn ( ) 2 Once c s obtaned, the algorthm updates the weghts W, the learnng rate η and N c, the neghbourhood surroundng the wnner node c, n such way that the vectors represented by output nodes are smlar f they are located n a small neghborhood. For each node N c, the SOM algorthm performs the followng operaton: w ( old ) ( ) = w + η [ v w ] (3) ( new) old After that the neghborhood sze and the learnng rate are decreased. Ths process s repeated untl t converges. The neghborhood set N c s a set of nodes that surround the wnner node c. These nodes n N c are affected wth weght modfcatons, apart from those changes made to the wnner node, as defned n the algorthm. These weght changes are made to ncrease the matchng between the nodes n N c and the correspondng nput vectors. As the update of weght parameters proceeds, the sze of the neghborhood set s slowly decreased to a predefned lmt, for nstance, a sngle node. Ths process leads to one of the most mportant

5 > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 5 propertes of SOM that smlar nput vectors are mapped to geometrcally close wnner nodes on the output map. Ths s called neghborhood preservatons, whch has turned out to be very useful for clusterng smlar data patterns. It s not always straghtforward to vsually nspect the proected data on the two-dmensonal output map n order to decde the number and sze of natural clusters. Therefore, careful analyss or post-processng of output maps s crucal to the partton of the orgnal data set. Lke the K-means algorthm, the SOM produces a sub-optmal partton f the ntal weghts are not chosen properly. Moreover, ts convergence s controlled by varous parameters such as the learnng rate, the sze and shape of the neghbourhood n whch learnng takes place. Consequently, the algorthm may not be very stable n that a partcular nput pattern may produce dfferent wnner nodes on the output map at dfferent teratons. 2) Basc Algorthms: Herarchcal Technques The man problem of non-herarchcal approaches s that when workng wth hgh dmensonal problems, n general, there wll not be enough tems to populate the vector space, whch wll mply that most dmensons wll be unrelable for smlarty computatons. In order to solve ths problem herarchcal clusterng technques were developed. There are two types of herarchcal clusterng: agglomeratve and dvsve. Both share a common characterstc: they create a herarchy of clusters. Whle the agglomeratve approach creates a bottom-up herarchy the dvsve approach creates a top-down one. Generally speakng, dvsve algorthms are computatonally less effcent. A typcal herarchcal agglomeratve clusterng algorthm s outlned below: ) Place each pattern n a separate cluster 2) Compute the proxmty matrx of all the nter-pattern dstances for all dstnct pars of patterns 3) Fnd the most smlar par of clusters usng the matrx. Merge these two clusters nto one, decrement number of clusters by one and update the proxmty matrx to reflect ths merge operaton. 4) If all patterns are n one cluster, stop. Otherwse, go to the above step 2. The output of such algorthm s a nested herarchy of trees that can be cut at a desred dssmlarty level formng a partton. Herarchcal agglomeratve clusterng algorthms dffer prmarly n the way they measure the dstance or smlarty of two clusters where a cluster may consst of only a sngle obect at a tme. The most commonly used nter-cluster measures are: d AB = mn( d ), A B (4) d AB d AB = max( d ), A B = n n A B A B d (5), (6) where d AB s the dssmlarty between two clusters A and B, d s the dssmlarty between two ndvdual patterns and, n A and n B are the number of ndvduals n clusters A and B respectvely. These three nter-cluster dssmlarty measures are the bass of the three of the most popular herarchcal clusterng algorthms. The sngle-lnkage algorthm uses (4), the mnmum of the dstances between all pars of patterns drawn from the two clusters (one pattern from each cluster). The complete-lnkage algorthm uses Equaton (5), the maxmum of all parwse dstances between patterns n the two clusters. The group-average algorthm uses Equaton (6), the average of the dstances between all pars of ndvduals that are made up of one ndvdual from each cluster. A challengng ssue wth herarchcal clusterng s how to decde the optmal partton from the herarchy. One approach s to select a partton that best fts the data n some sense, and there are many methods that have been suggested n the lterature (Evertt, 993). It has also been found that the sngle-lnkage algorthm tends to exhbt the so-called channg effect: t has a tendency to cluster together at a relatvely low level obects lnked by chans of ntermedates. As such, the method s approprate f one s lookng for optmally connected clusters rather than for homogeneous sphercal clusters. The complete-lnkage algorthm, on the other hand, tends to produce clusters that tghtly bound or compact, and has been found to produce more useful herarches n many applcatons than the sngle-lnk algorthm (Jan, 999). The group-average algorthm s also wdelyused. Detaled dscusson and practcal examples of how these algorthms work can be found n (Jan, 999) and (Webb, 999). 3) Basc Algorthms: Fuzzy Clusterng One of the most wdely used fuzzy clusterng algorthms s the Fuzzy C-Means (FCM) Algorthm (Bezdek, 98). The FCM algorthm attempts to partton a fnte collecton of elements X={x,,x n } nto a collecton of c fuzzy clusters wth respect to some gven crteron. Gven a fnte set of data, the algorthm returns a lst of c cluster centres C={c,,c c }and a partton matrx U=u, є [0,],=, n, =,,c, where each element tells the degree to whch element x belongs to cluster c. Lke the k-means algorthm, the fuzzy c-means ams to mnmse an obectve functon. The standard functon s: J = c n m ( ) ( u, ) x c = = whch dffers from the k-means obectve functon by the addton of the membershp values u and the fuzzfer m. The 2 (7)

6 > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 6 fuzzfer m determnes the level of cluster fuzzness. A large m results n smaller membershps u and hence, fuzzer clusters. In the lmt m=, the membershps u converge to 0 or, whch mples a crsp parttonng. In the absence of expermentaton or doman knowledge, m s commonly set to 2. The basc Fuzzy C-Means Algorthm, gven n data ponts (x,,x n ) to be clustered, a number of c clusters wth (c,,c c ) the center of the clusters, and m the level of cluster fuzznes wth, m >, (8) frst ntalzes the membershp matrx U to random values, verfyng that: c u [0,], u = (9) = After the ntalzaton, the algorthm obtans the center of the clusters c, =,,c: n m ( u ) x = c = n (0) m ( u ) = And obtans the dstance between all ponts =,,n and all cluster centers =,,c d ( = x c () ) Updatng matrx U accordng to the new dstances, d = 0 u = c d u = = k dk 2 m (2) Ths process s repeated untl the set of cluster centers s stablzed. There are other algorthms, whch are optmzatons of the orgnal FCM, lke Fuzzy c-medod Algorthm (FCMdd) or the Fuzzy c-trmered Medods Algorthm (FCTMdd) (Krshnapuram et al., 200) 4) Applcatons for User Modelng For UM there are two knds of nterestng clusters to be dscovered: usage clusters and page clusters. Clusterng of users tends to establsh groups of users exhbtng smlar browsng patterns, whch are usually called stereotypes. Such knowledge s especally useful for nferrng user demographcs n order to perform market segmentaton n e- commerce-applcatons or provde personalzed Web content to the users. On the other hand, clusterng of pages wll dscover groups of pages havng related content. Ths nformaton s useful for Internet search engnes and Web assstance provders. In the context of UM, clusterng has a dstngushable characterstc: t s usually done wth non-numercal data. Ths mples that, usually, the clusterng technques appled are relatonal, where numercal values represent the degrees to whch two obects of the data set are related. Clusterng appled to user modelng has to use technques that can handle relatonal data because the nformaton used to create clusters (pages vsted, characterstcs of the user, etc.) cannot usually be represented by numercal vectors. In case they are represented usng vectors, part of the semantc of the orgnal data s lost. In these systems the defnton of dstance s done usng vectoral representatons of user nteractons wth the adaptve hypermeda system. Some examples of relatonal TABLE I EXAMPLES OF SOME CLUSTERING-BASED USER MODELS (Mobasher et al, 2000) (Palouras et al. 999) (Doux et al., 997) (Fu et al., 999) (Hay et al., 200) (Mobasher et al., 200) Applcaton Input Data Results Capture of web-users nterests usng k-means User logs from the Unv. of Mnnesota Example of the mplementaton of the Clusterng. Desgn of a cluster-based Comp. Scence web server collected recommendaton system n a commercal recommendaton system durng a month. ste. News- flterng system based on communtes of users. Clusterng allows to recommend nterestng news to a user. K-means clusterng algorthm for user proflng n order to derve prototypcal behavor from each user. Groupng of users wth a common behavor n a web server takng nto account access patterns. Clusterng methods that capture the nherent sequentalty of web vsts. A metrc, Sequence Algnment Method, s ntroduced to be used nstead of Eucldean dstance for clusterng purposes. Clusterng for collaboratve flterng When regsterng a user specfes hs/her nterests. Data collected from a dedcated set of experments where users are asked about ther preferences. Data collected from UMR web server log ( contanng 2.5 mllon records. Log fles of a Belgan telecom provder collected over a one-week perod. 2,000 sessons collected from the Assocaton for Consumer Research web ste. Establshed machne learnng technques are very useful for the acquston of communtes of users The technques proposed handle qualtatve data for clusterng users effcently. The clusters obtaned can be used for personalzaton purposes. The results are as good as the ones obtaned wth Eucldean dstance, whle keepng the concept of order. Wth the proper data preprocessng the clusterng approach outperforms more tradtonal approaches to ths problem (lke Nearest Neghbor)

7 > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 7 TABLE II EXAMPLES OF SOME FUZZY CLUSTERING-BASED USER MODELS Lampnen and Kovsto (2002) Nasraou et al. (999) Josh et al. (2000) Krshnapuram et al. (200) Applcaton Input Data Outcome samples of dfferent applcatons from an edge router of a LAN network. ntroduced. Obtan applcaton profles from network traffc data to manage network resources. A new algorthm (CARD) to mne user profles from access logs s proposed. Two algorthms to mne user profles: FCMdd and FCTMdd. Web access log analyss for user proflng usng RFCMdd (Robust Fuzzy c-medods). 2 day log data of the Dep. of Comp. Eng.. at Unv. of Mssour. CSEE logs of Unv. of Maryland Fve days of CSEE web server actvty of Unv. of Maryland. FCM produced better results than SOM. A method for the comparson of both solutons s also CARD s very effectve for clusterng many dfferent profles n user sessons. Both algorthms extract nterestng user profles. FCMdd s not able to handle nose as effectvely as FCTMdd. RFCMdd s very effectve for clusterng of relatonal data. clusterng appled to web mnng are (Josh et al., 2000) and (Mobasher et al., 2000). Table I summarzes some studes and applcatons of hard clusterng for UM. Some examples of recommendaton mplemented wth clusterng algorthms are (Mobasher et al, 2000), (Doux et al., 999), (Palouras et al., 999), whch uses SOM, (Fu et al., 200) and (Mobasher et al, 200). Examples of classfcaton tasks mplemented usng clusterng are (Doux et al. 997) and (Hay et al., 200). (Goren-Bar et al., 200) uses SOM to classfy documents based on a subectvely predefned set of clusters n a specfc doman. When usng Fuzzy Clusterng, the man dfference wth hard clusterng technques s that, a user can be at the same tme n more than one cluster wth dfferent degrees of truth. Ths allows to better capture the nherent uncertanty that the problem of modelng user behavor has. Examples of applcatons that mplement a recommendaton task usng FC nclude (Lampnen and Kovsto, 2002), (Nasraou et al., 999) and (Nasraou and Krshnapuram, 2000). Examples of classfcaton tasks are (Josh et al., 2000) and (Krshnapuram et al., 200). Table II summarzes some studes and applcatons of FC for UM. 5) Lmtatons The man problem that clusterng technques face s how to defne the concept of dstance that s gong to be used. In general some knowledge of the problem s needed to defne an optmum concept of dstance. When appled to user modellng ths problem s even harder due to the nature of the data avalable: nteractons, user preferences, pages vsted, etc., whch are not expressed n a numercal way. Dfferent technques to characterze web user behavour usng numercal vectors have been proposed (Josh et al., 2000) (Mobasher et al., 2000), but n one way or another, the representatons loose part of the semantcs that the orgnal data had. More research s needed on how to express the data avalable n a numercal way and how to defne a meanngful concept of dstance for creatng effcent user models usng clusterng technques whle at the same tme keepng the semantcs of the orgnal data. Non-herarchcal clusterng technques assume that the number of clusters k s known a pror. For user modelng ths s not usually the case. Ths mples that some heurstcs need to be used to determne the number of clusters. Also user models developed so far usng fuzzy clusterng do not fully use the fuzzy nature of the technque n order to create more flexble and adaptve systems. More studes are needed to create meanngful ways of mxng the dfferent personalzatons assocated wth each one of the clusters n whch a user can be ncluded when usng fuzzy clusterng. B. Assocaton Rules for User Modelng Assocaton rule dscovery ams at dscoverng all frequent patterns among transactons. The problem was orgnally ntroduced by (Agrawal et al., 993) and was based on detectng frequent tems n a market basket. The Apror algorthm, ncludng ts varatons and extensons, s wdely accepted to solve ths problem. ) Basc Algorthms Beng I={, 2,..., m } a set of tems, S a set of transactons, where each transacton T s a set of tems, and X a set of tems n I, a transacton T s sad to contan X, f : X T (3) An Assocaton Rule s defned as an mplcaton of the form: X Y, X I, Y I, X Y = (4) Where X s usually called antecedent and Y consequent. The order of the tems of both the antecedent and consequent s not relevant. Each Assocaton Rule has a confdence and a support assocated. The support of a rule s defned as the fracton of strngs n the set of sessons of S where the rule successfully apples: S S / XY S S θ ( XY ) = (5) The confdence of a rule s defned as the fracton of tmes for whch f the antecedent X s satsfed, the consequent Y s also true: θ ( XY) σ ( XY) = θ ( X ) (6)

8 > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 8 The values of support and confdence are used to flter the set of assocaton rules obtaned from S. Gven a set of transactons S, the task of mnng assocaton rules can be formulated as fndng all assocaton rules wth at least mnmum support and mnmum confdence, where the thresholds for support and confdence are user-specfed values. The task of dscoverng assocaton rules can be done n three steps: () fnd all sets of tems that have transacton support above mnmum support, (2) for each tem obtaned, fnd all non-empty subsets and (3) for each such subset X of tem I, f the confdence s bgger than the gven threshold, produce the rule: X ( I X ) (7) 2) Applcatons for User Modelng In the context of user modelng, assocaton rules capture sets of actons that have a causal relaton among them. A typcal applcaton of assocaton rules for user modelng s capturng pages that are accessed together. Typcally, assocaton rules are used to mplement recommendaton tasks. Some examples of recommendaton tasks mplemented usng assocaton rules are (Chen et al., 2002), (Ln et al., 200) and (Nanopoulos et al., 200), (Mobasher et al., 200), and (Geyer-Schulz et al., 2002). Table III summarzes these examples. 3) Lmtatons Assocaton rules have two mportant drawbacks when used for user modelng: they do not capture the sequentalty of both the antecedent and the consequent and no temporalty nformaton (for example when X happens when s Y gong to happen) s captured. Both temporalty and sequentalty are very mportant characterstcs for user modelng. Temporalty allows us to predct not only what actons a user s gong to take, but also when those actons are gong to be taken. Sequentalty captures the order n whch a user makes a set of actons whch s relevant to the consequent that such antecedent wll fre. Because temporalty and sequentalty are two key elements that need to be captured to create effcent user models, more research s needed to capture these characterstcs when creatng assocaton rules. IV. SUPERVISED APPROACHES TO USER MODELING Ths secton gves a revew of how supervsed learnng technques (decson trees/classfcaton rules, Neural Networks, k-nearest neghbor and Support Vector Machnes) can be used to model user behavor. A. Decson Trees/Classfcaton Rules for User Modelng Decson tree learnng (Mtchell, 997)(Wnston, 992) s a method for approxmatng dscrete-valued functons wth dsunctve expressons. Decson tree learnng s generally best suted to problems where nstances are represented by attrbute-value pars and the target functon has dscrete output values. Classfcaton rules are an alternatve representaton of the knowledge obtaned from classfcaton trees. They construct a profle of tems belongng to a partcular group accordng to ther common attrbutes. ) Basc Algorthms The tranng process that creates a decson tree s called nducton. A standard decson tree algorthm has two phases: () tree growng and (2) prunng. The growng phase can be done usng two methods: () Top-Down nducton and (2) Incremental nducton. Top-down nducton s an teratve process whch nvolves splttng the data nto progressvely smaller subsets. Each teraton consders the data n only one node. The frst teraton consders the root node that contans all the data. Subsequent teratons work on dervatve nodes that wll contan subsets of the data. The algorthm begns by analyzng the data to fnd the ndependent varable that, when used as a splttng rule wll result n nodes that are most dfferent from each other wth respect to the dependent varable. The qualty of a test s measured by the mpurty/varance of example sets. The most common measure s the nformaton gan. Typcally, TABLE III EXAMPLES OF SOME ASSOCIATION RULES-BASED USER MODELS (Chen et al., 2002) (Ln et al., 200) (Mobasher et al., 200) (Geyer-Schulz, 2002) (Nanopoulos et al, 200) Applcaton Input Data Results Predcton of user ntenton n machne-human Interacton of 5 users wth the system The approach acheves 84% average nteracton. The model s used to predct human actons durng a month. accuracy n predctng user s ntenton. for an Internet browser. Learnng user preference models of multmeda Internet fles. The model s used to predct when a user wll need a multmeda fle and to help n the search of the fle needed. Effcent personalzaton of web stes usng assocaton rules and non-labeled data contaned n server s log fle. Evaluaton of the recommendatons gven n a educatonal Internet nformaton broker usng Assocaton Rules and repeat-buyng theory. Assocaton Rule dscovery to construct a predcton system for effectve predcton of web user request. Implementaton of a prefetchng system. 200 records of user s emals. Access logs for the Web ste Assocaton for Consumer Research (ACR) Newsletter ( Data set obtaned from the Vrtual Unversty Informaton system of Venna Unversty durng 6 moths of usage. Clarknet trace log: The average correct acceptance and average correct refusal of each user preference modelng s hgher that 80%. Assocaton Rules acheve better recommendaton effectveness that tradtonal collaboratve flterng approaches. Assocaton Rules tend to work better and acheve hgher predcton accuracy. The proposed algorthm WM o outperforms the exstng algorthms and s very effcent n reducng the number of canddates.

9 > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 9 the set of possble tests s lmted to splttng the examples accordng to the value of a certan attrbute. Once a node s splt, the same process s performed on the new nodes, each of whch contans a subset of the data n the parent node. Ths process s repeated untl only nodes where no splts should be made reman. Incremental nducton s a method for the task of concept learnng. When a new tranng example s entered t s classfed by the decson tree. If t s ncorrectly classfed then the tree s revsed. Restructurng the tree can be done by storng all tranng examples or by mantanng statstcs assocated wth nodes n the tree. Tree-buldng algorthms usually have several stoppng rules. These rules are usually based on several factors ncludng maxmum tree depth, mnmum number of elements n a node consdered for splttng, or the mnmum number of elements that must be n a new node. The second phase of the algorthm optmzes the resultng tree obtaned n the frst phase. Prunng s a technque used to make a tree more general. It removes splts and the subtrees created by them. There s a great varety of dfferent decson tree algorthms n the lterature. Some of the more common algorthms are: Classfcaton and Regresson Trees (CART) (Breman et al, 984) (Efron et al., 99), Ch-squared Automatc Interacton Detecton (CHAID) (Kass, 980), C4.5 (Qunlan, 993), J4.8 (Wtten and Frank, 999), C5.0 (whch mplements boostng) and ID3 (Qunlan, 986). Rules are, at ts smplest form, an equvalent form of expressng a classfcaton tree. In order to obtan the set of rules of a decson tree, each path s traced from root node to leaf node, recordng the test outcomes as antecedents and the leaf-node classfcaton as the consequent. The process of convertng a decson tree nto decson rules can be done after or before prunng. Convertng a decson tree to rules before prunng has some advantages: () t allows dstngushng among the dfferent contexts n whch a decson node s used, (2) the prunng decson regardng an attrbute test can be made dfferently for each path, and (3) t mproves readablty. Nevertheless, obtanng the rules before prunng can produce a hgh number of rules. Ths can be solved by applyng some rule-reducton technques. The basc technques are: () elmnate redundant and unnecessary antecedents, (2) use OR to combne rules n order to reduce the total number of rules to the number of dependent varables and (3) elmnate unnecessary rules. Rules can be also obtaned after the tree has been pruned, producng n ths case a smaller rule knowledge-base. Algorthms such as CART, C4.5 and C.5 nclude methods to generalze rules assocated wth a tree whch removes redundances. 2) Applcatons for User Modelng In the context of user modelng, decson trees can be used to classfy users and/or documents n order to use ths nformaton for personalzaton purposes. Decson trees can also handle nosy data and/or data wth mssng parameters, whch makes them very useful for creatng user models due to the nosy and mprecse nature of the data avalable. Table IV summarzes some studes and applcatons of Decson Trees for UM. Classfcaton trees are typcally used to mplement classfcaton tasks. In ths case the classfcaton trees are used to construct user models used to personalze the user experence (for example regardng hs/her level of expertse, hs/her cogntve style, etc.). Some examples of ths approach are (Tsukada et. al, 2002) and (Beck et al., 2003). Due to ts ablty to group users wth smlar characterstcs, classfcaton tress can be also used to mplement recommendaton tasks. Examples of these approaches are (Palouras et al, 999), (Zhu et al., 2003) and (Webb et al., 997). Classfcaton rules are wdely used to model user behavor because t provdes a straghtforward framework to represent knowledge. The readablty of the output knowledge s a great advantage of ths approach. Table V summarzes some studes and applcatons of Classfcaton Rules for UM. TABLE IV EXAMPLES OF SOME DECISSION TREE-BASED USER MODELS (Palouras et al., 999) (Tsukada et al., 200) (Webb et al., 997) (Zhu et al., 2003) (Beck et al., 2003) Applcaton Input Data Results Constructon of user stereotypes usng C4.5. Stereotypes Questons answered by 3 users It s very mportant to have good data n are used to antcpate nterests of a user wthn the regardng the ECRAN nformaton order to obtan good models. context of a news retreval system. system. Automatc classfcaton of web pages n a pre-specfed set of categores usng C4.5 and assocaton rules. Use of C4.5 to buld the Feature Based Modelng nstructon module. The results are appled to the Subtracton Modeler. Constructon of a recommender system to help users fnd relevant nformaton on the web usng C4.5 and Naïve Bayesan Classfer. Constructon of a User Model for an adaptve tutor wth J4.8 and Naïve Bayesan classfer. 4 top categores of Yahoo! JAPAN. From each category 200 pages were randomly download. Test admnstered to 73 nne to ten year old prmary school students. Each test contaned 40 three column subtracton problem randomly generated. Collected data from 29 partcpants, askng each partcpant to perform two search tasks. Data collected from the nteracton of 88 students wth the Readng Tutor. The expermental evaluaton demonstrates that ths method provdes acceptable accuracy wth the classfcaton of web-page nto top categores of Yahoo! JAPAN. C4.5 ncreases the number of predctons made wthout sgnfcantly alterng the accuracy of those predctons. C4.5 outperforms Naïve Bayesan Classfer. Naïve Bayesan Classfer outperforms J4.8 for ndvdual modelng and J4.8 outperforms Naïve Bayesan Classfer for Group modelng.

10 > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 0 (Shah et al., 2002), (Zhang, 2003), (Joerdng, 999) and (Semeraro et al., 200) are examples of classfcaton tasks mplemented usng classfcaton rules. In these cases, the rules are used to characterze dfferent user stereotypes that are used to mplement some personalzed feature. (Cercone, 2002) and (Cercone et al., 2002b) are examples of applcaton of classfcaton rules for recommendaton. Classfcaton rules can not only be obtaned from a learnng algorthm but can also be drectly expressed by the desgner of the system. Some examples of ths approach are (Morales et al, 999) that constructs a model capturng the behavor of a user that s controllng a pole, (De Bra et al.2003) and (Romero et al., 2003) that use rules as part of the archtecture of a general purpose tool for adaptve webstes, and (Benak et al., 997) that uses rules to characterze user behavor n the context of nformaton retreval. 3) Lmtatons Decson trees/classfcaton rules produce results that are hghly dependent on the qualty of the data avalable. Ths s caused by the fact that subtrees are created usng the maxmum nformaton gan possble. In some cases f the nformaton avalable s not approprate, whch typcally happens when the nformaton used to create user models has been obtaned usng user feedback or n a nosy envronment, the models created wll not correctly capture user behavor. Also, decson trees have the problem that for hgh dmensonal problems, the response tme can be very hgh. Ths s an nconvenent when workng wth adaptve systems, because real-tme response s needed. Ths problem can be solved n some cases usng classfcaton rules. Specal nterest for user modelng has the combnaton of classfcaton rules wth soft computng technques (fuzzy logc and neural networks especally) n order to create more flexble user models. Fuzzy classfcaton rules are able to overlap user models and to mprove the nterpretablty of the results. So far user modelng wth fuzzy classfcaton rules has not been used at ts full capacty. B. K-Nearest Neghbor (k-nn) for User Modelng k-nearest Neghbor s a predctve technque sutable for classfcaton models (Fredman et al., 975). Unlke other learnng technques n whch the tranng data s processed to create the model, n k-nn the tranng data represents the model. ) Basc Algorthms The smplest verson s the Nearest Neghbor algorthm. In ths case the tranng porton of the algorthm smply stores the data ponts of the tranng data S, S={x,,x n }. To classfy a new tem, the nearest-neghbor classfer fnds the closest (accordng to some dstance metrc) tranng-pont x to and assgns t to the same class of x. The nearest-neghbor algorthm s tradtonally mplemented by computng the dstance from to every x of S. It can also be mplemented more effcently usng Vorono dagrams. The k-nearest-neghbor (k-nn) algorthm s a varaton of nearest-neghbor. In ths case, nstead of lookng at only the pont x that s closest to, the algorthm looks at the k ponts n S that are closest to. Snce the class of each of the k nearest ponts are known, the maorty category n ths subset s the result of the classfcaton. Nearest-neghbor s a specal case of k-nearest-neghbor, where k =. k-nn has two man parameters: () the parameter k (number of nearest cases to be used) and (2) the dstance metrc, d. The selecton of the metrc s very mportant, because dfferent metrcs, used on the same tranng data, can result n completely dfferent predctons. In general, the selecton of the metrc s applcaton dependent. A typcal metrc d s the Eucldean dstance, where: d(x, ) = sqrt((x - ) T (x - )) (8) The value of k should be determned by expermentaton. In general, hgher values of k produce nose-proof systems, but also systems wth hgher computatonal costs. Typcally k s n (Shah et al., 2002) (Zhang, 2003) (Joerdng, 999) TABLE V EXAMPLES OF SOME CLASSIFICATION RULES-BASED USER MODELS Applcaton Input Data Results Classfcaton rules model bd strateges n an on-lne aucton webste Classfcaton of the users of an nformaton retreval (IR) system accordng to dfferent user stereotypes. Dynamc user modelng of the nterests of a web shopper. The user model represents the present nterest of that partcular user. (Cercone, 2002) ELEM2, a novel rule nducton process. Not Descrbed (Semeraro et al., 200) Use of classfcaton rules to model the user nteracton n a dgtal lbrary servce archtecture. Comparson between the results gven by C4.5 and J4.8. Data collected from 2 auctons over dfferent perod of tme, makng a total of,537 bds collected. Ratngs gven by 64 people wth four dfferent backgrounds (stereotypes) regardng ther IR knowledge. Set of vsts of a user to an adaptve shoppng ste. Log fle contanng the nteractons of each user wth the dgtal lbrary system. The data s the preprocessed and labeled. The bdder model obtaned can be used to smulate onlne markets and/or to detect fraud n on-lne aucton webstes. The classfcaton can be effectvely used to personalze the nformaton retreval envronment. Classfcaton Rules effcently model dynamc human behavor. Applcatons to extract user profles that are used to construct recommender systems. Accurate nteracton models can be nferred automatcally by usng J4.8. J4.8 slghtly outperforms C4.5.

11 > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < TABLE VI EXAMPLES OF SOME K-NN BASED USER MODELS (Bllsus et al, 999) (Schwab et al., 999) (Tsrga et al., 2002) (Shh et al., 200) Applcaton Input Data Results Constructon of an ntellgent agent desgned to comple a daly news program for ndvdual users usng a user model that captures user nterests usng a mult-strategy approach. ELFI (Electronc Fundng Informaton) a WWW system that provdes personalzed nformaton about research fundng. Intalzaton of the Student Model n a Algebra Tutor usng k- NN and stereotypes based on recognzed smlartes wth other students that have already nteracted wth the system. Development of an on-lne learnng system usng k-nn for provdng adaptve learnng materals on a course of ntroducton to computer networks. Tranng data of the agent by 0 users durng 4-8 days. K-NN s used to model Short Term nterest of a user. ELFI log fle of several months of system usage. The selecton of a program s regarded as nterestng for that user. Informaton s captured by the Algebra Tutor usng a set of questons to the new user of the system. Ths nformaton s used to obtan the stereotype. Not Descrbed A mult-strategy learnng approach (a short-term model and a long-term model) captures changes n user nterest very effcently. The optmum soluton s obtaned wth a combnaton of K-NN and Naïve Bayesan Classfers. The combnaton of K-NN and stereotypes produces very good results for Student Modelng. Applcaton of k-nn compared wth prevous researches can retreve more adequate materals based on prevous smlar cases. the range of fve to twenty rather than n the order of hundreds. Because the model created by k-nn s bascally the set of tranng ponts, ths means, especally for hgh dmensonal problems and/or for problems wth a lot of tranng data, that the amount of memory needed to store the model s very hgh, and that t ncreases when more data s added to the system. Agglomeratve Nearest Neghbor (A-NN) algorthms have been proposed to solve ths problem. A-NN s desgned to capture the same nformaton as k-nn but wthout the necessty of storng the complete set of tranng data. The key dea behnd A-NN s that t clusters tranng ponts that have the same class. Each class s then represented by a classrepresentatve, aganst whch the concept of dstance s measured when classfyng a new nstance. A-NN has been tradtonally used for collaboratve clusterng. 2) Applcatons for User Modellng k-nn algorthms can be used very effectvely for recommendaton. In ths case the algorthm s used to recommend new elements based on user s past behavor. A typcal applcaton s the recommendaton of nterestng documents. As t can be seen the concept of dstance (n ths case the dstance between two texts) s the key element to an effcent recommendaton system. Examples of k-nn used to mplement recommendaton tasks are (Bllsus et al, 999), (Schwab et al., 999) and (Shh et al., 200). Nevertheless, lke any classfcaton algorthm, t can also be used to mplement classfcaton tasks, for example to ntalze properly a new user when frst vstng a system gven a set of exstng user models. (Tsrga et al. 2002) s an example of k-nn used to mplement a classfcaton task. Table VI summarzes some studes and applcatons of k-nn for UM. 3) Lmtatons k-nn s a very smple technque that produces good and straghtforward results. Nevertheless the man drawbacks of ths approach s the hgh dependence of the results from the value of k and the metrc d chosen. In general, to choose both parameters, some expermentaton wth the system and some knowledge on how the system works wll be needed. Also, the value k should be bg enough to produce a classfcaton system nose-proof, otherwse a k-nn wth a small k value wll be extremely susceptble to nose. In the context of user modelng ths s very mportant because usually the data avalable s nosy and mprecse due to the nature of the problem. Nevertheless, an ncrease n the value of k mples an ncrease n the computatonal tme needed. The defnton of the dstance d, when used for user modelng, faces the same problems as clusterng technques: () the necessty of transformng user related nformaton nto a vectoral representaton wthout loosng semantc, and (2) the defnton of a concept of dstance among those vectors that capture the characterstcs of the problem (user nterest, user behavor, etc.). Another lmtaton of k-nn for user modelng s the response tme, whch s drectly affected by d and by the dmensonalty and number of examples of the tranng data. Adaptve hypermeda systems need real-tme response, and k- NN may n some cases not be sutable. C. Neural Networks for User Modelng An Artfcal Neural Network (ANN) s an nformaton processng paradgm that s nspred by the way bologcal nervous systems process nformaton. Two good ntroductons to Neural Networks can be found n (Fausett, 994) and (Haykn, 999). Although typcal ANNs are desgned to solve supervsed problems, there are also archtectures to solve unsupervsed problems. ) Basc Concepts The key element of ths paradgm s the structure of the nformaton processng system. It s composed of a large number of hghly nterconnected processng elements (neurons) workng n parallel. They consst of the followng elements: () Neurones, (2) Weghted nterconnectons, (3) An actvaton rule, to propagate sgnals through the network and (4) learnng algorthm, specfyng how weghts are adusted.

12 > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 2 Fg. 3. Archtecture of an Artfcal Neuron The basc element of any ANN s an artfcal neuron (Fg. 3). A neuron has N weghted nput lnes and a sngle output. The neuron wll combne these weghted nputs by formng ther sum and, wth reference to a threshold value and actvaton functon, t wll determne ts output. Beng x,x 2,,x N the nput sgnals, w,,w N the synaptc weghts, u the actvaton potental, θ the threshold and y the output sgnal and f the actvaton functon: N u = Σ w x (9) = y = f ( u θ ) (20) Defnng w 0 =θ and x 0 =-, the output of the system can be reformulated as: y N f Σ w x 0 = = (2) The actvaton functon f defnes the output of the neuron n terms of the actvty level at ts nput. The most common form of actvaton functon used s the sgmod functon. There are very dfferent ways n whch a set of neurons can be connected among them. The tradtonal cluster of artfcal neurons s called neural network. Neural networks are bascally dvded n three layers: The Input Layer, The Hdden Layer, whch may contan one ore more layers, and the output layer. The layer of nput neurons receves the data ether from nput fles or drectly from electronc sensors n real-tme applcatons. The output layer sends nformaton drectly to the outsde world, to a secondary computer process, or to other devces such as a mechancal control system. Between these two layers can be many hdden layers. These nternal layers contan many of the neurons n varous nterconnected structures. The nputs and outputs of each of these hdden neurons smply go to other neurons. In most networks each neuron n a hdden layer receves the sgnals from all of the neurons n a layer above t, typcally an nput layer. After a neuron performs ts functon t passes ts output to all of the neurons n the layer below t, provdng a feed forward path to the output. Another type of connecton s feedback. Ths s where the output of one layer routes back to a prevous layer. Mult-Layer Perceptrons are the typcal archtecture of NNs. MLP are full-connected feed-forward nets wth one or more layers of nodes between the nput and the output nodes. Classfcaton and recognton capabltes of NNs stem from the non-lneartes used wthn the nodes. A snglelayered perceptron mplements a sngle hyperplane. A twolayer perceptron mplements arbtrary convex regons consstng of ntersecton of hyperplanes. A three-layer NN mplements decson surfaces of arbtrary complexty (Lpman, 987)(Looney, 987). That s the reason why a three layer NN s the most typcal archtecture. NNs learn through an teratve process of adustments. There are two tranng approaches: supervsed and unsupervsed. In supervsed tranng, both the nputs and the outputs are provded. The net s traned by ntally selectng small random weghts and nternal thresholds, and presentng all tranng data repeatedly. Weghts are adusted after every tral usng nformaton specfyng the correct class untl weghts converge and the cost functon s reduced to an acceptable value. The vast bulk of networks utlze supervsed tranng. The most common supervsed technque s the backpropagaton learnng algorthm. It uses a gradent search technque to mnmze a cost functon defned by the mean square error (MSE) between the desred and the actual net outputs, wth l the number of tranng ponts: MSE l l 2 = Σ( y y ) (22) = The generally good performance found for the backpropagaton algorthm s somewhat surprsng consderng that TABLE VII EXAMPLES OF SOME NNS BASED USER MODELS (Bdel et al., 2003) (Sas et al., 2003) (Shepperd, 2002) (Beck and Woolf., 998) Applcaton Input Data Results Classfcaton and trackng of user navgaton. Data generated from an on-lne A labeled approach to the problem encyclopeda. produces better accuracy. Predcton of user s next step n a vrtual envronment 30 users performed exploraton and Very accurate predctons of the next searchng wthn the envronment. step Adaptve flterng system for electronc news usng Very useful for readers wth specfc The Halfax Herald Ltd. stereotypes. nformaton needs. Constructon of a student model for an ntellgent tutorng NN-based recommendaton to each Data collected by the tutorng system system. ndvdual.

13 > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 3 t s a gradent descent technque that may fnd a local mnmum n the cost functon nstead of the desred global mnmum. 2) Applcatons for User Modelng NNs are able to derve meanng from complcated and/or mprecse data. Also, NN do not requre the defnton of any metrc (unlke k-nn or clusterng) whch make them completely applcaton ndependent. No ntal knowledge about the problem that s gong to be solved s needed. These characterstcs make NNs a powerful method to model human behavour and an deal technque to create user models for hypermeda applcatons. NNs have been used for classfcaton and recommendaton n order to group together users wth the same characterstcs and create profles and stereotypes. (Bdel et al., 2003) s an example of NNs used for classfcaton. (Sas et al., 2003), (Beck et al., 2003), (Sheperd et al., 2002) and (Beck and Woolf, 998) use NNs for recommendaton tasks. Table VII presents more detals of these applcatons. 3) Lmtatons NNs have been successfully used for UM manly because they do not need any heurstc to produce a model. Nevertheless t stll faces mportant lmtatons. The man ones are the tranng tme needed to produce a model (whch n cases can be measured n the order of many hours and even days) and the amount of nformaton needed. The tranng tme s an nconvenence for creatng dynamc models. Although there are technques able to retran NNs dynamcally, the technques used so far for UM retran the system from scratch n case more nformaton, e.g. a new user or a new document, s added. More research n the feld of ncremental learnng s needed. Another mportant lmtaton of NNs s ther black box behavor. Whle the prevous technques, to a dfferent extent, can be nterpreted and manually changed, NNs can not be nterpreted whch lmts ts applcatons n case a human understandable user model s needed. D. Support Vector Machnes (SVM) for User Modelng SVM s a classfer derved from Statstcal Learnng Theory (Vapnck, 995). The man advantages of SVM when used for classfcaton problems are () ablty to work wth hgh dmensonal data and (2) hgh generalzaton performance wthout the need to add a-pror knowledge, even when the dmenson of the nput space s very hgh. Excellent ntroductons to SVM can be found n (Crstann et al., 2000). The problem that SVM try to solve s to fnd an optmal hyperplane that correctly classfes data ponts and separates the ponts of two classes as much as possble. Fgure 4 s an example of the prevous stuaton. ) Basc Algorthms Gven two classes, the obectve s to dentfy the Fg. 4. SVM maxmzaton of the dstance between classes. hyperplane that maxmzes m, 2 m = w (23) whle at the same tme classfyng correctly all the examples gven. Beng w T the hyperplane that verfes the prevous condton, all ponts that were part of that class wll verfy w T x+b>0, where x s the pont that s beng valdated. If a pont s not part of that class, then w T x+b<0. Formally the problem can be presented as follows. Gven m the dmenson of the problem and N the number of tranng ponts, let: ( x, y ),...,( x, y ) R m Y, y Y, Y = {,} (24) N N be the set of labelled nputs, where - ndcates that the nput s not of that class and ndcates that the nput s of that class. The decson boundary should verfy: x w x w T T T T + b, y = + b, y = The problem can be presented as: 2 Maxmze m = w T Subect to y ( xw + b), =,..., n (25) (26) Ths formulaton can be expressed as a standard quadratc problem (QP) n the form of: N maxmze α = 2 Subect to α 0, α y N = N α α y y x x =, = = 0 T (27) In ths context a global maxmum α can always be found and w can be recovered as: N w = α y x (28) =

Hierarchical clustering for gene expression data analysis

Hierarchical clustering for gene expression data analysis Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervsed Learnng and Clusterng Supervsed vs. Unsupervsed Learnng Up to now we consdered supervsed learnng scenaro, where we are gven 1. samples 1,, n 2. class labels for all samples 1,, n Ths s also

More information

Concurrent Apriori Data Mining Algorithms

Concurrent Apriori Data Mining Algorithms Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng

More information

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like:

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like: Self-Organzng Maps (SOM) Turgay İBRİKÇİ, PhD. Outlne Introducton Structures of SOM SOM Archtecture Neghborhoods SOM Algorthm Examples Summary 1 2 Unsupervsed Hebban Learnng US Hebban Learnng, Cntd 3 A

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15 CS434a/541a: Pattern Recognton Prof. Olga Veksler Lecture 15 Today New Topc: Unsupervsed Learnng Supervsed vs. unsupervsed learnng Unsupervsed learnng Net Tme: parametrc unsupervsed learnng Today: nonparametrc

More information

Unsupervised Learning

Unsupervised Learning Pattern Recognton Lecture 8 Outlne Introducton Unsupervsed Learnng Parametrc VS Non-Parametrc Approach Mxture of Denstes Maxmum-Lkelhood Estmates Clusterng Prof. Danel Yeung School of Computer Scence and

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1 4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervsed Learnng and Clusterng Why consder unlabeled samples?. Collectng and labelng large set of samples s costly Gettng recorded speech s free, labelng s tme consumng 2. Classfer could be desgned

More information

Machine Learning. Topic 6: Clustering

Machine Learning. Topic 6: Clustering Machne Learnng Topc 6: lusterng lusterng Groupng data nto (hopefully useful) sets. Thngs on the left Thngs on the rght Applcatons of lusterng Hypothess Generaton lusters mght suggest natural groups. Hypothess

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION

A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION 1 THE PUBLISHING HOUSE PROCEEDINGS OF THE ROMANIAN ACADEMY, Seres A, OF THE ROMANIAN ACADEMY Volume 4, Number 2/2003, pp.000-000 A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION Tudor BARBU Insttute

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

The Research of Support Vector Machine in Agricultural Data Classification

The Research of Support Vector Machine in Agricultural Data Classification The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

LinkSelector: A Web Mining Approach to. Hyperlink Selection for Web Portals

LinkSelector: A Web Mining Approach to. Hyperlink Selection for Web Portals nkselector: A Web Mnng Approach to Hyperlnk Selecton for Web Portals Xao Fang and Olva R. u Sheng Department of Management Informaton Systems Unversty of Arzona, AZ 8572 {xfang,sheng}@bpa.arzona.edu Submtted

More information

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

Edge Detection in Noisy Images Using the Support Vector Machines

Edge Detection in Noisy Images Using the Support Vector Machines Edge Detecton n Nosy Images Usng the Support Vector Machnes Hlaro Gómez-Moreno, Saturnno Maldonado-Bascón, Francsco López-Ferreras Sgnal Theory and Communcatons Department. Unversty of Alcalá Crta. Madrd-Barcelona

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan

More information

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z. TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS Muradalyev AZ Azerbajan Scentfc-Research and Desgn-Prospectng Insttute of Energetc AZ1012, Ave HZardab-94 E-mal:aydn_murad@yahoocom Importance of

More information

K-means and Hierarchical Clustering

K-means and Hierarchical Clustering Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n gvng your own lectures. Feel free to use these sldes verbatm, or to modfy them to ft your

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

(1) The control processes are too complex to analyze by conventional quantitative techniques.

(1) The control processes are too complex to analyze by conventional quantitative techniques. Chapter 0 Fuzzy Control and Fuzzy Expert Systems The fuzzy logc controller (FLC) s ntroduced n ths chapter. After ntroducng the archtecture of the FLC, we study ts components step by step and suggest a

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

Keyword-based Document Clustering

Keyword-based Document Clustering Keyword-based ocument lusterng Seung-Shk Kang School of omputer Scence Kookmn Unversty & AIrc hungnung-dong Songbuk-gu Seoul 36-72 Korea sskang@kookmn.ac.kr Abstract ocument clusterng s an aggregaton of

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

Classifying Acoustic Transient Signals Using Artificial Intelligence

Classifying Acoustic Transient Signals Using Artificial Intelligence Classfyng Acoustc Transent Sgnals Usng Artfcal Intellgence Steve Sutton, Unversty of North Carolna At Wlmngton (suttons@charter.net) Greg Huff, Unversty of North Carolna At Wlmngton (jgh7476@uncwl.edu)

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

A New Approach For the Ranking of Fuzzy Sets With Different Heights

A New Approach For the Ranking of Fuzzy Sets With Different Heights New pproach For the ankng of Fuzzy Sets Wth Dfferent Heghts Pushpnder Sngh School of Mathematcs Computer pplcatons Thapar Unversty, Patala-7 00 Inda pushpndersnl@gmalcom STCT ankng of fuzzy sets plays

More information

Announcements. Supervised Learning

Announcements. Supervised Learning Announcements See Chapter 5 of Duda, Hart, and Stork. Tutoral by Burge lnked to on web page. Supervsed Learnng Classfcaton wth labeled eamples. Images vectors n hgh-d space. Supervsed Learnng Labeled eamples

More information

Machine Learning 9. week

Machine Learning 9. week Machne Learnng 9. week Mappng Concept Radal Bass Functons (RBF) RBF Networks 1 Mappng It s probably the best scenaro for the classfcaton of two dataset s to separate them lnearly. As you see n the below

More information

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between

More information

Clustering. A. Bellaachia Page: 1

Clustering. A. Bellaachia Page: 1 Clusterng. Obectves.. Clusterng.... Defntons... General Applcatons.3. What s a good clusterng?. 3.4. Requrements 3 3. Data Structures 4 4. Smlarty Measures. 4 4.. Standardze data.. 5 4.. Bnary varables..

More information

Graph-based Clustering

Graph-based Clustering Graphbased Clusterng Transform the data nto a graph representaton ertces are the data ponts to be clustered Edges are eghted based on smlarty beteen data ponts Graph parttonng Þ Each connected component

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

A Webpage Similarity Measure for Web Sessions Clustering Using Sequence Alignment

A Webpage Similarity Measure for Web Sessions Clustering Using Sequence Alignment A Webpage Smlarty Measure for Web Sessons Clusterng Usng Sequence Algnment Mozhgan Azmpour-Kv School of Engneerng and Scence Sharf Unversty of Technology, Internatonal Campus Ksh Island, Iran mogan_az@ksh.sharf.edu

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

Collaboratively Regularized Nearest Points for Set Based Recognition

Collaboratively Regularized Nearest Points for Set Based Recognition Academc Center for Computng and Meda Studes, Kyoto Unversty Collaboratvely Regularzed Nearest Ponts for Set Based Recognton Yang Wu, Mchhko Mnoh, Masayuk Mukunok Kyoto Unversty 9/1/013 BMVC 013 @ Brstol,

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

A User Selection Method in Advertising System

A User Selection Method in Advertising System Int. J. Communcatons, etwork and System Scences, 2010, 3, 54-58 do:10.4236/jcns.2010.31007 Publshed Onlne January 2010 (http://www.scrp.org/journal/jcns/). A User Selecton Method n Advertsng System Shy

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

An Improved Neural Network Algorithm for Classifying the Transmission Line Faults

An Improved Neural Network Algorithm for Classifying the Transmission Line Faults 1 An Improved Neural Network Algorthm for Classfyng the Transmsson Lne Faults S. Vaslc, Student Member, IEEE, M. Kezunovc, Fellow, IEEE Abstract--Ths study ntroduces a new concept of artfcal ntellgence

More information

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION Paulo Quntlano 1 & Antono Santa-Rosa 1 Federal Polce Department, Brasla, Brazl. E-mals: quntlano.pqs@dpf.gov.br and

More information

Alignment Results of SOBOM for OAEI 2010

Alignment Results of SOBOM for OAEI 2010 Algnment Results of SOBOM for OAEI 2010 Pegang Xu, Yadong Wang, Lang Cheng, Tany Zang School of Computer Scence and Technology Harbn Insttute of Technology, Harbn, Chna pegang.xu@gmal.com, ydwang@ht.edu.cn,

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

Image Representation & Visualization Basic Imaging Algorithms Shape Representation and Analysis. outline

Image Representation & Visualization Basic Imaging Algorithms Shape Representation and Analysis. outline mage Vsualzaton mage Vsualzaton mage Representaton & Vsualzaton Basc magng Algorthms Shape Representaton and Analyss outlne mage Representaton & Vsualzaton Basc magng Algorthms Shape Representaton and

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned

More information

Object-Based Techniques for Image Retrieval

Object-Based Techniques for Image Retrieval 54 Zhang, Gao, & Luo Chapter VII Object-Based Technques for Image Retreval Y. J. Zhang, Tsnghua Unversty, Chna Y. Y. Gao, Tsnghua Unversty, Chna Y. Luo, Tsnghua Unversty, Chna ABSTRACT To overcome the

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data Malaysan Journal of Mathematcal Scences 11(S) Aprl : 35 46 (2017) Specal Issue: The 2nd Internatonal Conference and Workshop on Mathematcal Analyss (ICWOMA 2016) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES

More information

Fitting: Deformable contours April 26 th, 2018

Fitting: Deformable contours April 26 th, 2018 4/6/08 Fttng: Deformable contours Aprl 6 th, 08 Yong Jae Lee UC Davs Recap so far: Groupng and Fttng Goal: move from array of pxel values (or flter outputs) to a collecton of regons, objects, and shapes.

More information

Recommended Items Rating Prediction based on RBF Neural Network Optimized by PSO Algorithm

Recommended Items Rating Prediction based on RBF Neural Network Optimized by PSO Algorithm Recommended Items Ratng Predcton based on RBF Neural Network Optmzed by PSO Algorthm Chengfang Tan, Cayn Wang, Yuln L and Xx Q Abstract In order to mtgate the data sparsty and cold-start problems of recommendaton

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

Active Contours/Snakes

Active Contours/Snakes Actve Contours/Snakes Erkut Erdem Acknowledgement: The sldes are adapted from the sldes prepared by K. Grauman of Unversty of Texas at Austn Fttng: Edges vs. boundares Edges useful sgnal to ndcate occludng

More information

Determining Fuzzy Sets for Quantitative Attributes in Data Mining Problems

Determining Fuzzy Sets for Quantitative Attributes in Data Mining Problems Determnng Fuzzy Sets for Quanttatve Attrbutes n Data Mnng Problems ATTILA GYENESEI Turku Centre for Computer Scence (TUCS) Unversty of Turku, Department of Computer Scence Lemmnkäsenkatu 4A, FIN-5 Turku

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

Fuzzy C-Means Initialized by Fixed Threshold Clustering for Improving Image Retrieval

Fuzzy C-Means Initialized by Fixed Threshold Clustering for Improving Image Retrieval Fuzzy -Means Intalzed by Fxed Threshold lusterng for Improvng Image Retreval NAWARA HANSIRI, SIRIPORN SUPRATID,HOM KIMPAN 3 Faculty of Informaton Technology Rangst Unversty Muang-Ake, Paholyotn Road, Patumtan,

More information

1. Introduction. Abstract

1. Introduction. Abstract Image Retreval Usng a Herarchy of Clusters Danela Stan & Ishwar K. Seth Intellgent Informaton Engneerng Laboratory, Department of Computer Scence & Engneerng, Oaland Unversty, Rochester, Mchgan 48309-4478

More information

Web Document Classification Based on Fuzzy Association

Web Document Classification Based on Fuzzy Association Web Document Classfcaton Based on Fuzzy Assocaton Choochart Haruechayasa, Me-Lng Shyu Department of Electrcal and Computer Engneerng Unversty of Mam Coral Gables, FL 33124, USA charuech@mam.edu, shyu@mam.edu

More information

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide Lobachevsky State Unversty of Nzhn Novgorod Polyhedron Quck Start Gude Nzhn Novgorod 2016 Contents Specfcaton of Polyhedron software... 3 Theoretcal background... 4 1. Interface of Polyhedron... 6 1.1.

More information

Clustering is a discovery process in data mining.

Clustering is a discovery process in data mining. Cover Feature Chameleon: Herarchcal Clusterng Usng Dynamc Modelng Many advanced algorthms have dffculty dealng wth hghly varable clusters that do not follow a preconceved model. By basng ts selectons on

More information

Using Fuzzy Logic to Enhance the Large Size Remote Sensing Images

Using Fuzzy Logic to Enhance the Large Size Remote Sensing Images Internatonal Journal of Informaton and Electroncs Engneerng Vol. 5 No. 6 November 015 Usng Fuzzy Logc to Enhance the Large Sze Remote Sensng Images Trung Nguyen Tu Huy Ngo Hoang and Thoa Vu Van Abstract

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE

SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE Dorna Purcaru Faculty of Automaton, Computers and Electroncs Unersty of Craoa 13 Al. I. Cuza Street, Craoa RO-1100 ROMANIA E-mal: dpurcaru@electroncs.uc.ro

More information

Overview. Basic Setup [9] Motivation and Tasks. Modularization 2008/2/20 IMPROVED COVERAGE CONTROL USING ONLY LOCAL INFORMATION

Overview. Basic Setup [9] Motivation and Tasks. Modularization 2008/2/20 IMPROVED COVERAGE CONTROL USING ONLY LOCAL INFORMATION Overvew 2 IMPROVED COVERAGE CONTROL USING ONLY LOCAL INFORMATION Introducton Mult- Smulator MASIM Theoretcal Work and Smulaton Results Concluson Jay Wagenpfel, Adran Trachte Motvaton and Tasks Basc Setup

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

Network Intrusion Detection Based on PSO-SVM

Network Intrusion Detection Based on PSO-SVM TELKOMNIKA Indonesan Journal of Electrcal Engneerng Vol.1, No., February 014, pp. 150 ~ 1508 DOI: http://dx.do.org/10.11591/telkomnka.v1.386 150 Network Intruson Detecton Based on PSO-SVM Changsheng Xang*

More information

Review of approximation techniques

Review of approximation techniques CHAPTER 2 Revew of appromaton technques 2. Introducton Optmzaton problems n engneerng desgn are characterzed by the followng assocated features: the objectve functon and constrants are mplct functons evaluated

More information

Meta-heuristics for Multidimensional Knapsack Problems

Meta-heuristics for Multidimensional Knapsack Problems 2012 4th Internatonal Conference on Computer Research and Development IPCSIT vol.39 (2012) (2012) IACSIT Press, Sngapore Meta-heurstcs for Multdmensonal Knapsack Problems Zhbao Man + Computer Scence Department,

More information

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics Introducton G10 NAG Fortran Lbrary Chapter Introducton G10 Smoothng n Statstcs Contents 1 Scope of the Chapter... 2 2 Background to the Problems... 2 2.1 Smoothng Methods... 2 2.2 Smoothng Splnes and Regresson

More information

Hermite Splines in Lie Groups as Products of Geodesics

Hermite Splines in Lie Groups as Products of Geodesics Hermte Splnes n Le Groups as Products of Geodescs Ethan Eade Updated May 28, 2017 1 Introducton 1.1 Goal Ths document defnes a curve n the Le group G parametrzed by tme and by structural parameters n the

More information

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach Angle Estmaton and Correcton of Hand Wrtten, Textual and Large areas of Non-Textual Document Images: A Novel Approach D.R.Ramesh Babu Pyush M Kumat Mahesh D Dhannawat PES Insttute of Technology Research

More information

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc. [Type text] [Type text] [Type text] ISSN : 0974-74 Volume 0 Issue BoTechnology 04 An Indan Journal FULL PAPER BTAIJ 0() 04 [684-689] Revew on Chna s sports ndustry fnancng market based on market -orented

More information

Enhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques

Enhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques Enhancement of Infrequent Purchased Product Recommendaton Usng Data Mnng Technques Noraswalza Abdullah, Yue Xu, Shlomo Geva, and Mark Loo Dscplne of Computer Scence Faculty of Scence and Technology Queensland

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

A Robust Method for Estimating the Fundamental Matrix

A Robust Method for Estimating the Fundamental Matrix Proc. VIIth Dgtal Image Computng: Technques and Applcatons, Sun C., Talbot H., Ourseln S. and Adraansen T. (Eds.), 0- Dec. 003, Sydney A Robust Method for Estmatng the Fundamental Matrx C.L. Feng and Y.S.

More information

Tuning of Fuzzy Inference Systems Through Unconstrained Optimization Techniques

Tuning of Fuzzy Inference Systems Through Unconstrained Optimization Techniques Tunng of Fuzzy Inference Systems Through Unconstraned Optmzaton Technques ROGERIO ANDRADE FLAUZINO, IVAN NUNES DA SILVA Department of Electrcal Engneerng State Unversty of São Paulo UNESP CP 473, CEP 733-36,

More information

Topology Design using LS-TaSC Version 2 and LS-DYNA

Topology Design using LS-TaSC Version 2 and LS-DYNA Topology Desgn usng LS-TaSC Verson 2 and LS-DYNA Wllem Roux Lvermore Software Technology Corporaton, Lvermore, CA, USA Abstract Ths paper gves an overvew of LS-TaSC verson 2, a topology optmzaton tool

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

Personalized Concept-Based Clustering of Search Engine Queries

Personalized Concept-Based Clustering of Search Engine Queries IEEE TRANSACTIONS ON JOURNAL NAME, MANUSCRIPT ID 1 Personalzed Concept-Based Clusterng of Search Engne Queres Kenneth Wa-Tng Leung, Wlfred Ng, and Dk Lun Lee Abstract The exponental growth of nformaton

More information

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions Sortng Revew Introducton to Algorthms Qucksort CSE 680 Prof. Roger Crawfs Inserton Sort T(n) = Θ(n 2 ) In-place Merge Sort T(n) = Θ(n lg(n)) Not n-place Selecton Sort (from homework) T(n) = Θ(n 2 ) In-place

More information

Associative Based Classification Algorithm For Diabetes Disease Prediction

Associative Based Classification Algorithm For Diabetes Disease Prediction Internatonal Journal of Engneerng Trends and Technology (IJETT) Volume-41 Number-3 - November 016 Assocatve Based Classfcaton Algorthm For Dabetes Dsease Predcton 1 N. Gnana Deepka, Y.surekha, 3 G.Laltha

More information

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009. Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton

More information

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification Introducton to Artfcal Intellgence V22.0472-001 Fall 2009 Lecture 24: Nearest-Neghbors & Support Vector Machnes Rob Fergus Dept of Computer Scence, Courant Insttute, NYU Sldes from Danel Yeung, John DeNero

More information