Data Mining Approaches to User Modeling for Adaptive Hypermedia: Survey and Future Directions

Size: px

Start display at page:

Download "Data Mining Approaches to User Modeling for Adaptive Hypermedia: Survey and Future Directions"

Brice Conley
5 years ago
Views:

1 > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < Data Mnng Approaches to User Modelng for Adaptve Hypermeda: Survey and Future Drectons Enrque Fras-Martnez, Sherry Y. Chen and Xaohu Lu Abstract The ablty of an adaptve hypermeda system to create talored envronments depends manly on the amount and accuracy of nformaton stored n each user model. Some of the dffcultes that user modelng faces are the amount of data avalable to create user models, the adequacy of the data, the nose wthn that data and the necessty of capturng the mprecse nature of human behavor. Data mnng and machne learnng technques have the ablty to handle large amounts of data and to process uncertanty. These characterstcs make these technques sutable for automatc generaton of user models that smulate human decson-makng. Ths paper surveys dfferent data mnng technques that can be used to effcently and accurately capture user behavor. The paper also presents gudelnes that show whch technques may be used more effcently accordng to the task mplemented by the applcaton. Index Terms Adaptve Hypermeda, Data Mnng, Machne Learnng, User Modelng. I. INTRODUCTION Adaptve hypermeda can be defned as the technology that allows to personalze for each ndvdual user of a hypermeda applcaton the content and presentaton of the applcaton accordng to user preferences and characterstcs (Perkowtz et al., 998) (Perkowtz et al., 999) (Perkowtz et al., 2000). The process of personalzaton s defned as the ways n whch nformaton and servces can be talored to match the unque and specfc needs of an ndvdual or a communty (Callan et al., 200). Personalzaton s about buldng customer loyalty by developng a meanngful one-to-one relatonshp; by understandng the needs of each ndvdual and helpng satsfy a goal that effcently and knowledgeably addresses each ndvdual s need n a gven context (Recken, 2000). The more nformaton a user model has, the better the content and presentaton wll be personalzed. A user model s created trough a user modelng process n whch unobservable nformaton about a user s nferred from observable nformaton from that user; for example, usng the nteractons wth the system (Zukerman, Albrecht & Ncholson, 999). Ths work was supported n part by the UK Arts and Humantes Research Board (AHRB) under Grant MRG/AN983/APN6300. E. Fras-Martnez, S. Chen and X. Lu are wth Department of Informaton Systems & Computng, Brunel Unversty, Uxbrdge, Mddlesex UB8 3PH, Unted Kngdom ; phone: +44 (0) ; fax: +44 (0) ; e- mal: {enrque.fras-martnez, sherry.chen, xaohu.lu}@brunel.ac.uk. User models can be created usng a user-guded approach, n whch the models are drectly created usng the nformaton provded by each user, or an automatc approach, n whch the process of creatng a user model s controlled by the system and s hdden from the user. The user-guded approach produces adaptable servces and adaptable user-models (Fnk et al., 997), whle the automatc approach produces adaptve servces and adaptve user models (Fnk et al., 997) (Bruslovsky et al., 997). In general, a user model wll contan some adaptve and some adaptable elements. Ideally the set of adaptable elements should be reduced to the mnmum possble (elements lke age, gender, favorte background color, etc.), whle other elements (favorte topcs, patterns of behavor, etc.) should be created by a learnng process. These concepts have also been presented n the lterature as mplct and explct user modelng acquston (Quroga et al., 999). The problem of user modelng can be mplemented usng an automatc approach because a typcal user exhbts patterns when accessng a hypermeda system and the set of nteractons contanng those patterns can be stored n a log database n the server. In ths context, machne learnng and data mnng technques can be appled to recognze regulartes n user trals and to ntegrate them as part of the user model. Machne learnng encompasses technques where a machne acqures/learns knowledge from ts prevous experence (Wtten & Frank, 999). The output of a machne learnng technque s a structural descrpton of what has been learned that can be used to explan the orgnal data and to make predctons. From ths perspectve, data mnng and other machne learnng technques make t possble to automatcally create user models for the mplementaton of adaptve hypermeda servces. These technques make t possble to create user models n an envronment, such as a hypermeda applcaton, n whch, usually, users are not wllng to gve feedback of ther actons. Ths fact leads researchers to consder the dfferent technques that can be used to capture and model user behavor and whch elements of the behavor of a user each one of those technques can capture. In general the adaptve servce that s gong to be mplemented determnes the nformaton that a user model should contan. Once that s known, and takng nto account the characterstcs of the avalable data, dfferent data mnng/machne learnng

2 > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 2 technques can be chosen to capture the necessary patterns. Ths paper has surveyed the development of user models usng data mnng and machne learnng technques from 999 to 2004, focusng on the man ournals and conferences for user modelng, manly: User Modelng and User-Adapted Interacton, Internatonal Conference on User Modelng, IEEE Transactons on Neural Networks, Workshop of Intellgent Technques for Web Personalzaton (part of IJCAI Internatonal Jont Conference of Artfcal Intellgence) and Internatonal Workshop on Knowledge Dscovery on the WEB (WEBKDD, part of the ACM SIGKDD Internatonal Conference on Knowledge Dscovery and Data Mnng). The paper s ntentons are () to gve an up-to-date vew of data mnng technques appled to User Modelng and hghlght ther potental advantages and lmtatons, and (2) to gve basc gudelnes about whch technques can be useful for a gven adaptve applcaton. The paper complements the results of (Zuckerman et al., 200) that revewed the use of predctve statstcal models for user modelng. The organzaton of the paper s as follows. The paper frst defnes the concept of user model, ts relevance for adaptve hypermeda and the basc steps for the automatc creaton of user models. After that a set of unsupervsed and supervsed technques are presented. For each technque we present ts theoretcal background, ts pros and cons and ts applcatons n the feld of user modelng. Next we develop a gudelne on how to create a user model accordng to the needs of the adaptve hypermeda applcaton that s gong to be mplemented. The conclusons secton closes the paper. II. USER MODELING (UM) A user model should capture the behavor (patterns, goals, nterestng topcs, etc.) a user shows when nteractng wth the Web. A user model s defned as a set of nformaton structures desgned to represent one or more of the followng elements (Kobsa, 200): () representaton of goals, plans, preferences, tasks and/or abltes about one or more types of users; (2) representaton of relevant common characterstcs of users pertanng to specfc user subgroups or stereotypes; (3) the classfcaton of a user n one or more of these subgroups or stereotypes; (4) the recordng of user behavor; (5) the formaton of assumptons about the user based on the nteracton hstory and/or (6) the generalzaton of the nteracton hstores of many users nto groups. A. User Modelng and Adaptve Hypermeda Adaptve hypermeda allows to personalze to each ndvdual the content and presentaton of a web ste. The archtecture of an adaptve hypermeda system s usually dvded n two parts: the server sde and the clent sde. The server sde generates the user models from a data base contanng the nteractons of the users wth the system and the personal data/preferences that each user has gven to the system. Also, user models can contan knowledge ntroduced by the desgner (n the form of rules for example). These user models, n combnaton wth a hypermeda database, are used by the Decson Makng and Personalzaton Engne module Fg.. Generc Archtecture of an Adaptve Hypermeda Applcaton. to dentfy user needs, decde on the types of adaptaton to be performed and communcate them to an adaptve nterface. Fg. presents the archtecture of a generc AH system. A personalzed hypermeda system, by ts very nature, should respond n real tme to user nputs. To do so, the archtecture of the system should provde a quck access to the rght nformaton at the rght tme. Adaptve hypermeda uses the knowledge gven by user models to mplement one or more of the two basc types of adaptve tasks: Recommendaton (R). Recommendaton s the capablty of suggestng nterestng elements to a user based on some nformaton; for example from the tems to be recommended or from the behavor of other users. Recommendaton s also known n the lterature as collaboratve flterng. Classfcaton (C). Classfcaton bulds a model that maps or classfes data tems nto one of several predefned classes. Classfcaton s done usng only data related to that partcular tem. Ths knowledge can be used to talor the servces of each user stereotype (class). In the lterature, classfcaton has also been presented as content-based flterng. Both recommendaton and classfcaton are types of flterng applcatons. B. Automatc Generaton of User Models One of the processes presented n Fg. s the automatc generaton of user models from the nteracton data between the users and the system (done by the UM Generaton module). Fg. 2 presents the basc steps of ths module: () Data Collecton, (2) Preprocessng, (3) Pattern Dscovery and (4) Valdaton and Interpretaton: Data Collecton. In ths stage user data s gathered. For automatc user modellng the data collected ncludes: data regardng the nteracton between the user and the Web, data regardng the envronment of the user when nteractng wth the Web, drect feedback gven by the user, etc. Data Preprocessng/Informaton Extracton. The nformaton obtaned n the prevous stage cannot be drectly processed. It needs to be cleaned from nose and nconsstences n order to be used as the nput of the next

3 > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 3 phase. For user modellng, ths nvolves manly user dentfcaton and sesson reconstructon. Ths stage s amed at obtanng, from the data avalable, the semantc content about the user nteracton wth the system. Also n ths phase the data extracted should be adapted to the data structure used by standard pattern dscovery algorthms used n the next step. Pattern Dscovery. In ths phase, machne learnng and data mnng technques are appled to the data obtaned n the prevous stage n order to capture user behavour. The output of ths stage s a set of structural descrptons of what have been learned about user behavour and user nterests. These descrptons consttute the base of a user model. Dfferent technques wll capture dfferent user propertes and wll express them n dfferent ways. The knowledge needed to mplement an adaptve servce wll determne, among other factors, whch technques to apply n ths phase. Valdaton and Interpretaton. In ths phase the structures obtaned n the pattern dscovery stage are analyzed and nterpreted. The patterns dscovered can be nterpreted and valdated, usng doman knowledge and vsualzaton tools, n order to test the mportance and usablty of the knowledge obtaned. In general ths process s done wth the help of a user modellng desgner. User model heurstcs are used n each step of the process to extract, adapt and present the knowledge n a relevant way. The process of generaton of user models usng data mnng/machne learnng technques can be seen as a standard process of extractng knowledge from data where user modellng s used as a wrapper for the entre process. C. Data Mnng and ts Relevance to User Modelng As t has been presented n Fg. 2, the phase of Pattern Dscovery fnds out relevant nformaton about the behavor of a user (or set of users) when nteractng wth the Web. Data mnng and machne learnng technques are deal for that process because they are desgned to represent what have been learned from the nput data wth a structural representaton. Ths representaton stores the knowledge needed to mplement the two types of tasks prevously descrbed. Although the applcaton of machne learnng and data mnng technques work really well for modelng user behavor, t also faces some problems. As stated n (Webb et al., 200), among those challenges s the problem of needng large data sets, the problem of labelng data for supervsed machne leanng technques and the problem of computatonal complexty. Each data mnng/machne learnng technque wll capture dfferent relatonshps among the data avalable and wll express the results usng dfferent data structures. The key queston s to fnd out whch patterns need to be captured n order to mplement an adaptve servce. It s mportant, n order to choose a sutable learnng method, to know what knowledge s captured by each technque and how that knowledge can be used to mplement the two basc tasks. Also, the choce of learnng method depends largely on the type of tranng data avalable. The man dstncton n Fg. 2. Steps for Automatc Generaton of User Models. machne learnng research s between supervsed and unsupervsed learnng. Supervsed learnng requres the tranng data to be preclassfed. Ths means that each tranng tem s assgned a unque label, sgnfyng the class to whch the tem belongs. Gven these data, the learnng algorthm bulds a characterstc descrpton for each class, coverng the examples of ths class. The mportant feature of ths approach s that the class descrptons are bult condtonal to the preclassfcaton of the examples n the tranng set. In contrast, unsupervsed learnng methods do not requre preclassfcaton of the tranng examples. These methods form clusters of examples, whch share common characterstcs. The man dfference to supervsed learnng s that categores are not known n advance, but constructed by the learner tself. When the coheson of a cluster s hgh,.e., the examples n t are smlar, t defnes a new class. The rest of the paper presents how data mnng and machne learnng technques have been used for user modelng: whch knowledge can be captured wth each technque, examples of applcatons and ts lmts and strengths. The technques presented are dvded nto two groups: unsupervsed, whch ncludes clusterng (herarchcal, non-herarchcal and fuzzy clusterng) and assocaton rules, and supervsed, whch ncludes Decson trees/classfcaton rules, k-nearest neghbor (k-nn), Neural Networks and Support Vector Machnes (SVM). III. UNSUPERVISED APPROACHES TO USER MODELING The man unsupervsed technques are clusterng and assocaton rules. Clusterng comprses a wde varety of dfferent technques based n the same concept. A collecton of dfferent clusterng technques and ts varatons can be found n (Jan et al., 988). A. Clusterng for User Modelng The task of clusterng s to structure a gven set of unclassfed nstances (data vectors) by creatng concepts, based on smlartes found on the tranng data. A clusterng algorthm fnds the set of concepts that cover all examples verfyng that: () the smlarty between examples of the same concepts s maxmzed, and (2) the smlarty between examples of dfferent concepts s

4 > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 4 mnmzed. In a cluster algorthm the key element s how to obtan the smlarty between two tems of the tranng set. Clusterng technques can be classfed n hard clusterng and fuzzy clusterng. In non-fuzzy or hard clusterng, data s dvded nto crsp clusters, where each data pont belongs to exactly one cluster. In fuzzy clusterng, the data ponts can belong to more than one cluster, and assocated wth each of the nstances are membershp grades whch ndcate the degree to whch they belong to the dfferent clusters. Hard clusterng technques may be grouped nto two categores: herarchcal and non-herarchcal (Jan, 999). A herarchcal clusterng procedure nvolves the constructon of a herarchy or tree-lke structure, whch s bascally a nested sequence of parttons, whle non-herarchcal or parttonal procedures end up wth a partcular number of clusters at a sngle step. ) Basc Algorthms: Non-herarchcal Technques The man non-herarchcal clusterng technques are: () k- means clusterng and (2) Self-Organzng Maps (SOM). a) K-means Clusterng The k-means clusterng technque (MacQueen, 967) s gven as an nput the number of clusters k. The algorthm then pcks k tems, called seeds, from the tranng set n an arbtrary way. Then, n each teraton, each nput tem s assgned to the most smlar seed, and the seed of each cluster s recalculated to be the centrod of all tems assgned to that seed. Ths process s repeated untl the seed coordnates stablze. Ths algorthm ams at mnmzng an obectve functon, J, typcally a squared error functon: k n J = d = = k n = x = = ( ) c where d s the dstance measure between a data pont x and the cluster centre c. J s an ndcator of the dstance of the n data ponts from ther respectve cluster centres and t represents the compactness of the clusters created. Although t can be proved that the procedure wll always termnate, the k-means algorthm does not necessarly fnd the most optmal confguraton, correspondng to the global obectve functon mnmum. The K-means algorthm s popular because t s easy to understand and easy to mplement. Its man drawback s that t can take a consderable tme f the number of patterns nvolved s very large and the number of clusters s substantal. Another problem s that t s senstve to the ntal partton the selecton of the ntal patterns, and may converge to a local mnmum of the crteron functon value f the ntal partton s not properly chosen. A possble remedy s to run the algorthm wth a number of dfferent ntal parttons. If they all lead to the same fnal partton, ths mples that the global mnmum of the square error has been acheved. However, ths can be tme-consumng, and may not always work. 2 () b) Self Organzng Maps (SOM) The self-organsng map (SOM) algorthm was proposed by Teuvo Kohonen n 98 (Kohonen, 997). Apart from beng used n a wde varety of felds, the SOM offers an nterestng tool for exploratory data analyss, partcularly for parttonal clusterng and vsualzaton. It s capable of representng hghdmensonal data n a low dmensonal space (often a two or one dmensonal array) that preserves the structure of the orgnal data. A self-organsng network conssts of a set of nput nodes V = {v, v 2,, v N }, a set of output nodes C = {c, c 2,, c M }, a set of weght parameters W = {w, w 2,, w,, w NM } ( N, M, 0 w ), and a map topology that defnes the dstances between any gven two output nodes. The output nodes n SOM are usually arranged n a 2-dmensonal array to form a map. Each nput node s fully connected to every output node va a varable connecton. A weght parameter s assocated wth each of these connectons, and the weghts between the nput nodes and output nodes are teratvely changed durng the learnng phase untl a termnaton crteron s satsfed. For each nput vector v, there s one assocated wnner node on the output map. A wnner node s an output node that has mnmum dstance to the nput vector. The SOM algorthm starts by ntalzng the topology and sze of the output map (M), the connecton weghts to random values over the nterval [0, ], the gan value η (learnng rate) and the neghbourhood sze r, and normalzes both the nput vectors and the connected weght vectors. After that, the algorthm calculates the Eucldean dstance between the new nput vector v and each node on the output map, and desgnates the node wth the mnmum dstance as the wnner node c: N vk wk, =,2,,M (2) k = c = mn ( ) 2 Once c s obtaned, the algorthm updates the weghts W, the learnng rate η and N c, the neghbourhood surroundng the wnner node c, n such way that the vectors represented by output nodes are smlar f they are located n a small neghborhood. For each node N c, the SOM algorthm performs the followng operaton: w ( old ) ( ) = w + η [ v w ] (3) ( new) old After that the neghborhood sze and the learnng rate are decreased. Ths process s repeated untl t converges. The neghborhood set N c s a set of nodes that surround the wnner node c. These nodes n N c are affected wth weght modfcatons, apart from those changes made to the wnner node, as defned n the algorthm. These weght changes are made to ncrease the matchng between the nodes n N c and the correspondng nput vectors. As the update of weght parameters proceeds, the sze of the neghborhood set s slowly decreased to a predefned lmt, for nstance, a sngle node. Ths process leads to one of the most mportant

5 > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 5 propertes of SOM that smlar nput vectors are mapped to geometrcally close wnner nodes on the output map. Ths s called neghborhood preservatons, whch has turned out to be very useful for clusterng smlar data patterns. It s not always straghtforward to vsually nspect the proected data on the two-dmensonal output map n order to decde the number and sze of natural clusters. Therefore, careful analyss or post-processng of output maps s crucal to the partton of the orgnal data set. Lke the K-means algorthm, the SOM produces a sub-optmal partton f the ntal weghts are not chosen properly. Moreover, ts convergence s controlled by varous parameters such as the learnng rate, the sze and shape of the neghbourhood n whch learnng takes place. Consequently, the algorthm may not be very stable n that a partcular nput pattern may produce dfferent wnner nodes on the output map at dfferent teratons. 2) Basc Algorthms: Herarchcal Technques The man problem of non-herarchcal approaches s that when workng wth hgh dmensonal problems, n general, there wll not be enough tems to populate the vector space, whch wll mply that most dmensons wll be unrelable for smlarty computatons. In order to solve ths problem herarchcal clusterng technques were developed. There are two types of herarchcal clusterng: agglomeratve and dvsve. Both share a common characterstc: they create a herarchy of clusters. Whle the agglomeratve approach creates a bottom-up herarchy the dvsve approach creates a top-down one. Generally speakng, dvsve algorthms are computatonally less effcent. A typcal herarchcal agglomeratve clusterng algorthm s outlned below: ) Place each pattern n a separate cluster 2) Compute the proxmty matrx of all the nter-pattern dstances for all dstnct pars of patterns 3) Fnd the most smlar par of clusters usng the matrx. Merge these two clusters nto one, decrement number of clusters by one and update the proxmty matrx to reflect ths merge operaton. 4) If all patterns are n one cluster, stop. Otherwse, go to the above step 2. The output of such algorthm s a nested herarchy of trees that can be cut at a desred dssmlarty level formng a partton. Herarchcal agglomeratve clusterng algorthms dffer prmarly n the way they measure the dstance or smlarty of two clusters where a cluster may consst of only a sngle obect at a tme. The most commonly used nter-cluster measures are: d AB = mn( d ), A B (4) d AB d AB = max( d ), A B = n n A B A B d (5), (6) where d AB s the dssmlarty between two clusters A and B, d s the dssmlarty between two ndvdual patterns and, n A and n B are the number of ndvduals n clusters A and B respectvely. These three nter-cluster dssmlarty measures are the bass of the three of the most popular herarchcal clusterng algorthms. The sngle-lnkage algorthm uses (4), the mnmum of the dstances between all pars of patterns drawn from the two clusters (one pattern from each cluster). The complete-lnkage algorthm uses Equaton (5), the maxmum of all parwse dstances between patterns n the two clusters. The group-average algorthm uses Equaton (6), the average of the dstances between all pars of ndvduals that are made up of one ndvdual from each cluster. A challengng ssue wth herarchcal clusterng s how to decde the optmal partton from the herarchy. One approach s to select a partton that best fts the data n some sense, and there are many methods that have been suggested n the lterature (Evertt, 993). It has also been found that the sngle-lnkage algorthm tends to exhbt the so-called channg effect: t has a tendency to cluster together at a relatvely low level obects lnked by chans of ntermedates. As such, the method s approprate f one s lookng for optmally connected clusters rather than for homogeneous sphercal clusters. The complete-lnkage algorthm, on the other hand, tends to produce clusters that tghtly bound or compact, and has been found to produce more useful herarches n many applcatons than the sngle-lnk algorthm (Jan, 999). The group-average algorthm s also wdelyused. Detaled dscusson and practcal examples of how these algorthms work can be found n (Jan, 999) and (Webb, 999). 3) Basc Algorthms: Fuzzy Clusterng One of the most wdely used fuzzy clusterng algorthms s the Fuzzy C-Means (FCM) Algorthm (Bezdek, 98). The FCM algorthm attempts to partton a fnte collecton of elements X={x,,x n } nto a collecton of c fuzzy clusters wth respect to some gven crteron. Gven a fnte set of data, the algorthm returns a lst of c cluster centres C={c,,c c }and a partton matrx U=u, є [0,],=, n, =,,c, where each element tells the degree to whch element x belongs to cluster c. Lke the k-means algorthm, the fuzzy c-means ams to mnmse an obectve functon. The standard functon s: J = c n m ( ) ( u, ) x c = = whch dffers from the k-means obectve functon by the addton of the membershp values u and the fuzzfer m. The 2 (7)

6 > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 6 fuzzfer m determnes the level of cluster fuzzness. A large m results n smaller membershps u and hence, fuzzer clusters. In the lmt m=, the membershps u converge to 0 or, whch mples a crsp parttonng. In the absence of expermentaton or doman knowledge, m s commonly set to 2. The basc Fuzzy C-Means Algorthm, gven n data ponts (x,,x n ) to be clustered, a number of c clusters wth (c,,c c ) the center of the clusters, and m the level of cluster fuzznes wth, m >, (8) frst ntalzes the membershp matrx U to random values, verfyng that: c u [0,], u = (9) = After the ntalzaton, the algorthm obtans the center of the clusters c, =,,c: n m ( u ) x = c = n (0) m ( u ) = And obtans the dstance between all ponts =,,n and all cluster centers =,,c d ( = x c () ) Updatng matrx U accordng to the new dstances, d = 0 u = c d u = = k dk 2 m (2) Ths process s repeated untl the set of cluster centers s stablzed. There are other algorthms, whch are optmzatons of the orgnal FCM, lke Fuzzy c-medod Algorthm (FCMdd) or the Fuzzy c-trmered Medods Algorthm (FCTMdd) (Krshnapuram et al., 200) 4) Applcatons for User Modelng For UM there are two knds of nterestng clusters to be dscovered: usage clusters and page clusters. Clusterng of users tends to establsh groups of users exhbtng smlar browsng patterns, whch are usually called stereotypes. Such knowledge s especally useful for nferrng user demographcs n order to perform market segmentaton n e- commerce-applcatons or provde personalzed Web content to the users. On the other hand, clusterng of pages wll dscover groups of pages havng related content. Ths nformaton s useful for Internet search engnes and Web assstance provders. In the context of UM, clusterng has a dstngushable characterstc: t s usually done wth non-numercal data. Ths mples that, usually, the clusterng technques appled are relatonal, where numercal values represent the degrees to whch two obects of the data set are related. Clusterng appled to user modelng has to use technques that can handle relatonal data because the nformaton used to create clusters (pages vsted, characterstcs of the user, etc.) cannot usually be represented by numercal vectors. In case they are represented usng vectors, part of the semantc of the orgnal data s lost. In these systems the defnton of dstance s done usng vectoral representatons of user nteractons wth the adaptve hypermeda system. Some examples of relatonal TABLE I EXAMPLES OF SOME CLUSTERING-BASED USER MODELS (Mobasher et al, 2000) (Palouras et al. 999) (Doux et al., 997) (Fu et al., 999) (Hay et al., 200) (Mobasher et al., 200) Applcaton Input Data Results Capture of web-users nterests usng k-means User logs from the Unv. of Mnnesota Example of the mplementaton of the Clusterng. Desgn of a cluster-based Comp. Scence web server collected recommendaton system n a commercal recommendaton system durng a month. ste. News- flterng system based on communtes of users. Clusterng allows to recommend nterestng news to a user. K-means clusterng algorthm for user proflng n order to derve prototypcal behavor from each user. Groupng of users wth a common behavor n a web server takng nto account access patterns. Clusterng methods that capture the nherent sequentalty of web vsts. A metrc, Sequence Algnment Method, s ntroduced to be used nstead of Eucldean dstance for clusterng purposes. Clusterng for collaboratve flterng When regsterng a user specfes hs/her nterests. Data collected from a dedcated set of experments where users are asked about ther preferences. Data collected from UMR web server log ( contanng 2.5 mllon records. Log fles of a Belgan telecom provder collected over a one-week perod. 2,000 sessons collected from the Assocaton for Consumer Research web ste. Establshed machne learnng technques are very useful for the acquston of communtes of users The technques proposed handle qualtatve data for clusterng users effcently. The clusters obtaned can be used for personalzaton purposes. The results are as good as the ones obtaned wth Eucldean dstance, whle keepng the concept of order. Wth the proper data preprocessng the clusterng approach outperforms more tradtonal approaches to ths problem (lke Nearest Neghbor)

7 > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 7 TABLE II EXAMPLES OF SOME FUZZY CLUSTERING-BASED USER MODELS Lampnen and Kovsto (2002) Nasraou et al. (999) Josh et al. (2000) Krshnapuram et al. (200) Applcaton Input Data Outcome samples of dfferent applcatons from an edge router of a LAN network. ntroduced. Obtan applcaton profles from network traffc data to manage network resources. A new algorthm (CARD) to mne user profles from access logs s proposed. Two algorthms to mne user profles: FCMdd and FCTMdd. Web access log analyss for user proflng usng RFCMdd (Robust Fuzzy c-medods). 2 day log data of the Dep. of Comp. Eng.. at Unv. of Mssour. CSEE logs of Unv. of Maryland Fve days of CSEE web server actvty of Unv. of Maryland. FCM produced better results than SOM. A method for the comparson of both solutons s also CARD s very effectve for clusterng many dfferent profles n user sessons. Both algorthms extract nterestng user profles. FCMdd s not able to handle nose as effectvely as FCTMdd. RFCMdd s very effectve for clusterng of relatonal data. clusterng appled to web mnng are (Josh et al., 2000) and (Mobasher et al., 2000). Table I summarzes some studes and applcatons of hard clusterng for UM. Some examples of recommendaton mplemented wth clusterng algorthms are (Mobasher et al, 2000), (Doux et al., 999), (Palouras et al., 999), whch uses SOM, (Fu et al., 200) and (Mobasher et al, 200). Examples of classfcaton tasks mplemented usng clusterng are (Doux et al. 997) and (Hay et al., 200). (Goren-Bar et al., 200) uses SOM to classfy documents based on a subectvely predefned set of clusters n a specfc doman. When usng Fuzzy Clusterng, the man dfference wth hard clusterng technques s that, a user can be at the same tme n more than one cluster wth dfferent degrees of truth. Ths allows to better capture the nherent uncertanty that the problem of modelng user behavor has. Examples of applcatons that mplement a recommendaton task usng FC nclude (Lampnen and Kovsto, 2002), (Nasraou et al., 999) and (Nasraou and Krshnapuram, 2000). Examples of classfcaton tasks are (Josh et al., 2000) and (Krshnapuram et al., 200). Table II summarzes some studes and applcatons of FC for UM. 5) Lmtatons The man problem that clusterng technques face s how to defne the concept of dstance that s gong to be used. In general some knowledge of the problem s needed to defne an optmum concept of dstance. When appled to user modellng ths problem s even harder due to the nature of the data avalable: nteractons, user preferences, pages vsted, etc., whch are not expressed n a numercal way. Dfferent technques to characterze web user behavour usng numercal vectors have been proposed (Josh et al., 2000) (Mobasher et al., 2000), but n one way or another, the representatons loose part of the semantcs that the orgnal data had. More research s needed on how to express the data avalable n a numercal way and how to defne a meanngful concept of dstance for creatng effcent user models usng clusterng technques whle at the same tme keepng the semantcs of the orgnal data. Non-herarchcal clusterng technques assume that the number of clusters k s known a pror. For user modelng ths s not usually the case. Ths mples that some heurstcs need to be used to determne the number of clusters. Also user models developed so far usng fuzzy clusterng do not fully use the fuzzy nature of the technque n order to create more flexble and adaptve systems. More studes are needed to create meanngful ways of mxng the dfferent personalzatons assocated wth each one of the clusters n whch a user can be ncluded when usng fuzzy clusterng. B. Assocaton Rules for User Modelng Assocaton rule dscovery ams at dscoverng all frequent patterns among transactons. The problem was orgnally ntroduced by (Agrawal et al., 993) and was based on detectng frequent tems n a market basket. The Apror algorthm, ncludng ts varatons and extensons, s wdely accepted to solve ths problem. ) Basc Algorthms Beng I={, 2,..., m } a set of tems, S a set of transactons, where each transacton T s a set of tems, and X a set of tems n I, a transacton T s sad to contan X, f : X T (3) An Assocaton Rule s defned as an mplcaton of the form: X Y, X I, Y I, X Y = (4) Where X s usually called antecedent and Y consequent. The order of the tems of both the antecedent and consequent s not relevant. Each Assocaton Rule has a confdence and a support assocated. The support of a rule s defned as the fracton of strngs n the set of sessons of S where the rule successfully apples: S S / XY S S θ ( XY ) = (5) The confdence of a rule s defned as the fracton of tmes for whch f the antecedent X s satsfed, the consequent Y s also true: θ ( XY) σ ( XY) = θ ( X ) (6)

8 > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 8 The values of support and confdence are used to flter the set of assocaton rules obtaned from S. Gven a set of transactons S, the task of mnng assocaton rules can be formulated as fndng all assocaton rules wth at least mnmum support and mnmum confdence, where the thresholds for support and confdence are user-specfed values. The task of dscoverng assocaton rules can be done n three steps: () fnd all sets of tems that have transacton support above mnmum support, (2) for each tem obtaned, fnd all non-empty subsets and (3) for each such subset X of tem I, f the confdence s bgger than the gven threshold, produce the rule: X ( I X ) (7) 2) Applcatons for User Modelng In the context of user modelng, assocaton rules capture sets of actons that have a causal relaton among them. A typcal applcaton of assocaton rules for user modelng s capturng pages that are accessed together. Typcally, assocaton rules are used to mplement recommendaton tasks. Some examples of recommendaton tasks mplemented usng assocaton rules are (Chen et al., 2002), (Ln et al., 200) and (Nanopoulos et al., 200), (Mobasher et al., 200), and (Geyer-Schulz et al., 2002). Table III summarzes these examples. 3) Lmtatons Assocaton rules have two mportant drawbacks when used for user modelng: they do not capture the sequentalty of both the antecedent and the consequent and no temporalty nformaton (for example when X happens when s Y gong to happen) s captured. Both temporalty and sequentalty are very mportant characterstcs for user modelng. Temporalty allows us to predct not only what actons a user s gong to take, but also when those actons are gong to be taken. Sequentalty captures the order n whch a user makes a set of actons whch s relevant to the consequent that such antecedent wll fre. Because temporalty and sequentalty are two key elements that need to be captured to create effcent user models, more research s needed to capture these characterstcs when creatng assocaton rules. IV. SUPERVISED APPROACHES TO USER MODELING Ths secton gves a revew of how supervsed learnng technques (decson trees/classfcaton rules, Neural Networks, k-nearest neghbor and Support Vector Machnes) can be used to model user behavor. A. Decson Trees/Classfcaton Rules for User Modelng Decson tree learnng (Mtchell, 997)(Wnston, 992) s a method for approxmatng dscrete-valued functons wth dsunctve expressons. Decson tree learnng s generally best suted to problems where nstances are represented by attrbute-value pars and the target functon has dscrete output values. Classfcaton rules are an alternatve representaton of the knowledge obtaned from classfcaton trees. They construct a profle of tems belongng to a partcular group accordng to ther common attrbutes. ) Basc Algorthms The tranng process that creates a decson tree s called nducton. A standard decson tree algorthm has two phases: () tree growng and (2) prunng. The growng phase can be done usng two methods: () Top-Down nducton and (2) Incremental nducton. Top-down nducton s an teratve process whch nvolves splttng the data nto progressvely smaller subsets. Each teraton consders the data n only one node. The frst teraton consders the root node that contans all the data. Subsequent teratons work on dervatve nodes that wll contan subsets of the data. The algorthm begns by analyzng the data to fnd the ndependent varable that, when used as a splttng rule wll result n nodes that are most dfferent from each other wth respect to the dependent varable. The qualty of a test s measured by the mpurty/varance of example sets. The most common measure s the nformaton gan. Typcally, TABLE III EXAMPLES OF SOME ASSOCIATION RULES-BASED USER MODELS (Chen et al., 2002) (Ln et al., 200) (Mobasher et al., 200) (Geyer-Schulz, 2002) (Nanopoulos et al, 200) Applcaton Input Data Results Predcton of user ntenton n machne-human Interacton of 5 users wth the system The approach acheves 84% average nteracton. The model s used to predct human actons durng a month. accuracy n predctng user s ntenton. for an Internet browser. Learnng user preference models of multmeda Internet fles. The model s used to predct when a user wll need a multmeda fle and to help n the search of the fle needed. Effcent personalzaton of web stes usng assocaton rules and non-labeled data contaned n server s log fle. Evaluaton of the recommendatons gven n a educatonal Internet nformaton broker usng Assocaton Rules and repeat-buyng theory. Assocaton Rule dscovery to construct a predcton system for effectve predcton of web user request. Implementaton of a prefetchng system. 200 records of user s emals. Access logs for the Web ste Assocaton for Consumer Research (ACR) Newsletter ( Data set obtaned from the Vrtual Unversty Informaton system of Venna Unversty durng 6 moths of usage. Clarknet trace log: The average correct acceptance and average correct refusal of each user preference modelng s hgher that 80%. Assocaton Rules acheve better recommendaton effectveness that tradtonal collaboratve flterng approaches. Assocaton Rules tend to work better and acheve hgher predcton accuracy. The proposed algorthm WM o outperforms the exstng algorthms and s very effcent n reducng the number of canddates.

9 > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 9 the set of possble tests s lmted to splttng the examples accordng to the value of a certan attrbute. Once a node s splt, the same process s performed on the new nodes, each of whch contans a subset of the data n the parent node. Ths process s repeated untl only nodes where no splts should be made reman. Incremental nducton s a method for the task of concept learnng. When a new tranng example s entered t s classfed by the decson tree. If t s ncorrectly classfed then the tree s revsed. Restructurng the tree can be done by storng all tranng examples or by mantanng statstcs assocated wth nodes n the tree. Tree-buldng algorthms usually have several stoppng rules. These rules are usually based on several factors ncludng maxmum tree depth, mnmum number of elements n a node consdered for splttng, or the mnmum number of elements that must be n a new node. The second phase of the algorthm optmzes the resultng tree obtaned n the frst phase. Prunng s a technque used to make a tree more general. It removes splts and the subtrees created by them. There s a great varety of dfferent decson tree algorthms n the lterature. Some of the more common algorthms are: Classfcaton and Regresson Trees (CART) (Breman et al, 984) (Efron et al., 99), Ch-squared Automatc Interacton Detecton (CHAID) (Kass, 980), C4.5 (Qunlan, 993), J4.8 (Wtten and Frank, 999), C5.0 (whch mplements boostng) and ID3 (Qunlan, 986). Rules are, at ts smplest form, an equvalent form of expressng a classfcaton tree. In order to obtan the set of rules of a decson tree, each path s traced from root node to leaf node, recordng the test outcomes as antecedents and the leaf-node classfcaton as the consequent. The process of convertng a decson tree nto decson rules can be done after or before prunng. Convertng a decson tree to rules before prunng has some advantages: () t allows dstngushng among the dfferent contexts n whch a decson node s used, (2) the prunng decson regardng an attrbute test can be made dfferently for each path, and (3) t mproves readablty. Nevertheless, obtanng the rules before prunng can produce a hgh number of rules. Ths can be solved by applyng some rule-reducton technques. The basc technques are: () elmnate redundant and unnecessary antecedents, (2) use OR to combne rules n order to reduce the total number of rules to the number of dependent varables and (3) elmnate unnecessary rules. Rules can be also obtaned after the tree has been pruned, producng n ths case a smaller rule knowledge-base. Algorthms such as CART, C4.5 and C.5 nclude methods to generalze rules assocated wth a tree whch removes redundances. 2) Applcatons for User Modelng In the context of user modelng, decson trees can be used to classfy users and/or documents n order to use ths nformaton for personalzaton purposes. Decson trees can also handle nosy data and/or data wth mssng parameters, whch makes them very useful for creatng user models due to the nosy and mprecse nature of the data avalable. Table IV summarzes some studes and applcatons of Decson Trees for UM. Classfcaton trees are typcally used to mplement classfcaton tasks. In ths case the classfcaton trees are used to construct user models used to personalze the user experence (for example regardng hs/her level of expertse, hs/her cogntve style, etc.). Some examples of ths approach are (Tsukada et. al, 2002) and (Beck et al., 2003). Due to ts ablty to group users wth smlar characterstcs, classfcaton tress can be also used to mplement recommendaton tasks. Examples of these approaches are (Palouras et al, 999), (Zhu et al., 2003) and (Webb et al., 997). Classfcaton rules are wdely used to model user behavor because t provdes a straghtforward framework to represent knowledge. The readablty of the output knowledge s a great advantage of ths approach. Table V summarzes some studes and applcatons of Classfcaton Rules for UM. TABLE IV EXAMPLES OF SOME DECISSION TREE-BASED USER MODELS (Palouras et al., 999) (Tsukada et al., 200) (Webb et al., 997) (Zhu et al., 2003) (Beck et al., 2003) Applcaton Input Data Results Constructon of user stereotypes usng C4.5. Stereotypes Questons answered by 3 users It s very mportant to have good data n are used to antcpate nterests of a user wthn the regardng the ECRAN nformaton order to obtan good models. context of a news retreval system. system. Automatc classfcaton of web pages n a pre-specfed set of categores usng C4.5 and assocaton rules. Use of C4.5 to buld the Feature Based Modelng nstructon module. The results are appled to the Subtracton Modeler. Constructon of a recommender system to help users fnd relevant nformaton on the web usng C4.5 and Naïve Bayesan Classfer. Constructon of a User Model for an adaptve tutor wth J4.8 and Naïve Bayesan classfer. 4 top categores of Yahoo! JAPAN. From each category 200 pages were randomly download. Test admnstered to 73 nne to ten year old prmary school students. Each test contaned 40 three column subtracton problem randomly generated. Collected data from 29 partcpants, askng each partcpant to perform two search tasks. Data collected from the nteracton of 88 students wth the Readng Tutor. The expermental evaluaton demonstrates that ths method provdes acceptable accuracy wth the classfcaton of web-page nto top categores of Yahoo! JAPAN. C4.5 ncreases the number of predctons made wthout sgnfcantly alterng the accuracy of those predctons. C4.5 outperforms Naïve Bayesan Classfer. Naïve Bayesan Classfer outperforms J4.8 for ndvdual modelng and J4.8 outperforms Naïve Bayesan Classfer for Group modelng.

10 > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 0 (Shah et al., 2002), (Zhang, 2003), (Joerdng, 999) and (Semeraro et al., 200) are examples of classfcaton tasks mplemented usng classfcaton rules. In these cases, the rules are used to characterze dfferent user stereotypes that are used to mplement some personalzed feature. (Cercone, 2002) and (Cercone et al., 2002b) are examples of applcaton of classfcaton rules for recommendaton. Classfcaton rules can not only be obtaned from a learnng algorthm but can also be drectly expressed by the desgner of the system. Some examples of ths approach are (Morales et al, 999) that constructs a model capturng the behavor of a user that s controllng a pole, (De Bra et al.2003) and (Romero et al., 2003) that use rules as part of the archtecture of a general purpose tool for adaptve webstes, and (Benak et al., 997) that uses rules to characterze user behavor n the context of nformaton retreval. 3) Lmtatons Decson trees/classfcaton rules produce results that are hghly dependent on the qualty of the data avalable. Ths s caused by the fact that subtrees are created usng the maxmum nformaton gan possble. In some cases f the nformaton avalable s not approprate, whch typcally happens when the nformaton used to create user models has been obtaned usng user feedback or n a nosy envronment, the models created wll not correctly capture user behavor. Also, decson trees have the problem that for hgh dmensonal problems, the response tme can be very hgh. Ths s an nconvenent when workng wth adaptve systems, because real-tme response s needed. Ths problem can be solved n some cases usng classfcaton rules. Specal nterest for user modelng has the combnaton of classfcaton rules wth soft computng technques (fuzzy logc and neural networks especally) n order to create more flexble user models. Fuzzy classfcaton rules are able to overlap user models and to mprove the nterpretablty of the results. So far user modelng wth fuzzy classfcaton rules has not been used at ts full capacty. B. K-Nearest Neghbor (k-nn) for User Modelng k-nearest Neghbor s a predctve technque sutable for classfcaton models (Fredman et al., 975). Unlke other learnng technques n whch the tranng data s processed to create the model, n k-nn the tranng data represents the model. ) Basc Algorthms The smplest verson s the Nearest Neghbor algorthm. In ths case the tranng porton of the algorthm smply stores the data ponts of the tranng data S, S={x,,x n }. To classfy a new tem, the nearest-neghbor classfer fnds the closest (accordng to some dstance metrc) tranng-pont x to and assgns t to the same class of x. The nearest-neghbor algorthm s tradtonally mplemented by computng the dstance from to every x of S. It can also be mplemented more effcently usng Vorono dagrams. The k-nearest-neghbor (k-nn) algorthm s a varaton of nearest-neghbor. In ths case, nstead of lookng at only the pont x that s closest to, the algorthm looks at the k ponts n S that are closest to. Snce the class of each of the k nearest ponts are known, the maorty category n ths subset s the result of the classfcaton. Nearest-neghbor s a specal case of k-nearest-neghbor, where k =. k-nn has two man parameters: () the parameter k (number of nearest cases to be used) and (2) the dstance metrc, d. The selecton of the metrc s very mportant, because dfferent metrcs, used on the same tranng data, can result n completely dfferent predctons. In general, the selecton of the metrc s applcaton dependent. A typcal metrc d s the Eucldean dstance, where: d(x, ) = sqrt((x - ) T (x - )) (8) The value of k should be determned by expermentaton. In general, hgher values of k produce nose-proof systems, but also systems wth hgher computatonal costs. Typcally k s n (Shah et al., 2002) (Zhang, 2003) (Joerdng, 999) TABLE V EXAMPLES OF SOME CLASSIFICATION RULES-BASED USER MODELS Applcaton Input Data Results Classfcaton rules model bd strateges n an on-lne aucton webste Classfcaton of the users of an nformaton retreval (IR) system accordng to dfferent user stereotypes. Dynamc user modelng of the nterests of a web shopper. The user model represents the present nterest of that partcular user. (Cercone, 2002) ELEM2, a novel rule nducton process. Not Descrbed (Semeraro et al., 200) Use of classfcaton rules to model the user nteracton n a dgtal lbrary servce archtecture. Comparson between the results gven by C4.5 and J4.8. Data collected from 2 auctons over dfferent perod of tme, makng a total of,537 bds collected. Ratngs gven by 64 people wth four dfferent backgrounds (stereotypes) regardng ther IR knowledge. Set of vsts of a user to an adaptve shoppng ste. Log fle contanng the nteractons of each user wth the dgtal lbrary system. The data s the preprocessed and labeled. The bdder model obtaned can be used to smulate onlne markets and/or to detect fraud n on-lne aucton webstes. The classfcaton can be effectvely used to personalze the nformaton retreval envronment. Classfcaton Rules effcently model dynamc human behavor. Applcatons to extract user profles that are used to construct recommender systems. Accurate nteracton models can be nferred automatcally by usng J4.8. J4.8 slghtly outperforms C4.5.

11 > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < TABLE VI EXAMPLES OF SOME K-NN BASED USER MODELS (Bllsus et al, 999) (Schwab et al., 999) (Tsrga et al., 2002) (Shh et al., 200) Applcaton Input Data Results Constructon of an ntellgent agent desgned to comple a daly news program for ndvdual users usng a user model that captures user nterests usng a mult-strategy approach. ELFI (Electronc Fundng Informaton) a WWW system that provdes personalzed nformaton about research fundng. Intalzaton of the Student Model n a Algebra Tutor usng k- NN and stereotypes based on recognzed smlartes wth other students that have already nteracted wth the system. Development of an on-lne learnng system usng k-nn for provdng adaptve learnng materals on a course of ntroducton to computer networks. Tranng data of the agent by 0 users durng 4-8 days. K-NN s used to model Short Term nterest of a user. ELFI log fle of several months of system usage. The selecton of a program s regarded as nterestng for that user. Informaton s captured by the Algebra Tutor usng a set of questons to the new user of the system. Ths nformaton s used to obtan the stereotype. Not Descrbed A mult-strategy learnng approach (a short-term model and a long-term model) captures changes n user nterest very effcently. The optmum soluton s obtaned wth a combnaton of K-NN and Naïve Bayesan Classfers. The combnaton of K-NN and stereotypes produces very good results for Student Modelng. Applcaton of k-nn compared wth prevous researches can retreve more adequate materals based on prevous smlar cases. the range of fve to twenty rather than n the order of hundreds. Because the model created by k-nn s bascally the set of tranng ponts, ths means, especally for hgh dmensonal problems and/or for problems wth a lot of tranng data, that the amount of memory needed to store the model s very hgh, and that t ncreases when more data s added to the system. Agglomeratve Nearest Neghbor (A-NN) algorthms have been proposed to solve ths problem. A-NN s desgned to capture the same nformaton as k-nn but wthout the necessty of storng the complete set of tranng data. The key dea behnd A-NN s that t clusters tranng ponts that have the same class. Each class s then represented by a classrepresentatve, aganst whch the concept of dstance s measured when classfyng a new nstance. A-NN has been tradtonally used for collaboratve clusterng. 2) Applcatons for User Modellng k-nn algorthms can be used very effectvely for recommendaton. In ths case the algorthm s used to recommend new elements based on user s past behavor. A typcal applcaton s the recommendaton of nterestng documents. As t can be seen the concept of dstance (n ths case the dstance between two texts) s the key element to an effcent recommendaton system. Examples of k-nn used to mplement recommendaton tasks are (Bllsus et al, 999), (Schwab et al., 999) and (Shh et al., 200). Nevertheless, lke any classfcaton algorthm, t can also be used to mplement classfcaton tasks, for example to ntalze properly a new user when frst vstng a system gven a set of exstng user models. (Tsrga et al. 2002) s an example of k-nn used to mplement a classfcaton task. Table VI summarzes some studes and applcatons of k-nn for UM. 3) Lmtatons k-nn s a very smple technque that produces good and straghtforward results. Nevertheless the man drawbacks of ths approach s the hgh dependence of the results from the value of k and the metrc d chosen. In general, to choose both parameters, some expermentaton wth the system and some knowledge on how the system works wll be needed. Also, the value k should be bg enough to produce a classfcaton system nose-proof, otherwse a k-nn wth a small k value wll be extremely susceptble to nose. In the context of user modelng ths s very mportant because usually the data avalable s nosy and mprecse due to the nature of the problem. Nevertheless, an ncrease n the value of k mples an ncrease n the computatonal tme needed. The defnton of the dstance d, when used for user modelng, faces the same problems as clusterng technques: () the necessty of transformng user related nformaton nto a vectoral representaton wthout loosng semantc, and (2) the defnton of a concept of dstance among those vectors that capture the characterstcs of the problem (user nterest, user behavor, etc.). Another lmtaton of k-nn for user modelng s the response tme, whch s drectly affected by d and by the dmensonalty and number of examples of the tranng data. Adaptve hypermeda systems need real-tme response, and k- NN may n some cases not be sutable. C. Neural Networks for User Modelng An Artfcal Neural Network (ANN) s an nformaton processng paradgm that s nspred by the way bologcal nervous systems process nformaton. Two good ntroductons to Neural Networks can be found n (Fausett, 994) and (Haykn, 999). Although typcal ANNs are desgned to solve supervsed problems, there are also archtectures to solve unsupervsed problems. ) Basc Concepts The key element of ths paradgm s the structure of the nformaton processng system. It s composed of a large number of hghly nterconnected processng elements (neurons) workng n parallel. They consst of the followng elements: () Neurones, (2) Weghted nterconnectons, (3) An actvaton rule, to propagate sgnals through the network and (4) learnng algorthm, specfyng how weghts are adusted.

12 > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 2 Fg. 3. Archtecture of an Artfcal Neuron The basc element of any ANN s an artfcal neuron (Fg. 3). A neuron has N weghted nput lnes and a sngle output. The neuron wll combne these weghted nputs by formng ther sum and, wth reference to a threshold value and actvaton functon, t wll determne ts output. Beng x,x 2,,x N the nput sgnals, w,,w N the synaptc weghts, u the actvaton potental, θ the threshold and y the output sgnal and f the actvaton functon: N u = Σ w x (9) = y = f ( u θ ) (20) Defnng w 0 =θ and x 0 =-, the output of the system can be reformulated as: y N f Σ w x 0 = = (2) The actvaton functon f defnes the output of the neuron n terms of the actvty level at ts nput. The most common form of actvaton functon used s the sgmod functon. There are very dfferent ways n whch a set of neurons can be connected among them. The tradtonal cluster of artfcal neurons s called neural network. Neural networks are bascally dvded n three layers: The Input Layer, The Hdden Layer, whch may contan one ore more layers, and the output layer. The layer of nput neurons receves the data ether from nput fles or drectly from electronc sensors n real-tme applcatons. The output layer sends nformaton drectly to the outsde world, to a secondary computer process, or to other devces such as a mechancal control system. Between these two layers can be many hdden layers. These nternal layers contan many of the neurons n varous nterconnected structures. The nputs and outputs of each of these hdden neurons smply go to other neurons. In most networks each neuron n a hdden layer receves the sgnals from all of the neurons n a layer above t, typcally an nput layer. After a neuron performs ts functon t passes ts output to all of the neurons n the layer below t, provdng a feed forward path to the output. Another type of connecton s feedback. Ths s where the output of one layer routes back to a prevous layer. Mult-Layer Perceptrons are the typcal archtecture of NNs. MLP are full-connected feed-forward nets wth one or more layers of nodes between the nput and the output nodes. Classfcaton and recognton capabltes of NNs stem from the non-lneartes used wthn the nodes. A snglelayered perceptron mplements a sngle hyperplane. A twolayer perceptron mplements arbtrary convex regons consstng of ntersecton of hyperplanes. A three-layer NN mplements decson surfaces of arbtrary complexty (Lpman, 987)(Looney, 987). That s the reason why a three layer NN s the most typcal archtecture. NNs learn through an teratve process of adustments. There are two tranng approaches: supervsed and unsupervsed. In supervsed tranng, both the nputs and the outputs are provded. The net s traned by ntally selectng small random weghts and nternal thresholds, and presentng all tranng data repeatedly. Weghts are adusted after every tral usng nformaton specfyng the correct class untl weghts converge and the cost functon s reduced to an acceptable value. The vast bulk of networks utlze supervsed tranng. The most common supervsed technque s the backpropagaton learnng algorthm. It uses a gradent search technque to mnmze a cost functon defned by the mean square error (MSE) between the desred and the actual net outputs, wth l the number of tranng ponts: MSE l l 2 = Σ( y y ) (22) = The generally good performance found for the backpropagaton algorthm s somewhat surprsng consderng that TABLE VII EXAMPLES OF SOME NNS BASED USER MODELS (Bdel et al., 2003) (Sas et al., 2003) (Shepperd, 2002) (Beck and Woolf., 998) Applcaton Input Data Results Classfcaton and trackng of user navgaton. Data generated from an on-lne A labeled approach to the problem encyclopeda. produces better accuracy. Predcton of user s next step n a vrtual envronment 30 users performed exploraton and Very accurate predctons of the next searchng wthn the envronment. step Adaptve flterng system for electronc news usng Very useful for readers wth specfc The Halfax Herald Ltd. stereotypes. nformaton needs. Constructon of a student model for an ntellgent tutorng NN-based recommendaton to each Data collected by the tutorng system system. ndvdual.

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 3 t s a gradent descent technque that may fnd a local mnmum n the cost functon nstead of the desred global

13 > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 3 t s a gradent descent technque that may fnd a local mnmum n the cost functon nstead of the desred global mnmum. 2) Applcatons for User Modelng NNs are able to derve meanng from complcated and/or mprecse data. Also, NN do not requre the defnton of any metrc (unlke k-nn or clusterng) whch make them completely applcaton ndependent. No ntal knowledge about the problem that s gong to be solved s needed. These characterstcs make NNs a powerful method to model human behavour and an deal technque to create user models for hypermeda applcatons. NNs have been used for classfcaton and recommendaton n order to group together users wth the same characterstcs and create profles and stereotypes. (Bdel et al., 2003) s an example of NNs used for classfcaton. (Sas et al., 2003), (Beck et al., 2003), (Sheperd et al., 2002) and (Beck and Woolf, 998) use NNs for recommendaton tasks. Table VII presents more detals of these applcatons. 3) Lmtatons NNs have been successfully used for UM manly because they do not need any heurstc to produce a model. Nevertheless t stll faces mportant lmtatons. The man ones are the tranng tme needed to produce a model (whch n cases can be measured n the order of many hours and even days) and the amount of nformaton needed. The tranng tme s an nconvenence for creatng dynamc models. Although there are technques able to retran NNs dynamcally, the technques used so far for UM retran the system from scratch n case more nformaton, e.g. a new user or a new document, s added. More research n the feld of ncremental learnng s needed. Another mportant lmtaton of NNs s ther black box behavor. Whle the prevous technques, to a dfferent extent, can be nterpreted and manually changed, NNs can not be nterpreted whch lmts ts applcatons n case a human understandable user model s needed. D. Support Vector Machnes (SVM) for User Modelng SVM s a classfer derved from Statstcal Learnng Theory (Vapnck, 995). The man advantages of SVM when used for classfcaton problems are () ablty to work wth hgh dmensonal data and (2) hgh generalzaton performance wthout the need to add a-pror knowledge, even when the dmenson of the nput space s very hgh. Excellent ntroductons to SVM can be found n (Crstann et al., 2000). The problem that SVM try to solve s to fnd an optmal hyperplane that correctly classfes data ponts and separates the ponts of two classes as much as possble. Fgure 4 s an example of the prevous stuaton. ) Basc Algorthms Gven two classes, the obectve s to dentfy the Fg. 4. SVM maxmzaton of the dstance between classes. hyperplane that maxmzes m, 2 m = w (23) whle at the same tme classfyng correctly all the examples gven. Beng w T the hyperplane that verfes the prevous condton, all ponts that were part of that class wll verfy w T x+b>0, where x s the pont that s beng valdated. If a pont s not part of that class, then w T x+b<0. Formally the problem can be presented as follows. Gven m the dmenson of the problem and N the number of tranng ponts, let: ( x, y ),...,( x, y ) R m Y, y Y, Y = {,} (24) N N be the set of labelled nputs, where - ndcates that the nput s not of that class and ndcates that the nput s of that class. The decson boundary should verfy: x w x w T T T T + b, y = + b, y = The problem can be presented as: 2 Maxmze m = w T Subect to y ( xw + b), =,..., n (25) (26) Ths formulaton can be expressed as a standard quadratc problem (QP) n the form of: N maxmze α = 2 Subect to α 0, α y N = N α α y y x x =, = = 0 T (27) In ths context a global maxmum α can always be found and w can be recovered as: N w = α y x (28) =

Hierarchical clustering for gene expression data analysis

Hierarchical clustering for gene expression data analysis Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally