Effective Page Recommendation Algorithms Based on. Distributed Learning Automata and Weighted Association. Rules

Size: px

Start display at page:

Download "Effective Page Recommendation Algorithms Based on. Distributed Learning Automata and Weighted Association. Rules"

Anthony Tyler
5 years ago
Views:

1 Effectve Page Recommendaton Algorthms Based on Dstrbuted Learnng Automata and Weghted Assocaton Rules R. Forsat 1*, M. R. Meybod 2 1 Department of Computer Engneerng, Islamc Azad Unversty, Karaj Branch, Karaj, Iran 2 Department of Computer Engneerng, Amrkabr Unversty of Technology, Tehran, Iran * Correspondng author. 1 *Rana Forsat, Msc Department of Computer Engneerng Karaj Azad Unversty, Karaj, Iran. Tel: +98(21) Fax: +98(21) Emal: forsat@kau.ac.r 2 Mohammad Reza Meybod, PhD Department of Computer Engneerng Amrkabr Unversty of Technology, Tehran, Iran. Emal: mmeybod@aut.ac.r

2 Abstract Dfferent efforts have been done to address the problem of nformaton overload on the Internet. Recommender systems am at drectng users through ths nformaton space, toward the resources that best meet ther needs and nterests by extractng knowledge from the prevous users nteractons. In ths paper we propose three algorthms to solve the web page recommendaton problem. In our frst algorthm, we use dstrbuted learnng automata to learn the behavor of prevous users and recommend pages to the current user based on learned patterns. By ntroducng a novel Weghted Assocaton Rule mnng algorthm, we present our second algorthm for recommendaton purpose. Also, a novel method s proposed to pure the current sesson wndow. One of the challengng problems n recommendaton systems s dealng wth unvsted or newly added pages. By consderng ths problem and mprovng the effcency of frst two algorthms we present a hybrd algorthm based on dstrbuted learnng automata and proposed weghted assocaton rule mnng algorthm. In the hybrd algorthm we employ the HITS algorthm to extend the recommendaton set. Our experments on real data set show that hybrd algorthm performs better than the other algorthms we compared to and, at the same tme, t s less complex than other proposed algorthms wth respect to memory usage and computatonal cost too. Keywords: Personalzaton, Machne Learnng, Learnng Automata, Web Mnng

3 1. Introducton World Wde Web has been growng rapdly n recent years and ths has resulted n a huge volume of hyperlnked documents whch contan no logcal organzaton. Currently, Google ndexes more than 3 bllons web pages n the world whch ths number ncreases wth the rate of 7.3 mllon pages per day. The massve nflux of nformaton onto World Wde Web has facltated user, not only nformaton retreval, but also knowledge dscovery. However, users are provded wth more nformaton and servce optons, t has become more dffcult for them to fnd the rght or nterestng nformaton, the problem commonly known as nformaton overload due to the fact of sgnfcantly ncreasng and rapdly expandng growth n amount of nformaton on the web. Personalzaton technques [1] are alternatve, user-centrc, promsng approaches to tackle the problem of nformaton overload by adaptng the content and structure of webstes to the needs of the users by takng advantage of the knowledge acqured from the analyss of the users access behavors. Personalzaton ams to provde users wth what they want or need wthout explctly ask them from t [2]. Typcally, personalzaton focuses on the processes of dentfyng web users or objects, collectng nformaton wth respect to users preference or nterests as well as adaptng ts servce to satsfy the users needs. In short, web personalzaton can be used to provde better qualty servce and applcaton of web to users durng ther browsng perod. The actons can be made by hghlghtng the hyperlnks, nsertng new hyperlnks that seem to be of nterest for the current user dynamcally, and the creaton of new ndex pages. Numerous approaches are ntroduced for personalzaton system whch can be categorzed nto two major groups, whch are content-based flterng agents and collaboratve flterng systems [3]. In both cases a user model s created from data gathered explctly and/or mplctly about user nterests and used to recommend a set (referred to as the recommendaton set) of tems deemed to be of nterest to the user. The most common form of mplct data about user nterests takes the form of tem ratngs. Each tem bulds a model of user preferences usng these content descrptons and the ratng data of the user. The model s then used to predct the lkelhood of tems, not

4 currently vewed by the user, beng of nterest to the user. The most lkely tems of nterest to the user consttute the recommendaton set. The man lmtaton of content-based flterng s the lack of dversty n the recommendatons. As the recommendatons are based only on tems prevously rated by the actve user, the tems n the recommendaton set lack serendpty, tendng to be very smlar to prevously rated tems. User studes have shown that users fnd onlne recommenders most useful when they recommend unexpected tems [4]; hghlghtng the fact that overspecalzaton by content-based flterng systems s ndeed a serous drawback. Common approaches to dealng wth ths problem of overspecalzaton nclude explctly njectng dversty nto the recommendaton set [5, 6, 7] and buldng hybrd recommendaton systems by ncorporatng aspects of collaboratve flterng nto the recommendaton generaton process. Collaboratve flterng s tradtonally a memory-based approach to recommendaton generaton, though model-based approaches have also been developed. The user model s generally n the form of an n-dmensonal vector representng the ratngs of the n tems, n the tem set, by the user. Hence, n contrast to content-based approaches, collaboratve flterng does not tradtonally use any tem content descrptons. The recommendaton process conssts of dscoverng the neghborhood of the actve user, that s, other users that have a smlar ratng vector to that of the actve user, and predctng the ratngs of tems, not currently vewed by the actve user, based on ratngs of these tems by users wthn the actve user s neghborhood [8]. Whle collaboratve flterng s commercally the most successful approach to recommendaton generaton, t suffers from a number of well-known problems of all, the collaboratve flterng system has become the predomnant approach n furnshng the e- commerce system wth an ntellgence to capture user profles and recommendng relevant pages to the users. However, t suffers from a number of well-known problems ncludng the cold start/latency problem, sparseness wthn the ratng matrx, scalablty, and effcency [9]. Item-based smlarty [10] and dmenson reducton was proposed by some researchers to overcome the drawback.

5 One research area that has recently contrbuted greatly for ths problem s web mnng. Most of the systems developed n ths feld are based on web usage mnng (WUM) [11]. The term Web Usage Mnng [12] was ntroduced by Cooley et al, n whch they defne web usage mnng as the automatc dscovery of user access patterns from web Servers. Web usage mnng has ganed much attenton n the lterature as a potental approach to fulfll the requrement of web personalzaton [13, 12, 14, 15, 3,16]. These systems are manly concerned wth analyzng web usage logs, dscoverng patterns from ths data and makng recommendatons based on the extracted knowledge [14, 3, 17, 18]. Unlke tradtonal personalzaton technques, whch manly recommend a set (referred to as the recommendaton set) of tems deemed to be of nterest to the user base ther decsons on user ratngs on dfferent tems or other explct feedbacks provded by the user [19,20]. These technques dscover user preferences from ther mplct feedbacks, namely the web pages they have vsted. More recently, systems that take advantage of a combnaton of content, usage and even structural nformaton of the webstes have been ntroduced [51,52,53,54,55] and shown superor results n the web page recommendaton problem. In [42] the degree of connectvty based on the lnk structure of the webste s used to evaluate usage based technques. A new method for generatng navgaton models s presented n [54] whch explots the usage, content and structure data of the webste. Erkan et al. [52, 53] use the content of web pages to augment usage profles wth semantcs usng doman ontology. A few combned or hybrd web recommender systems have been proposed n the lterature [42] [55]. The work n [55] adopts a clusterng technque to obtan both ste usage and ste content profles n the off-lne phase. In the on-lne phase, a recommendaton set s generated by matchng the current actve sesson and all usage profles. Smlarly, another recommendaton set s generated by matchng the current actve sesson and all content profles. Fnally, a set of pages wth the maxmum recommendaton value across the two recommendaton sets s presented as recommendaton. Ths s called a weghted hybrdzaton method [57].

6 In [42], Nakagawa and Mobasher use assocaton rule mnng, sequental pattern mnng, and contguous sequental mnng to generate three types of navgatonal patterns n the off-lne phase. In the on-lne phase, recommendaton sets are selected from the dfferent navgatonal models, based on a localzed degree of hyperlnk connectvty wth respect to a user s current locaton wthn the ste. Ths s called a swtchng hybrdzaton method [57]. An extensve study of web personalzaton based on web usage mnng can be found n [21]. Some studes have consdered the approach of usng pages nterestng to the user for the recommendaton process. In [41], Mobasher et al use statstcal sgnfcance testng to judge whether a page s nterestng to a user. Its man dea s: A duraton threshold s calculated for each page usng the average duraton and standard devaton of the vsts to the page; f the duraton of a page s longer than the threshold, that page s consdered nterestng to the user and vce versa. The drawback of such an approach s that t smply dvdes pages nto nterestng and unnterestng groups, and neglects the dfference n the degrees of nterest. For one thng, there sn t a clear dvson between nterestng and unnterestng pages; for another, the degrees of nterest are probably not the same for all the nterestng (and unnterestng) pages. Clusterng and collaboratve flterng approaches are ready to ncorporate both bnary and non-bnary weghts of pages, although bnary weghts are usually used for computng effcency [21] [41]. Assocaton Rule (AR) mnng and Sequental Pattern (SP) mnng [42] can lead to hgher recommendaton precson [21], and are easy to scale to large datasets, but how to ncorporate page weght nto the AR and the SP models has not been explored n prevous studes. Weghted Assocaton Rule (WAR) mnng allows dfferent weghts to be assgned to dfferent tems, and s a possble approach to mprovng the AR model n the web personalzaton process. Ca et al. [56] proposed assgnng dfferent weghts to tems to reflect ther dfferent mportance. In ther framework, two ways are proposed to calculate temset weght: total weght and average weght.

7 Weghted support of an temset s defned as the product of the temset support and the temset weght. Tao et al. [58] also proposed assgnng dfferent weghts to tems, the temset/transacton weght s defned as the average weght of the tems n the set/transacton, and weghted support of an temset s the fracton of the weght of the transactons contanng the temset relatve to the weght of all transactons. Both models attempt to gve greater weghts to more mportant tems, facltatng the dscovery of mportant but less frequent temsets and assocaton rules. However, both models assume a fxed weght for each tem whle n the context of web usage mnng a page mght have dfferent mportance n dfferent sessons. As the connectvty features of the web graph plays an mportant role n the process of the web personalzaton, on the other hand a page s mportant f many users have vsted t before, n the context of navgatng a web ste, we propose a novel machne learnng perspectve toward the problem, whch we beleve s sutable to the nature of web page personalzaton problem and ntegratng web usage mnng wth lnk analyss technques for assgnng probabltes to the web pages based on ther mportance n the web ste s navgatonal graph and makes recommendatons prmarly based on web usage logs and the structure of the web ste. In ths paper, we frst propose a page recommendaton algorthm based on dstrbuted learnng automata to learn the behavor of prevous users. The proposed algorthm takes advantage of web usage data and lnk nformaton to recommend pages to the current user base on learned pattern. In ths work we try to assgn a quanttatve weght to each page, takng nto account the degree of nterest. We extend the tradtonal assocaton rule mnng algorthm by allowng that a weght to be assocated wth each tem n a transacton for reflectng the nterest of each tem wthn the transacton. By ntroducng a novel weghted assocaton rule mnng algorthm, we present our second algorthm for recommendaton purpose. In the proposed weghted assocaton rule mner, the tme spent by each user on each page and vstng frequency of each page are used to assgn a quanttatve weght to the pages nstead of tradtonal bnary weghts. Also, a novel method s proposed to pure the current sesson wndow.

8 One of the challengng problems n recommendaton systems s dealng wth unvsted or newly added pages. By consderng ths problem and mprovng the effcency of frst two algorthms we present a hybrd algorthm based on dstrbuted learnng automata and weghted assocaton rule mnng algorthm. In the hybrd algorthm we employ the HITS algorthm to extend the recommendaton set. We have appled these algorthms on standard data set and got very good results compared to the assocaton rules, whch s commonly known as one of the most successful approaches n web mnng based recommender systems. The evaluaton of the expermental results shows consderable mprovements and ther robustness. Our experments on real data set show that hybrd algorthm performs better than the other algorthms we compared to and, at the same tme, t s less complex than other proposed algorthms wth respect to memory usage and computatonal cost too. The rest of ths paper s organzed as follows. Secton 2 provdes dstrbuted learnng automata -based recommendaton algorthm whch s the bass of our method. In secton 3 we present our weghtng schema, weghted assocaton rule and overvew the weghted assocaton rule based recommendaton method. We represent our hybrd approach n secton 4. Secton 5 gves the performance evaluaton of the proposed algorthms compared to assocaton rule based method. Secton 6 concludes the paper. 2. Web Page Recommendatons based on Dstrbuted Learnng Automata 2.1 Prelmnares Learnng Automata Learnng Automata are adaptve decson-makng devces operatng on unknown random envronments. The automata approach to learnng nvolves the determnaton of an optmal acton from a set of allowable actons. An automaton can be regarded as an abstract object whch has fnte number of possble actons. In each decson process, the automata selects an acton from ts fnte set of actons. Ths acton s appled to a random envronment. The random envronment evaluates the selected acton and gves a grade to

9 appled acton of automata. The random response of envronment (.e. grade of acton) s used by automata n further acton selecton. By contnung ths process, the automata learns to select an acton wth best grade. The learnng algorthm used by automata to determne the selecton of next acton from the response of the envronment. An automaton actng on unknown random envronment and mproves ts performance s some specfed manner, s referred to as learnng automata (LA). Learnng automata can be classfed nto man categores: fxed structure learnng automata and varable structure learnng automata [22]. In the followng, the varable structure learnng automata whch wll be used n ths paper s descrbed. Varable structure learnng automata s represented by quntuple < α, β, p, T ( α, β, p) >, where α = α, α, L, α }, β = β, β, L, β }, and p = { 2 { 1 2 r { 1 2 r p1, p, L, pr} are an acton set wth r actons, an envronment response set, and the probablty set p contanng r probabltes, each beng the probablty of performng every acton n the current nternal automaton state, respectvely. The functon of T s the renforcement algorthm, whch modfes the acton probablty vector p wth respect to the performed acton and receved response. If the response of the envronment takes bnary values learnng automata model s P-model and f t takes fnte output set wth more than two elements that take values n the nterval [0, 1], such a model s referred to as Q-model, and when the output of the envronment s a contnuous varable n the nterval [0, 1], t s referred as S-model. It s evdent that the crucal factor affectng the performance of the varable structure learnng automata s learnng algorthm for updatng the acton probabltes. Varous learnng algorthms have been reported n the lterature. Letα, be the acton chosen at step n as a sample realzaton from probablty dstrbuton p. The lnear reward-nacton algorthm s one of the learnng schemas and ts recurrence Equaton for updatng acton probablty vector p s defned as Equaton(1).

10 p ( n+ 1) = p ( n) + a.( 1 β ( n)).(1 p ( n)) b. β( n). p ( n) (1) p ( n+ 1) = j b. β( n) pj( n) + a(.1 β( n)). pj( n) + b. β( n). pj( n) r 1 f j Where 0 <α < 1 s called step length and determnes the amount of ncreases (decreases) of the acton probabltes. The above mentoned learnng automata has a fxed number of actons. In some applcatons, lke our frst proposed algorthm, we need that LA has a changng number of actons [23]. A LA wth changng number of actons, at any tme nstance n selects ts acton from a set of actve actons V (n) and behaves lke ths. For selectng an acton, the learnng automata frst computes the sum of ts actons probablty K (n) and then the vector pˆ ( n) s computed accordng to Equaton (2). The automaton selects one of ts actve actons randomly based on actons probabltes,.e. p ˆ ( n). The automaton apples the selected acton α to the envronment and gets the response. For desrable responses, the pˆ ( n) vector s updated based on Equaton (3) and for undesrable actons s updated based on Equaton (4). Fnally, the automaton updates the actons probablty vector p(n) based on vector p ˆ ( n + 1) as shown n Equaton (5). K ( n) = p ( n) α V ( n) pˆ ( n) = prob[ α ( n) = α α V ( n)] = V ( n) s the set of enabled actons p ( n) K ( n) (2) pˆ ( n + 1) = pˆ ( n) + a.(1 pˆ ( n)) (3) pˆ ( n + 1) = pˆ ( n) a. pˆ ( n) j j j j pˆ ( n + 1) = (1 b). pˆ ( n) (4) b pˆ j ( n + 1) = + (1 b) pˆ j ( n) j j rˆ 1

11 p (n + 1) = pˆ (n + 1).K(n) p (n + 1) = p (n) j j for all, α V ( n) (5) for all j, α V ( n) j Dstrbuted Learnng Automata A dstrbuted learnng automata (DLA) s a network of LA whch collectvely cooperate to solve a partcular problem. The number of actons for a partcular LA n DLA s equal to the number of LA s that are connected to ths LA. Selecton of an acton by a LA n the network actvates one LA correspondng to the acton. Formally, a dstrbuted learnng automata can be defned by a graph DLA = ( V, E), where the set V = { 2 LA1, LA, L, LAn } s the set of n learnng automata and E V V s the set of edges n the graph. The edge (, j) represents the acton j of automata LA. In other words, LAj s actvated when acton j of automata LA s selected. The number of actons for partcular automata LA k ( k = 1,2, L, n) s equal to the out-degree of that node. If p j corresponds to the probablty dstrbuton of actons of LA, then j p j m shows the probablty of selectng acton α m by automata A j. In other words, we can assgn a weght to each edge (, j) n graph whch s equal to the probablty of selecton of acton by automata j [24, 25, 26]. For example, n Fg. 1, every automata has two actons. Selecton of acton α 3 by A 1 wll actvate automata A 3. Actvated automata choose one of ts acton whch results n actvaton of the LA correspondng to the selected acton. At any gven tme, only one of the automata n the network could be actve PageRank Algorthm The PageRank computaton for rankng hypertext-lnked web pages was orgnally outlned by Page and Brn [27]. Two ntutve explanatons are offered for PageRank [28]. The frst ntuton of PageRank s based on an dea that f a page v of nterest has many other pages u wth hgh PageRank scores pontng to, then the authors of pages u are mplctly conferrng some mportance to page v. The second conceptual model of

12 PageRank s called the random surfer model. Consder a surfer who starts at a web page and pcks one of the lnks on that page at random. On loadng the next page, ths process s repeated. In the random surfer model, the web s represented by a graph G = ( V, E), wth web pages as the vertces, V, and the lnks between web pages as the edges, E. If a lnk exsts from page u to page v then ( u v) E. To represent the followng of hyperlnks, a transton matrx P from the web graph s constructed, settng: p j 1 (6) : f ( u u j ) E = deg( u ) 0 : otherwse Where deg(u) s the out-degree of vertexu,.e. the number of outbound lnks from page u. From ths defnton, we see that f a page has no out-lnks, then ths corresponds to a zero row n the matrx P. To model the surfer s jumpng from danglng nodes, a r r T second matrx D = d v, where d r and v r are both column vectors. If a page has no outgong edge, ts correspondng entry n the matrx d r wll be zero. The v r s the personalzaton vector representng the probablty dstrbuton of destnaton pages when a random jump s made. Typcally, ths dstrbuton s taken to be unform, p = 1/ n for an n-page graph ( 1 n ). However, t need not be as many dstnct personalzaton vectors may be used to represent dfferent classes of user wth dfferent web browsng patterns. Ths flexblty comes at a cost, though, as each dstnct personalzaton vector requres an addtonal PageRank calculaton. Puttng together the surfer s followng of hyperlnks and hs/her random jumpng from danglng pages yelds the stochastc matrx P' = P + D, where P' s a transton matrx of a dscrete-tme Markov chan (DTMC). To represent the surfer s decson not to follow any of the current page lnks, but to nstead jump to a random web page, we construct a teleportaton matrx E, where e = p for all,.e. ths random jump s also dctated by j j the personalzaton vector. Incorporatng ths matrx nto the model gves:

13 P '' = cp' + (1 c) E, where 0 < c < 1, and c represents the probablty that the user chooses to follow one of the lnks on the current page,.e. there s a probablty of ( 1 c) that the surfer randomly jumps to another page nstead of followng lnks on the current page. Havng constructed web pages by usng power method approaches. 2.2 The Proposed Algorthm P '' we mght attempt to fnd the PageRank vector of Our frst algorthm s based on DLA and PageRank ntroduced n the prevous subsectons. The proposed algorthm employs the web usage data and underlyng ste structure to recommend pages to the current user. In the proposed algorthm, the transton matrx P and personalzaton vector v r n the orgnal PageRank algorthm are computed based on usage data nstead of lnk structure. For ths reason, a DLA learns the transton probablty matrx P from the behavor of the vstng users whch are avalable n the ste s log fle. In addton, the personalzaton vector v r s computed based on the vstng rate of pages preferrng pages whch are vsted by more users. Havng the transton matrx P and personalzaton vector v r whch obtaned from the knowledge acqured from prevous users vsts, the PageRank algorthm s used to compute the rank of each page.. It s notceable that the PageRank algorthm s used n a totally dfferent context. In ths context the PageRank s a usage-based snce t s based on the navgatonal behavor of prevous vstors. The proposed algorthm, n addton to page recommendaton, can be used to modfy the lnks between the pages, (.e. add new lnks or delete old lnks). In the followng subsectons, the steps of algorthm are descrbed Computng the Transton Probablty Matrx In the orgnal PageRank algorthm, the probablty of followng a lnk when the user s n a specfed page s unformly dstrbuted between all of the outgong lnks or favors certan pages n the personalzed PageRank. In our algorthm, we bas the PageRank algorthm usng the data acqured from prevous users vsts, as they are dscovered from the user sessons recorded n the web ste s logs. The ntuton behnd our algorthm s as follows: a page s mportant n a web ste f many users have vsted t before. Suppose that a page s vsted more than other pages consderng the outgong pages of a certan

14 page. The hgh vstng rate of a page ndcates that the page s followed by more users and s mportant for them. So, t s better that the recommendaton s based to the page wth hgh vstng rate. To learn the transton probabltes based on precous users behavor, we use a dstrbuted learnng automata wth n LAs wth varable number of actons. For each page n the ste a LA wth n 1 actons s added to the DLA. Each acton corresponds to followng a page. For each LA at each tme a subset of ts actons s actve. The number of actons n the LA assgned to page s equal to the number of pages that a user at page can follow from that page. In the begnnng, all of actons are nactve. When a user at page go to page j, the acton corresponds to page j s rewarded or penalzed by the envronment and so the probablty of actons of LA s updated. These probabltes correspond to probablty of transton between pages whch are learned by the LA. In the followng the rewardng and penalzng schema of actons n LA s descrbed. The rewardng and penalzng schema of actons s based on a learnng algorthm whch updates the actons probabltes n each step. Snce the employed LA has varable number of actons, Equaton (3) and Equaton (4) are employed to ths ntenton. In usng these equatons, the parameter a, whch s called rewardng parameter, s calculated from Equaton (7): a = ω + λ (7) Where ω s a constant and λ s obtaned by ths ntuton. If a user goes from page to page j n the ste and there s no lnk between these pages, the value of λ s set to constant value; otherwse t s set to zero. In other words, n the movement of a user from page to page j the j th acton n th learnng automate s more awarded when there s no lnk between pages and j than there s a lnk between them. Ths ntuton sounds reasonable and can be used to modfy the underlyng lnk structure statcally exsts between pages. Due to t s obvous that two pages whch have hyperlnks together have more probablty to be vsted by a user than the pages wthout lnks. Specally, n comparng two pages wth same vstng rate, the pages wthout lnk was more nterestng for user.

15 If there s a cycle n users navgaton path, the actons n the cycle ndcates the llegal movement of user, quandary of user, or the dssatsfacton of user from the vsted pages and must be penalzed. The penalzaton ncreases wth the cycle length. So, the parameter b whch s penalzaton factor s calculated from Equaton (8): b = ( steps n cycle contanng k and l ) β (8) Where β s a constant factor. As t s clear from Equaton (8), the penalzaton factor has drect relaton wth the length of cycle traversed by the user. For every user sesson n the log fle, we begn wth the frst page. For each par of consecutve pages n the sesson, the LA correspondng to the frst page s used to update ts probabltes f the acton s already actve; otherwse actvates t. We assume that any consecutve pages repettons have been removed from the user sessons; on the other hand, we keep any pages that have been vsted more than once, but not consecutvely. Ths process s repeated tll reachng the latest page n the sesson. After processng all of the sessons, the transton matrx P s generated based on probablty of actons n DLA. Each entry page Page Rankng p j n matrx P s set to the probablty of acton j n the LA corresponds to The transton matrx P and personalzaton vector v r must be avalable to compute the mportance of pages. As mentoned before, our objectve s to generate a set of based PageRank vectors usng users sessons. The key to creatng usage-based PageRank s that we can bas the computaton to ncrease the effect of certan pages by usng a nonunform personalzaton vector for v r. Note that the bas nvolves ntroducng addtonal rank to the approprate pages n each teraton of the computaton. The computaton of matrx P based on dstrbuted learnng automata was descrbed n prevous subsecton. In ths subsecton we descrbe computaton of personalzaton vector v r. We employ the vstng rate of pages as measure for personalzaton. Let w () denote the number of users who vst the page. The value of personalzaton vector for

16 w( ) page s set to w( j) j V. Ths settng exactly models the mportance of pages for users and contrbuton of pages n recommendaton. In ths case the vector v r s a probablty vector and sum of ts all entres equal to Page Recommendaton As descrbed before, the goal of personalzaton s to compute a set of pages unvsted by current user to recommend for hm whch has the maxmum match wth the users nterest [29, 30]. The recommendaton phase s the only onlne phase of every recommendaton algorthm and must have a satsfed performance. Suppose that a user s walkng n the ste and the path traversed by hm s p1 p2 p3... pk. For a new user ths path s empty. We use a fxed-sze sldng wndow over the current actve sesson to capture the current user s hstory path. Note that the sldng wndow of sze w over the actve sesson allows only the last w vsted pages to nfluence the recommendaton set. To recommend page p k+ 1 to the current user, we must model the navgatonal behavor of the users of a web ste. Markov models provde a smple way to capture sequental dependency when modelng the navgatonal behavor of the users of a web ste. The order of the Markov model ndcates the memory of the predcton,.e. denotes the number of prevous user steps whch are taken nto consderaton n the process of calculatng the path probabltes. Therefore, n the case of Markov chans, the probablty of vstng a page depends only on the prevous one, n second-order Markov models t depends on the prevous two, and so on. The selecton of the order nfluences both the predcton accuracy and the complexty of the model whle heavly depends on the applcaton/data set. After computng the transton matrx P by usng DLA, the path probabltes are computed for an m-order model usng the chan rule as follows:

17 Pr( p p p L p ) = Pr( p ) Pr( p p Lp ) k 1 m 1 = 2 k (9) equals to: p 1 p2 p 3 For example the probablty of path Pr( p1 p2)pr( p2 p3) Pr( p1 p2 p3) = Pr( p1) Pr( p2 p1) Pr( p3 p2) = Pr( p1) Pr( p ) Pr( p ) 1 2 Where Pr( ) represents the probablty of transton between pages and Pr( ) s the rank of page obtaned from the based PageRank algorthm. p Based on Equaton (9), the predcton of the next most probable page k+ 1 vst of a user s performed by computng the probabltes of all exstng paths such p p p p k p havng the pages vsted so far by the user as prefx k+ 1 and choosng the most probable one. The bounded probabltes computaton s straghtforward snce t reduces to a lookup on the transton probablty matrx P. For all unvsted pages, ths value s computed and sorted based on ther probabltes. The number of recommended pages can be controlled based on the number of pages or based on determned threshold value for probablty. Then the pages wth hghest rank are recommended to the current user. 3. Web Page Recommendatons based on Weghted Assocaton Rule In ths secton we present our second algorthm for page recommendaton. In the proposed algorthm, we extend the tradtonal assocaton rule mnng algorthm by allowng a weght to be assocated wth each tem n a transacton to reflect the nterest of each tem wthn the transacton and develop a novel recommendaton algorthm based on proposed weghted assocaton rule mnng approach. In the proposed weghted assocaton rule mner, the tme spent by each user on each page and vstng frequency of each page are used to assgn a quanttatve weght to the pages nstead of tradtonal

18 bnary weghts. The ntuton behnd ths dea s that the tme spent on pages [31] and vstng frequency are good mplct nterest ndcator of a user on those pages. The methodology s lke ths: frst, the weghted assocaton rules of each URL wll be extracted from the web log data and smlarty between actve user sessons wll be calculated upon the weghted rules nstead of an exact match for fndng the best rule. Fnally, the recommendaton engne wll then fnd the most smlar rules to the actve user sesson wth the hghest weghted confdence by scorng each rule n terms of both ts smlarty to the actve sesson and ts weghted confdence. In the followng, we frst ntroduce our weghtng schema. Then we descrbe the proposed weghted assocaton rule mnng algorthm and page recommendaton mechansm. 3.1 Weghtng Schema Let P = p, p,..., p } denote the set of web pages accessed by users n web server { 1 2 m logs after the preprocessng phase [32], each of them s unquely represented by ts assocated URL. Also let T = t, t,..., t } be the set of user transactons where each t T { 1 2 n s a subset of P. To facltate the hgh qualty recommendaton, we represent each transacton t as an m-dmensonal vector over the space of web pages, t =< p, w ), ( p, w ),..., ( p m, w ) >, where w denotes the weght wth the th web page ( m ( 1 m) vsted n a transacton t. The weght w n transacton t needs to be approprately determned to capture a user s nterest n th web page. The weghts can be determned n a number of ways; however n the context of personalzaton based on clckstream data, the prmary sources of data are server access logs. Ths allows us to choose two types of weghts for pages: weghts can be bnary, representng the exstence or nonexstence of a page access n the transacton or they can be a functon of parameters such as duraton of the assocated page n the user s sesson to represent the nterest of page to a specfc vstng user. Snce the recommendaton process s based on the behavor of prevous users, so the weghtng schema must precsely model the user s nterest. Recommendaton approaches

19 proposed n prevous works; however, do not dstngush the mportance of dfferent pages and all the vsted pages are treated equally whatever ther usefulness to the user. They neglect the dfference n the mportance of the pages and degree of nterest n a users sesson. It s qute probable that not all the pages vsted by the user are of nterest to hm/her. A user mght get nto a page and fnd t s of no value to hm/her, causng rrelevant page accesses to be recorded nto the log fle. Therefore, t s mperfect to use all the vsted pages equally to capture user nterest and predct user behavor. Although n usage-based recommendaton systems we can t expect users to express ther nterests explctly, we need a weght measure for approxmatng the nterest degree of a web page to a user. Inspred by Chan and coworkers [33, 34], we propose a weghtng measure whch s calculated from web logs to extract the nterest of page for the vstor. In our weghtng schema, both of tme length of a page and vstng frequency of a page are used to estmate ts mportance n a transacton, n order to capture the user s nterest more precsely nstead of bnary whch s typcally used n other researches. Ths approach try to gve more consderaton to more useful pages, n order to better capturng the user s nformaton need and recommend more useful pages to the user. Several reasons valdate the dea of usng pages vst duraton as one of the weghtng parameters. Frst, t reflects the relatve mportance of each page, because a user generally spend more tme on a more useful page[31, 35], because f a user s not nterested n a page, he/she do not spend much tme on vewng the page and usually jumps to another page quckly [36]. However, a quck jump mght also occur due to the short length of a web page so the sze of a page may affect the actual vstng tme. Hence, t s more approprate to accordngly normalze duraton by the length of the web page, that s, the total bytes of the page. The formula of duraton s gven n Equaton (10). Second, the rates of most human bengs gettng nformaton from web pages should not dffer greatly [35]. If we assume a smlar rate of acqurng nformaton from pages for each user, the tme a user spends on a page s proportonal to the volume of nformaton useful to hm/her. As page duraton can be calculated from web logs, t s a good choce for nferrng user nterest.

20 Frequency s the number of tmes that a page s accessed by dfferent users. It seems natural to assume that web pages wth a hgher frequency are of stronger nterest to users. A parameter that must be consdered n the calculatng the frequency of a page s the ndegree of that page (e.g. the number of ncomng lnks to the page). It s obvous that a page wth large n-degree has more probablty to be vsted by a user than a page wth small one. Specally, n comparng two pages wth same vstng rate, the page wth small n-degree s more nterestng. The formula of frequency s gven n Equaton (11). We use tme spent by a user for vewng a page and frequency of vstng as two very mportant peces of nformaton n measurng the user s nterest on the page, so we assgn a sgnfcant weght to each page n a transacton accordng to these defntons as Equaton (12). Duraton ( p) = Total Duraton( p) Sze( p) Total Duraton( p) maxq T ( ) Sze( p) (10) Number of vst( p) Frequency ( p) = Number of vst( Q) Q T 1 Indegree( p) (11) Weght( p) = Frequency( p) Duraton( p) (12) At the end, every user transacton s successfully transformed nto a m-dmensonal vector of weghts of web pages,.e., t =< p, w ),( p, w ),...,( p m, w ) >, where m s the ( m number of web pages vsted n all users sessons. 3.2 Weghted Assocaton Rule Based Recommendaton model Assocaton Rule Mnng of Web Usage Log Gven a set of transactons where each transacton s a set of tems (pages), an assocaton rule mples the form X Y, wherex I, Y I, X Y = φ, where X and

21 Y are two sets of tems; X s the body and Y s the head of the rule. The support for the assocaton rule X Y s the percentage of transactons that contan both X and Y among all transactons. The confdence of the rule X Y s the percentage of transactons that contan Y among transacton that contan X. The support represents the usefulness of the dscovered rule and the confdence represents certanty of the rule. The confdence s computed as follows: Confdence = Support ( X Y ) Support( X ) (13) Assocaton rule mnng s the dscovery of all assocaton rules that are above a userspecfed mnmum support and mnmum confdence. Apror algorthm s one of the prevalent technques used to fnd assocaton rules [37, 38]. Apror operates n two phases. In the frst phase, all temsets wth mnmum support (frequent temsets) are generated. Ths phase utlzes the downward closure property of support. In other words, f an temset of sze k s a frequent temset, then all the temsets below (k - 1) sze must also be frequent temsets. The second phase of the algorthm generates rules from the set of all frequent temsets. Assocaton rules capture the relatonshps among tems based on ther patterns of cooccurrence across transactons. In the case of web transactons, assocaton rules capture relatonshps among pages based on the navgatonal patterns of users. Each web page can be vewed as an tem, and the set of web pages accessed by a user wthn a short perod of tme can be treated as a transacton so the purpose of mnng assocaton rules s to fnd out whch web pages are usually vsted together n dfferent sessons. However, the tradtonal assocaton rules (ARM) model focus on bnary attrbutes. In other words, ths approach only consders whether an tem s present n a transacton or not. Also t s supposed that all tems have the same sgnfcance and does not take nto account the weght of an tem wthn a transacton and all pages n a transacton are treated unformly. Also, n most prevous approaches of applyng ARM to web usage personalzaton they gnore the dfference n the mportance of the pages n a user sesson.

22 3.2.2 Mnng Weghted Assocaton Rules As mentoned before, we frst extend the tradtonal assocaton rule problem by allowng a weght to be assocated wth each tem n a transacton to reflect nterest of each tem wthn the transacton. In turn, ths provdes us wth an opportunty to assocate a weght parameter wth each tem n a resultng assocaton rule, whch we call a weghted assocaton rule (WAR). Weghted assocaton rule s useful n some sense. For example, the product, whch has hgher proft margn, should be pad more attenton. Weghted Assocaton Rule (WAR) mnng allows dfferent weghts to be assgned to dfferent tems, and s a possble approach to mprovng the ARM model n the web personalzaton process. In ths model, greater weghts are gven to more mportant tems, facltatng the dscovery of mportant but less frequent temsets and assocaton rules. However, prevous models assume a fxed weght for each tem, whle n the context of web usage mnng, a page mght have dfferent mportance n dfferent sessons. In the followng we descrbe weghted rules wth the defnton of assocated parameters. We extend the Apror by adaptng ts parameters based on weghted tems. In the next secton we employ ths algorthm for page recommendaton Weght Settngs Gven the transformaton of user transactons nto a m -dmensonal space as vectors of weghts of web pages, t =< p, w ),( p, w ),...,( p m, w ) > where each p P, the weght of page w assocated to page ( m p s a non-negatve real number to reflect the mportance p n transacton t accordng to Equaton (12). Inspred by Tao[58], we modfy the measures exst n Apror algorthm n the followng defntons to reflect the weghtng schema. Defnton 1. Weghted tem: Item weght s a value attached to an tem (page) representng ts sgnfcance. We denote t as w p ) = Weght( p ), whch s calculated usng the Equaton (12). (

23 Defnton 2. Weght of an temset n a transacton: Based on the tem weght w p ), the weght of an temset X, denoted as w(x, t), can be ( derved from the weghts of ts enclosng tems. One smple way s to use the mnmum weght of the all tems n the temset as the weght of whole temset as shown n Equaton (14). w(x, t) mn( w( p1, p = 0 2,..., pk ) X t X t (14) Where k s the number of tems n the temset. Alternatvely, we can use the average weghts of ts enclosng tems as the temset weght. Our experments show that the mnmum weght has better qualty. Defnton 3. Transacton weght: By assgnng a weght to each tem and temset, we also assgn a weght to each transacton to be used n the calculaton of the support of each temset. Assgnng weght to transactons gves us the possblty to dstngush between dfferent transactons. Usually the hgher a transacton weght, the more t contrbutes to the mnng result. One smple way s to calculate the average weghts of all tems that enclosed n each transacton. The weght of each transacton w(t k ) s calculated as shown n Equaton (15). w(t ) = k tk =1 w( p ) t k (15) Defnton 4. Weghted support of an temset across all transacton: We modfy the support of an temset, Weghted support wsp(x ) across all transactons s defned as follows: of an temset X

24 wsp( X ) w( t ) w( X, t ) t = T T w k = 1 w( t ) k (16) Where w s the average weght of all the tems across all transactons, and T s the set of all transactons Weghted Frequent Itemset The problem of frequent pattern mnng n the tradtonal assocaton rule mnng framework s to fnd the complete set of temset satsfyng a mnmum support threshold n the database. In our model, we say an temset s frequent f ts weghted support s above a predefned weghted support threshold. Our approach to mnng frequent temsets s based on the Apror [38] algorthm. To prune nfrequent patterns, frequent pattern mnng uses the downward closure property (ant-mono-tone property) [13,39]. That s, any subset of a sgnfcant temset s also sgnfcant or f a pattern s nfrequent pattern, all super patterns must be nfrequent patterns. Usng the downward closure property, nfrequent patterns can be easly pruned. By our defnton of weghted support and frequent temsets, there s a property that any subset of a frequent temset s also frequent, here called a weghted downward closure property [38]. The downward closure property of the support measure n the unweghted case longer exsts. Therefore, the canddate temsets havng k tems can be generated by jonng large temsets havng k-1 tems. Ths can result n much smaller number of canddate temsets. For example, f we are lookng for pars of tems wth mnsup, we can only consder those tems that appear n the database havng mnsup. Provded mnsup s hgh enough, the number of tems for the next jonng step wll be small enough to speed up the computaton sgnfcantly. Followng theorem shows that our weghtng schema holds the downward closure property. Theorem. The proposed weghtng schema holds the downward closure property and for any canddate temset, all of ts subtems also are canddate temset.

25 Proof. Let I 1 and I 2 be two temsets. Also suppose that I1 I,.e. 2 I 2 be a superset of 1 For provng the valdty of downward closure property n the proposed algorthm, we suppose that I 1 s not a sgnfcant temset over all the transactons but I s a sgnfcant 2 temset. Let T denote a set of transactons whch contans all the tems n 1 I 1 and smlarly T denote the set for I 2 2. Snce I 2 s superset of I 1, sot2 T1. Therefore t T temset we have 1 w ( t) t T 2 w( t). Accordng to the defnton of weghted support of an w( t)* w( I, t) 1 t T 1 wsp ( I1) = and w * t T 1 w( t) wsp ( I 2 ) = t T 2 t T 2 I. w( t) * w( I1, t). By comparng w * w( t) wsp( I 1 ) and wsp( I 2 ) and consderng the fact that w ( t) w( t) we have that wsp I ) wsp( ). Because I 1 s not a sgnfcant temset, ts weghted support s ( 1 I 2 less than the mnmum threshold and snce wsp I ) wsp( ) so the wsp I ) s also less t T ( 1 I 2 than the mnmum support threshold and I 2 s not a sgnfcant temset. In concluson, f an temset s a sgnfcant temset, ts subsets also are sgnfcant temset and t proves that the downward closure property always vald n the proposed algorthm. Defnton 5. Weghted confdence of the weghted assocaton rule We defne the weghted confdence of assocaton rule for weghted rules as follows: 1 t T 2 ( 2 wsp( X Y) wconf ( X Y) = wsp( X ) (17) Defnton 6. Weghted rules For each rule, besdes the weghted confdence and weghted support, we also add the weght of each page. The result of weghted assocaton rule mnng conceptually descrbed as follows: r =< ( p, p2,..., pk ),( qk + 1, qk +,..., qk + ),( w1, w2,..., wk + m ),δ, α > R, 1 2 m where ( p 1, p2,..., pk ), ( q k+, q2,..., q k + ) present the body and head of the weghted rule 1 m

26 respectvely, w represent the weght of th page n the rule,δ represent the weghted support and α represent the weghted confdence of the rule A Recommendaton Engne Usng Weghted Assocaton Rules The goal of personalzaton based on anonymous web usage data s to compute a recommendaton set for the current user sesson, consstng of the objects (lnks, ads, etc.) that most closely match the current user sesson. These recommended pages are added to the last page n the actve sesson accessed by the user before that page s sent to the browser. The methods based on assocaton rule mnng to compute a recommendaton set for the current (actve) user sesson, use a sldng wndow to control the number of sesson pages to be matched aganst the assocaton rules [40]. So, mantanng a hstory depth may be mportant n the recommendaton servce to provde reasonable suggestons. In the followng, we present our mechansms for ths purpose Modfy User s Current Sesson Mantanng a hstory depth may be mportant because most users navgate several paths leadng to ndependent peces of nformaton wthn a sesson. Prevous works [40, 21, 42] use a fxed-sze sldng wndow over the current actve sesson to capture the current user s hstory depth and generate the recommendatons. The sldng wndow of sze n and go the rght way over the actve sesson allows only the last n vsted pages to nfluence the recommendaton value n the recommendaton set because most of users go back and forth whle navgatng a ste to fnd the desred nformaton, and t may not be approprate to use earler portons of the user sesson to represent the user s current nformaton need. However, ths method does not dstngush the mportance of dfferent pages, and all the n last vsted pages are treated equally whatever ther usefulness to the user. A better approach would be to flter out unnterestng pages and use only the pages of nterest to the user for the personalzaton process. Another parameter can also be used to assocate an addtonal measure of sgnfcance wth each page n the user's actve sesson s weght of page. Although t seems that the recently vsted pages by user are more approprate to be used for the recommendaton, but n many cases the user have a burst behavor. He navgates between

27 pages to fnd an nterestng page and spent much of hs tme on that page and then repeats ths process. So, the place of a page n the user sesson s not the only parameter nfluencng the selecton of predctor pages. Hence we consder the freshness of a page and ts weght smultaneously to choose the predctor pages. In contrast to usng a sldng wndow to preserve only the most recent sesson nformaton for the matchng work, nspred by Yan and L [43], we propose a measure for approxmatng the user s current nterest and flter out unnterestng pages by usng a most smple method to capture the weght of nterest of each page. We formulate the freshness of a page and ts weght smultaneously to sgnfy pages n user s current sesson as follows. Frst, the sesson s weghted as done for transactons. Ths guarantees that the tme spent by user on each page and the frequency of page s reflected n the weght of each page. To apply the freshness of each page to ts sgnfcance, we defne the followng parameter for each page: Fresh( p ) = = 1,2, L, w w (18) Where w s the sze of sldng wndow and s the place of page n the sldng wndow where 1 s assgned to the frst vsted page. In ths Equaton the last page s the freshest page. Also, the weghted vector should be normalzed to effectvely reflect the mpact of freshness. The weght of each page s normalzed as follows: W normalzed ( p ) = n w( p ) j= 1 w( p ) j (19) Therefore, n the weght measure we devsed, fresh and Wnormalzed are valued equally. We use the harmonc mean of Wnormalzed and fresh to represent the nterest degree of a web page to a user n the sesson. Equaton (20) guarantees that Interest of a page s hgh only when Wnormalzed and fresh are both hgh.

28 2 Fresh( p ) W Interest( p ) = Fresh( p ) + W normalzed normalzed ( p ) ( p ) (20) For example let S =< ( A,30),( B, 20),( C,5),( D,5),( E, 4),( F, 10 ) > s an actve user sesson after calculatng the weght of each page accordng to Equaton (12). Fg. 2 shows the comparson between our method and tradtonal sldng wndow. As we set the length of slde wndow to 3, the tradtonal method use the 3 latest pages from current sesson by choosng the page set X = { D, E, F} but our method chooses the set X = { A, B, F}. Albet the page A s vsted frst by user but as t has a large weght than D, E and F so t s the more nterested for user and ncluded n our wndow n contrast to the tradtonal method that escapes t Recommendaton Mechansm We developed a usage model for predctons based on weghted assocaton rule. There are two phases n our system. Frst, the weghted assocaton rules of each URL wll be extracted from the web log data, the rules produced s representng the behavor of user s navgaton on the web ste. Secondly, the recommendaton engne wll search the top-n most smlar weghted rules to the actve user sesson before generatng recommendaton for the user. Durng the second phase nstead of exact match between the actve user and rules, we use a smlarty measure for fndng the most smlar rules Smlarty Measurement Each of the weghted assocaton rules r =< ( p1, p2,..., pk ),( qk + 1, qk +,..., qk + ),( w1, w2,..., wk + m ),δ, α > R obtaned n the mnng 2 m stage descrbed n the prevous secton, are represented as a set of page-weght pars. Ths wll allow for both the actve sesson and the assocaton rules to be treated as m- dmensonal vectors over the space of page n the ste. Thus, gven a weghted assocaton rule r, we can represent the left-hand sde of the each rule r L as a vector: r = w, w,..., w }, where L { 1 2 m

29 w weght( p, rl ), f p rl = 0, otherwse (21) Smlarly, the current user sesson s also represented as a vector S = s, s,... s } where { 1 2 m s s a sgnfcance weght assocated wth the correspondng page reference, f the user has accessed p n ths sesson, and s = 0, otherwse. Then we compute the matchng score between assocaton rules that capture relatonshps among page based on ther co-occurrence n navgatonal patterns of users and the current actve sesson. The matchng score between them s defned as: Dssmlarty (S,r 2 2 ( ( s ) w( rl )) ) = ( ) L : r L > 0 w( s ) + w( rl ) w (22) Match Score(S, r L ) 1 = 1-4 Dssmlarty(S, rl ) 1 : r L >0 (23) S and r L represent the actve user and left hand sde of weghted assocaton rule, respectvely. The ratonale behnd ths formulaton s as follows: Dssmlar ty (S, rl ) s a dssmlarty measure and have been appled to the experments n lterature and acheved success n solvng dfferent problems [44, 45, 46] that use average (arthmetc mean value ( w ( s ) + w( r L ))/ 2 ) weght as the normalzaton scheme. In order to have smlarty measures between 0 and 1, t s necessary to normalze ts dstance by dvdng t by the maxmum dscrepancy and then subtractng ths normalzed dstance from 1. Where a perfect match between actve user and rule are found, the Match Score s equal to 1. As the algorthm tres to fnd rules that are smlar to the actve user sesson, the smlarty measure between a rule and the actve sesson s dependent on the magntude of the left-hand sde of the rule. Assocaton rules mght have multple tems on the rght hand sde of the rules but, due to the nature of the predcton problem n ths paper recommendatons are ndependent of one another and users wll select only one of several recommendatons so we only use rules that have sngleton rght-hand sdes.

30 Recommendaton Score The recommendaton engne s the onlne component of a usage-based personalzaton system n order to determne whch tems (not already vsted by the user n the actve p sesson) are to be recommended, a recommendaton score s computed for each page. Two factors are used n determnng ths recommendaton score: the overall matchng score of the actve sesson to the weghted rules as a whole, and the weghted confdence of the rule. The recommendaton scores for the actve user are computed by multplyng these factors. Gven the weghted assocaton rule and actve sesson S, a recommendaton scores for the actve sesson, Rec(S, X => p), s computed as follows: Rec(S, X => p) = Match Score(S, X) * wconf ( X p) (24) Fnally the top-n most smlar pages are sorted then the hghest recommendaton score choose as the recommendaton to the actve user. The mprovement of ths approach s that nstead of exact match between the actve user and assocaton rules, both of the smlarty between rules and current sesson and the weghted confdence of X => p are used to determne the recommendaton score, not just the confdence value as s used n prevous works [3, 40]. We choose the hghest recommendaton score as the recommendaton to the actve sesson. 4. The Hybrd Algorthm In ths secton we propose a hybrd effcent algorthm based on dstrbuted learnng automata and weghted assocaton rule algorthms proposed before consderng ther weak and strong ponts. The algorthm solves the problem of recommendng rarely vsted or newly added pages. The steps n the algorthms could be brefly summarzed as follows: Step 1: Cluster the pages based on users usage pattern. Step 2: Generate the seed recommendaton set. Step 3: Extend the seed set by clusters to generate the canddate set.

31 Step 4: Apply the HITS algorthm to rank the canddate set and generate fnal recommendaton set. A general vew of the Hybrd algorthm s depcted n Fg. 3. These steps are descrbed n the next four subsectons Cluster the Pages Based on Users Usage Pattern We propose an algorthm to cluster web pages not from the content of the pages but from the pattern of ther usage, assumng that users have an ntutve grasp of what a page s about and how valuable t s, and ths ntuton gudes ther actons. The method clusters the pages based on how often they occur together across user sessons. On the other hand, page clusters tend to group together frequently co-occurrng tems across sessons, even f these tems are themselves not deemed to be smlar. Ths allows us to obtan clusters that potentally capture overlappng nterests of dfferent types of users. The dea of clusterng based on usage data s nspred by the functonng of the bran. In the bran, concepts that are actvated smultaneously (co-actvaton) become more strongly assocated. Snce, users vstng a web ste can be assumed to be lookng for mutually relevant pages rather than a random assortment of unrelated pages, pages whch are consulted by same user, are co-actvated and have assocaton wth each other. In other words, documents develop stronger assocatons as they are more frequently coactvated. It s notceable that ths method s partcularly useful for multmeda documents, whch do not contan any searchable keywords. To learn the assocatons mplctly exsts between pages based on usage data, a DLA s employed as done n DLA Recommender algorthm. A dstrbuted learnng automata wth n LAs wth varable number of actons learns the assocaton between pages usng log data. In the DLA, the probablty of acton j n th and LA represents the assocaton between th j page. We create the assocaton matrx P from the actons probablty n DLA as follows. We set the aj to the probablty of acton j n LA. Snce the learnng process assumes ordered page access, so the learnng process yelds to an asymmetrc assocaton

32 matrx p p ). By multplyng the (asymmetrc) matrx P wth ts transpose we can ( j j create a new, symmetrc matrx: T S = P P sj = a k k a kj (25) Where s j represents the degree of smlarty between the pages and j. Indeed, s j s the dot product between the all the assocatons that the documents and j have wth other documents. The more the assocaton vectors overlap, and thus the more and j resemble each other n the way they relate to other documents, the larger the dot product, and therefore s j. Ths smlarty measure can now be used as an nput to a varety of clusterng algorthms that put documents together n classes dependng on how smlar/dssmlar they are from each other. Havng the symmetrc assocaton matrx, the clusterng phase s conducted n the followng steps: 1. We create a smlarty matrx between web pages where the dstance (smlarty) between pages s ether zero, f the two pages are drectly lnked n the web ste structure (.e. there s a hyperlnk from one to the other) or set to the co-occurrence frequency between the two pages n matrx S otherwse. 2. A graph G s created n whch each page s a node and each nonzero cell n the smlarty matrx s an edge. In order to reduce nose, we apply a threshold to remove edges correspondng to low co-occurrence frequency. 3. The graph created n prevous step s parttoned usng graph parttonng tool MeTS for mnmzng the number of cut edges. The generated clusters wll be used to extend the recommendaton set Generatng the Seed Recommendaton Set The man drawback of frst algorthm (DLA Recommender algorthm) s that the computaton of recommendaton set s tme consumng and lmts the algorthms

33 performance. The same problem exsts n the second algorthm where the process of matchng current users sesson wth all of the generated rules needs a lot f tme. So, we use another method based on dea n [124] to forward the process more ntellgently. The dea behnd our method s to lmt the canddate set whch must be consdered for recommendaton and decrease the onlne tme of recommendaton. In order to facltate the search for canddate recommendaton set and mprove the recommendaton effcency we use a Weghted Itemset Graph. Fg. 4 gves an example of the Weghted Itemset Graph. The dea comes from [40], n whch the data structure s called the Frequent Itemset Graph because the temsets stored n t are frequent temsets. Each node stores an temset along wth ts weghted support. The graph s organzed nto levels from 0 to k, where k s the maxmum sze among all frequent temsets. Each node at depth d n the graph corresponds to an temset, I, of sze d. For a node N contanng temset I, each chld node of N corresponds to a sgnfcant temset I {p} at level d+1. The sngle root node at level 0 corresponds to the empty temset. To be able to match dfferent orderngs of an actve sesson wth frequent temsets, all temsets are sorted n lexcographc order before beng nserted nto the graph. The user s actve sesson s also sorted n the same manner before matchng wth patterns. Gven an actve user sesson wndow w, we frst modfy the current sesson based on the method proposed n secton to obtan the modfed wndow w, sorted n lexcographc order, and then a depth-frst search of the Weghted Itemset Graph s performed to level w. If a match s found, then the chldren of the matchng node N contanng w are used to generate canddate recommendatons. Each chld node of N corresponds to a frequent temset w {p}. In each case, the page p s added to the wsp( w { p}) recommendaton set f the weghted support rato s greater than or equal wsp( w ) wsp( w { p}) to α, where α s a mnmum confdence threshold. Note that s the wsp( w ) weghted confdence of the assocaton rule w {p}. Snce the modfcaton of actve user sesson wndow changes the order of pages vsted by the user, so we add the mpact of transton probabltes between pages n the fnal score of each page p as follows:

34 score( p) = wsp( w { p}) wsp( w ) < u, v> w p( u, v) (26) 4.3. Extendng the Seed Set and Apply HITHS The most of recommendaton algorthms suffer from two major drawbacks. Frst, wth ncreasng the sze of recommendaton set, the precson decreases sgnfcantly. Second, some resources such as rarely vsted or newly added page are out of recommendaton consderaton. It s concevable that there are other resources not yet vsted, even though they are relevant and could be nterestng to have n the recommendaton lst. Such resources could be, for nstance, newly added web pages or pages that have lnks to them not evdently presented due to bad desgn. We need to provde an opportunty for these rarely vsted or newly added pages to be ncluded n the recommendaton set. Otherwse, they would never be recommended. To allevate these problems, we propose a novel method. We use the seed recommendaton set generated n prevous step as the nput of ths step. We extend the seed set to generate a canddate recommendaton set. Intally, we put all of the pages n seed set n the canddate set. For each page p n the seed set, the canddate set s supplemented wth pages that are n the same cluster wth page p. The clusters generated n the subsecton 4.1. Snce the pages n each cluster have strong assocaton based on users behavor, ths extenson sounds good. We generate a graph from pages ncluded n the canddate set by connectng them wth lnks exst n the underlyng ste structure. The result s what s called a connectvty graph whch now represents our augmented navgatonal pattern. Ths process of obtanng the connectvty graph s smlar to the process used by the HITS algorthm [50] to fnd the authorty and hub pages. We take advantage of the bult connectvty graph by clusterng to apply the HITS algorthm n order to dentfy the authorty and hub pages wthn a gven cluster. These measures of authorty and hub allow us to rank the pages wthn the cluster. Ths s mportant because at real tme durng

35 the recommendaton, t s crucal to rank recommendatons, especally f they are numerous. Authorty and hub are mutually renforcng [50] concepts. Indeed, a good authorty s a page ponted to by many good hub pages, and a good hub s a page that ponts to many good authorty pages. Snce we would lke to be able to recommend pages newly added to the ste, n our framework, we consder only the hub measure [47]. Ths s because a newly added page would be unlkely to be a good authortatve page, snce not many pages are lnked to t. However, a good new page would probably lnk to many authorty pages, t would, therefore, have the chance to be a good hub page. Consequently, we use the hub value to rank the canddate recommendaton pages n the on-lne module to create the fnal recommendaton set. Generaton of fnal set s the onlne process of recommendaton system and must be conducted effcently. The performance of ths step strongly depends on the sze of seed set. By ncreasng the sze of seed set, the generated canddate set wll be large and needs more tme to compute the rank of pages. We set the sze of seed set to 1 n our experments to speedup ths process. 5. Expermental Results 5.1 Data Sets In ths secton we present a set of experments that we performed for evaluatng the mpact of our proposed technques on the predcton process. Overall our experments have verfed the effectve of our proposed technques n web page recommendaton. We are usng the web access logs of the DePaul Unversty CTI Web server [48], based on a random sample of users vstng the ste for a 2 week perod durng Aprl 2002 (DePaul Web Server Data). Ths dataset contans dstnct user sessons of length more than 1 and 683 dstnct pages. We treat each sesson as a transacton. Each transacton contans a sequence of pages along wth ther weghts (duratons). We splt the data sets n two non-overlappng tme wndows to form tranng and a test data set. Randomly, 80% of the data set selected for tranng set whles another 30% for testng.

36 For our evaluaton, we presented each user sesson to the recommendaton system, and the system recorded the recommendatons t made after seeng each page the user had vsted. The system was allowed to make n recommendatons n each step wth n < 10 and n < l, where l s the number of outgong lnks of the last page vsted by the user. Ths lmtaton on number of recommendatons s adopted from [49]. 5.2 Expermental Methodology and Metrcs In order to evaluate the recommendaton effectveness for our method, we measured the performance of proposed method usng 2 dfferent standard measures, namely, Precson, Coverage [6], Recommendaton precson and coverage are two metrcs qute smlar to the precson and recall metrcs commonly used n nformaton retreval lterature. Recommendaton precson measures the rato of correct recommendatons (.e., the proporton of relevant recommendatons to the total number of recommendatons), where correct recommendatons are the ones that appear n the remanng of the user sesson. For each vst sesson after consderng each page p, the system generates a set of recommendatons R ( p). To compute the Precson, R ( p) s compared wth the rest of the sesson T ( p) as follows: T( p) R( p) Pr ecson = R( p) (27) Recommendaton coverage on the other hand shows the rato of the pages n the user sesson that the system s able to predct(.e., the proporton of relevant recommendatons to all pages that should be recommended) before the user vsts them: T ( p) R( p) Coverage = T ( p) (28) 5.3 Results and Dscussons In all experments we measured both precson and coverage of recommendatons aganst varyng number of recommended pages from 1 to 11. To consder the mpact of

37 wndow sze (the porton of user hstores used to produce recommendatons), we performed all experments usng wndow szes from 1 to Impact of Actve Wndow Sze on User Navgaton Tral In all experments we measured both Precson and Coverage of recommendatons aganst varyng umber of recommended pages. In our state defnton, we used the noton of N-Grams by puttng a sldng wndow on user navgaton paths. The mplcaton of usng a sldng wndow of sze w s that we base the predcton of user future vsts on hs w past vsts. The choce of ths sldng wndow sze can affect the system n several ways. To consder the mpact of wndow sze (the porton of user hstores used to produce recommendatons) on the DLA Personalzaton algorthm, we also vary wndow szes w from 1 to 4. The mpact of dfferent wndow szes on precson scores of recommendatons aganst varyng numbers of recommended pages from 1 to 12 s depcted n Fg. 5. A large sldng wndow seems to provde more nformaton to the system whle on the other hand causng a larger state space wth sequences that occur less frequently n the usage logs. We evaluated our performance system wth dfferent wndow szes on user tral as seen n Fg. 5. As our experments show the best results are acheved when usng a wndow of sze 3. It can be nferred form ths dagram that a wndow of sze 1 w = 1 whch consders only the user s last page vst does not hold enough nformaton to make the recommendaton, the accuracy of recommendatons mprove wth ncreasng the wndow sze and the best results are acheved wth a wndow sze of 3 w = 3. As shown n Fg. 5 usng a wndow sze larger than 3 results n weaker performance, t seems to be due to the fact that, as mentoned above, n these models, states contan sequences of page vst that occurrng less frequently n web usage logs, causng the system to make decsons based on weaker evdence. Fg. 6 shows the mpact of wndow sze on precson of the weghted assocaton rule Recommender algorthm. The results show clearly that precson ncreases as a larger

38 porton of user s hstory s used to generate recommendatons. It can be nferred form ths dagram although at hgher number recommendaton pages the dfference between varous wndow szes becomes smaller A Comparson wth Other Methods As our experments on the prevous secton show the algorthms are acheved the best results when usng a wndow of sze 3 and the other hand the mean transacton length of the data s 3, n these experments we used a fxed wndow sze of 3 on recommendaton hstory (set the wndow sze to 3). We frst compared three recommender systems, DLA Recommender algorthm, weghted assocaton rule Recommender algorthm and Hybrd algorthm. The Recommendaton Accuracy and coverage of the three systems are depcted n Fg. 7. In the experment, we vared the number of recommended pages to test the trend and consstency of the system qualty. Fg. 7 shows the Recommendaton Accuracy of the three contenders. As expected, the accuracy decreases when we ncrease the number of recommendaton page. The consstent best performance of Hybrd llustrates the valdty of usage and connectvty nformaton to mprove recommendatons n our hybrd system, and also ndcates that weghted assocaton rule s more useful for recommendaton accuracy mprovement. The coverage of the three systems are depcted n Fg. 8. We notce that wth the ncrease of the number of recommended pages, Hybrd can acheve an ncreasngly superor result compared to both DLA and Weghted, whle the two systems keep smlar performance n terms of coverage. Ths fgure verfes our justfcaton for usng two algorthms n buldng a hybrd recommender system. We observed our system performance n comparson wth assocaton rules, whch s commonly known as one of the most successful approaches n web mnng based recommender systems [40]. Fg. 9 and Fg. 10 have shown the comparson of Hybrd system s performance wth AR method n the sense of ther accuracy and coverage n dfferent number of recommended pages on CTI dataset. As the number of

39 recommendaton page ncreases, naturally precson decreases n all systems, but our system gans much better results than the assocaton rule algorthm. It can be seen the rate n whch precson decreases n our algorthm s lower than tradtonal assocaton rule algorthm. Expermental results show that the Hybrd model ncreases coverage and precson sgnfcantly and our system gans much better results than the tradtonal assocaton rule algorthm. It can be concluded that Hybrd approach s capable of makng web recommendaton more accurately and effectvely aganst the conventonal method. In summary, ths experment shows that our system can sgnfcantly mprove the qualty of web ste recommendaton by combnng the two nformaton channels, whle each channel ncluded contrbutes to ths mprovement. By combnng smlarty between rules and actve user and confdence of the weghted rules, the recommendaton engne has selected only the most relevant pages. Therefore, t ncreases the effectveness of the recommendaton engne. 6. Concluson In ths paper we proposed new methods for web page recommendaton. Frst, we proposed an algorthm based on dstrbuted learnng automata to learn the behavor of prevous users and ntegratng web usage mnng wth lnk analyss technques for assgnng probabltes to the web pages based on ther mportance n the web ste s navgatonal graph and makes recommendatons prmarly based on learned pattern and the structure of the web ste. By ntroducng a novel Weghted Assocaton Rule mnng algorthm, we present our second algorthm for recommendaton purpose. In whch users navgatonal patterns are automatcally extracted from web usage data. These navgatonal patterns are then used to generate recommendatons based on a user s current status. The pages n a recommendaton lst are ranked accordng to ther mportance and smlartes, whch s n turn computed based on web usage nformaton. Also, a novel method s proposed to pure the current sesson wndow.

40 One of the challengng problems n recommendaton systems s dealng wth unvsted or newly added pages. So, thrd mprovement was hybrdzaton of the effcency of frst two algorthms. We present a hybrd algorthm based on dstrbuted learnng automata and Weghted Assocaton Rule mnng algorthm. In the hybrd algorthm we employ the HITS algorthm to extend the recommendaton set. Our expermental results llustrate that usng ths hybrd algorthms n a web recommender system has the potental to mprove the qualty of the system and can generate hgher qualty recommendatons than usng ether the dstrbuted learnng automata recommendaton or the Weghted Assocaton Rule mnng recommendaton algorthm alone. 7. References [1] S. S. Anand, B. Mobasher, Intellgent Technques n Web Personalzaton, Lecture Notes n Artfcal Intellgence, Sprnger-Verlag, Berln, Germany, vol. 3169, 2005, pp [2] M. Mulvenna, M. Anand, A. G. Bunchner, Personalzaton on The Net Usng Web mnng, Commun, ACM, 2000, pp [3] B. Mobasher, R. Cooley, J. Srvastava, Automatc Personalzaton based on Web Usage Mnng, Communcatons of the ACM, vol. 43 no. 8, 2000, pp [4] R. Snha, K. Swearngen, Comparng Recommendatons Made by Onlne Systems and Frends, In Proceedngs of the Delos-NSF Workshop on Personalzaton and Recommender Systems n Dgtal Lbrares, [5] B. Smyth, P. Mcclave, Smlarty Vs Dversty, In Proceedngs of the 4th Internatonal Conference on Case- Based Reasonng: Case-Based Reasonng Research and Development, 2001, pp [6] C. Zegler, G. Lausen, L. Schmdt-Theme, Taxonomy-Drven Computaton of Product Recommendatons, In Proceedngs of the ACM Conference on Informaton and Knowledge Management, 2004, pp [7] C. Zegler, S. M. Mcnee, J. A. Konstan, G. Lausen, Improvng Recommendaton Lsts Through Topc Dversfcaton, In Proceedngs of the 14th Internatonal Conference on the World Wde Web, 2005, pp [8] P. Resnck, N. Iacovou, M. SUSHAK, P. Bergstrom, J. Redl, Grouplens: An Open Archtecture for Collaboratve Flterng of Netnews, In Proceedngs of the 1994 Computer Supported Collaboratve Work Conference, [9] M. O. Mahony, N. Hurley, N. Kushmerck, G. Slverstre, Collaboratve Recommendatons: A Robustness Analyss, ACM Trans, Internet Tech, vol. 4, no. 4, 2004, pp [10]B. Sarwar, G. Karyps, J. Konstan, J. Redl, Item-Based Collaboratve Flterng Recommendaton Algorthms, In Proceedngs of the 10th Internatonal World Wde Web Conference, Hong Kong, [11] J. Srvastava, R. Cooley, M. Deshpande, P. Tan, Web Usage Mnng: Dscovery and Applcatons of Usage Patterns from Web Data, SIGKDD Exploratons, vol. 1, no. 2, 20001, pp [12] R. Cooley, B. Mobasher, J. Srvastava, Web Mnng: Informaton and Pattern Dscovery on The World Wde Web, Proceedngs of IEEE Internatonal Conference Tools Wth AI, 1997, pp [13] M. Ernak, M. Vazrganns, Web Mnng for Web Personalzaton, ACM Transactons on Internet Technology, vol. 3, no. 1, 2003, pp [14] X. Fu, J. Budzk, K. Hammond, Mnng Navgaton Hstory for Recommendaton, In Proceedngs of The Ffth Internatonal Conference on Intellgent User Interfaces, 2000, pp [15] M. Gery, H. Haddad, Evaluaton of Web Usage Mnng Approaches For User s Next Request Predcton, Proceedngs of The Ffth ACM Internatonal Workshop on Web Informaton and Data Management, 2003, pp [16] M. D. Mulvenna, S. S. Anand, A. G. Buchner, Personalzaton On the Net Usng Web Mnng, Communcatons of the ACM, vol. 43, no. 8, 2000, pp

41 [17] C. Shahab, A. Zarkesh, J. Abd, V. Shah, Knowledge Dscovery from User's Web-page Navgaton, In Proceedngs of The 7th IEEE Intl, Workshop on Research Issues n Data Engneerng, [18] A. M. Wasf, Collectng User Access Patterns for Buldng User Profles and Collaboratve Flterng, In: IUI 99: Proceedngs of The 1999 Internatonal Conference on Intellgent User Interfaces, [19] M. Deshpande, G. Karyps, Item-Based Top-N Recommendaton Algorthms, ACM Transactons on Informaton Systems (TOIS), [20]J. Herlocker, J. Konstan, A. Brochers, J. Redel, An Algorthmc Framework for Performng Collaboratve Flterng, Proceedngs of 200 Conference on Research and Development n Informaton Retreval, [21] B. Mobasher, Web Usage Mnng and Personalzaton, In Practcal Handbook of Internet Computng, Munndar, P. Sngh (ed.), CRC Press, [22] K. Narendra, M. A. L. Thathachar, "Learnng Automata: An Introducton", Prentce Hall, Englewood Clffs, New Jersey, [23] M. A. L. Thathachar, R. Harta Bhaskar, Learnng Automata wth Changng Number of Actons, IEEE Transactons on Systems Man and Cybernetcs, vol. 17, no. 6, Nov. 1987, pp [24] M. R. Meybod, H. Begy, "Solvng Stochastc Path Problem Usng Dstrbuted Learnng Automata", Proceedngs of The Sxth Annual Internatonal CSI Computer Conference, CSICC2001, Isfahan, Iran, 2001, pp [25] M. R. Meybod, H. Begy, "Solvng Stochastc Shortest Path Problem Usng Monte Carlo Samplng Method: A Dstrbuted Learnng Automata Approach", Sprnger-Verlag Lecture Notes n Advances n Soft Computng: Neural Networks and Soft Computng, 2003, pp [26] H. Begy, M. R. Meybod, "A New Dstrbuted Learnng Automata Based Algorthm For Solvng Stochastc Shortest Path Problem", Proceedngs of the Sxth Internatonal Jont Conference on Informaton Scence, Durham, USA, 2002, pp [27] L. Page, S. Brn, R. Motwan, T. Wngord, The PageRank Ctaton Rankng: Brngng Order to the Web, Stanford Unversty, [28] A. N. Langvlle, C. D. Meyer, Deeper Insde PageRank, Internet Mathematcs, 2004, pp [29] B. Mobasher, H. Da, T. Luo, M. Nakagawa, Dscovery and Evaluaton of Aggregate Usage Profles for Web Personalzaton, Data Mnng and Knowledge Dscovery, 2002, pp [30] H. Lue, V. Keselj, Combned Mnng of Web Server Logs and Web Contents for Classfyng User Navgaton Patterns and Predctng Users Future Requests, Data & Knowledge Engneerng, [31] C. Shahab, A. Zarkesh, J. Abd, V. Shah, Knowledge Dscovery from User's Web-page Navgaton, In Proceedngs of The 7th IEEE Intl, Workshop on Research Issues n Data Engneerng, [32] R. Cooley, B. Mobasher, J. Srvastava, Data Preparaton for Mnng World Wde Web Browsng Patterns, In Journal of Knowledge and Informaton Systems, 1999, pp [33] P.K. Chan, A Non-Invasve Learnng Approach to Buldng Web User Profles, In: Workshop on Web usage Analyss and User Proflng, Ffth Internatonal Conference on Knowledge Dscovery and Data Mnng, San Dego, [34] S. Dumas, T. Joachms, K. Bharat, A. Wegend, Implct Measures of User Interests and Preferences, 2003 Workshop Report: ACM SIGIR Forum, Fall [35]Y. Lang, L. Chunpng, Incorporatng Pagevew Weght nto an Assocaton-Rule-Based Web Recommendaton System, Sprnger-Verlag Berln Hedelberg, AI 2006, LNAI 4304, 2006, pp [36] M. Morta, Y. Shnoda, Informaton Flterng Based on User Behavor Analyss and Best Match Text Retreval, In: Proceedngs of the 17th Annual Internatonal ACM SIGIR Conference on Research and Development n Informaton Retreval, Sprnger-Verlag, New York, Inc., Dubln, Ireland, 1994, pp [37] R. Agrawal, T. Imelnsk, A.Swam, Mnng Assocaton Between Sets of Items n Massve Database, Internatonal Proceedngs of the ACM SIGMOD Internatonal Conference on Management of Data, 1993, pp [38] R. Agrawal, R. Srkant, Fast Algorthms for Mnng Assocaton Rules n Large Databases, In Proceedngs of the 20th Internatonal Conference on Very Large Data Bases VLDB'94,Santago, Chle, 1994, pp [39] J. Srvastava, R. Cooley, M. Deshpande, P. Tan, Web Usage Mnng: Dscovery and Applcatons of Usage Patterns from Web Data, SIGKDD Exploratons, vol. 1no. 2, 20001, pp [40] B. Mobasher, H. Da, T. Luo, M. Nakagawa, Effectve Personalzaton Based on Assocaton Rule Dscovery from Web Usage Data, In Proceedngs of the 3rd ACM Workshop on Web Informaton and Data Management (WIDM01), Atlanta, Georga, November [41] B. Mobasher, H. Da, T. Luo, M. Nakagawa, Improvng the Effectveness of Collaboratve Flterng on Anonymous Web Usage Data, In Proceedngs of the IJCAI 2001Workshop on Intellgent Technques for Web Personalzaton (ITWP01), August [42] M. Nakagawa, B. Mobasher, A Hybrd Web Personalzaton Model Based on Ste Connectvty, In The Ffth Internatonal WEBKDD Workshop: Web mnng as a Premse to Effectve and Intellgent Web Applcatons, 2003, pp

42 [43]Y. Lang, L. Chunpng, Incorporatng Pagevew Weght nto an Assocaton-Rule-Based Web Recommendaton System, Sprnger-Verlag Berln Hedelberg, AI 2006, LNAI 4304, 2006, pp [44] V. Keselj, F. Peng, N. Cercone, C. Thomas, N-gram-Based Author Profles For Authorshp Attrbuton In Proceedngs of the Conference Pacfc Assocaton for Computatonal Lngustcs, Nova Scota, Canada, [45] A. Tomovc, P. Jancc, V. Keselj, N-gram-Based Classfcaton and Herarchcal Clusterng of Genome Sequences, Computer Methods and Programs n Bomedcne, [46] Y. Mao, V. Kesˇelj, E. E. Mlos, Comparng Document Clusterng Usng N-grams Terms and Words, Master s thess, Faculty of Computer Scence, Dalhouse Unversty, [47] O. Za ıane, J. L, R. Hayward, Msson-Based Navgatonal Behavor Modelng for Web Recommender System, Sprnger-Verlag Berln Hedelberg, [48] [49] J. L, O. R. Zaane, Combnng Usage Content and Structure Data to Improve Web Ste Recommendaton, 5th Internatonal Conference on Electronc Commerce and Web, [50] J. M. Klenberg, Authortatve Sources n a Hyperlnked Envronment, Journal of The ACM, vol. 46, no. 5, 1999, pp [51] A. Bose, K. Beemanapall, J. Srvastava, S. Sahar, Incorporatng Concept Herarches nto Usage Mnng based Recommendatons, Proc. 8th WEBKDD workshop, [52] M. Ernak, M.Vazrganns, I. Varlams, SEWeP: Usng Ste Semantcs and Taxonomy to Enhance the Web Personalzaton Process, n Proc: of the 9th SIGKDD Conf, [53]M. Ernak, C. Lampos, S. Paulaks, M. Vazrganns, Web Personalzaton Integratng Content Semantcs and Navgatonal Patterns, In Proceedngs of the sxth ACM workshop on Web Informaton and Data Management WIDM, [54] J.L, O. R. Zaane, Combnng Usage, Content and Structure Data to Improve Web Ste Recommendaton, 5th Internatonal Conference on Electronc Commerce and Web, [55] B. Mobasher, H. Da, T. Luo, Y. Sun, J. Zhu, Integratng Web Usage and Content Mnng for More Effectve Personalzaton, In EC-Web, 2000, pp [56] C.H. Ca, A.W.C. Fu, C.H. Cheng, W.W. Kwong, Mnng Assocaton Rules wth Weghted Items, In Database Engneerng and Applcatons Symposum, Proceedngs IDEAS'98, July 1998, pp [57] R. Burke, Hybrd Recommender Systems: Survey and Experments, In User Modelng and User-Adapted Interacton, [58] F. Tao, F. Murtagh, M. Fard, Weghted Assocaton Rule Mnng usng Weghted Support and Sgnfcance Framework, In Proceedngs of the 9th SIGKDD Conference, Fgures Captons: Fg. 1. Dstrbuted learnng automata Fg. 2. Comparson between our method and tradtonal sldng wndow

43 Fg. 3. Archtecture of the hybrd recommender algorthm Fg. 4. The weghted temset graph Fg. 5. DLA recommender algorthm performance wth varous user actve wndows sze Fg. 6. Weghted assocaton rule recommender algorthm performance wth varous user actve wndows sze Fg. 7. Comparng the precson of proposed algorthms Fg. 8. Comparng the coverage of proposed algorthms Fg. 9. Comparng our hybrd algorthm precson wth assocaton rule methods Fg. 10 Comparng our hybrd algorthm coverage wth Assocaton Rule methods

44 Fg. 1. Dstrbuted learnng automata

45 Fg. 2. Comparson between our method and tradtonal sldng wndow

46 Fg. 3. Archtecture of the hybrd recommender algorthm

47 Fg. 4. The weghted temset graph

48 Fg. 5 DLA recommender algorthm performance wth varous user actve wndows sze

49 Fg. 6 Weghted assocaton rule recommender algorthm performance wth varous user actve wndows sze

50 Fg. 7 Comparng the precson of proposed algorthms

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan