Using Markov Models to define proactive action plans for users at multi-viewpoint websites

Size: px

Start display at page:

Download "Using Markov Models to define proactive action plans for users at multi-viewpoint websites"

Avis Short
6 years ago
Views:

1 Using Markov Models to define proactive action plans for users at multi-viewpoint websites E. Menasalvas 1, S. Millán 2, P. Gonzalez 1 1 Facultad de Informática UPM. Madrid Spain 2 Universidad del Valle. Cali. Colombia Abstract. Deciding about the best action plan to be tailored to and carried out for each user is the key for personalization. This is a challenging task as the maximum number of elements of an environment have to be taken into account when making decisions such as type of user, actual behaviour or goals to fulfil. The difficulty is even greater when dealing with web users and when decisions have to be taken on-line and salesmen are not involved. In this paper, we propose an approach that integrates user typologies and behaviour patterns in a multidepartamental organization to decide the best action plan to be carried out at each particular moment. The key idea to do this is based on detecting users behaviour changes by means of Behaviour Evolution Models (a combination of Discrete Markov Models). Besides, an agent based architecture has been proposed for the implementation of the whole method. 1 Introduction Relationships with the users are paramount when trying to competitively develop activities in any web environment. The lost of the one-to-one relationship tends to make businesses less competitive because it is difficult to manage customers when no information about them is available. In traditional businesses, information about transactions with users is stored and used to calculate customer typologies. A company then defines a set of actions tailored to users according to those typologies. At least two goals have to be fulfilled when defining a plan of action. On the one hand, relationships with the user have to be improved giving him/her what he/she is looking for and in the most profitable way for both the user and the company. On the other, the company has to increase benefits. In maintaining a relationship with the user, both objective data about the user (gender, age, likes and dislikes) as well as subjective information about the current context of the user (his/her behaviour, goals at each particular moment) have to be taken into account. In traditional businesses, the salesman supports the latter as he/she is responsible for determining the user present and potential future goals as well as the present situation of the customer, and, then, act accordingly. In web environments, many enterprises have been very worried about getting hold of the identity of the navigator in terms of personal data. Though knowing the user (his/her profile, what he/she wants, his/her goals, Research is partially supported by Universidad Politécnica de Madrid (project Web-RT) and MCYT under project DAWIS

2 likes and dislikes) is important what it is really important is the information about they way he/she wants to fulfil his/her goals, his/her particular behaviour: His/her preferences and the context are important factors that determine the way he/she behaves in each particular navigation. All this, integrated with information related to the business, preferences and goals of the organization behind will result in a successful e-crm. Nevertheless, the behaviour of a web user is not generally something stable but variable while navigating. Hence, detecting and capturing the user behaviour evolution so to accordingly and successfully act at each particular moment is the unavoidable commitment that web-site sponsors must face. But user behaviour evolution depends not only on his/her present situation but on his/her profile. Thus, when deciding about the actions to undertake different users typologies have to be taken into account. In order to build user profiles, a combination of personal data, demographic factors and relationships with the company have to be used. Any real process or activity, as for example dynamic user behaviour evolution, generates a set of outputs, signals or observable events, different in nature. Obtaining dynamic models that somewhat precisely, simulate real processes and activities is important due to the potential number of applications that can be designed based on these models. An example of such potential and challenging applications nowadays are, no doubt, the construction of prediction, identification and recognition systems. Examples of models that can be used to characterize real processes include statistical models useful for characterizing the statistical properties of the signal, Gaussian models, Poisson Processes, Markov Processes and Hidden Markov Processes [?]. All these models are based on the assumption that a signal or an observation can be correctly characterized as a random parametrical process where the parameters of the stochastic process are precisely estimated by means of real observations. Dynamic models such as the Markov models have shown to be an appropriate tool for extracting and exploring dynamic behaviour and for modelling a page access pattern when successfully applied to problems such as link prediction and path analysis. Some authors have stated that there is some evidence that web surfing behavior may be a non- Markovian process in nature [?], and consequently, Markov models can not be used as true data generating tools. Nevertheless, they provide a mechanism to describe a useful and meaningful view of a dynamic web behaviour. Once we are able to characterize the behaviour of the user and model its possible evolutions along a particular session, the challenge is to recognize at each particular moment the department of the company for which the session is or could be more profitable so that an appropriate and personalized action plan can be tailored. In this paper, we present an approach to identify, among available action plans, the best plan to be carried out depending on the actions and behaviour of the user as he/she navigates on the web. The approach is based on a Markov model. In this sense, we propose a method to characterize user behaviours and model his possible evolutions on a site as a Markovian process. The changes in behaviour depend on both the kind of user and the pages visited. We also propose an agent architecture to deploy the present approach. The remainder of the paper is organized as follows. Section 3 presents the preliminaries of the method. Section 4 presents the proposed Markov model. In section 7 the proposed

3 architecture is given. In section 2 related works are briefly shown. Finally, in section 8 main conclusions and further developments are included. 2 Related Work Information collected and stored by Web servers is the main source of data for analyzing user navigation patterns. A successful e-crm depends mainly on the capabilities of businesses to identify and classified users in order to model and predict her/his behaviour on the site and generate action plans geared to make him/her a frequent customer. In this sense, maintaining a one-to-one relationship with clients is of outmost importance. Web site personalization is one the key to do it [?], [?], [?]. Adaptive web site is nowadays a very important research area. Different web mining approaches and techniques have been proposed in order to improve the sites by creating adaptive Web sites, based mainly on the analysis of user access logs [?], [?]. An important aspect of web mining for analyzing web user access logs is related to categories and clusters. Clustering users, based on their common properties, and analyzing features in each cluster, can provide more appropriate services to the users. To group and characterize similar web site users, several clustering techniques have been proposed [?], [?], [?], [?], [?]. Although the standard K-means algorithm has been used to cluster a user s transversal path as in [?] some authors [?] have noted that it is not clear whether clusters are meaningful and how the similarity measure is applied. Clustering algorithms based on object data are not appropriate for clustering user sessions because of the high dimensionality of the feature space (number of pages in a Web site). The URL in a Web site typically has a hierarchical structure that makes it very difficult to convert sessions into simple numerical features without loosing the information hidden in the structure of the Web site. In [?] clustering of Web users based on their access patterns is analyzed. According to the access patterns obtained, pages are later organized so that users of a cluster will find these pages easy to access. To achieve this goal, the authors propose to generalize the file containing information about users sessions using the attribute oriented induction method. In order to discover overlapping aggregate profiles, two Web usage mining techniques based on clustering of user transactions and Web pageviews are proposed and evaluated in [?]. The first technique, named PACT (Profile Aggregations based on Clustering Transactions), is based on the derivation of overlapping profiles from user transactions clusters. The second one, based on Association Rule Hypergraph Partitioning, derives overlapping aggregates profiles from pageviews. An algorithm, PageGather, for semi-automatically improving site organization by learning from visitor access patterns is proposed in [?]. Using page co-occurrence frequencies clusters of related but unlinked pages are found. Based on PageGather, index pages are created for easier navigation. [?] describes a remote Java agent that captures client s selected links and page orders, accurate page viewing time, and cache references. Link path sequences are enumerated and clustering, in this path space, is done using the cosine angle distance metric. In [?], each user session is represented as a N-dimensional vector capturing the fre-

4 quency of access to different documents within the site. These collection of vectors are clustered based on users interests and the clusters used to determine which pages are most interesting to the particular set of users. Sequence information is ignored in this analysis. To adapt and personalize a web site requires, on the other hand, modelling and predicting a user s behaviour. Markov models are useful to reach this goal. Markov models have been used, among other things, to improve pre-fetching strategies for web caches [?], to classify browsing sessions [?], to influence caching priorities between primary, secondary and tertiary storages [?], and to predict web page accesses [?],[?], [?], [?]. To predict a user s behaviour on a web site, Deshpande et al. [?] propose techniques to select parts of different order Markov models to obtain a model characterized by a reduced state complexity and improved prediction accuracy. Low-order Markov models of web navigation are used in [?] to estimate purchases probabilities of a user based on clicks sequences. Using the calculated probabilities it is possible to dynamically classify a user s visit. A user s visit could be classified as buy visit, non-buy visit or wait for additional information to do it. Cadez et al. [?] use Markov models for clustering web usage data. A clustering approach based model is presented in which user clusters are calculated by learning a mixture of first- order Markov models. In this paper Markov models are used to model the potential behaviours of a user on a site in which behaviour s changes depend on both the kind of user and the pages visited. 3 Markov Models preliminaries There are two kind of probabilistic models based on Markov models: DMM (Discrete Markov Models) and HMM (Hidden Markov Models). DMM also known as Observable Markov Models are the ones in which each state corresponds to a physical or observable event, while in HMM events are probabilistic functions of each state [?]. The model presented in section 4 is a DMM. As stated in [?], a DMM can be defined by the tuple < S, A, λ >. S corresponds to the state space, A is a matrix representing transition probabilities from one state to another. λ is the initial probability distribution of the states in S. An element of the matrix A, say A[s, s ] can be interpreted as the probability of transitioning from state s to s in one step. Similarly, an element of A A will denote the probability of transitioning from one state to another in two steps, and so on. The fundamental property of Markov models is the dependency on previous states. This is a key point in the model together with order, a distinguishing feature of the Markov models [?]. Order refers to the way previous states affect a sequence of observations. Thus, a first order model is the one in which the following state depends only on the previous one in the sequence, while in higher orders, it depends on the 2, 3,..., previous states. When applying Markov models to web usage mining, first order models have shown to be not very precise, needing to be improved by higher order models. However, higher order models have limitations that restrict their use in some cases. Limitations are related to the complexity of the state-space. As the number of states increases, the cover-

5 age of the states and the transitions between them can be considerably reduced in the training set. On the other hand, the complexity of the state space can also negatively affect the model precision [?]. In [?] authors propose to combine different order Markov models to solve the coverage problem. Thus, they propose to obtain 1,2,3,... order Markov models (All-Kth-Order Markov models) using the training set. Some sequences of states will not be present in the training set. Consequently, when applying the model in order to find the following state, the most precise model (higher order) will be chosen first. If the sequence of states is not present in that model a lower order model will be chosen. In such a case, the sequence of states will have to be cut and the precision will consequently decrease. However, this approach further increases the problem of state-space complexity. There are certain approaches that can be used to reduce this complexity [?] but they may also reduce the accuracy of prediction of the resulting model. A proposed method for combining different order Markov models obtaining a global model that will reduce the state-space complexity, retaining at the same time the coverage of the All-Kth-Order Markov models and even improving accuracy of prediction is proposed in [?]. In other words, this will help reduce the state-space complexity without affecting the performance of the model. 4 Behaviour Evolution Model The behaviour of a user while navigating in a web site evolves even while in the same session. Changes in behavior can be of a different nature: pages visited, environment, mood. No matter what the reason for the change can be, what really matters is discovering when the behaviour has varied and, consequently, act. 4.1 Preliminaries We propose to build a behavior evolution model. As different users can be classified in different typologies according to their previous relationship with the site, we propose to have different evolution models for each typology. We assume that evolution patterns will firstly depend on the profile of the user. Once typologies or profiles of users are obtained using the historical data (data from previous navigations), we proceed to identify the different observable behaviour in each kind of user. In order to make the process computable, we define certain moments in which we observe the behaviour to see if it has changed. In our case, these points correspond to web pages of the site, in which changes in user behaviour can be observed. We will call these points Breaking Points (BK from now on). So the model we propose is based on three basic concepts: User typology: each possible profile that is identified by segmenting the user database taking objective information of the relationship between the user and the enterprise into account. User behavior: each possible observable behaviour of the relationship between the user and the site in each navigation. Subjective information representing user context and goals such as inter-click times, amount of time spent by a customer on the site in the past and so on are taken into account.

6 Breakpoint: point in which the behaviour of a user is analyzed to see its evolution or tendency. Fig. 1. Behavior Evolution Thus we propose to build a DMM model for each user typology using the BK s in which changes of behaviour have been observed as well as the observed behaviours themselves. The model will help to estimate not only the probability of going to the next BK but also to predict the behaviour a particular user will show when arriving at that particular BK. Thus, it is clear that the states of the Markov model we propose will be represented by the different combinations of BK s and user typologies. Figure 1 shows an example of one of the models we plan to build. 4.2 The Proposed Model Let BK = {BK 1,..., BK R } be the set of Breakpoint identified in the site. Let U = {U 1,..., U M } be the set of possible user typologies Let T i = {T 1,..., T M } the set of observed behaviours for particular typology of user T i. We define states of the first order DMM, S = {S 1,..., S N }, as all the possible combinations of BK and T i that are present in the training set. Then the maximum number of possible states of the model will be N = R M Let us illustrate the model with an example: Let us assume that we have three breaking points {BK 1, BK 2, BK 3 }, and two different kinds of behavior observed, {T 1, T 2 } for those users in typology U 1. Then, the possible states of the first order Markov model will be: S = {S 1, S 2, S 3, S 4, S 5, S 6 } (6 states), where S 1 = BK 1 T 1, S 2 = BK 1 T 2, S 3 = BK 2 T 1, S 4 = BK 2 T 2, S 5 = BK 3 T 1 and S 6 = BK 3 T 2. Assuming that BK are certain pages of the site, then the state transitions will depend on the topology of the site and on the historical navigation data. The set of historical navigation of the web log will be used to calculate the parameters of the model. That is, within certain BK s not every BK is reachable and there could be certain transitions between break points that, though possible for the site structure, will not be present in the log data. It could be the case that a given transition is present for a certain user typology but not for all typologies. In the same way, there may be transitions between behaviours which are not found in the training data set. In the example, if we assume that the set of possible transitions for the typology

7 Fig. 2. Example of a Discrete Markov Model given is: {BK 1 BK 2, BK 1 BK 3, BK 2 BK 3, BK 3 BK 3 }, we can represent the transitions graphically as in figure 2. Once the status of the model is defined, the transition matrix A can be filled with the estimated frequencies found in the training set (in this case, data about previous sessions). Having a number N of states the size of the transition matrix will be N N. Let t i,j be a cell of the matrix. The resulting matrix takes the form of the one represented in the following table: A = S 1 S 2... S N t 1,1 t 1,2... t 1,N t 2,1 t 2,2... t 2,N t N,1 t N,2... t N,N In the case of the example, the size of the matrix will be 6 6. Each cell is interpreted in the following way. For example, cellt 2,3 stores the probability of going from S 2, BK 1 T 2, to S 3, BK 2 T 1. In other words, the probability of going from BK 1 to BK 2 changing the behaviour classified as T 2, to behaviour T 1 Once we have the transition matrix, then some problems can be solved. For example, given an on-line session, we could find the probability of arriving at a certain page with a certain behaviour (state in the model). This way for example, given session {..., S L, S M }, we could find the probability of reaching state S N. To solve the query, in a first order model what matters is the previous state, in this case S M. Thus, the answer is as simple as looking for the information kept in the cell t M,N of the transition matrix, that is, P (S N /{S M }). Once the first order model is obtained and depending on the training set, higher order models can be obtained. The combination of all the models for the different user typologies will result in a model of user behaviour evolution. In the example, for order 2 model, the number of possible transitions will be V 6,2 + 2 = 30+2 = 32 (Variations without repetition, the order counts, of 6 elements taken two by two plus two transitions with repetition: {S 5, S 5 } y {S 6, S 6 } ). The matrix would have all these combinations as entry point for the rows and the simple status or observations as columns and would take the form of the following figure:

8 A 2 = {S 1, S 3 } {S 1, S 4 }... {S 6, S 6 } t 2 1,1 t 2 1,2... t 2 1,6 t 2 2,1 t 2 2,2... t 2 2, t 2 32,1 t 2 32,2... t 2 32,6 Now, problems that we were solving with the order 1 model can be solved with higher precision as not only the previous state count but also two previous states. In the example, having an on-line session {S 1, S 3 }, to find the probability of reaching BreakPoint BK 3 and then change behavior to T 2 we will have to calculate P (S 6 /{S 1, S 3 }). So we have to look for S 3 y S 1 in this order in the matrix. That is the element t 2 1,6 of A 2. As we have already mentioned in section 3 the main problem related to higher order models has to do with coverage. One simple method to overcome some of these difficulties is to train varying order Markov models and use all of them during the prediction phase. In addition to this, there are also some problems derived from the mixture of different Markov models, as it may increase the state-space complexity sometimes resulting in worse prediction accuracy and in greater search-time when finding and calculating probabilities on-line. When combining different order Markov models, it will be a requirement to minimize as much as possible the complexity of the states in order to improve the model prediction accuracy at the same time that the largest coverage of the patterns has to be kept. To do so, we propose to use the techniques of intelligent combination of Markov models proposed in [?]. Once the complexity of the model has been reduced, for an efficient on-line deployment (optimization of search time) of the proposed model, we propose to store in trees estimated frequencies for different transitions in different orders. Actually, a tree for each possible state of a 1-order model will be obtained. Branches Fig. 3. Behaviour Evolution Model Tree

9 of the tree will represent the transitions that end in this state (from leaves to root). The depth of the branches will show the order of the different transitions being estimated. Each node will store information about the frequencies of the order that corresponds to the depth of the node in the tree. This way, search in the tree will be direct access. For the example, figure 3 shows part of the tree for S 6 state. In this case, level 0 keeps information to the probability of that state λ (is the initial probability distribution of the states in S). Level 1 nodes, for example for S 3, will store t 3,6, this is the transition from S 3 to S 6. Level 2 nodes will store in the same fashion second order transitions, this way S 1 will store the value of t 2 1,6 from A 2 transition matrix. That is, the probability of transitioning from {S 1, S 3 } to S 6. 5 Obtaining the models In order to obtain the evolution behaviour model, a preprocessing stage in which historical session data is analyzed to identify user typologies, behaviours, and breaking points, is needed. Present tasks needed in the process are presented below. All of them, when related to preprocessing of logs common in any other web mining procedure, are taken for granted. In [?] a detailed description of the process to generate breakpoints can be found. 5.1 Data Preparation After the weblogs have been properly preprocessed and sessions have been obtained, information in the logs is enriched in order to obtain user behaviour typologies. In our case, logs are enriched with information related both to the context of the user and the navigation itself. To take care of this latter part in our approach, the algorithm presented in [?]) has been used. With logs enriched this way, the next step is to obtain user behaviour typologies for each possible user profile. With all these data we will have a dual process due to the fact that sessions can fall into two categories: identified and non-identified sesions (anonymous sessions). For all those anonymous sessions we assume that we have a special user typology. To identify behaviours we make use of the breakpoints calculated according to [?] and for all possible BK we calculate the possible behaviours according to navigation information and value of session [?]. Thus, according to all this information, sessions are classified and subsessions are segmented for later classification. 5.2 Model development Once sessions and subsessions are properly identified and classified, the next step is to obtain the Markov evolution behaviour model. A model of evolution of behaviour will be obtained for each user typology. The model will be the result of the combination of DMM of different orders in which < S, A, λ > has been estimated. States, S, depend on the BKs (that are common for every typology

10 in the site) and on the behaviours of each user typology. Initial probabilities of each state, λ, transition matrices of each order A, A 2,..., as well as the maximun order for which probabilities can be obtained will be calculated taking into account the original dataset already preprocessed. 6 Online application of the model Once the model of behaviour is estimated, it can be applied on-line. The process for applying the model is as follows: 1. User Typology identification.when entering the site, a user is assigned his/her typology. This, can be the one kept in the profile of the user, if this is a registered user, or the result of a classification method used for new navigators. 2. User Behaviuor Model Construction. For each event in a navigation a model is built to keep the user behavior. The model will later be used when applying a Markov behaviour model at BK s. 3. Check the behaviour at the Breaking Point. Each time a user visits a breakpoint, taking into account both the user typology and the user behaviour up to this point, the Markov model is used to estimate the possible change of behaviour and the next breakpoint that the user will probably visit. 4. Better Action Plan Determination. Considering the user typology and its behaviour model, and according to the results presented in [?], the better action plan to be followed is determined. T he dynamic nature of the web itself added to the fact that we are dealing with user typologies, user model behavior, user lifecycle models and, in general, probabilistic models based on the data being gathered on-line by the web server, requires a continuous process of refining and reviewing the models and action plans in order to keep alive the intelligent component of the CRM system. Thus, the process requires the following processes : 1. Typologies Reconstruction. Depending of the attributes upon which the typologies have been computed, it will be necessary to rebuild typologies to be adapted for the changes. This will also lead to the recalculation of the markovian models as well as, in some cases, the computation of new models of user behaviour evolution as new typologies can appear and some can disappear. 2. Refining the Markov Models. Not only typologies but transition matrices have to be recalculated as new data (sessions) are being stored by the web server. For this task, all the steps already reviewed in section 5 will have to be repeated. Due to the implicit cost of the refining process, the benefit of improving the models will have to be balanced with the cost of loosing customers because of a bad model response, so that the exact moment to refine the model is estimated.

7 Architecture Overview For the implementation of the system, a multiagent architecture based on a three-layer architecture proposed in [?] has been used.

11 7 Architecture Overview For the implementation of the system, a multiagent architecture based on a three-layer architecture proposed in [?] has been used. Figure 4 illustrates the agents involved and the interactions between them. The new Fig. 4. Multiagent architecture architecture we are proposing is composed of 4 layers: Decision Layer. This layer includes agents that make decisions depending on the information supplied by the semantic layer. There are two main kind of agents: User Agents. Represent each navigation on the site. The interaction User-Interface Agent and Interface Agent-User agent will make it possible together with the data being already stored to calculate the user model. Planning Agents or Agents of strategy. The main task of these agents is to determine the strategy to be followed in order to obtain a better relationship with the user at the same time that goals achievement is improved. They will collaborate with the Interface agents and CRM Services Provider Layer agents to elaborate the best action plan. This will depend on the problem to be solved and on the environment conditions. Semantic Layer. This layer contains agents related to the logic of the algorithms and method used. We will have different agents, each of which will specialize in the application of the different models needed for decision making process. Models will be stored in a repository from which they will be updated, deleted or improved when needed. For the latter we will have refining agents.

12 CRM Services Provider Layer. It offers an interface, which will be used by any agent asking for a service. Each agent will offer only one particular service, so that, a particular Action Plan selected for a particular session at a particular moment will involve several agents that will act, collaborate and interact among them in order to reach the proposed goals. 8 Conclusions A model for analyzing user behaviour changes has been presented. The model combines different order Markov models and integrates different user typologies. The main advantage of the model is that not only user navigation can be predicted but the behaviour shown can also be estimated. An agent architecture to deploy the model has also been proposed. A prototype of the system is under evaluation and results obtained at one of the teaching university site are promising. The presented approach can be used as basis for a personalized web site. Issues such as obtaining the breaking points by means of other complex methods, evolution of typologies, typologies life cycle analysis would improve the present method. These open issues that can be developed and addressed by multiple alternatives have been the motivation of current research for improving the proposed method and forthcoming work. 9 Acknowledgments The research has been partially supported by Universidad Politécnica de Madrid under Project WEB-RT Doctorado con Cali. References 1. Mersereau AJ Bertsimas DJ and Patel NR. Dynamic classification of online customers. In Proceedings of the SIAM International Conference on Data Mining, San Francisco, California, May. 2. D. Weld C. Anderson, P. Domingos. Relational markov models and their applications to adaptive web navigation. Proc. of The Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD2002), H. Dai and B. Mobasher. A road map to more effective web personalization: Integrating domain knowledge with web usage mining. In Proc.of the International Conference on Internet Computing 2003 (IC 03), Las Vegas, Nevada, June M. Deshpande and G. Karypis. Selective markov models for predicting web-page accesses, M. Pérez-E. Hochsztain V. Robles O.Marbán J. Peña A. Tasistro E. Menasalvas, S. Millán. Beyond user clicks: an algorithm and an agent-based architecture to discover user behavior. 1st European Web Mining Forum, Workshop at ECML/PKDD-2003, 22 September 2003, Cavtat-Dubrovnik, Croatia, Oren Etzioni. The world-wide web: Quagmire or gold mine? Communications of the ACM, 39(11):65 68, 1996.

13 7. Y. Fu, K. Sandhu, and M. Shih. Clustering of web users based on access patterns, M. Hadjimichael, O. Marbán, E. Menasalvas, S. Millan, and J.M. Peña. Subsessions: a granular approach to click path analysis. In Proceedings of IEEE Int. Conf. On Fuzzy Systems 2002 (WCCI2002), Honolulu, U.S.A., pages , May Bernardo A. Huberman, Peter L. T. Pirolli, James E. Pitkow, and Rajan M. Lukose. Strong regularities in World Wide Web surfing. Science, 280(5360):95 97, C. Meek-P. Smyth S.White I. Cadez, D. Heckerman. Visualization of navigations patterns on a web site using model-based clustering. Proc. of The Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD2000), Achim Kraiss and Gerhard Weikum. Integrated document caching and prefetching in storage hierarchies based on Markov-chain predictions. VLDB Journal: Very Large Data Bases, 7(3): , B. Mobasher, H. Dai, T. Luo, M. Nakagawa, and J. Witshire. Discovery of aggregate usage profiles for web personalization. In Proceedings of the WebKDD Workshop, O. Nasraoiu, R. Krisnapuram, and A. Joshi. Mining web access logs using a fuzzy relational clustering algorithm based on a robust estimator O. Nasraoui, H. Frigui, A. Joshi, and R. Krishnapuram. Mining web access logs using relational competitive fuzzy clustering. 15. Mike Perkowitz and Oren Etzioni. Adaptive web sites: Automatically synthesizing web pages. In AAAI/IAAI, pages , Mike Perkowitz and Oren Etzioni. Towards adaptive Web sites: conceptual framework and case study. Computer Networks (Amsterdam, Netherlands: 1999), 31(11 16): , James E. Pitkow and Peter Pirolli. Mining longest repeating subsequences to predict world wide web surfing. In USENIX Symposium on Internet Technologies and Systems, Lawrence R. Rabiner. 19. R. Sarukkai. Link prediction and path analysis using markov chains. Ninth International World Wide Web Conference, Ramesh R. Sarukkai. Link prediction and path analysis using markov chains. In Computer Networks, Volume 33, Issues 1-6, Pages C. Shahabi, A. M. Zarkesh, J. Adibi, and V. Shah. Knowledge discovery from user s webpage navigation. In Proceedings of the Seventh International Workshop on Research Issues in Data Engineering, High Performance Database Management for Large-Scale Applications (RIDE 97), Washington- Brussels - Tokyo, IEEE, pages 20 31, C. G. Thomas and G. Fischer. Using agents to personalize the web. In Proc. WI 97, Orlando, Florida, J.C. Mogul V.N. Padmanabhan. Using predictive prefetching to improve world wide web latency. Computer Communication Review, Hector Garcia-Molina Umeshwar Dayal Woon Yan, Matthew Jacobsen. From user access patterns to dynamic hypertext linking. In Fifth Intl. World Wide Web Conference, pages , May 1996.

Recommendation Models for User Accesses to Web Pages (Invited Paper)

Recommendation Models for User Accesses to Web Pages (Invited Paper) Ṣule Gündüz 1 and M. Tamer Özsu2 1 Department of Computer Science, Istanbul Technical University Istanbul, Turkey, 34390 gunduz@cs.itu.edu.tr