Prediction of user navigation patterns by mining the temporal web usage evolution

Size: px

Start display at page:

Download "Prediction of user navigation patterns by mining the temporal web usage evolution"

August Austin
6 years ago
Views:

1 Soft Comput (28) 12: DOI 1.17/s y FOCUS Prediction of user navigation patterns by mining the temporal web usage evolution Vincent S. Tseng Kawuu Weicheng Lin Jeng-Chuan Chang Published online: 23 May 27 Springer-Verlag 27 Abstract Advances in the data mining technologies have enabled the intelligent Web abilities in various applications by utilizing the hidden user behavior patterns discovered from the Web logs. Intelligent methods for discovering and predicting user s patterns is important in supporting intelligent Web applications like personalized services. Although numerous studies have been done on Web usage mining,few of them consider the temporal evolution characteristic in discovering web user s patterns. In this paper, we propose a novel data mining algorithm named Temporal N-Gram (TN- Gram) for constructing prediction models of Web user navigation by considering the temporality property in Web usage evolution. Moreover, three kinds of new measures are proposed for evaluating the temporal evolution of navigation patterns under different time periods. Through experimental evaluation on both of real-life and simulated datasets, the proposed TN-Gram model is shown to outperform other approaches like N-gram modeling in terms of prediction precision, in particular when the web user s navigating behavior changes significantly with temporal evolution. Keywords Temporal patterns Navigation patterns Data mining Personalized services V. S. Tseng (B) K. W. Lin J.-C. Chang Institute of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan, ROC tsengsm@mail.ncku.edu.tw K. W. Lin linwc@idb.csie.ncku.edu.tw J.-C. Chang invers@idb.csie.ncku.edu.tw 1 Introduction Advances in the data mining technologies have enabled the intelligent Web abilities in various applications like page recommendation, page prefetching and personalization navigation by utilizing the hidden user behavior patterns discovered from the Web logs (Borges and Levene 1999; Nanopoulos et al. 23; Padmanabhan and Mogul 1996). The behavior patterns contain a lot of useful information because the patterns directly reflect the Web site usage of users, and thus form the basis of intelligent Web development. However, discovering the patterns from the big amount of Web logs is challenging, and it is becoming an important research topic of data mining recently, namely Web Usage Mining. For the research on Web mining, numerous studies have been done on discovering the users behavior patterns in various aspects. In Tan and Kumar (22), the authors apply the association rules to the discovery of associated pageviews. An intuitive application, for example, is using the discovered associated pages to improve the Web site structure. For the linking characteristic of Web sites, several studies (Tan et al. 2) discussed the indirect association relation. The sequential pattern (Agrawal and Srikant 1995), which reveals the sequential page-views of users, was a widely discussed topic. Moreover, some studies focused on developing the clustering methods (Frias-Martinez and Karamcheti 22; Wang and Zaiane 22) to cluster the users with similar behavior or cluster the Web pages. Most of past studies assumed the Web usage patterns are invariant with time (Frias-Martinez and Karamcheti 22; Gündüz and Özsu 23a,b; Pitkow and Pirolli 1999, Srivastava et al. 2; Su et al. 2; Tan and Kumar 22) and few of them took into account the temporal characteristic or temporal evolution of Web usage. In fact, user s Web usage patterns may change with time, i.e., a Web visitor may

2 158 V. S. Tseng et al. have different behavior on the same Web at different time. For instance, students may usually search and browse professional literature at day time and visit the auction web sites at night. Obviously, the temporal evolution of Web usage patterns is an important feature that should be considered in order to construct an effective prediction models for user navigation patterns. In this paper, we propose a novel method named Temporal N-Gram (TN-Gram) for constructing prediction models of Web user navigation by considering the important factor of temporal evolution of user s navigation patterns. Moreover, three kinds of measures, namely Support-based Fundamental Rule Changes, Confidence-based Fundamental Rule Changes and Prediction Rule Changes, are proposed for evaluating the temporal evolution of navigation patterns under different time periods by utilizing the concept of fundamental rule changes proposed in Liu et al. (21). Through experimental evaluation on both of real-life and simulated datasets, our method is shown to outperform other existing algorithms like N-gram modeling in terms of the prediction when user s navigating behavior changes with time. The rest of this paper is organized as follows. Section 2 briefly reviews the research motivation and the related work about prediction methods on web usage mining. Section 3 presents the proposed prediction model. Section 4 describes three methods for measuring the change of temporality of the web log. In Sect. 5, we give the experimental result. Finally, we conclude our work in Sect Motivation and related work Three main steps in constructing a prediction model for Web usage are presented in Srivastava et al. (2). The first step is data preprocessing for cleaning the dataset and converting the Web log into a session file that contains a click-stream of page-views for each visitor. The second step is pattern discovery. A popular approach is the N-gram model (Su et al. 2), which discovers user navigation patterns that hold the order, adjacency and recency information. The third step is pattern analysis that predicts the user s next request. In Yang et al. (24) present different methods systematically for building association rule-based prediction models from web logs. They focus on three features of association rules, namely the order, adjacency, and recency. According to these features, five types of association rules are considered, namely subset rules, subsequence rules, latest subsequence rules, substring rules and latest substring rules, as shown in Table 1. The motivation of this research is to take into account the temporality, which is ignored in previous studies. Some studies have also considered other features of rules. Frias-Martinez and Karamcheti (22) propose a prediction model for user access sequences, called sequential Table 1 Comparisons of different prediciton models Order Adjacency Recency Temporality Subset Subsequence Y Kerning-1latest Y Y Subsequence Kerning-1substring Y Y Latest Substring Y Y Y Our work Y Y Y Y association rules, for capturing the sequential properties and temporality on the visited web pages. Note that the temporality discussed in Frias-Martinez and Karamcheti (22) is different from the one we target in this work. The temporality defined in Frias-Martinez and Karamcheti (22)is the time distance between the antecedent and the consequent pages, while our temporality indicates the starting access time of a user session. Gündüz and Özsu (23a,b) also consider the feature of time, but the feature denotes the time spent on the sequences of visiting pages. Since the start time of a session is the access time of the first request, it is interesting to consider the time of the latest request. Zukerman et al. propose the Time and Second-order Time Markov models that consider temporal information (Nicholson et al. 1998). The Time Markov model depends on the latest request while the Second-order Time Markov model focuses on the latest request and the referring request before it. The Markov model is a well-know model for stochastic processes (Papoulis 1991) and it has become a well-suited approach for modeling and predicting users behavior in web mining. A k th -order Markov model for navigation patterns equals a k-gram model in definition. In Nanopoulos et al. (23), Alexandros et al. collect four traditional models, namely Dependency Graph (DG) (Padmanabhan and Mogul 1996), m-order Prediction-by-Partial-Match (PPM) (Palpanas and Mendelzon 1999), Markov models, and Markov models for ordering (WM o ), where WM o is a generalization of DG and one-order PPM. In addition to various Markov models, the longest repeating subsequences model was proposed for prediction in Pitkow and Pirolli (1999). However, none of the existing studies took into consideration the factor of temporal evolution of Web usage patterns in constructing the prediction models. This motivates our research in complementing the insufficiency for the literature. 3 Proposed prediction model The proposed model for predicting temporal navigation patterns consist of two main components, namely the calendar

3 Prediction of user navigation patterns by mining the temporal web usage evolution 159 schema for representing the timing dimension and the Temporal N-gram model for predicting user s temporal navigation patterns. In the following, we describe the calendar schema and temporal N-gram model, respectively. As discussed in previous sections, the existing studies lack the consideration of temporality factor in constructing the user navigation models. To support the dimension of temporality for navigation patterns, a calendar schema is needed. Here, we use the calendar schema for rule patterns proposed by Li et al. (22) as the base. Specifically, a calendar schema C is defined as follows: C = (G n : D n, G n 1 : D n 1,...,G 1 : D 1 ). (1) G i is a granularity name like year or month, and D i is a finite subset of the valid value corresponding to G i.for example, if we want to investigate the temporal navigation patterns in units of every hour, we may design a calendar schema in form like (hour : ), (hour : 1),..., (hour : 23) with 24 segments. Before building the N-gram model for pattern discovery, it is necessary to perform data preprocessing, i.e., converting a web log into a session file. In our work, we eliminate multimedia files like gif or jpg and script files like js, and a session is considered as an abnormal access pattern if its length L is such that L < 3orL >(5 L avg ), where L avg is the average session length. An abnormal access pattern might result from a web spider or a visitor s incautious access through a search engine. For the modeling and prediction of temporal navigation patterns, we propose a new approach named Temporal N-Gram (abbreviated as TN-Gram). The proposed approach is based on the N-Gram model (Su et al. 2), and the main difference is that we utilize the calendar schema to determine the temporality of a session. Moreover, all I-gram models, which are also known as all K th Markov models, are discovered by TN-Gram. In the following, we list the two algorithms, namely the TN-Gram_Building and TN- Gram_Predict, for building the models and performing predictions, respectively. Here, a hash table H[] is produced for each calendar schema (or the granularity). For example, given a calendar schema with each hour as the granularity, twenty four hash table H[] will be produced. Besides, the term C(S) indicates the calendar schema that matches the starting time of the session S. After the TN-Gram model is built, we use a support-based mechanism to prune the mined patterns with low support. The following lists the algorithm for predicting user s navigation pattern. The prediction algorithm takes as input the built TN-Gram model and the user s active session, and it returns the predicted page the user will navigate in the next action. Algorithm TN-Gram_Building Input: L // the session file C // the calendar schema Output: H 1 [], H 2 [],..., H n [] where n = C //H i [] is the result of the n-gram model corresponding to C[i] Begin T[] := //a hash table for counting H[] := //the result table Max[] := //records the maximum count For i := 1 to L Do S:=L[i] For j := 1 to S Do P := substring(s, j, n) //P is the antecedent R := substring(s, j + 1, 1) //R is the next click T k [P, R] := T k [P, R] + 1 where C(S) C k If T k [P, R] > Max k [P] Then H k [P] := R End If End For End For Return H[] End Algorithm TN-Gram_Predict Input: H[] //generated from Algorithm Model_Building S //user s active session Output: R //the predicted item (or web page) Begin For i := S downto 1 Do If S is an index in hash table HC(S) Then R := HC(S)[S] Return R End If S := S after removing the first element End For Return No Matched Prediction End 4 Detection for changes of temporality In this research, we also propose new methods for discovering the evolution of navigation patterns. Since our prediction model is based on N-gram, we will detect the evolution of the mined navigation models. Considering the nature of temporality, we separate the original session file by calendar schemas. To find out the evolutional changes, we perform the following two basic steps: 1. Rule generation. We first mine navigation rules in each sub-dataset that is separated from the original session file according to the calendar schema. We define R, the set of navigation rules as given in Liu et al. (21). R ={r r to(r 1 UR 2 )} (2) Each rule in R is associated with a support and value. For different datasets under varied time periods, different rule sets will be discovered. 2. Identification of rule changes. After R is generated, we can identify the changes of rules between rule sets under different time periods. The two fundamental rule changes

4 16 V. S. Tseng et al. for support and addressed in Liuetal. (21)are used as our base. To address the of the prediction, we propose another measure to evaluate the prediction rule changes. 4.1 Support-based fundamental rule changes Since the support is the ratio of the pattern consisted of the antecedent and the consequence in all sessions, we have the same formula to calculate the expected support of navigation rules as association rules. Consider the rule, AB y, where both A and B are m-gram divided from the antecedent, and y is the consequence as the predictor. Let ABy be the pattern. We can intuitively assume that the support of ABy increases with the increase in support of both AB and By. The expected support is calculated by the formulas as follows. ErAB(sup t2(r)) = min sup t1(r) sup t2(rab), 1 sup t1(rab) sup t1(r) ErBy(sup t2(r)) = min sup t2(rby), 1 sup t1(rby) (4) 4.2 Confidence-based fundamental rule changes For navigation rules, the continuity is the important element although it is not considered in association rules. Hence, we have to modify the definition of expected s. As described above, considering the rule, AB y, we may investigate the relationship among A B, B y, and AB y. Here, A B means that A appears in front of B. Our assumption is that the of AB y increases if the of B y increases or the of A B decreases. It is proven as follows: sup(aby) sup(ab) = sup(b) conf(a B) sup(aby) conf(b y) sup(by) = conf(a B) conf(a B) sup(aby) sup(by) conf(b y) conf(ab y) = sup(aby) = We have the result: conf(b y) conf(ab y) (6) conf(a B) According to (4), the expected s are computed as follows: Er A B(conft2(r)) conft1(ra B) = min conft1(r) Er B y(conft2(r)) = min conft2(ra B), 1 conft1(r) conft2(rb y), 1 conft1(rb y) (3) (5) (7) (8) 4.3 Changes of prediction rule For the two kinds of rule changes described above, a change is fundamental if it can not be explained by other changes. Even though we can determine the changes of rules between two time periods, we can not conclude they will influent the of prediction. For example, assume that we have a navigation rule in one sub-dataset (time period 1) as a, b y with support and as 1 and 2%, respectively. For the same rule, assuming that the support and are 1 and 7%, respectively, in another sub-dataset (time period 2). In this case, we shall consider the changes are significant. Notice that the fundamental rule changes significantly in both support and, but the of prediction is not reduced if there exists no other rule of the same antecedence with a higher. The key point here is that we notice the difference of the subsequence instead of the support or of the rules. In rule generation, we replace R by P as P ={p p (P 1 UP 2 )} (9) Here, P denotes a prediction rule that is a navigation rule with the highest for one antecedent. Therefore, it is different from R, which is an antecedent and corresponds to the only subsequence in P. If the subsequences of the prediction rule in both sub-datasets are the same, the prediction rule is not a change. Otherwise, we determine the change with Chi-square test as follows. Consider a prediction rule in time period 1, A x with a 1 %, and the other rule in time period 2, A y with b 2 %. Meanwhile, consider the of A x in time period 2 and the of A y in time period 2. The former value is a 2 % and the later value is b 1 %. As an evidence, we have the results that a 1 > b 1 and b 2 > a 2 as shown in Table 2. We utilize Chi-square test to test the homogeneity of (a 1, b 1 ) and (a 2, b 2 ). If the null hypothesis is rejected, it is crucial for the request following A in both time periods. For this reason, we determine whether it is a change only when the homogeneity of either (a 1, b 1 )or(a 2, b 2 ) is rejected. 5 Experimental evaluation In our work, given an active session, the prediction model will predict one page as the next request of a user. We define two Table 2 Example for changes of prediction rule Confidence Time period 1 (%) Time period 2 (%) A x a 1 a 2 A y b 1 b 2

5 Prediction of user navigation patterns by mining the temporal web usage evolution 161 measures to evaluate the prediction model, namely and recall. The is defined as the ratio of requests predicted correctly to all recommendation requests. To determine the practicality, we use the recall measure as defined in Su et al. (2) as the evaluation measure. Our definition of recall is the same as the applicability, which is the ratio of predicted requests to all actual requests. In the rest of this section, we describe the two real data-sets and a simulated data, and then present the experimental results. 5.1 Datasets The first Web log we tested is NASA log from the NASA Kennedy Space Center server in Florida. It contains 1,569,898 requests and 72,198 IPs aggregated as 51,132 sessions involving 4,737 pages. The second log is Clark-Net log. Clark-Net is a a commercial Internet site provider for the Metro Baltimore-Washington DC area. The log contains 1,654,882 requests and 85,137 IP s from August 28 th, 1995 to September 3 rd, The session file contains 3,78 sessions and 13,988 pages. For these two logs, we use 3 min of access interval as the threshold to identify a session. The third log we tested is a simulated data. We assume a complete tree as a web structure with five branches for each node and the tree depth as seven. Considering the backtracking problem, we employ exponential distribution on the depth for the probability of back-tracking. After the tree is constructed, we set the property of each node as among a normal node, temporal node, or strong node. A normal node is a random node such that a user visiting it will also visit its children randomly. A temporal node is the kind of nodes that a user visits according to the temporality we define. A strong node carries strong temporal property so that a user will visit some particular child with high probability in reflecting the temporal behavior. Note that we construct the simulated user navigation model based on the node property. The simulated data contains 22,32 sessions with 19,531 pages. For all tested log data, a session file is divided into two parts for the experiments, namely the training part with 8% of the whole session file chosen randomly, and the testing part for the rest. 5.2 Experimental results We define two simple calendar schemas in our experiments named Weekdays and Hours. Weekdays calendar sets each weekday as the unit of the schema, while Hours defines the schema, (hour: 7), (hour: 8 15), (hour: 16 23) for three divisions of one day. Moreover, we use the term All to indicate the case that no calendar schema is used (i.e., traditional N-gram model). We denote FC for Fundamental rule Changes and PC for Prediction rule Changes, respectively. changes [~7]-[8~15] FC (conf/nasa) PC (NASA) FC (sup/clarknet) [8~15]-[16~23] Hours [16~23]-[~7] FC (sup/nasa) FC (conf/clarknet) PC (clarknet) Fig. 1 Changes on Hours in NASA and ClarkNet logs changes FC (/NASA) FC (support/nasa) PC (NASA) FC (/clarknet) FC (support/clarknet) PC (clarknet) Sun.-Mon. Mon.-Tue. Tue.-Wed. Wed.-Thu. Thu.-Fri. Fri.-Sat. Sat.-Sun. Fig. 2 Changes on Weekdays in NASA and ClarkNet logs Figures 1 and 2 show the results of changes for the NASA data (blue line) and ClarkNet data (red line). Although the fundamental changes show the changes between each time slot in and support, we focus on the prediction rule changes in the following discussions. Note that the prediction rule changes of Hours (solid line) are stable for both data. However, the changes are more unstable for Weekdays. In addition, the changes of Weekdays in ClarkNet data are more distinct than in NASA data. Therefore, it is interesting to note that, the of Weekdays increases more in ClarkNet data than in NASA data compared to the All model. We show the for the both data in Figs. 3 and 4. In Fig. 3, the largest gap among the three lines is 1.5%, while it becomes 4% as shown in Fig. 4 with <.5. However, the of Weekdays is less than others when the >.5, as shown in Fig All Hours Weekdays Fig. 3 The on the NASA log.6.55 All Hours Weekdays Fig. 4 The on the Clarknet log

6 162 V. S. Tseng et al. This is because the number of rules with high in Weekdays is more than in All case. In other words, the Weekdays model can capture the inherent property that is implicit when the temporality is ignored. The recall clarifies this feature as illustrated in Fig. 5. Although we have investigated the for all pageviews, some page-views may lack the property of temporality. For example, users may have temporal behavior when they visit a general homepage for variable products. However, it may not be the case when users visit a detail pageview of particular products. Hence, we are interested in the of the page-views with prediction rule changes. Figure 6 shows the and it is observed that the is more distinct than that for all page-views. Finally, we simulate a special dataset with evident Hours property in order to test different kinds of temporal properties. Figure 7 shows the on the simulated dataset under different settings of. Although the simulated dataset carries the Hours property, it is not clear whether the Hours model is a good model through Fig. 7. This is because the page-views with Hours property take up only two percent of the total data in our simulated data. However, as shown in Fig. 8, the proposed TN-gram model outperforms substantially traditional N-gram model in terms of if we consider only the temporal page-views. The experimental results show that the average value of prediction rule changes is 4 and 48% for Weekdays and Hours, respectively (the figures are not shown here due to space limitation). recall All Hours Weekdays Fig. 5 The recall on the Clarknet log All Temporal Model NASA-Weekdays NASA-Hours Clarknet-Weekdays Clarknet-Hours Fig. 6 The of the temporal model and All All Temporal Model Sim.-Weekdays Sim.-Hours Fig. 8 The on simulated data for different models 6 Conclusions Our work aims at exploring the temporality property for identifying the time period in which user s navigation patterns change significantly so as to improve the prediction. This can provide useful insight for intelligent websites in strategy planning like personalized services and marketing promotion. In this paper, we have proposed a novel method named Temporal N-Gram (TN-Gram) for constructing prediction models of Web user navigation. After the prediction model is constructed, three kinds of new measures, namely Support-based Fundamental Rule Changes, Confidence-based Fundamental Rule Changes, and Changes of Prediction Rules are used to evaluate the temporal evolution of navigation patterns. For empirical evaluation, we adopted two real datasets and we also design a simulator to generate dataset that carries the temporal navigation characteristics of users. Through experimental evaluation on both of the reallife and simulated datasets, the proposed TN-Gram method is shown to outperform other existing approaches like N-gram modeling in terms of the prediction precision. For the future work, we will apply the TN-Gram model on different kinds of web sites like popular auction sites so as to evaluate its performance and effectiveness in more details. Moreover, we will also consider the user group issue and integrate it with TN-Gram to discover more interesting patterns. Besides, since the discovered temporal evolution can be exploited in wide applications, we will apply the TN-Gram method on applications like personalized services, with the aim to enhance the richness and quality of applications in web systems. Acknowledgments This research was supported by Ministry of Economic Affairs, Taiwan, ROC, under grant no. 93-EC-17-A , and by National Science Council, Taiwan, ROC, under grant no. NSC H all 3-interval weekdays Fig. 7 The on simulated dataset by varying References Agrawal R, Srikant R (1995) Mining sequential patterns. In: Proceedings of the international conference on data engineering (ICDE), Taipei, Taiwan, March 1995 Borges J, Levene M (1999) Data mining of user navigation patterns. In: Proceedings of the workshop on web usage analysis and user profiling (WEBKDD 99), San Diego, CA, August 15, 1999, pp 31 36

7 Prediction of user navigation patterns by mining the temporal web usage evolution 163 Frias-Martinez E, Karamcheti V (22) A prediction model for user access sequences. In: Proceedings of the WEBKDD workshop: web mining for usage patterns and user profiles, ACM SIGKDD international conference on knowledge discovery and data mining, July 22 Gündüz Ş, Özsu MT (23a) A user interest model for web page navigation. In: Proceedings of international workshop on data mining for actionable knowledge (DMAK), Seoul, Korea, April 23, pp Gündüz Ş, Özsu MT (23b) A web page prediction model based on click-stream tree representation of user behavior. In: Proceedings of the ninth ACM international conference on knowledge discovery and data mining (KDD), Washington, DC, August 23, pp Li Y, Ning P, Wang XS, Jajodia S (22) Discovering calendar-based temporal association rules. J Data Knowl Eng (DKE) 44(2): Liu B, Hsu W, Ma Y (21) Discovering the set of fundamental rule changes. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining (KDD-21), San Francisco, CA, August 2 23, 21 Nanopoulos D, Katsaros, Manolopoulos Y (23) A data mining algorithm for generalized web prefetching. IEEE Transactions on Knowledge and Data Engineering Nicholson E, Zukerman I, Albrech DW (1998) A decision-theoretic approach for pre-sending information on the WWW. In: Proceedings of the fifth Pacific Rim international conference on artificial intelligence, 1998, pp Padmanabhan V, Mogul J (1996) Using predictive prefetching to improve world wide web latency. ACM SIGCOMM Computer Comm Rev 26(3) Palpanas T, Mendelzon A (1999) Web prefetching using partial match prediction. In: Proceedings of the fourth web caching workshop (WCW 99), March 1999 Papoulis A (1991) Probability, random variables, and stochastic processes. McGraw Hill, New York Pitkow J, Pirolli P (1999) Mining longest repeating subsequences to predict world wide web surfing. In: Proceedings of the USENIX symposium on Internet technologies and systems (USITS 99), October 1999 Srivastava J, Cooley R, Deshpande M, Tan P (2) Web usage mining: discovery and applications of usage patterns from web data. In: SIGKDD Explorations, ACM SIGKDD, January 2 Su Z, Yang Q, Lu Y, Zhang H (2) Whatnext: a prediction system for web requests using n-gram sequence models. In: Proceedings of the first international conference on web information systems and engineering conference, Hong Kong, June 2, pp 2 27 Tan P, Kumar V (22) Mining association patterns in web usage data. In: Proceedings of the international conference on advances in infrastructure for e-business, e-education, e-science, and e-medicine on the Internet Tan P, Kumar V, Srivastava J (2) Indirect association: mining higher order dependencies. In: Proceedings of the fourth European conference on principles and practice of knowledge discovery in databases, Lyon, France, pp Wang W, Zaiane OR (22) Clustering web sessions by sequence alignment. In: Proceedings of the third international workshop on management of information on the web in conjunction with 13th international conference on database and expert systems applications DEXA 22, Aix en Provence, France, September 2 6, pp Yang Q, Li T, Wang K (24) Building association rule based sequential classifiers for web document prediction. J Data Min Knowl Discov 8(3): Zukerman I, Albrecht DW, Nicholson AE (1999) Predicting user s request on the WWW. In: Proceedings of the seventh international conference on user modeling, 1999

WEB-LOG CLEANING FOR CONSTRUCTING SEQUENTIAL CLASSIFIERS

Applied Artificial Intelligence, 17:431 441, 2003 Copyright # 2003 Taylor & Francis 0883-9514/03 $12.00 +.00 DOI: 10.1080/08839510390219291 u WEB-LOG CLEANING FOR CONSTRUCTING SEQUENTIAL CLASSIFIERS QIANG