Part 2. Mining Patterns in Sequential Data

Similar documents
Web Usage Mining. Overview Session 1. This material is inspired from the WWW 16 tutorial entitled Analyzing Sequential User Behavior on the Web

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler

Mining Maximal Sequential Patterns without Candidate Maintenance

Sequential PAttern Mining using A Bitmap Representation

ClaSP: An Efficient Algorithm for Mining Frequent Closed Sequences

Data Mining: Concepts and Techniques. Chapter Mining sequence patterns in transactional databases

A NOVEL ALGORITHM FOR MINING CLOSED SEQUENTIAL PATTERNS

International Journal of Computer Engineering and Applications,

CS145: INTRODUCTION TO DATA MINING

Sequential Pattern Mining: A Survey on Issues and Approaches

VMSP: Efficient Vertical Mining of Maximal Sequential Patterns

An Algorithm for Mining Large Sequences in Databases

Sequential Pattern Mining Methods: A Snap Shot

Knowledge Discovery from Web Usage Data: Research and Development of Web Access Pattern Tree Based Sequential Pattern Mining Techniques: A Survey

EFFICIENT TRANSACTION REDUCTION IN ACTIONABLE PATTERN MINING FOR HIGH VOLUMINOUS DATASETS BASED ON BITMAP AND CLASS LABELS

International Journal of Scientific Research and Reviews

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA

Chapter 13, Sequence Data Mining

An Effective Process for Finding Frequent Sequential Traversal Patterns on Varying Weight Range

A Comprehensive Survey on Sequential Pattern Mining

Discover Sequential Patterns in Incremental Database

Constraint-Based Mining of Sequential Patterns over Datasets with Consecutive Repetitions

Performance evaluation of top-k sequential mining methods on synthetic and real datasets

Frequent Pattern Mining

TKS: Efficient Mining of Top-K Sequential Patterns

Lecture 10 Sequential Pattern Mining

PSEUDO PROJECTION BASED APPROACH TO DISCOVERTIME INTERVAL SEQUENTIAL PATTERN

USING FREQUENT PATTERN MINING ALGORITHMS IN TEXT ANALYSIS

Improving Efficiency of Apriori Algorithms for Sequential Pattern Mining

A Survey of Sequential Pattern Mining

CS570 Introduction to Data Mining

DISCOVERING ACTIVE AND PROFITABLE PATTERNS WITH RFM (RECENCY, FREQUENCY AND MONETARY) SEQUENTIAL PATTERN MINING A CONSTRAINT BASED APPROACH

Review Paper Approach to Recover CSGM Method with Higher Accuracy and Less Memory Consumption using Web Log Mining

Binary Sequences and Association Graphs for Fast Detection of Sequential Patterns

TEMPORAL SEQUENTIAL PATTERN IN DATA MINING TASKS

APPLYING BIT-VECTOR PROJECTION APPROACH FOR EFFICIENT MINING OF N-MOST INTERESTING FREQUENT ITEMSETS

Appropriate Item Partition for Improving the Mining Performance

International Journal of Electrical, Electronics ISSN No. (Online): and Computer Engineering 4(1): 14-19(2015)

To Enhance Projection Scalability of Item Transactions by Parallel and Partition Projection using Dynamic Data Set

SeqIndex: Indexing Sequences by Sequential Pattern Analysis

Chapter 2. Related Work

Interestingness Measurements

CS6220: DATA MINING TECHNIQUES

Non-redundant Sequential Association Rule Mining. based on Closed Sequential Patterns

SEQUENTIAL PATTERN MINING FROM WEB LOG DATA

A new algorithm for gap constrained sequence mining

Mining Frequent Patterns without Candidate Generation

Keshavamurthy B.N., Mitesh Sharma and Durga Toshniwal

Sequence Mining Automata: a New Technique for Mining Frequent Sequences Under Regular Expressions

Performance and Scalability: Apriori Implementa6on

An Efficient Algorithm for Finding the Support Count of Frequent 1-Itemsets in Frequent Pattern Mining

PTclose: A novel algorithm for generation of closed frequent itemsets from dense and sparse datasets

Sequential Pattern Mining: Concepts and Primitives. Mining Sequence Patterns in Transactional Databases

Association Rule Mining. Introduction 46. Study core 46

Tutorial on Assignment 3 in Data Mining 2009 Frequent Itemset and Association Rule Mining. Gyozo Gidofalvi Uppsala Database Laboratory

Discovering Frequent Arrangements of Temporal Intervals

Keywords: Mining frequent itemsets, prime-block encoding, sparse data

Fast Accumulation Lattice Algorithm for Mining Sequential Patterns

DATA MINING II - 1DL460

Efficient Mining of Top-K Sequential Rules

Categorization of Sequential Data using Associative Classifiers

The concept of sequential pattern mining for text

Frequent Pattern Mining. Based on: Introduction to Data Mining by Tan, Steinbach, Kumar

Pattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R,

Sampling for Sequential Pattern Mining: From Static Databases to Data Streams

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 6

Roadmap. PCY Algorithm

ISSN: (Online) Volume 2, Issue 7, July 2014 International Journal of Advance Research in Computer Science and Management Studies

Apriori Algorithm. 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke

Chapter 4: Mining Frequent Patterns, Associations and Correlations

Data Mining for Knowledge Management. Association Rules

Effective Mining Sequential Pattern by Last Position Induction

DATA MINING II - 1DL460

Fast incremental mining of web sequential patterns with PLWAP tree

Basic Concepts: Association Rules. What Is Frequent Pattern Analysis? COMP 465: Data Mining Mining Frequent Patterns, Associations and Correlations

DATA MINING II - 1DL460

FastLMFI: An Efficient Approach for Local Maximal Patterns Propagation and Maximal Patterns Superset Checking

Efficient Updating of Discovered Patterns for Text Mining: A Survey

Chapter 6: Basic Concepts: Association Rules. Basic Concepts: Frequent Patterns. (absolute) support, or, support. (relative) support, s, is the

Scalable Frequent Itemset Mining Methods

Memory issues in frequent itemset mining

Finding frequent closed itemsets with an extended version of the Eclat algorithm

CSE 5243 INTRO. TO DATA MINING

Mining Associated Ranking Patterns from Wireless Sensor Networks

Web Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India

Advance Association Analysis

Itemset Based Sequence Classification

Data Mining Part 3. Associations Rules

Mining Frequent Itemsets Along with Rare Itemsets Based on Categorical Multiple Minimum Support

FHM: Faster High-Utility Itemset Mining using Estimated Utility Co-occurrence Pruning

Monotone Constraints in Frequent Tree Mining

DATA MINING II - 1DL460

Efficient Discovery of Sets of Co-occurring Items in Event Sequences

Using a Probable Time Window for Efficient Pattern Mining in a Receptor Database

Data Mining in Bioinformatics Day 5: Frequent Subgraph Mining

Mining High Average-Utility Itemsets

An Algorithm for Frequent Pattern Mining Based On Apriori

Web page recommendation using a stochastic process model

Association Rule Discovery

Transcription:

Part 2 Mining Patterns in Sequential Data

Sequential Pattern Mining: Definition Given a set of sequences, where each sequence consists of a list of elements and each element consists of a set of items, and given a userspecified min_support threshold, sequential pattern mining is to find all of the frequent subsequences, i.e., the subsequences whose occurrence frequency in the set of sequences is no less than min_support. ~ [Agrawal & Srikant, 1995] 1 Given a set of data sequences, the problem is to discover sub-sequences that are frequent, i.e., the percentage of data sequences containing them exceeds a user-specified minimum support. ~ [Garofalakis, 1999] P. Singer, F. Lemmerich: Analyzing Sequential User Behavior on the Web 1 cited after Pei et al. 2001 2

Why Sequential Patterns? Direct Knowledge Feature Detection P. Singer, F. Lemmerich: Analyzing Sequential User Behavior on the Web 3

Notation & Terminology Data: Dataset: set of sequences Sequence: an ordered list of itemsets (events) <e 1,,e n > Itemset: an (unordered) set of items e i = {i i1,,i iz } S sub = <s 1,, s n > is a subsequence of sequence S ref = <r 1,, r n > if: i 1 < < i n : s k r ik Example: <a, (b,c), c> <a, (d,e), (b,c), (a,c)> is subsequence More Examples: Length of a sequence: # items used in the sequence (not unique): Example: length (<a,(b,c),a>) = 4 More Examples: P. Singer, F. Lemmerich: Analyzing Sequential User Behavior on the Web 4

Frequent Sequential Patterns Support sup(s) of a (sub-)sequence S in a dataset: Number of sequences in the dataset that have S as a subsequence Examples: Given a user chosen constant minsupport: Sequence S is frequent in a dataset if sup (S) minsupport Task: Find all frequent sequences in the dataset If all sequences contain exactly one event: Frequent itemset mining! P. Singer, F. Lemmerich: Analyzing Sequential User Behavior on the Web 5

Pattern Space General approach: enumerate candidates and count Problem: combinatorial explosion : Too many candidates Candidates for only 3 items: {} a b c Length 1: 3 candidates <a,a> <a,b> <a,c> <b,a> <b,b> <b,c> <c,a> <c,b> <c,c> <(a,b)> <(a,c)> <(b,c)> Length 2: 12 candidates <a,a,a> <a,(ab)> <a,(ac)> <a,a,b)> <a,a,c> <a,(bc)> <a,b,a> <a,b,c> <a,c,b> <a,c,c> <b,(ab)> <b,(a,c)> Candidates for 100 items: Length 1: 100; 100 99 Length 2: 100 100 = 14,950 2 Length 3: 100 #candidates for length i = 2 100 1 10 30 i Length 3: 46 candidates P. Singer, F. Lemmerich: Analyzing Sequential User Behavior on the Web 6

Monotonicity and Pruning If S is a subsequence of R then sup(s) is at most as large as sup(r) Monotonicity: If S is not frequent, then it is impossible that R is frequent! E.g. < a > occurs only 5 times, then <a, b> can occur at most 5 times Pruning: If we know that S is not frequent, we do not have to evaluate any supersequence of S! {} Assume b is not frequent a b c <a,a> <a,b> <a,c> <b,a> <b,b> <b,c> <c,a> <c,b> <c,c> <(a,b)> <(a,c)> <(b,c)> Length 2: only 5 candidates <a,a,a> <a,(ab)> <a,(ac)> <a,a,b)> <a,a,c> <a,(bc)> <a,b,a> <a,b,c> <a,c,b> <a,c,c> <b,(ab)> <b,(a,c)> P. Singer, F. Lemmerich: Analyzing Sequential User Behavior on the Web Length 3: only 20 candidates left 7

Apriori Algorithm (for Sequential Patterns) Evaluate pattern levelwise according to their length: Find frequent patterns with length 1 Use these to find frequent patterns with length 2 First find frequent single items At each level do: Generate candidates from frequent patterns of the last level For each pair of candidate sequences (A, B): Remove first item of A and the last item of B If these are then equal: generate a new candidate by adding the last item of b at the end of a E.g.: A = <a, (b,c), d>, B = <(b,c), (d,e)> new candidate <a, (b,c), (d,e)> More Examples: Prune the candidates (check if all subsequences are frequent) Check the remaining candidates by counting [Agrawal & Srikant, 1995] P. Singer, F. Lemmerich: Analyzing Sequential User Behavior on the Web 8

Extensions based on Apriori: Generalized Sequential Patterns (GSP): [Srikant & Agrawal 1996] Adds max/min gaps, Taxonomies for items, Efficiency improvements through hashing structures PSP: [Masseglia et al. 1998] Organizes candidates in a prefix tree Maximal Sequential Patterns using Sampling (MSPS): Sampling [Luo & Choung 2005] See Mooney / Roddick for more details [Mooney & Roddick 2013] P. Singer, F. Lemmerich: Analyzing Sequential User Behavior on the Web 9

SPaDE: Sequential Pattern Discovery using Equivalence Classes Uses a vertical data representation: [Zaki 2001] SID Time Items 1 10 a, b, d 1 15 b, d 1 20 c 2 15 a 2 20 b, c, d a SID Time 1 10 2 15 b SID Time 1 10 1 15 2 20 3 10 c SID Time 1 20 2 20 d SID Time 1 10 1 15 2 20 3 10 3 10 b, d (Original) Horizontal database layout Vertical database layout ID-lists for longer candidates are constructed from shorter candidates Exploits equivalence classes: <b> and <d> are equivalent <b, x> and <d, x> have the same support Can traverse search space with depth-first or breadth-first search P. Singer, F. Lemmerich: Analyzing Sequential User Behavior on the Web 10

Extensions based on SPaDE SPAM: Bitset representation [Ayres et al. 2002] LAPIN: [Yang & et al. 2007] Uses last position of items in sequence to reduce generated candidates LAPIN-SPAM: combines both ideas [Yang & Kitsuregawa 2005] IBM: [Savary & Zeitouni 2005] Combines several datastructures (bitsets, indices, additional tables) P. Singer, F. Lemmerich: Analyzing Sequential User Behavior on the Web 11

PrefixSpan [Pei et al. 2001] Similar idea to Frequent Pattern Growth in FIM Determine frequent single items (e.g., a, b, c, d, e): First mine all frequent sequences starting with prefix <a > Then mine all frequent sequences starting with prefix <b > Mining all frequent sequences starting with <a > does not require complete dataset! Build projected databases: Use only sequences containing a For each sequence containing a only use the part after a Given Sequence Projection to a < b, (c,d), a, (b d), e > <a, (b,d), e> <c, (a,d), b, (d,e)> <b, (de), c> <(a,d), b, (d,e)> [will be removed] More Examples: P. Singer, F. Lemmerich: Analyzing Sequential User Behavior on the Web 12

PrefixSpan (continued) Given prefix a and projected database for a: mine recursively! Mine frequent single items in projected database (e.g., b, c, d) Mine frequent sequences with prefix <a, b> Mine frequent sequences with prefix <a, c> Mine frequent sequences with prefix <(a,b)> Mine frequent sequences with prefix <(a,c)> {} Depth-First-Search a b c Examples: <a,a> <a,b> <a,c> <b,a> <b,b> <b,c> <c,a> <c,b> <c,c> <(a,b)> <(a,c)> <(b,c)> <a,a,a> <a,(ab)> <a,(ac)> <a,a,b)> <a,a,c> <a,(bc)> <a,b,a> <a,b,c> <a,c,b> <a,c,c> <b,(ab)> <b,(a,c)> P. Singer, F. Lemmerich: Analyzing Sequential User Behavior on the Web 13

Advantages of PrefixSpan Advantages compared to Apriori: No explicit candidate generation, no checking of not occuring candidates Projected databases keep shrinking Disadvantage: Construction of projected database can be costly P. Singer, F. Lemmerich: Analyzing Sequential User Behavior on the Web 14

So which algorithm should you use? All algorithm give the same result Runtime / memory usage varies Current studies are inconclusive Depends on dataset characteristics: Dense data tends to favor SPaDE-like algorithms Sparse data tends to favor PrefixSpan and variations Depends on implementations P. Singer, F. Lemmerich: Analyzing Sequential User Behavior on the Web 15

The Redundancy Problem The result set often contains many and many similar sequences Example: find frequent sequences with minsupport = 10 Assume <a, (bc), d> is frequent Then the following sequence also MUST be frequent: <a>, <b>, <c>, <a, b>, <a, c>, <a, d>, <b, d>, <c, d>, <(b,c)>, <a, (b,c)>, <a, b, d>, <a, c, d>, <(b,c), d> Presenting all these as frequent subsequences carries little additional information! P. Singer, F. Lemmerich: Analyzing Sequential User Behavior on the Web 16

Closed and Maximal Patterns Idea: Do not use all patterns, but only frequent closed sequences: all super-sequences have a smaller support frequent maximal sequences : All super-sequences are not frequent Example: Dataset <a, b, c, d, e, f> <a, c, d> <c, b, a> <b, a, (de)> <b, a, c, d, e> sup (<a,c>) = 3 frequent sup (<a,c,d>) = 3 frequent, closed sup (<a,c,d,e>) = 2 frequent, closed, max. sup (<a,c,d,e,f>) = 1 not frequent Try this example: Set of all frequent sequences can be derived from the maximal sequences Count of all frequent sequences can be derived from the closed sequences P. Singer, F. Lemmerich: Analyzing Sequential User Behavior on the Web 17

Mining Closed & Maximal Patterns In principle: can filter resulting frequent itemsets Specialized algorithms Apply pruning during the search process Much faster than mining all frequent sequences Some examples Closed: CloSpan: PrefixTree with additional pruning [Yan et al. 2003] BIDE: Memory-efficient forward/backward checking [Wang&Han 2007] ClaSP: Based on SPaDE [Gomariz et al. 2013] Maximal: AprioriAdjust: Based on Apriori [Lu & Li 2004] VMSP: Based on vertical data structures [Fournier-Viger et al. 2014] MaxSP: Inpired by PrefixSpan, maximal backward and forward extensions [Fournier-Viger et al. 2013] MSPX: approximate algorithm using samples [Luo & Chung 2005] P. Singer, F. Lemmerich: Analyzing Sequential User Behavior on the Web 18

Beyond Frequency Frequent sequence interesting sequence Example for text sequences: Most frequent sequences in Adventures from Tom Sawyer 1 : Sequence Support (in %) <and, and> 13% <and, to> 9.8% <to, and 9.1% <of, and> 8.6% Two options: Add constraints (filter) Use interestingness measures P. Singer, F. Lemmerich: Analyzing Sequential User Behavior on the Web 1 According to [Petitjean et al. 2015] 19

Constraints Item constraints: e.g., high-utility items: Sum all items in the sequence > 1000$ Length constraint: Minimum/maximum number of events/transactions Model-based constraints: Sub-/supersequences of a given sequence Gap constraints: Maximum gap between events of a sequence Time constraints: Given timestamps, maximum time between events of a sequence Closed or maximal sequences only Computation: 1. Mine all frequent patterns 2. Filter Push constraints into the algorithm P. Singer, F. Lemmerich: Analyzing Sequential User Behavior on the Web 20

Interestingness Measures and top-k Search Use interestingness measures Function that assign a numeric value (score) to each sequence Should reflect the assumed interestingness for users Desired properties: conciseness, generality, reliability, diversity, novelty, surprisingness, applicability New goal: search for the k sequences that achieve the highest score Interestingness measure also implies a ranking of the result Simple mining approach: 1. Compute all frequent patterns 2. Compute the score of each pattern P. Singer, F. Lemmerich: Analyzing Sequential User Behavior on the Web 21

Confidence Typical measure for association rule mining Can easily be adapted for sequential pattern Split sequence into a rule (e.g., with the last event as rule head) Confidence = accuracy of this rule Support = 30 A B C D Support = 20 Confidence (< A, B, D > F) = 20 30 Can be used as a constraint or as an interestingness measure P. Singer, F. Lemmerich: Analyzing Sequential User Behavior on the Web 22

Leverage Compare support of a sequence with expected support : Score S = sup S expectedsupport (S) Idea of expected support? If: Then: [Petitjean et al. 2015] frequent Formalization for 2-sequences: expectedsupport (< a, b >= frequent sup (< a, b > + sup < b, a > 2 Formalization for larger sequences generalizes this P. Singer, F. Lemmerich: Analyzing Sequential User Behavior on the Web 23 AND Is also more likely to be frequent then the average 4-sequence It should be reported only, if ist frequency exceeds expectation

Other Interestingness Measures Information theoretic approaches: [Tatti & Vreeken 2012], [Lam et al. 2014] Use minimum description length Find sequence (sets) that best explain/compress the dataset Model-based approaches [Gwadera & Crestani 2010], [Lam et al. 2014] Build a reference model (e.g., learn a markov chain model) Determine which sequences are most unlikely given that model (Compute statistical significance) Include time information P. Singer, F. Lemmerich: Analyzing Sequential User Behavior on the Web 24

Efficient top-k Sequential Pattern Mining [Petitjean et al. 2015] Example Algorithm SkOPUS: Depth First Search Pruning: Interestingness measures like leverage/confidence are not directly monotone (unlike support) E.g.: score ( <a, b, c> ) can be higher then score ( <a, b> ) Use upper bounds ( optimistic estimates ) oe(s) For each sequence S this is threshold, such that no super-sequence of S has a higher score Has to be determined for each interestingness measure separately Often easy to compute for a single interestingness measure P. Singer, F. Lemmerich: Analyzing Sequential User Behavior on the Web 25

Case Study Web Log Mining [Soares et al. 2006] Portuguese web portal for business executives: Data: 3,000 users; 70,000 session; 1.7M accesses Navigation patterns found on page level: Too many Not very useful On type level ( news, navigation ) More interesting findings P. Singer, F. Lemmerich: Analyzing Sequential User Behavior on the Web 26

Mining Web logs to Improve Website Organization Given link structure of a web page, visitor log Build sequences for each visitor Define target page [Srikant & Yang 2001] Find frequent paths to the target page Identify links that could shorten user paths P. Singer, F. Lemmerich: Analyzing Sequential User Behavior on the Web 27

Available Software Libraries Java: SPMF (most extensive library) http://www.philippe-fournier-viger.com/spmf/ Basic support in RapidMiner, KNIME R arulessequences package TraMiner package Python Multiple basic implementations The implementations for this tutorial (mainly educational, not efficient) Spark: PrefixSpan available P. Singer, F. Lemmerich: Analyzing Sequential User Behavior on the Web 28

What we did not talk about Episode mining [Mannila et al. 1997] Given long sequences: find recurring patterns Mining: candidate generation vs. pattern growth Discriminative sequential pattern Incremental mining / data streams Pattern in time-series P. Singer, F. Lemmerich: Analyzing Sequential User Behavior on the Web 29

Q U E S T I O N S? P. Singer, F. Lemmerich: Analyzing Sequential User Behavior on the Web 30

References (1/2) Agrawal, R., & Srikant, R. (1995, March). Mining sequential patterns. In Data Engineering, 1995. Proceedings of the Eleventh International Conference on (pp. 3-14). IEEE. Ayres, J., Flannick, J., Gehrke, J., & Yiu, T. (2002, July). Sequential pattern mining using a bitmap representation. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 429-435). ACM. Fournier-Viger, P., Wu, C. W., Gomariz, A., & Tseng, V. S. (2014). VMSP: Efficient vertical mining of maximal sequential patterns. In Advances in Artificial Intelligence (pp. 83-94). Springer International Publishing. Fournier-Viger, P., Wu, C. W., & Tseng, V. S. (2013). Mining maximal sequential patterns without candidate maintenance. In Advanced Data Mining and Applications (pp. 169-180). Springer Berlin Heidelberg. Garofalakis, M. N., Rastogi, R., & Shim, K. (1999, September). SPIRIT: Sequential pattern mining with regular expression constraints. In VLDB (Vol. 99, pp. 7-10). Gomariz, A., Campos, M., Marin, R., & Goethals, B. (2013). Clasp: An efficient algorithm for mining frequent closed sequences. In Advances in knowledge discovery and data mining (pp. 50-61). Springer Berlin Heidelberg. Gwadera, R., & Crestani, F. (2010). Ranking sequential patterns with respect to significance. In Advances in Knowledge Discovery and Data Mining (pp. 286-299). Springer Berlin Heidelberg. Lam, H. T., Mörchen, F., Fradkin, D., & Calders, T. (2014). Mining compressing sequential patterns. Statistical Analysis and Data Mining, 7(1), 34-52. Luo, C., & Chung, S. M. (2005, April). Efficient Mining of Maximal Sequential Patterns Using Multiple Samples. In SDM (pp. 415-426). Mannila, H., Toivonen, H., & Verkamo, A. I. (1997). Discovery of frequent episodes in event sequences. Data mining and knowledge discovery, 1(3), 259-289. Masseglia, F., Cathala, F., & Poncelet, P. (1998). The PSP approach for mining sequential patterns. In Principles of Data Mining and Knowledge Discovery (pp. 176-184). Springer Berlin Heidelberg. Mooney, C. H., & Roddick, J. F. (2013). Sequential pattern mining--approaches and algorithms. ACM Computing Surveys (CSUR), 45(2), 19. Icons in this slide set are CC0 Public Domain, taken from pixabay.com P. Singer, F. Lemmerich: Analyzing Sequential User Behavior on the Web 31

References (2/2) Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., & Hsu, M. C. (2001, April). Prefixspan: Mining sequential patterns efficiently by prefix-projected pattern growth. In icccn (p. 0215). IEEE. Soares, C., de Graaf, E., Kok, J. N., & Kosters, W. A. (2006). Sequence mining on web access logs: A case study. In Belgian/Netherlands Artificial Intelligence Conference, Namur. Srikant, R., & Agrawal, R. (1996). Mining sequential patterns: Generalizations and performance improvements (pp. 1-17). Springer Berlin Heidelberg. Srikant, R., & Yang, Y. (2001, April). Mining web logs to improve website organization. In Proceedings of the 10th international conference on World Wide Web (pp. 430-437). ACM. Tatti, N., & Vreeken, J. (2012, August). The long and the short of it: summarising event sequences with serial episodes. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 462-470). ACM. Wang, J., Han, J., & Li, C. (2007). Frequent closed sequence mining without candidate maintenance. Knowledge and Data Engineering, IEEE Transactions on, 19(8), 1042-1056. Yan, X., Han, J., & Afshar, R. (2003, May). CloSpan: Mining closed sequential patterns in large datasets. In In SDM (pp. 166-177). Yang, Z., Wang, Y., & Kitsuregawa, M. (2007). LAPIN: effective sequential pattern mining algorithms by last position induction for dense databases. In Advances in Databases: Concepts, Systems and Applications (pp. 1020-1023). Springer Berlin Heidelberg. Yang, Z., & Kitsuregawa, M. (2005, April). LAPIN-SPAM: An improved algorithm for mining sequential pattern. In Data Engineering Workshops, 2005. 21st International Conference on (pp. 1222-1222). IEEE. Zaki, M. J. (2001). SPADE: An efficient algorithm for mining frequent sequences. Machine learning, 42(1-2), 31-60. Icons in this slide set are CC0 Public Domain, taken from pixabay.com P. Singer, F. Lemmerich: Analyzing Sequential User Behavior on the Web 32