Mining more complex patterns: frequent subsequences and subgraphs. Department of Computers, Czech Technical University in Prague
|
|
- Merilyn Simpson
- 5 years ago
- Views:
Transcription
1 Mining more complex patterns: frequent subsequences and subgraphs Jiří Kléma Department of Computers, Czech Technical University in Prague
2 poutline Motivation for frequent subsequence/subgraph search applications, variance in tasks, what do we already know? connection to itemsets, what changed? larger state spaces, special approaches to special tasks make sense, complexity of canonical code words, algorithms and canonical forms GSP algorithm (Agrawal s APRIORI generalization), FreeSpan for depth-first search, graph canonical forms based on spanning trees, summary the issues covered and not covered. A4M33SAD
3 pfrequent subsequences DNA example motif discovery searches for short sequential patterns in a file of unaligned DNA or protein sequences, searches for discriminative patterns (characteristics) typical for one sequence class, unusual in the other classes, this pattern could relate with the biological/regulation function of the (protein) class, transcription factor interacts with DNA through a particular motif, frequent subsequence search is a subtask, event = nucleotide, string (no time), undirected DNA. A4M33SAD
4 pfrequent subgraphs molecular fragments example acceleration of drug development, ex.: protection of human CEM cells against an HIV infection (public data), high-throughput screening of chemical compounds (37,171 substances tested) 325 confirmed active (100% protection against infection), 877 moderately active (50-99% protection against infection), others confirmed inactive (<50% protection against infection), task: why some compounds active and others not?, where to aim future screening? Borgelt: Frequent Pattern Mining. A4M33SAD
5 pfrequent subgraphs molecular fragments example search for fragments common for the active substances find molecular substructures that frequently appear in active substances, frequent active patterns = subgraphs, search for discriminative patterns we add the requirement that patterns appear only rarely in the inactive molecules, where to aim future tests? what is the most promising pharmacophore, i.e., drug candidate? Borgelt: Frequent Pattern Mining. A4M33SAD
6 pmore complex patterns the global picture The course of presentation directed sequences undirected sequences generalized sequences connected graphs. A4M33SAD
7 pfrequent subsequences similarity to frequent itemsets first of all, similarity in task representation, the process can be intrinsically identical, but we ask different questions itemsets: which insurance contracts people arrange concurrently, sequences: how people arrange insurance contracts in their course of life, transaction representation still formally possible and helpful (universal) more factors must be concerned and stored. Transaction Items (insurance type) t 1 home, life t 2 car, home t 3 pension, life t 4 travel t 5 pension, life Customer Date (time) Items (insurance type) c home, life c travel c car, pension c car, home c pension A4M33SAD
8 pfrequent subsequences similarity to frequent itemsets secondly, similarity in terms of task solution, APRIORI property can easily be generalized for sequences: Each subsequence of a frequent sequence is frequent. the anti-monotone property can also be transformed to monotone one: No supersequence of an infrequent sequence can be frequent. this generalization further extends to general graphs, the model APRIORI-like algorithm for sequential data a direct analogy of the APRIORI algorithm for itemsets, the basic operations (informal a reminder only): 1. search for trivial frequent sequences (typically of the length 0 or 1), 2. generate candidate sequences with the length incremented by 1, 3. check for their actual support in the transaction database, 4. reduce the candidate sequence set a subset of frequent sequences of the given length is created, 5. until the frequent sequence set non-empty go to the step 2. A4M33SAD
9 pfrequent substrings a trivial APRIORI application a string a directed sequence, an equidistant step, exactly one item per transaction, events given by a symbol alphabet, a pattern is an ordered list of neighboring events, a 1... a m is a subsequence of b 1... b n iff i a 1 = b i a m = b i+m. Ex.: DNA sequence (n = 20, A = {a, g, t}) i C i L i 1 {a}, {g}, {t} {a}, {g}, {t} 2 (9 patterns) {aa}, {ga}, {gg}, {gt}, {tg}, {tt} 3 {aaa}, {gaa}, {gga},... (12 patterns) {gaa}, {ggg}, {gtt}, {tga}, {ttg} 4 {gggg}, {gttg}, {tgaa}, {ttga} {gggg}, {tgaa}, {ttga} 5 {ggggg}, {ttgaa} {ttgaa} how to check for support quickly, i.e. how to find all subsequence occurrences in a sequence? algorithms Knuth-Morris-Pratt or Boyer-Moore. A4M33SAD
10 pcanonical form for sequences a canonical (standard) code word a unique sequence representation, based on the symbol alphabet ordering, a usual (not necessary) choice: the lexicographical symbol alphabet ordering a < b < c <..., the lexicographically smallest (smaller) code word taken as canonical (bac < cab), a directed sequence the only interpretation (way of reading), each (sub)sequence is a canonical code word, an undirected sequence two possible ways of reading = two alternative code words, the routine application of lexicographical ordering is not possible, prefix property in a space of canonical code words does not hold: every prefix of a canonical word is a canonical word itself, sequence canonical form prefix canonical form bab bab ba ab cabd cabd cab bac we have to find a different way of forming code words. A4M33SAD
11 pcanonical form for undirected sequences The canonical code words with the prefix property will be formed as follows even and odd length words will be handled separately, code words are started in the middle of sequence, even length odd length sequence a m a m 1... a 2 a 1 b 1 b 2... b m 1 b m a m a m 1... a 2 a 1 a 0 b 1 b 2... b m 1 b m code word a 1 b 1 a 2 b 2... a m 1 b m 1 a m b m a 0 a 1 b 1 a 2 b 2... a m 1 b m 1 a m b m code word b 1 a 1 b 2 a 2... b m 1 a m 1 b m a m a 0 b 1 a 1 b 2 a 2... b m 1 a m 1 b m a m canonical is the lexicographically smaller code word in the table, the sequence is extended by adding a pair a m+1 b m+1 or b m+1 a m+1, one item at the front and one item at the end. an example even length odd length sequence code words sequence code words at at ta ule lue leu data atda taad rules luers leusr A4M33SAD
12 pcanonical form for undirected sequences efficiency two possible code words can be created and compared in O(m), an additional symmetry flag introduced for each sequence enables the same in O(1) m s m = (a i = b i ) i=1 the symmetry flag is maintained in constant time with s m+1 = s m (a m+1 = b m+1 ) sequence extension is permissible when the flag: if s m = true, it must be a m+1 b m+1, if s m = false, any relation between a m+1 and b m+1 is possible. sequences and symmetry flags at the beginning even length: an empty sequence, s 0 = 1, odd length: all frequent alphabet symbols, s 1 = 1, the procedure guarantees exclusively the canonical sequence extensions. A4M33SAD
13 pfrequent subsequences APRIORI application to undirected sequences consider undirected sequences, otherwise the formalization as yet a 1... a m is a subsequence of b 1... b n if: i a 1 = b i a m = b i+m, or i a 1 = b i+m a m = b i, ex.: DNA sequence (n = 20, A = {a, g, t}) i C i L i 0 {} {} 1 {a}, {g}, {t} {a}, {g}, {t} 2 {aa}, {ag}, {at}, {gg}, {gt}, {tt} {aa}, {gg}, {gt}, {tt} 3 {aaa}, {aag}, {aat}, {gag}, {gat}, {tat}, {aga}, {agg}, {agt}, {ggg}, {gtt} {ggg}, {ggt}, {tgt}, {ata}, {atg}, {att}, {gtg}, {gtt}, {ttt} 4 {aaaa}, {aaag}, {aaat}, {gaag}, {gaat}, {taat}, {gggg} {agta}, {agtg}, {ggta}, {agtt}, {tgta}, {ggtg}, {ggtt}, {tgtg},... in total 27 (1) patterns A4M33SAD
14 pa generalized subsequence definition in transactional representation Items: I = {i 1, i 2,..., i m }, itemsets: (x 1, x 2,..., x k ) I, k 1, x i I, sequences: s 1,..., s n, s i = (x 1, x 2,..., x k ) I, s i, x 1 < x 2 <... < x k, an ordered list of elements, elements = itemsets, the canonical representation: lexicographical ordering of items in each itemset, ex.: a(abc)(ac)d(cf), a simplification of the form: (x i ) x i, the sequence length l (l-sequence) given by the number of item instances (occurrences) in sequence, ex.: a(abc)(ac)d(cf) is a 9-sequence, α is a subsequence of β, β is a supersequence of α: α β α = a 1,..., a n, β = b 1,..., b m, 1 j 1 <... < j n m, i = 1... n : a i b ji, ex.: a(bc)df a(abc)(ac)d(cf), d(ab) a(abc)(ac)d(cf), non-contiguous sequential patterns, a sequence database: S = { sid 1, s 1..., sid k, s k } a set of ordered pairs a sequence identifier and a sequence. A4M33SAD
15 psubsequence search in transaction representation Support of α sequence in the database S the number of sequences s S satisfying: α s, Subsequence search in transaction representation, task definition input: S a s min minimum support, output: the complete set of frequent sequential patterns all the subsequences with or above the threshold frequency. Id Sequence 10 a(abc)(ac)d(cf) 20 (ad)c(bc)(ae) 30 (ef)(ab)(df)cb 40 eg(af)cbc Id Time Items 10 t 1 a 10 t 2 a, b, c 10 t 3 a, c 10 t 4 d 10 t 5 c, f l sequential pattern (s min =2) 3 a(bc), aba, abc, (ab)c, (ab)d, (ab)f, aca, acb, acc, adc,... 4 a(bc)a, (ab)dc,... Pei, Han et al.: PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth. A4M33SAD
16 pgsp: Generalized Sequential Patterns [Agrawal, Srikant, 1996] applies the core idea of APRIORI to sequential data, the key issue is generation of the candidate sequential patterns divided into two steps 1. join l-sequence is created by joining of two (l-1)-sequences, (l-1)-sequences can be joined when identical after removal of the first item in one and the last one in second, 2. prune skip each l-sequence which contains an infrequent (l-1)-subsequence. C L 4 3 after join after prune (ab)c, (ab)d, (ab)(cd) (ab)(cd) a(cd), (ac)e, (ab)ce b(cd), bce Agrawal, Srikant: Mining Sequential Patterns: Generalizations and Performance. A4M33SAD
17 pexample: GSP, s min =2 Id Sequence 10 (bd)cb(ac) 20 (bf)(ce)b(fg) 30 (ah)(bf)abf 40 (be)(ce)d 50 a(bd)bcb(ade) s( g ) = s( h ) = 1 < s min (skips a large portion of 92 available 2-candidates), (bd)cba s 10 (bd)cba s 50 (created from (bd)cb a dcba ), (the patterns (bd)ba, (bd)ca and bcba must also be frequent). A4M33SAD
18 papriori algorithm for sequences disadvantages the generate (join step) and test (prune step) method, the problems discussed in terms of frequent itemsets persist and intensify 1. generates a large amount of candidate patterns obvious even for 2-sequences: m m + m(m 1) 2 O(m 2 ) (for itemsets it was just the second fraction, one third or so of candidates), 2. requires a lot of database scans one scan per sequence length, the number of scans given by the max pattern length the length max( s, s S) (typically >> m), (max itemset length is m and thus m scans at most), 3. search for long sequential patterns is difficult the total amount of candidate patterns is exponential with the pattern length, (the same growth as for itemsets, however the problem max( s, s S) >> m). the disadvantages addressed by alternative methods FreeSpan and PrefixSpan algorithms as examples. A4M33SAD
19 pepisode rules association rule analogy for sequences, predict the further development of sequence with the aid of patterns, S is a sequence database, β a frequent subsequence and α is its prefix, episode rule is a probabilistic implication α postfix(β, α), if conf(α postfix(β, α)) = s(β,s) s(α,s) conf min. Id Sequence 10 (bd)cb(ac) 20 (bf)(ce)b(fg) 30 (ah)(bf)abf 40 (be)(ce)d 50 a(bd)bcb(ade) let s min = 2, conf min = 0.7 β = (bd)cba, α = (bd) conf( (bd) cba ) = 1 conf min, β = bcb, α = bc conf( bc b ) = 3 4 conf min, when dealing with a single sequence S the support definition works with constraints (adjoining) sequence elements must not be too far (MaxGap), only the minimal occurences of a sequence are considered. A4M33SAD
20 pfrequent subgraph mining: definition given: graphs G = {G 1,..., G n } with labels A = {a 1,..., a m } and minimum support s min output: the set of frequent (sub)graphs with support meeting the minimum threshold F G (s min ) = {S s G (S) s min }, common constraint connected subgraphs only, main problem to avoid redundancy when searching canonical representation of (sub)graphs, partial order of (sub)graph space, efficient pruning of the searched subgraph space, fragment repository for processed graphs. subgraph S G informally: omit some vertices and (their incident) edges, proper subgraph S G, cannot be identical with the original graph, NP-complete subgraph isomorphism problem needs to be solved to count support! A4M33SAD
21 pfrequent subgraphs: example G contains three molecules, minimum support s min = 2, 15 frequent subgraphs exist, empty graph is properly contained in all graphs by definition. Borgelt: Frequent Pattern Mining. The same for the rest of the presentation. A4M33SAD
22 ptypes of frequent subgraphs closed and maximal maximal subgraph is frequent but none of its proper supergraphs is frequent, the set of maximal (sub)graphs: M G (s min ) = {S s G (S) s min R S : s G (R) < s min }, every frequent (sub)graph has a maximal supergraph, no supergraph of a maximal (sub)graph is frequent, M G (s min ) does not preserve knowledge of support values of all frequent subgraphs, closed subgraph is frequent but none of its proper supergraphs has the same support, the set of closed (sub)graphs: C G (s min ) = {S s G (S) s min R S : s G (R) < s G (S)}, every frequent (sub)graph has a closed supergraph (with the identical support), C G (s min ) preserves knowledge of support values of all frequent subgraphs, every maximal subgraph is also closed. A4M33SAD
23 pclosed and maximal subgraphs: example G contains three molecules, s min = 2, 4 closed subgraphs exist, 2 of them are maximal too. A4M33SAD
24 ppartially ordered set of subgraphs and its search subgraph (isomorphism) relationship defines a partial order on subgraphs Hasse diagram exists, the empty graph makes its infimum, no natural supremum exists, diagram can be completely searched top-down from the empty graph, branching factor is large, the depth-first search is usually preferable. the main problem a (sub)graph can be grown in several different ways, diagram must be turned into a tree each subgraph has a unique parent. A4M33SAD
25 ppartially ordered set of subgraphs and its search Searching for frequent (sub)graphs in the subgraph tree, base loop traverse all possible vertex attributes (their unique parent is the empty graph), recursively process all vertex attributes that are frequent, Recursive processing for a given frequent (sub)graph S generate all extensions R of S by an edge or by an edge and a vertex edge addition (u, v) E S, u V S v V S, if u V S v V S, the missing node is added too, S must be the unique parent of R, if R is frequent, further extend it, otherwise STOP. A4M33SAD
26 passigning unique parents How can we formally define the set of parents of subgraph S? subgraphs that contain exactly one edge less than the subgraph S, in other words, all the maximal proper subgraphs, canonical (unique) parent p c (S) of subgraph S an order on the edges of the (sub)graph S must be given before, let e be the last edge in the order whose removal does not diconnect S then p c (S) is the graph S without the edge e, if e leads to a leaf, we can remove it along with the created isolated node, if e is the only edge of S, we also need an order of the nodes, in order to define an order of the edges we will rely on a canonical form of (sub)graphs each (sub)graph is described by a code word, it unambiguously identifies the (sub)graph (up to automorphism = symmetries), having multiple code words per graph one of them is (lexicographically) singled out as the canonical code word. A4M33SAD
27 pcanonical graph representation Basic idea the characters of the code word describe the edges of the graph, vertex labels need not be unique, they must be endowed with unique labels (numbers), usual requirement on canonical form prefix property, which means that... when the last edge e is removed, the canonical word of the canonical parent originates, assuming the prefix property holds, search algorithm takes the canonical word of a parent and generates all possible extensions by an edge (and maybe a vertex), checks whether the extended code words are the canonical code words, consequence: easy and non-redundant access to children, the most common canonical forms spanning tree, adjacency matrix. A4M33SAD
28 pcanonical forms based on spanning trees Graph code word is created when constructing a spanning tree of the graph numbering the vertices in the order in which they are visited, describing each edge by the numbers of incident vertices, the edge and vertex labels, listing the edge descriptions in the order in which the edges are visited (edges closing cycles may need special treatment), the most common ways of constructing a spanning tree are search: depth-first breadth-first, both approaches ask for their own way of code word construction, one graph may be described by a large number of code words a graph has multiple spanning trees and its traversals (initial vertex, branching options), the main problem is to find the lexicographically smallest = canonical word quickly. A4M33SAD
29 pcanonical forms based on spanning trees A precedence order of labels is introduced due to efficiency, frequency of labels shall be concerned, vertex labels are recommended to be in ascending order, Regular expressions for code words depth-first: a (i d i s b a) m, (exception: indices in decreasing order) meaning of symbols: n the number of vertices of the graph, m the number of edges of the graph, i s index of the source vertex of an edge, i s {0,..., n 1}, i d index of the destination vertex of an edge, i d {0,..., n 1}, a the attribute of a vertex, b the attribute of an edge. A4M33SAD
30 pcanonical spanning tree: example Order of labels elements (vertices): S N O C, bonds (edges): =, A code word A: S 10-N 21-O 31-C 43-C 54-O 64=O 73-C 87-C 80-C A4M33SAD
31 precursive checking for canonical form traverse all vertices with a label no less than the current spanning tree root vertex, recursively add edges, compare the code word with the checked one (potentially canonical) if the new edge description is larger, the edge can be skipped (backtrack), if the new edge description is smaller, the checked code word is not canonical, if the new edge description is equal, the rest of the code word is processed recursively. A: S 10-N 21-O 31-C 43-C 54-O 64=O 73-C 87-C 80-C A4M33SAD
32 prestricted extensions of canonical words Principle of recursive search of subgraph tree generate all possible extensions of a given canonical code word (of a frequent parent), extension = add the description of an edge at the end of the code word, canonical form is checked, if met then proceed recursively, otherwise the word is discarded, how to verify efficiently whether a word is canonical? in general, a lex. smaller word with the same root vertex needs to be found, simple local rules exist that reject extensions locally = immediately only certain vertices are extendable, certain cycles cannot be closed, they represent necessary canonicity conditions, not sufficient. A4M33SAD
33 prestricted extensions of canonical words Depth-first: rightmost path extension extendable vertices must be on the rightmost path of the spanning tree (other vertices cannot be extended in the given search-tree branch), if the source vertex of the new edge is not a leaf, the edge description must not precede the description of the downward edge on the path (the edge attribute must be no less than the edge attribute of the downward edge, if it is equal, the attribute of its destination vertex must be no less than the attribute of the downward edge s destination vertex), edges closing cycles must start at an extendable vertex, must lead to the rightmost leaf (a subgraph has only one vertex meeting the condition), the index of the source vertex must precede the index of the source vertex of any edge already incident to the rightmost leaf. A4M33SAD
34 prestricted extensions: examples Extendability vertices: 0, 1, 3, 7, 8, edges closing cycles: none, Extension: attach a single bond carbon atom at the leftmost oxygen atom A: S 10-N 21-O 31-C 43-C 54-O 64=O 73-C 87-C 80-C 92-C S 10-N 21-O 32-C A4M33SAD
35 pfrequent subgraphs: use of restricted extensions Crossed-arc extension gets immediately rejected, bottom-right graph is visited from its upper-right canonical parent (prefix) only, see the canonical code words: upper left: S 10-N 21-O 31-C 43-C 54-O 64=O 73-C 87-C 80-C bottom left: S 10-N 21-O 31-C 43-C 54-O 64=O 73-C 87-C 80-C 90-O upper right: S 10-N 21-O 32-C 41-C 54-C 65-O 75=O 84-C 98-C bottom right: S 10-N 21-O 32-C 41-C 54-C 65-O 75=O 84-C 98-C 90-C A4M33SAD
36 pfrequent subgraphs search with canonical form Start with a single seed vertex, add an edge (and maybe a vertex) in each step (restricted extensions), determine the support and prune infrequent (sub)graphs (outside the code word space), check for canonical form and prune (sub)graphs with non-canonical code words. A4M33SAD
37 padditional issues Fragment repository canonical code words represent the dominant approach to redundancy reduction, an alternative is to store already processed subgraphs, they are not processed again, key efficiency issues: memory, fast access (hash), extensions for molecules frequent molecular fragments processed en bloc, ring mining, carbon chains and wildcard vertices, single graph only distinct definition of support (more complex), trees ordered unordered, rooted unrooted, in general easier than unrestricted graphs. A4M33SAD
38 pfrequent subsequence/subgraph mining summary Problem closely related to frequent itemset mining APRIORI property, however, to avoid redundancy during search gets more difficult larger branching factor, unlike complex patterns, itemsets have no internal structure, non-trivial canonical sequence/graph representation guarantees that support is counted at most once (additional necessary condition is parental support), choice of representation related with choice of searching algorithm, prefix property allows for early rejection of non-canonical candidates, the canonical form based on spanning trees introduced, support checking gets more time consuming e.g., subgraph isomorphism test for two graphs is NP-complete, special pattern types get dedicated treatment e.g., generalized sequences with gap constraints. A4M33SAD
39 precommended reading, lecture resources :: Reading Agrawal, Srikant: Mining Sequential Patterns. Agrawal, Srikant: MSPs: Generalizations and Performance. from APRIORI towards its sequential versions AprioriAll and GSP, Mannila et al.: Discovery of Frequent Episodes in Event Sequences. episodal rules, Borgelt: Frequent Pattern Mining. both frequent subsequence and subgraph mining, detailed slides, A4M33SAD
Frequent Pattern Mining
Frequent Pattern Mining...3 Frequent Pattern Mining Frequent Patterns The Apriori Algorithm The FP-growth Algorithm Sequential Pattern Mining Summary 44 / 193 Netflix Prize Frequent Pattern Mining Frequent
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING Sequence Data: Sequential Pattern Mining Instructor: Yizhou Sun yzsun@cs.ucla.edu November 27, 2017 Methods to Learn Vector Data Set Data Sequence Data Text Data Classification
More informationBBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler
BBS654 Data Mining Pinar Duygulu Slides are adapted from Nazli Ikizler 1 Sequence Data Sequence Database: Timeline 10 15 20 25 30 35 Object Timestamp Events A 10 2, 3, 5 A 20 6, 1 A 23 1 B 11 4, 5, 6 B
More informationData Mining in Bioinformatics Day 5: Frequent Subgraph Mining
Data Mining in Bioinformatics Day 5: Frequent Subgraph Mining Chloé-Agathe Azencott & Karsten Borgwardt February 18 to March 1, 2013 Machine Learning & Computational Biology Research Group Max Planck Institutes
More informationData Mining: Concepts and Techniques. Chapter Mining sequence patterns in transactional databases
Data Mining: Concepts and Techniques Chapter 8 8.3 Mining sequence patterns in transactional databases Jiawei Han and Micheline Kamber Department of Computer Science University of Illinois at Urbana-Champaign
More informationAdvance Association Analysis
Advance Association Analysis 1 Minimum Support Threshold 3 Effect of Support Distribution Many real data sets have skewed support distribution Support distribution of a retail data set 4 Effect of Support
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Sequence Data Instructor: Yizhou Sun yzsun@ccs.neu.edu November 22, 2015 Announcement TRACE faculty survey myneu->self service tab Homeworks HW5 will be the last homework
More informationLecture 10 Sequential Pattern Mining
Lecture 10 Sequential Pattern Mining Zhou Shuigeng June 3, 2007 Outline Sequence data Sequential patterns Basic algorithm for sequential pattern mining Advanced algorithms for sequential pattern mining
More informationMining Frequent Patterns without Candidate Generation
Mining Frequent Patterns without Candidate Generation Outline of the Presentation Outline Frequent Pattern Mining: Problem statement and an example Review of Apriori like Approaches FP Growth: Overview
More informationChapter 13, Sequence Data Mining
CSI 4352, Introduction to Data Mining Chapter 13, Sequence Data Mining Young-Rae Cho Associate Professor Department of Computer Science Baylor University Topics Single Sequence Mining Frequent sequence
More informationOn Canonical Forms for Frequent Graph Mining
n anonical Forms for Frequent Graph Mining hristian Borgelt School of omputer Science tto-von-guericke-university of Magdeburg Universitätsplatz 2, D-39106 Magdeburg, Germany Email: borgelt@iws.cs.uni-magdeburg.de
More informationPattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42
Pattern Mining Knowledge Discovery and Data Mining 1 Roman Kern KTI, TU Graz 2016-01-14 Roman Kern (KTI, TU Graz) Pattern Mining 2016-01-14 1 / 42 Outline 1 Introduction 2 Apriori Algorithm 3 FP-Growth
More informationData Mining Part 3. Associations Rules
Data Mining Part 3. Associations Rules 3.2 Efficient Frequent Itemset Mining Methods Fall 2009 Instructor: Dr. Masoud Yaghini Outline Apriori Algorithm Generating Association Rules from Frequent Itemsets
More informationKnowledge Discovery from Web Usage Data: Research and Development of Web Access Pattern Tree Based Sequential Pattern Mining Techniques: A Survey
Knowledge Discovery from Web Usage Data: Research and Development of Web Access Pattern Tree Based Sequential Pattern Mining Techniques: A Survey G. Shivaprasad, N. V. Subbareddy and U. Dinesh Acharya
More informationLecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R,
Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R, statistics foundations 5 Introduction to D3, visual analytics
More informationDATA MINING II - 1DL460
DATA MINING II - 1DL460 Spring 2013 " An second class in data mining http://www.it.uu.se/edu/course/homepage/infoutv2/vt13 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,
More informationCS570 Introduction to Data Mining
CS570 Introduction to Data Mining Frequent Pattern Mining and Association Analysis Cengiz Gunay Partial slide credits: Li Xiong, Jiawei Han and Micheline Kamber George Kollios 1 Mining Frequent Patterns,
More informationApriori Algorithm. 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke
Apriori Algorithm For a given set of transactions, the main aim of Association Rule Mining is to find rules that will predict the occurrence of an item based on the occurrences of the other items in the
More informationData Mining Techniques
Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 16: Association Rules Jan-Willem van de Meent (credit: Yijun Zhao, Yi Wang, Tan et al., Leskovec et al.) Apriori: Summary All items Count
More informationAssociation Pattern Mining. Lijun Zhang
Association Pattern Mining Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction The Frequent Pattern Mining Model Association Rule Generation Framework Frequent Itemset Mining Algorithms
More informationgspan: Graph-Based Substructure Pattern Mining
University of Illinois at Urbana-Champaign February 3, 2017 Agenda What motivated the development of gspan? Technical Preliminaries Exploring the gspan algorithm Experimental Performance Evaluation Introduction
More informationData mining, 4 cu Lecture 8:
582364 Data mining, 4 cu Lecture 8: Graph mining Spring 2010 Lecturer: Juho Rousu Teaching assistant: Taru Itäpelto Frequent Subgraph Mining Extend association rule mining to finding frequent subgraphs
More informationChapter 4: Association analysis:
Chapter 4: Association analysis: 4.1 Introduction: Many business enterprises accumulate large quantities of data from their day-to-day operations, huge amounts of customer purchase data are collected daily
More informationGenome Reconstruction: A Puzzle with a Billion Pieces Phillip E. C. Compeau and Pavel A. Pevzner
Genome Reconstruction: A Puzzle with a Billion Pieces Phillip E. C. Compeau and Pavel A. Pevzner Outline I. Problem II. Two Historical Detours III.Example IV.The Mathematics of DNA Sequencing V.Complications
More informationData Mining for Knowledge Management. Association Rules
1 Data Mining for Knowledge Management Association Rules Themis Palpanas University of Trento http://disi.unitn.eu/~themis 1 Thanks for slides to: Jiawei Han George Kollios Zhenyu Lu Osmar R. Zaïane Mohammad
More informationSupplementary Table 1. Data collection and refinement statistics
Supplementary Table 1. Data collection and refinement statistics APY-EphA4 APY-βAla8.am-EphA4 Crystal Space group P2 1 P2 1 Cell dimensions a, b, c (Å) 36.27, 127.7, 84.57 37.22, 127.2, 84.6 α, β, γ (
More informationChapter 6: Association Rules
Chapter 6: Association Rules Association rule mining Proposed by Agrawal et al in 1993. It is an important data mining model. Transaction data (no time-dependent) Assume all data are categorical. No good
More informationFrequent Pattern Mining. Based on: Introduction to Data Mining by Tan, Steinbach, Kumar
Frequent Pattern Mining Based on: Introduction to Data Mining by Tan, Steinbach, Kumar Item sets A New Type of Data Some notation: All possible items: Database: T is a bag of transactions Transaction transaction
More informationChapter 6: Basic Concepts: Association Rules. Basic Concepts: Frequent Patterns. (absolute) support, or, support. (relative) support, s, is the
Chapter 6: What Is Frequent ent Pattern Analysis? Frequent pattern: a pattern (a set of items, subsequences, substructures, etc) that occurs frequently in a data set frequent itemsets and association rule
More informationAn Algorithm for Mining Large Sequences in Databases
149 An Algorithm for Mining Large Sequences in Databases Bharat Bhasker, Indian Institute of Management, Lucknow, India, bhasker@iiml.ac.in ABSTRACT Frequent sequence mining is a fundamental and essential
More informationA Graph-Based Approach for Mining Closed Large Itemsets
A Graph-Based Approach for Mining Closed Large Itemsets Lee-Wen Huang Dept. of Computer Science and Engineering National Sun Yat-Sen University huanglw@gmail.com Ye-In Chang Dept. of Computer Science and
More informationInduction of Association Rules: Apriori Implementation
1 Induction of Association Rules: Apriori Implementation Christian Borgelt and Rudolf Kruse Department of Knowledge Processing and Language Engineering School of Computer Science Otto-von-Guericke-University
More informationChapter 4: Mining Frequent Patterns, Associations and Correlations
Chapter 4: Mining Frequent Patterns, Associations and Correlations 4.1 Basic Concepts 4.2 Frequent Itemset Mining Methods 4.3 Which Patterns Are Interesting? Pattern Evaluation Methods 4.4 Summary Frequent
More informationMining Association Rules in Large Databases
Mining Association Rules in Large Databases Association rules Given a set of transactions D, find rules that will predict the occurrence of an item (or a set of items) based on the occurrences of other
More informationCanonical Forms for Frequent Graph Mining
Canonical Forms for Frequent Graph Mining Christian Borgelt Dept. of Knowledge Processing and Language Engineering Otto-von-Guericke-University of Magdeburg borgelt@iws.cs.uni-magdeburg.de Summary. A core
More informationPart 2. Mining Patterns in Sequential Data
Part 2 Mining Patterns in Sequential Data Sequential Pattern Mining: Definition Given a set of sequences, where each sequence consists of a list of elements and each element consists of a set of items,
More informationMINING FREQUENT MAX AND CLOSED SEQUENTIAL PATTERNS
MINING FREQUENT MAX AND CLOSED SEQUENTIAL PATTERNS by Ramin Afshar B.Sc., University of Alberta, Alberta, 2000 THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE
More informationChapter 7: Frequent Itemsets and Association Rules
Chapter 7: Frequent Itemsets and Association Rules Information Retrieval & Data Mining Universität des Saarlandes, Saarbrücken Winter Semester 2013/14 VII.1&2 1 Motivational Example Assume you run an on-line
More informationFrequent Pattern Mining
Frequent Pattern Mining How Many Words Is a Picture Worth? E. Aiden and J-B Michel: Uncharted. Reverhead Books, 2013 Jian Pei: CMPT 741/459 Frequent Pattern Mining (1) 2 Burnt or Burned? E. Aiden and J-B
More informationKnowledge Discovery in Databases II Winter Term 2015/2016. Optional Lecture: Pattern Mining & High-D Data Mining
Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases II Winter Term 2015/2016 Optional Lecture: Pattern Mining
More informationData Mining in Bioinformatics Day 3: Graph Mining
Graph Mining and Graph Kernels Data Mining in Bioinformatics Day 3: Graph Mining Karsten Borgwardt & Chloé-Agathe Azencott February 6 to February 17, 2012 Machine Learning and Computational Biology Research
More informationCombining Ring Extensions and Canonical Form Pruning
Combining Ring Extensions and Canonical Form Pruning Christian Borgelt European Center for Soft Computing c/ Gonzalo Gutiérrez Quirós s/n, 00 Mieres, Spain christian.borgelt@softcomputing.es Abstract.
More informationSequential Pattern Mining: Concepts and Primitives. Mining Sequence Patterns in Transactional Databases
32 Chapter 8 Mining Stream, Time-Series, and Sequence Data 8.3 Mining Sequence Patterns in Transactional Databases A sequence database consists of sequences of ordered elements or events, recorded with
More informationData warehouse and Data Mining
Data warehouse and Data Mining Lecture No. 14 Data Mining and its techniques Naeem A. Mahoto Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology
More informationAssociation Rule Mining. Introduction 46. Study core 46
Learning Unit 7 Association Rule Mining Introduction 46 Study core 46 1 Association Rule Mining: Motivation and Main Concepts 46 2 Apriori Algorithm 47 3 FP-Growth Algorithm 47 4 Assignment Bundle: Frequent
More informationby the Genevestigator program (www.genevestigator.com). Darker blue color indicates higher gene expression.
Figure S1. Tissue-specific expression profile of the genes that were screened through the RHEPatmatch and root-specific microarray filters. The gene expression profile (heat map) was drawn by the Genevestigator
More informationPerformance and Scalability: Apriori Implementa6on
Performance and Scalability: Apriori Implementa6on Apriori R. Agrawal and R. Srikant. Fast algorithms for mining associa6on rules. VLDB, 487 499, 1994 Reducing Number of Comparisons Candidate coun6ng:
More informationTutorial on Assignment 3 in Data Mining 2009 Frequent Itemset and Association Rule Mining. Gyozo Gidofalvi Uppsala Database Laboratory
Tutorial on Assignment 3 in Data Mining 2009 Frequent Itemset and Association Rule Mining Gyozo Gidofalvi Uppsala Database Laboratory Announcements Updated material for assignment 3 on the lab course home
More informationAssociation rule mining
Association rule mining Association rule induction: Originally designed for market basket analysis. Aims at finding patterns in the shopping behavior of customers of supermarkets, mail-order companies,
More informationBCB 713 Module Spring 2011
Association Rule Mining COMP 790-90 Seminar BCB 713 Module Spring 2011 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Outline What is association rule mining? Methods for association rule mining Extensions
More informationA NOVEL ALGORITHM FOR MINING CLOSED SEQUENTIAL PATTERNS
A NOVEL ALGORITHM FOR MINING CLOSED SEQUENTIAL PATTERNS ABSTRACT V. Purushothama Raju 1 and G.P. Saradhi Varma 2 1 Research Scholar, Dept. of CSE, Acharya Nagarjuna University, Guntur, A.P., India 2 Department
More informationData Warehousing and Data Mining
Data Warehousing and Data Mining Lecture 3 Efficient Cube Computation CITS3401 CITS5504 Wei Liu School of Computer Science and Software Engineering Faculty of Engineering, Computing and Mathematics Acknowledgement:
More informationGraph and Digraph Glossary
1 of 15 31.1.2004 14:45 Graph and Digraph Glossary A B C D E F G H I-J K L M N O P-Q R S T U V W-Z Acyclic Graph A graph is acyclic if it contains no cycles. Adjacency Matrix A 0-1 square matrix whose
More informationAssociation Rules. A. Bellaachia Page: 1
Association Rules 1. Objectives... 2 2. Definitions... 2 3. Type of Association Rules... 7 4. Frequent Itemset generation... 9 5. Apriori Algorithm: Mining Single-Dimension Boolean AR 13 5.1. Join Step:...
More informationGraph Mining: Repository vs. Canonical Form
Graph Mining: Repository vs. Canonical Form Christian Borgelt and Mathias Fiedler European Center for Soft Computing c/ Gonzalo Gutiérrez Quirós s/n, 336 Mieres, Spain christian.borgelt@softcomputing.es,
More informationAssociation Rule Mining
Association Rule Mining Generating assoc. rules from frequent itemsets Assume that we have discovered the frequent itemsets and their support How do we generate association rules? Frequent itemsets: {1}
More informationANU MLSS 2010: Data Mining. Part 2: Association rule mining
ANU MLSS 2010: Data Mining Part 2: Association rule mining Lecture outline What is association mining? Market basket analysis and association rule examples Basic concepts and formalism Basic rule measurements
More informationAssociation rules. Marco Saerens (UCL), with Christine Decaestecker (ULB)
Association rules Marco Saerens (UCL), with Christine Decaestecker (ULB) 1 Slides references Many slides and figures have been adapted from the slides associated to the following books: Alpaydin (2004),
More informationSequential Pattern Mining A Study
Sequential Pattern Mining A Study S.Vijayarani Assistant professor Department of computer science Bharathiar University S.Deepa M.Phil Research Scholar Department of Computer Science Bharathiar University
More informationCSE 5243 INTRO. TO DATA MINING
CSE 543 INTRO. TO DATA MINING Advanced Frequent Pattern Mining & Locality Sensitive Hashing Huan Sun, CSE@The Ohio State University Slides adapted from Prof. Jiawei Han @UIUC, Prof. Srinivasan Parthasarathy
More informationFundamental Data Mining Algorithms
2018 EE448, Big Data Mining, Lecture 3 Fundamental Data Mining Algorithms Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html REVIEW What is Data
More informationMonotone Constraints in Frequent Tree Mining
Monotone Constraints in Frequent Tree Mining Jeroen De Knijf Ad Feelders Abstract Recent studies show that using constraints that can be pushed into the mining process, substantially improves the performance
More informationData Structure. IBPS SO (IT- Officer) Exam 2017
Data Structure IBPS SO (IT- Officer) Exam 2017 Data Structure: In computer science, a data structure is a way of storing and organizing data in a computer s memory so that it can be used efficiently. Data
More informationMining Minimal Contrast Subgraph Patterns
Mining Minimal Contrast Subgraph Patterns Roger Ming Hieng Ting James Bailey Abstract In this paper, we introduce a new type of contrast pattern, the minimal contrast subgraph. It is able to capture structural
More informationLecture 2 Wednesday, August 22, 2007
CS 6604: Data Mining Fall 2007 Lecture 2 Wednesday, August 22, 2007 Lecture: Naren Ramakrishnan Scribe: Clifford Owens 1 Searching for Sets The canonical data mining problem is to search for frequent subsets
More informationRoadmap. PCY Algorithm
1 Roadmap Frequent Patterns A-Priori Algorithm Improvements to A-Priori Park-Chen-Yu Algorithm Multistage Algorithm Approximate Algorithms Compacting Results Data Mining for Knowledge Management 50 PCY
More informationData Structures. Notes for Lecture 14 Techniques of Data Mining By Samaher Hussein Ali Association Rules: Basic Concepts and Application
Data Structures Notes for Lecture 14 Techniques of Data Mining By Samaher Hussein Ali 2009-2010 Association Rules: Basic Concepts and Application 1. Association rules: Given a set of transactions, find
More informationMining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach
Data Mining and Knowledge Discovery, 8, 53 87, 2004 c 2004 Kluwer Academic Publishers. Manufactured in The Netherlands. Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach
More informationFinding Frequent Patterns Using Length-Decreasing Support Constraints
Finding Frequent Patterns Using Length-Decreasing Support Constraints Masakazu Seno and George Karypis Department of Computer Science and Engineering University of Minnesota, Minneapolis, MN 55455 Technical
More information2. Discovery of Association Rules
2. Discovery of Association Rules Part I Motivation: market basket data Basic notions: association rule, frequency and confidence Problem of association rule mining (Sub)problem of frequent set mining
More informationSubgroup Mining (with a brief intro to Data Mining)
Subgroup Mining (with a brief intro to Data Mining) Paulo J Azevedo INESC-Tec, HASLab Departamento de Informática 1 Expectations 2 Knowledge Discovery (non trivial relation between data elements) in DataBases
More informationEffectiveness of Freq Pat Mining
Effectiveness of Freq Pat Mining Too many patterns! A pattern a 1 a 2 a n contains 2 n -1 subpatterns Understanding many patterns is difficult or even impossible for human users Non-focused mining A manager
More informationGraph Algorithms in Bioinformatics
Graph Algorithms in Bioinformatics Computational Biology IST Ana Teresa Freitas 2015/2016 Sequencing Clone-by-clone shotgun sequencing Human Genome Project Whole-genome shotgun sequencing Celera Genomics
More informationAssociation Analysis: Basic Concepts and Algorithms
5 Association Analysis: Basic Concepts and Algorithms Many business enterprises accumulate large quantities of data from their dayto-day operations. For example, huge amounts of customer purchase data
More informationAssociation Rule Mining. Entscheidungsunterstützungssysteme
Association Rule Mining Entscheidungsunterstützungssysteme Frequent Pattern Analysis Frequent pattern: a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set
More informationA Study on Mining of Frequent Subsequences and Sequential Pattern Search- Searching Sequence Pattern by Subset Partition
A Study on Mining of Frequent Subsequences and Sequential Pattern Search- Searching Sequence Pattern by Subset Partition S.Vigneswaran 1, M.Yashothai 2 1 Research Scholar (SRF), Anna University, Chennai.
More informationPyramidal and Chiral Groupings of Gold Nanocrystals Assembled Using DNA Scaffolds
Pyramidal and Chiral Groupings of Gold Nanocrystals Assembled Using DNA Scaffolds February 27, 2009 Alexander Mastroianni, Shelley Claridge, A. Paul Alivisatos Department of Chemistry, University of California,
More informationThe Relation of Closed Itemset Mining, Complete Pruning Strategies and Item Ordering in Apriori-based FIM algorithms (Extended version)
The Relation of Closed Itemset Mining, Complete Pruning Strategies and Item Ordering in Apriori-based FIM algorithms (Extended version) Ferenc Bodon 1 and Lars Schmidt-Thieme 2 1 Department of Computer
More informationUSING FREQUENT PATTERN MINING ALGORITHMS IN TEXT ANALYSIS
INFORMATION SYSTEMS IN MANAGEMENT Information Systems in Management (2017) Vol. 6 (3) 213 222 USING FREQUENT PATTERN MINING ALGORITHMS IN TEXT ANALYSIS PIOTR OŻDŻYŃSKI, DANUTA ZAKRZEWSKA Institute of Information
More informationData Mining: Concepts and Techniques. Chapter 5. SS Chung. April 5, 2013 Data Mining: Concepts and Techniques 1
Data Mining: Concepts and Techniques Chapter 5 SS Chung April 5, 2013 Data Mining: Concepts and Techniques 1 Chapter 5: Mining Frequent Patterns, Association and Correlations Basic concepts and a road
More informationAN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE
AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE Vandit Agarwal 1, Mandhani Kushal 2 and Preetham Kumar 3
More informationCSE 417 Dynamic Programming (pt 5) Multiple Inputs
CSE 417 Dynamic Programming (pt 5) Multiple Inputs Reminders > HW5 due Wednesday Dynamic Programming Review > Apply the steps... optimal substructure: (small) set of solutions, constructed from solutions
More informationMULTI-DIMENSIONAL SEQUENTIAL PATTERN MINING
MULTI-DIMENSIONAL SEQUENTIAL PATTERN MINING by Helen Pinto B.Sc. B.Mgmt., University of Lethbridge, Alberta, 1998 THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF
More informationData Mining in Bioinformatics Day 5: Graph Mining
Data Mining in Bioinformatics Day 5: Graph Mining Karsten Borgwardt February 25 to March 10 Bioinformatics Group MPIs Tübingen from Borgwardt and Yan, KDD 2008 tutorial Graph Mining and Graph Kernels,
More informationChapter 2. Related Work
Chapter 2 Related Work There are three areas of research highly related to our exploration in this dissertation, namely sequential pattern mining, multiple alignment, and approximate frequent pattern mining.
More informationTCGR: A Novel DNA/RNA Visualization Technique
TCGR: A Novel DNA/RNA Visualization Technique Donya Quick and Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Dallas, Texas 75275 dquick@mail.smu.edu, mhd@engr.smu.edu
More informationSequential PAttern Mining using A Bitmap Representation
Sequential PAttern Mining using A Bitmap Representation Jay Ayres, Jason Flannick, Johannes Gehrke, and Tomi Yiu Dept. of Computer Science Cornell University ABSTRACT We introduce a new algorithm for mining
More informationAssociation Rule Mining: FP-Growth
Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong We have already learned the Apriori algorithm for association rule mining. In this lecture, we will discuss a faster
More informationFast incremental mining of web sequential patterns with PLWAP tree
Data Min Knowl Disc DOI 10.1007/s10618-009-0133-6 Fast incremental mining of web sequential patterns with PLWAP tree C. I. Ezeife Yi Liu Received: 1 September 2007 / Accepted: 13 May 2009 Springer Science+Business
More informationSequencing. Computational Biology IST Ana Teresa Freitas 2011/2012. (BACs) Whole-genome shotgun sequencing Celera Genomics
Computational Biology IST Ana Teresa Freitas 2011/2012 Sequencing Clone-by-clone shotgun sequencing Human Genome Project Whole-genome shotgun sequencing Celera Genomics (BACs) 1 Must take the fragments
More informationCSE 417 Branch & Bound (pt 4) Branch & Bound
CSE 417 Branch & Bound (pt 4) Branch & Bound Reminders > HW8 due today > HW9 will be posted tomorrow start early program will be slow, so debugging will be slow... Review of previous lectures > Complexity
More informationCMTreeMiner: Mining Both Closed and Maximal Frequent Subtrees
MTreeMiner: Mining oth losed and Maximal Frequent Subtrees Yun hi, Yirong Yang, Yi Xia, and Richard R. Muntz University of alifornia, Los ngeles, 90095, US {ychi,yyr,xiayi,muntz}@cs.ucla.edu bstract. Tree
More informationDiscrete mathematics
Discrete mathematics Petr Kovář petr.kovar@vsb.cz VŠB Technical University of Ostrava DiM 470-2301/02, Winter term 2018/2019 About this file This file is meant to be a guideline for the lecturer. Many
More informationAssociation Rule Discovery
Association Rule Discovery Association Rules describe frequent co-occurences in sets an itemset is a subset A of all possible items I Example Problems: Which products are frequently bought together by
More informationTrees. Arash Rafiey. 20 October, 2015
20 October, 2015 Definition Let G = (V, E) be a loop-free undirected graph. G is called a tree if G is connected and contains no cycle. Definition Let G = (V, E) be a loop-free undirected graph. G is called
More informationInternational Journal of Scientific Research and Reviews
Research article Available online www.ijsrr.org ISSN: 2279 0543 International Journal of Scientific Research and Reviews A Survey of Sequential Rule Mining Algorithms Sachdev Neetu and Tapaswi Namrata
More informationAPPLYING BIT-VECTOR PROJECTION APPROACH FOR EFFICIENT MINING OF N-MOST INTERESTING FREQUENT ITEMSETS
APPLYIG BIT-VECTOR PROJECTIO APPROACH FOR EFFICIET MIIG OF -MOST ITERESTIG FREQUET ITEMSETS Zahoor Jan, Shariq Bashir, A. Rauf Baig FAST-ational University of Computer and Emerging Sciences, Islamabad
More informationInterestingness Measurements
Interestingness Measurements Objective measures Two popular measurements: support and confidence Subjective measures [Silberschatz & Tuzhilin, KDD95] A rule (pattern) is interesting if it is unexpected
More informationAssociation Rule Discovery
Association Rule Discovery Association Rules describe frequent co-occurences in sets an item set is a subset A of all possible items I Example Problems: Which products are frequently bought together by
More informationCSE 373 MAY 10 TH SPANNING TREES AND UNION FIND
CSE 373 MAY 0 TH SPANNING TREES AND UNION FIND COURSE LOGISTICS HW4 due tonight, if you want feedback by the weekend COURSE LOGISTICS HW4 due tonight, if you want feedback by the weekend HW5 out tomorrow
More information