An Introduction to Data Mining
|
|
- Patience Hudson
- 6 years ago
- Views:
Transcription
1 An Introduction to Data Mining Hossein Hakimzadeh Computer and Information Sciences Data Mining (B561) 1
2 What Is Data Mining? Original Definition: "data mining" was a statistician's term for overusing data to draw invalid inferences. Bonferroni's theorem: If there are too many possible conclusions to draw, some will be true for purely statistical reasons, with no physical validity. Data Mining (B561) 2
3 What Is Data Mining? David Rhine, a "parapsychologist" at Duke in the 1950's tested students for "extrasensory perception" (ESP) by asking them to guess 10 cards as red or black. He found about 1/1000 of them guessed all 10. He declared them to have ESP. When he retested them, he found they did not do better than average. His conclusion: Telling people they have ESP causes them to lose it! Data Mining (B561) 3
4 What Is Data Mining? Definition-1: "Discovery of useful summaries of data." Data Mining Course at Stanford University Definition-2: The mining or discovery of new information in term of patterns or rules from vast amount of data. Fundamental of Database Systems, Elmasri and Navathe, 4 th Edition, Addison Wesley. Data Mining (B561) 4
5 Data Mining vs. Data Retrieval The existing query tools can be likened to using the equivalent of a flashlight to locate interesting information in data. The user is left to point the flashlight where the user thinks he or she should go to find useful trends and patterns. Data mining discovers patterns that direct the user toward the right questions to ask with traditional query tools. A data mining tool does not require any assumptions; it tries to discover relationships and hidden patterns that may not always be obvious. Data Mining (B561) 5
6 Applications of Data Mining: Some examples of "successes": 1. Decision trees constructed from bank-loan histories to produce algorithms to decide whether to grant a loan. 2. Patterns of traveler behavior mined to manage the sale of discounted seats on planes, rooms in hotels, etc. 3. "Diapers and beer." Customers who buy diapers are more likely to buy beer than average customers. This observation allowed supermarkets to place beer and diapers nearby, knowing many customers would walk between them. Placing potato chips in between increased the sales of all three items. Data Mining (B561) 6
7 Applications of Data Mining: Some examples of "successes": 4. Skycat and Sloan Sky Survey: clustering sky objects by their radiation levels in different bands allowed astronomers to distinguish between galaxies, nearby stars, and many other kinds of celestial objects. 5. Comparison of the genotype of people with/without a condition allowed the discovery of a set of genes that together account for many cases of diabetes. This sort of mining will become much more important as the human genome is constructed. Data Mining Course at Stanford University Data Mining (B561) 7
8 Data-Mining Communities: Data-mining has been claimed by an number of research communities: Statistics. Artificial Intelligence, where it is called "machine learning." Neural networks and genetic algorithms are also used. Researchers in clustering algorithms. Visualization researchers. Databases, where data mining can be thought of as algorithms for executing very complex queries on non-main-memory data. Data Mining Course at Stanford University Data Mining (B561) 8
9 Stages of the Data-Mining Process: Data gathering Data cleansing Feature extraction Pattern extraction and discovery Visualization of the data Evaluation of results Data Mining (B561) 9
10 Stages of the Data-Mining Process: Data gathering: (Data warehousing, Web crawling.) Data cleansing: (eliminate errors and/or bogus data, e.g., patient fever = 125.) Feature extraction: (obtaining only the interesting attributes of the data, e.g., "date acquired" is probably not useful for clustering celestial objects, as in Skycat. ) (Remove useless data) Pattern extraction and discovery: (this is the stage that is often thought of as "data mining".) Visualization of the data. Evaluation of results: (not every discovered fact is useful, or even true! Judgment is necessary before following your software's conclusions.) Data Mining (B561) 10
11 How is Knowledge Discovered? Deductive Knowledge Inductive Knowledge Data Mining (B561) 11
12 How is Knowledge Discovered? Deductive Knowledge: New information (or facts) are deduced by applying pre-specified logical rules of deduction on a given data. (i.e. Deductive Databases) A Prolog program to build a simple family Knowledgebase. sister(mary,jack). sister(mary,jim). brother(jack,mary). brother(jack,jim). father(john,jack). father(john,jim). sibling(x,y) :_ father(z,x), father(z,y). sibling(x,y) :_ brother(x,y). sibling(x,y) :_ brother(z,x), brother(z,y). sibling(x,y) :_ sister(x,y). sibling(x,y) :_ sister(z,x), sister(z,y). Data Mining (B561) 12
13 How is Knowledge Discovered? Inductive Knowledge: Discovers new rules and patterns from the supplied data. (i.e. Data mining) Inductive reasoning works by way of moving from specific observations to broader generalizations and theories we begin with specific observations and measures, begin to detect patterns and regularities, formulate some tentative hypotheses that we can explore, and finally end up developing some general conclusions or theories. Data Mining (B561) 13
14 Typical Results of Data Mining 1. Association Rules: (whenever a customer buys video equipment she also buys a another electronic gadget.) 2. Sequential Patterns: (when a customer buys a camera and within three month he buys photographic supplies, then within six months, he is likely to by an accessory item.) (if the customer buys more than twice in the lean periods, he may be more likely to buy at least once during the Christmas period.) Data Mining (B561) 14
15 Typical Results of Data Mining 3. Classification Trees/Hierarchies: (Customers may be classified by frequency of visits, by type of financing used, by amount of purchase, by affinity for types of items, and then some revealing statistics may be generated for such classes.) (Customers may be divided into five categories of credit worthiness, based on prior credit transactions) 4. Patterns within time series: (Stock of utility companies X, Y and Z showed the same pattern during 2003, in terms of closing stock price.) (Retail sales index improves, in the months immediately following the tax refund/rebate period.) (Two products show the same sales pattern during summer but not winter.) Data Mining (B561) 15
16 The Goals of Data Mining: 1. Prediction 2. Identification 3. Classification and Clustering 4. Optimization Data Mining (B561) 16
17 The Goals of Data Mining: 1. Prediction: Predict future behavior (i.e. predicting that certain discount levels will cause certain Specific customers to purchase an item) (i.e. predicting sales in a given period) (i.e. certain seismic wave patterns may predict an earthquake.) Data Mining (B561) 17
18 The Goals of Data Mining: 2. Identification: Identifying the existence of an item, event or activity (i.e. system intruders may be identified by the type of programs being executed, files accessed, CPU utilization, network activities and the time at which such event occur. ) Data Mining (B561) 18
19 The Goals of Data Mining: 3. Classification and Clustering: Partitioning the data into categories of classes. (i.e. discount-seeking shopper, shopper in a rush, loyal and regular shopper, name brand shopper, infrequent shopper, etc.) Data Mining (B561) 19
20 Classification or Supervised Learning An analyst for a telecommunications company wants to understand why some customers remain loyal while others leave. Ultimately, the analyst wants to predict which customer is most likely to leave and join competitors. The analyst can construct a model derived from historical data of loyal and disloyal customers. Building a model for this business problem requires knowledge of which customers have remained loyal and which have not. This type of mining is called classification or supervised learning, because the training examples are labeled with the actual class they belong to (loyal or lost). Data Mining (B561) 20
21 Clustering or Unsupervised Learning Retailers want to know where similarities exist in their customer base so that they can create and understand different groups to which they sell and market. The analyst will use a database with rows of customer information and attempt to create customer segments. The data set may contain many attributes such as customers with or without children, single parent and income level. During the discovery process, their difference can be used to separate the data into natural groupings. This approach is referred to as clustering or unsupervised learning. Clustering can be based on historical patterns, but unlike classification approach, the outcome is not supplied with the training data. Data Mining (B561) 21
22 The Goals of Data Mining: 4. Optimization: Optimize the use of limited resources. (i.e. time, money, space, material, personnel, etc. ) Data Mining (B561) 22
23 In the real world such results can be used to: Plan store locations based on demographics To run targeted promotions Combine items in advertising Predict what admission criteria will lead to academic success, better retention, and graduation rates. Data Mining (B561) 23
24 Association Rules and Frequent Item-sets The market-basket problem assumes we have some large number of items, e.g., "bread", "milk", etc. Customers fill their market baskets with some subset of the items, We get to know what items people buy together, even if we don't know who they are. Marketers use this information to position items, and control the way a typical customer traverses the store. Data Mining (B561) 24
25 Association Rules and Frequent Item-sets In addition to the marketing application, the same sort of question has the following uses: 1. Baskets = documents; items = words. Words appearing frequently together in documents may represent phrases or linked concepts. Can be used for intelligence gathering. 2. Baskets = sentences, items = documents. Two documents with many of the same sentences could represent plagiarism or mirror sites on the Web. 3. Baskets = semester schedule, items = courses. Courses appearing together in students schedule may have synergistic effect for current or future semester schedules. Data Mining (B561) 25
26 Goals for Market-Basket Mining 1. Association rules are statements of the form (X1 ;X2 ;...;Xn) Y Y, meaning that if we find all of X1 ;X2 ;...;Xn in the market basket, then we have a good chance of finding Y. The probability of finding Y for us to accept this rule is called the confidence of the rule. We normally would search only for rules that have confidence above a certain threshold. (significantly higher than random placement into baskets) Data Mining (B561) 26
27 Goals for Market-Basket Mining Example-1: (Low confidence) {milk; butter} Y bread simply because a lot of people buy bread. Consider the following examples: {shoe polish} Y bread {vine} Y bread {flower} Y bread Data Mining (B561) 27
28 Goals for Market-Basket Mining Example-2: (High confidence) {diapers} Y beer The beer/diapers story asserts that the rule {diapers} Y beer holds with confidence significantly greater than the fraction of baskets that contain beer. Data Mining (B561) 28
29 Causality: Ideally, we would like to know that in an association rule the presence of X1 ;...;Xn actually "causes" Y to be bought. However, "causality" is an elusive concept. nevertheless, for market-basket data, the following test suggests what causality means. If we lower the price of diapers and raise the price of beer, we can lure diaper buyers, who are more likely to pick up beer while in the store, thus covering our losses on the diapers. That strategy works because "diapers causes beer. However, working it the other way round, running a sale on beer and raising the price of diapers, will not result in beer buyers buying diapers in any great numbers, and we lose money. Data Mining (B561) 29
30 Frequent Item-sets: In many (but not all) situations, we only care about association rules or causalities involving sets of items that appear frequently in baskets. For example, we cannot run a good marketing strategy involving items that no one buys anyway. Thus, much data mining starts with the assumption that we only care about sets of items with high support; i.e., they appear together in many baskets. We then find association rules or causalities only involving a high-support set of items (i.e., (X1 ;...;Xn; and Y) must appear in at least a certain percent of the baskets, called the support threshold. Data Mining (B561) 30
31 Implementing Association Rules: An Association rule is of form X Y where X = { x1, x2,.., xn} and Y = { y1, y2,.., ym} are set of items, with x i and y j being distinct items for all i and all j. X Y states that if a customer buys X, then she is likely to buy Y. In general LHS RHS where LHS and RHS are are set of items. The set LHS U RHS is called an item-set (e.g. a set of items purchased by customers. For an association rule to be considered interesting, the rule must satisfy Support and Confidence measures. Data Mining (B561) 31
32 Support or Prevalence for Association Rule: Support for the rule LHS RHS refers to how frequently a specific item-set occurs in the data base. Percentage of transactions that contain all the items in the item-set (LHS U RHS) If the support is low, it implies that item-set occurs in only a small fraction of transactions and therefore, the association rule is not as reliable. Data Mining (B561) 32
33 Confidence or Strength for Association Rule: Confidence for the rule LHS RHS refers to how strong the association is. Confidence is calculated as: Support(LHS U RHS) / Support(LHS) In other words, the probability that the items in RHS will be purchased, given that the items in LHS are purchased. Data Mining (B561) 33
34 Example of Association Rules: T-ID 101 Time 6:35 Items Bought milk, bread, cookies, juice 792 7:38 milk, juice :05 milk, eggs :45 bread, cookies, coffee Suppose the following association rules have been observed: milk juice And bread juice Data Mining (B561) 34
35 Example of Association Rules: T-ID Time 6:35 7:38 8:05 8:45 Items Bought milk, bread, cookies, juice milk, juice milk, eggs bread, cookies, coffee What is the support for {milk, juice}? What is the support for {bread, juice}? Data Mining (B561) 35
36 Example of Association Rules: T-ID 101 Time 6:35 Items Bought milk, bread, cookies, juice 792 7:38 milk, juice :05 milk, eggs :45 bread, cookies, coffee What is the support for {milk, juice}? 50% What is the support for {bread, juice}? 25% Data Mining (B561) 36
37 Example of Association Rules: T-ID Time 6:35 7:38 8:05 8:45 Items Bought milk, bread, cookies, juice milk, juice milk, eggs bread, cookies, coffee What is the confidence for milk juice? What is the confidence for bread juice? Data Mining (B561) 37
38 Example of Association Rules: T-ID 101 Time 6:35 Items Bought milk, bread, cookies, juice 792 7:38 milk, juice :05 milk, eggs :45 bread, cookies, coffee What is the confidence for milk juice? 50% / 75% = 66.7% What is the confidence for bread juice? 25% / 50% = 50% Data Mining (B561) 38
39 What is a good Association Rule? The goal of mining association rules is to generate all possible rules that exceed some minimum User-Specified support and confidence thresholds. Data Mining (B561) 39
40 Data mining algorithms At the heart of data mining is the process of building a model to represent a data set. Vendors/researchers often discuss the differences in model built using algorithms and approaches. There are hundreds of derivative approaches under the generic data mining model names like neural networks, agent networks, decision trees, concept hierarchies, genetic algorithms, fuzzy logic, and belief networks. For example, Neural Ware offers a neural network product set that offers over 25 different neural network approaches. Data Mining (B561) 40
41 How does "Data Mining" compare with other statistical techniques? Data analysis has been in existence for decades and the advent of computers and statistics accelerated manipulation of very large data sets for discovering knowledge. Statistical approaches to data analysis involve a process called regression analysis, which has been used to model data. Various regression models rely upon underlying assumptions that the underlying data is well-behaved and that the relationship structures are of a form that can be linearly transformed for ease of estimation. This restricts the ability of the modeler because in the real world, things do not function according to some predictable linear function. The new "Machine Learning" or data mining techniques impose no such prior restraints on the model and can seek out relationships that would otherwise go undetected by traditional methods. Data Mining (B561) 41
42 Why should you consider using "Data Mining"? Data mining automates the process of discovering useful trends and patterns. It can be designed so as to automate the process of learning about evolving relationships with the aid of an expert, the model builder. When dealing with large databases, data mining is a computationally intensive process and requires a fair amount of disk space as well. Decreases in hardware costs have made data mining available to a much wider audience. Increase in the power of PCs and a decrease in its cost has made data mining feasible for all types of businesses - large and small. Data Mining (B561) 42
Chapter 28. Outline. Definitions of Data Mining. Data Mining Concepts
Chapter 28 Data Mining Concepts Outline Data Mining Data Warehousing Knowledge Discovery in Databases (KDD) Goals of Data Mining and Knowledge Discovery Association Rules Additional Data Mining Algorithms
More informationsignicantly higher than it would be if items were placed at random into baskets. For example, we
2 Association Rules and Frequent Itemsets The market-basket problem assumes we have some large number of items, e.g., \bread," \milk." Customers ll their market baskets with some subset of the items, and
More informationData Mining Concepts
Data Mining Concepts Outline Data Mining Data Warehousing Knowledge Discovery in Databases (KDD) Goals of Data Mining and Knowledge Discovery Association Rules Additional Data Mining Algorithms Sequential
More informationData Mining Algorithms
Algorithms Fall 2017 Big Data Tools and Techniques Basic Data Manipulation and Analysis Performing well-defined computations or asking well-defined questions ( queries ) Looking for patterns in data Machine
More informationDATA MINING. Prof. Navneet Goyal Department of Computer Science & Information Systems, BITS, Pilani.
DATA MINING Prof. Navneet Goyal Department of Computer Science & Information Systems, BITS, Pilani. Topics What is Data Mining? Data Mining Tasks Association Rules Clustering Classification & Prediction
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 1 1 Acknowledgement Several Slides in this presentation are taken from course slides provided by Han and Kimber (Data Mining Concepts and Techniques) and Tan,
More informationGrading. 20% Activity (course attendance and homework) 40% Project (project attendance, algorithm presentation, project delivery)
1. Introduction to Data Mining Grading 60% during the semester: 20% Activity (course attendance and homework) 40% Project (project attendance, algorithm presentation, project delivery) 40% Final exam (questions
More informationGrading. Road Map. Definition ([Liu 11]) Definition ([Wikipedia]) Definition ([Ullman 09, 10])
Grading 1. Introduction to Data Mining 60% during the semester: 20% Activity (course attendance and homework) 40% Project (project attendance, algorithm presentation, project delivery) 40% Final exam (questions
More informationJarek Szlichta
Jarek Szlichta http://data.science.uoit.ca/ Approximate terminology, though there is some overlap: Data(base) operations Executing specific operations or queries over data Data mining Looking for patterns
More informationCOMP90049 Knowledge Technologies
COMP90049 Knowledge Technologies Data Mining (Lecture Set 3) 2017 Rao Kotagiri Department of Computing and Information Systems The Melbourne School of Engineering Some of slides are derived from Prof Vipin
More informationThanks to the advances of data processing technologies, a lot of data can be collected and stored in databases efficiently New challenges: with a
Data Mining and Information Retrieval Introduction to Data Mining Why Data Mining? Thanks to the advances of data processing technologies, a lot of data can be collected and stored in databases efficiently
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University.
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 2 Instructor: Jure Leskovec TAs: Aditya Parameswaran Bahman Bahmani Peyman Kazemian 3 Course website: http://cs246.stanford.edu
More informationData Mining Clustering
Data Mining Clustering Jingpeng Li 1 of 34 Supervised Learning F(x): true function (usually not known) D: training sample (x, F(x)) 57,M,195,0,125,95,39,25,0,1,0,0,0,1,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0 0
More informationSupervised and Unsupervised Learning (II)
Supervised and Unsupervised Learning (II) Yong Zheng Center for Web Intelligence DePaul University, Chicago IPD 346 - Data Science for Business Program DePaul University, Chicago, USA Intro: Supervised
More informationMachine Learning: Symbol-based
10c Machine Learning: Symbol-based 10.0 Introduction 10.1 A Framework for Symbol-based Learning 10.2 Version Space Search 10.3 The ID3 Decision Tree Induction Algorithm 10.4 Inductive Bias and Learnability
More informationData Structures. Notes for Lecture 14 Techniques of Data Mining By Samaher Hussein Ali Association Rules: Basic Concepts and Application
Data Structures Notes for Lecture 14 Techniques of Data Mining By Samaher Hussein Ali 2009-2010 Association Rules: Basic Concepts and Application 1. Association rules: Given a set of transactions, find
More informationData mining overview. Data Mining. Data mining overview. Data mining overview. Data mining overview. Data mining overview 3/24/2014
Data Mining Data mining processes What technological infrastructure is required? Data mining is a system of searching through large amounts of data for patterns. It is a relatively new concept which is
More informationAn Introduction to Data Mining BY:GAGAN DEEP KAUSHAL
An Introduction to Data Mining BY:GAGAN DEEP KAUSHAL Trends leading to Data Flood More data is generated: Bank, telecom, other business transactions... Scientific Data: astronomy, biology, etc Web, text,
More informationJeffrey D. Ullman Stanford University
Jeffrey D. Ullman Stanford University A large set of items, e.g., things sold in a supermarket. A large set of baskets, each of which is a small set of the items, e.g., the things one customer buys on
More informationData Mining. Ryan Benton Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, La., USA.
Data Mining Ryan Benton Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, La., USA January 13, 2011 Important Note! This presentation was obtained from Dr. Vijay Raghavan
More informationFoundation of Data Mining: Introduction
Foundation of Data Mining: Introduction Hillol Kargupta CSEE Department, UMBC hillol@cs.umbc.edu ITE 342, (410) 455-3972 www.cs.umbc.edu/~hillol Acknowledgement: Tan, Steinbach, and Kumar provided some
More informationAnalytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.
Glossary of data mining terms: Accuracy Accuracy is an important factor in assessing the success of data mining. When applied to data, accuracy refers to the rate of correct values in the data. When applied
More informationThis tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.
About the Tutorial Data Mining is defined as the procedure of extracting information from huge sets of data. In other words, we can say that data mining is mining knowledge from data. The tutorial starts
More informationKNOWLEDGE DISCOVERY AND DATA MINING
KNOWLEDGE DISCOVERY AND DATA MINING Prof. Fabio A. Schreiber Dipartimento di Elettronica e Informazione Politecnico di Milano INFORMATION MANAGEMENT TECHNOLOGIES DATA WAREHOUSE DECISION SUPPORT SYSTEMS
More informationData Mining Course Overview
Data Mining Course Overview 1 Data Mining Overview Understanding Data Classification: Decision Trees and Bayesian classifiers, ANN, SVM Association Rules Mining: APriori, FP-growth Clustering: Hierarchical
More informationDefining a Data Mining Task. CSE3212 Data Mining. What to be mined? Or the Approaches. Task-relevant Data. Estimation.
CSE3212 Data Mining Data Mining Approaches Defining a Data Mining Task To define a data mining task, one needs to answer the following questions: 1. What data set do I want to mine? 2. What kind of knowledge
More informationAssociation Rules. Berlin Chen References:
Association Rules Berlin Chen 2005 References: 1. Data Mining: Concepts, Models, Methods and Algorithms, Chapter 8 2. Data Mining: Concepts and Techniques, Chapter 6 Association Rules: Basic Concepts A
More informationAssociation Rule Mining. Entscheidungsunterstützungssysteme
Association Rule Mining Entscheidungsunterstützungssysteme Frequent Pattern Analysis Frequent pattern: a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set
More informationLecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R,
Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R, statistics foundations 5 Introduction to D3, visual analytics
More informationIntroduction to Data Mining S L I D E S B Y : S H R E E J A S W A L
Introduction to Data Mining S L I D E S B Y : S H R E E J A S W A L Books 2 Which Chapter from which Text Book? Chapter 1: Introduction from Han, Kamber, "Data Mining Concepts and Techniques", Morgan Kaufmann
More informationKnowledge Discovery & Data Mining
Announcements ISM 50 - Business Information Systems Lecture 17 Instructor: Magdalini Eirinaki UC Santa Cruz May 29, 2007 News Folio #3 DUE Thursday 5/31 Database Assignment DUE Tuesday 6/5 Business Paper
More informationPattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42
Pattern Mining Knowledge Discovery and Data Mining 1 Roman Kern KTI, TU Graz 2016-01-14 Roman Kern (KTI, TU Graz) Pattern Mining 2016-01-14 1 / 42 Outline 1 Introduction 2 Apriori Algorithm 3 FP-Growth
More informationD B M G Data Base and Data Mining Group of Politecnico di Torino
DataBase and Data Mining Group of Data mining fundamentals Data Base and Data Mining Group of Data analysis Most companies own huge databases containing operational data textual documents experiment results
More informationLecture 18. Business Intelligence and Data Warehousing. 1:M Normalization. M:M Normalization 11/1/2017. Topics Covered
Lecture 18 Business Intelligence and Data Warehousing BDIS 6.2 BSAD 141 Dave Novak Topics Covered Test # Review What is Business Intelligence? How can an organization be data rich and information poor?
More informationAn Effectual Approach to Swelling the Selling Methodology in Market Basket Analysis using FP Growth
An Effectual Approach to Swelling the Selling Methodology in Market Basket Analysis using FP Growth P.Sathish kumar, T.Suvathi K.S.Rangasamy College of Technology suvathi007@gmail.com Received: 03/01/2017,
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 1/8/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 2 Supermarket shelf
More informationISM 50 - Business Information Systems
ISM 50 - Business Information Systems Lecture 17 Instructor: Magdalini Eirinaki UC Santa Cruz May 29, 2007 Announcements News Folio #3 DUE Thursday 5/31 Database Assignment DUE Tuesday 6/5 Business Paper
More informationStats Overview Ji Zhu, Michigan Statistics 1. Overview. Ji Zhu 445C West Hall
Stats 415 - Overview Ji Zhu, Michigan Statistics 1 Overview Ji Zhu 445C West Hall 734-936-2577 jizhu@umich.edu Stats 415 - Overview Ji Zhu, Michigan Statistics 2 What is Data Mining? Data mining is a multi-disciplinary
More informationInternational Journal of Advance Engineering and Research Development. A Survey on Data Mining Methods and its Applications
Scientific Journal of Impact Factor (SJIF): 4.72 International Journal of Advance Engineering and Research Development Volume 5, Issue 01, January -2018 e-issn (O): 2348-4470 p-issn (P): 2348-6406 A Survey
More informationANU MLSS 2010: Data Mining. Part 2: Association rule mining
ANU MLSS 2010: Data Mining Part 2: Association rule mining Lecture outline What is association mining? Market basket analysis and association rule examples Basic concepts and formalism Basic rule measurements
More informationCISC 4631 Data Mining Lecture 01:
CISC 4631 Data Mining Lecture 01: Introduction to Data Mining 1 Let s Start By Seeing What You Know Quick Quiz Do you know what Data Mining is? Do you know of any examples of Data Mining? 2 What is Data
More informationIntroduction to Data Mining and Data Analytics
1/28/2016 MIST.7060 Data Analytics 1 Introduction to Data Mining and Data Analytics What Are Data Mining and Data Analytics? Data mining is the process of discovering hidden patterns in data, where Patterns
More informationHigh dim. data. Graph data. Infinite data. Machine learning. Apps. Locality sensitive hashing. Filtering data streams.
http://www.mmds.org High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Network Analysis
More informationNesnelerin İnternetinde Veri Analizi
Bölüm 4. Frequent Patterns in Data Streams w3.gazi.edu.tr/~suatozdemir What Is Pattern Discovery? What are patterns? Patterns: A set of items, subsequences, or substructures that occur frequently together
More informationCOMP 465 Special Topics: Data Mining
COMP 465 Special Topics: Data Mining Introduction & Course Overview 1 Course Page & Class Schedule http://cs.rhodes.edu/welshc/comp465_s15/ What s there? Course info Course schedule Lecture media (slides,
More informationChapter 4 Data Mining A Short Introduction
Chapter 4 Data Mining A Short Introduction Data Mining - 1 1 Today's Question 1. Data Mining Overview 2. Association Rule Mining 3. Clustering 4. Classification Data Mining - 2 2 1. Data Mining Overview
More informationPESIT- Bangalore South Campus Hosur Road (1km Before Electronic city) Bangalore
Data Warehousing Data Mining (17MCA442) 1. GENERAL INFORMATION: PESIT- Bangalore South Campus Hosur Road (1km Before Electronic city) Bangalore 560 100 Department of MCA COURSE INFORMATION SHEET Academic
More informationChapter 4: Mining Frequent Patterns, Associations and Correlations
Chapter 4: Mining Frequent Patterns, Associations and Correlations 4.1 Basic Concepts 4.2 Frequent Itemset Mining Methods 4.3 Which Patterns Are Interesting? Pattern Evaluation Methods 4.4 Summary Frequent
More informationOutline. Project Update Data Mining: Answers without Queries. Principles of Information and Database Management 198:336 Week 12 Apr 25 Matthew Stone
Outline Principles of Information and Database Management 198:336 Week 12 Apr 25 Matthew Stone Project Update Data Mining: Answers without Queries Patterns and statistics Finding frequent item sets Classification
More informationInternational Journal of Mechatronics, Electrical and Computer Technology
Identification of Mazandaran Telecommunication Company Fixed phone subscribers using H-Means and W-K-Means Algorithm Abstract Yaser Babagoli Ahangar 1*, Homayon Motameni 2 and Ramzanali Abasnejad Varzi
More informationOracle9i Data Mining. Data Sheet August 2002
Oracle9i Data Mining Data Sheet August 2002 Oracle9i Data Mining enables companies to build integrated business intelligence applications. Using data mining functionality embedded in the Oracle9i Database,
More informationUnderstanding Rule Behavior through Apriori Algorithm over Social Network Data
Global Journal of Computer Science and Technology Volume 12 Issue 10 Version 1.0 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals Inc. (USA) Online ISSN: 0975-4172
More informationCMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10)
CMPUT 391 Database Management Systems Data Mining Textbook: Chapter 17.7-17.11 (without 17.10) University of Alberta 1 Overview Motivation KDD and Data Mining Association Rules Clustering Classification
More informationData mining fundamentals
Data mining fundamentals Elena Baralis Politecnico di Torino Data analysis Most companies own huge bases containing operational textual documents experiment results These bases are a potential source of
More informationDATA MINING TRANSACTION
DATA MINING Data Mining is the process of extracting patterns from data. Data mining is seen as an increasingly important tool by modern business to transform data into an informational advantage. It is
More informationData warehouses Decision support The multidimensional model OLAP queries
Data warehouses Decision support The multidimensional model OLAP queries Traditional DBMSs are used by organizations for maintaining data to record day to day operations On-line Transaction Processing
More informationYunfeng Zhang 1, Huan Wang 2, Jie Zhu 1 1 Computer Science & Engineering Department, North China Institute of Aerospace
[Type text] [Type text] [Type text] ISSN : 0974-7435 Volume 10 Issue 20 BioTechnology 2014 An Indian Journal FULL PAPER BTAIJ, 10(20), 2014 [12526-12531] Exploration on the data mining system construction
More informationChapter 6: Basic Concepts: Association Rules. Basic Concepts: Frequent Patterns. (absolute) support, or, support. (relative) support, s, is the
Chapter 6: What Is Frequent ent Pattern Analysis? Frequent pattern: a pattern (a set of items, subsequences, substructures, etc) that occurs frequently in a data set frequent itemsets and association rule
More informationChapter 4: Association analysis:
Chapter 4: Association analysis: 4.1 Introduction: Many business enterprises accumulate large quantities of data from their day-to-day operations, huge amounts of customer purchase data are collected daily
More informationFinal Exam DATA MINING I - 1DL360
Uppsala University Department of Information Technology Kjell Orsborn Final Exam 2012-10-17 DATA MINING I - 1DL360 Date... Wednesday, October 17, 2012 Time... 08:00-13:00 Teacher on duty... Kjell Orsborn,
More informationAssociation Pattern Mining. Lijun Zhang
Association Pattern Mining Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction The Frequent Pattern Mining Model Association Rule Generation Framework Frequent Itemset Mining Algorithms
More informationWe will be releasing HW1 today It is due in 2 weeks (1/25 at 23:59pm) The homework is long
1/21/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 1 We will be releasing HW1 today It is due in 2 weeks (1/25 at 23:59pm) The homework is long Requires proving theorems
More informationApplication of Data Mining in Library and Information Services
168 Application of Data Mining in Library and Information Services K Prakash Prem Chand Umesh Gohel Abstract Knowledge Discovery or Data Mining is the partially automated process of extracting patterns,
More informationA Systems Approach to Dimensional Modeling in Data Marts. Joseph M. Firestone, Ph.D. White Paper No. One. March 12, 1997
1 of 8 5/24/02 4:43 PM A Systems Approach to Dimensional Modeling in Data Marts By Joseph M. Firestone, Ph.D. White Paper No. One March 12, 1997 OLAP s Purposes And Dimensional Data Modeling Dimensional
More informationCSE4334/5334 DATA MINING
CSE4334/5334 DATA MINING Lecture 4: Classification (1) CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai Li (Slides courtesy
More informationThe k-means Algorithm and Genetic Algorithm
The k-means Algorithm and Genetic Algorithm k-means algorithm Genetic algorithm Rough set approach Fuzzy set approaches Chapter 8 2 The K-Means Algorithm The K-Means algorithm is a simple yet effective
More informationCS377: Database Systems Data Warehouse and Data Mining. Li Xiong Department of Mathematics and Computer Science Emory University
CS377: Database Systems Data Warehouse and Data Mining Li Xiong Department of Mathematics and Computer Science Emory University 1 1960s: Evolution of Database Technology Data collection, database creation,
More informationThe Use of Fuzzy Logic at Support of Manager Decision Making
The Use of Fuzzy Logic at Support of Manager Decision Making The use of fuzzy logic is the advantage especially at decision making processes where the description by algorithms is very difficult and criteria
More informationData Mining Concept. References. Why Mine Data? Commercial Viewpoint. Why Mine Data? Scientific Viewpoint
References Discovering Knowledge in Data Daniel T Larose, 2005 Data Mining Concept Data Mining: Concepts and Techniques, 2nd Edition, 2005 Micheline Kamber, Jiawei Han Data Mining: Practical Machine Learning
More informationLesson 3: Building a Market Basket Scenario (Intermediate Data Mining Tutorial)
From this diagram, you can see that the aggregated mining model preserves the overall range and trends in values while minimizing the fluctuations in the individual data series. Conclusion You have learned
More informationMachine Learning: Symbolische Ansätze
Machine Learning: Symbolische Ansätze Unsupervised Learning Clustering Association Rules V2.0 WS 10/11 J. Fürnkranz Different Learning Scenarios Supervised Learning A teacher provides the value for the
More informationInternational Journal of Computer Engineering and Applications, ICCSTAR-2016, Special Issue, May.16
The Survey Of Data Mining And Warehousing Architha.S, A.Kishore Kumar Department of Computer Engineering Department of computer engineering city engineering college VTU Bangalore, India ABSTRACT: Data
More informationData warehouse and Data Mining
Data warehouse and Data Mining Lecture No. 14 Data Mining and its techniques Naeem A. Mahoto Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology
More informationSoftware Engineering Prof.N.L.Sarda IIT Bombay. Lecture-11 Data Modelling- ER diagrams, Mapping to relational model (Part -II)
Software Engineering Prof.N.L.Sarda IIT Bombay Lecture-11 Data Modelling- ER diagrams, Mapping to relational model (Part -II) We will continue our discussion on process modeling. In the previous lecture
More informationNow, Data Mining Is Within Your Reach
Clementine Desktop Specifications Now, Data Mining Is Within Your Reach Data mining delivers significant, measurable value. By uncovering previously unknown patterns and connections in data, data mining
More informationISSN: (Online) Volume 3, Issue 9, September 2015 International Journal of Advance Research in Computer Science and Management Studies
ISSN: 2321-7782 (Online) Volume 3, Issue 9, September 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online
More informationOn-Line Application Processing
On-Line Application Processing WAREHOUSING DATA CUBES DATA MINING 1 Overview Traditional database systems are tuned to many, small, simple queries. Some new applications use fewer, more time-consuming,
More informationEfficient Frequent Itemset Mining Mechanism Using Support Count
Efficient Frequent Itemset Mining Mechanism Using Support Count 1 Neelesh Kumar Kori, 2 Ramratan Ahirwal, 3 Dr. Yogendra Kumar Jain 1 Department of C.S.E, Samrat Ashok Technological Institute, Vidisha,
More informationINTRODUCTION... 2 FEATURES OF DARWIN... 4 SPECIAL FEATURES OF DARWIN LATEST FEATURES OF DARWIN STRENGTHS & LIMITATIONS OF DARWIN...
INTRODUCTION... 2 WHAT IS DATA MINING?... 2 HOW TO ACHIEVE DATA MINING... 2 THE ROLE OF DARWIN... 3 FEATURES OF DARWIN... 4 USER FRIENDLY... 4 SCALABILITY... 6 VISUALIZATION... 8 FUNCTIONALITY... 10 Data
More informationAssociation Rule Discovery
Association Rule Discovery Association Rules describe frequent co-occurences in sets an item set is a subset A of all possible items I Example Problems: Which products are frequently bought together by
More informationThe Fuzzy Search for Association Rules with Interestingness Measure
The Fuzzy Search for Association Rules with Interestingness Measure Phaichayon Kongchai, Nittaya Kerdprasop, and Kittisak Kerdprasop Abstract Association rule are important to retailers as a source of
More informationAssociation mining rules
Association mining rules Given a data set, find the items in data that are associated with each other. Association is measured as frequency of occurrence in the same context. Purchasing one product when
More informationQuestion Bank. 4) It is the source of information later delivered to data marts.
Question Bank Year: 2016-2017 Subject Dept: CS Semester: First Subject Name: Data Mining. Q1) What is data warehouse? ANS. A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile
More informationCase Study: SAP BW Data Mining (Association Analysis)
Case Study: SAP BW Data Mining (Association Analysis) Product SAP Netweaver Release 2004s Level Undergraduate Focus BW Data Mining Author Paul Hawking Robert Jovanovic Version 1.0 MOTIVATION The management
More informationData Mining: Approach Towards The Accuracy Using Teradata!
Data Mining: Approach Towards The Accuracy Using Teradata! Shubhangi Pharande Department of MCA NBNSSOCS,Sinhgad Institute Simantini Nalawade Department of MCA NBNSSOCS,Sinhgad Institute Ajay Nalawade
More informationAssignment 3 User Research Report Document
Assignment 3 User Research Report Document Online Clothing Store By Chris Kazanjian, Loren Smith, Jess Hartig, and Jeremiah Lyons DESCRIPTION OF USERS User Audience Male and Female Ages typically ranging
More informationBig Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1
Big Data Methods Chapter 5: Machine learning Big Data Methods, Chapter 5, Slide 1 5.1 Introduction to machine learning What is machine learning? Concerned with the study and development of algorithms that
More informationResearch on Data Mining and Statistical Analysis Xiaoyao Lu1, a
6th International Conference on Machinery, Materials, Environment, Biotechnology and Computer (MMEBC 2016) Research on Data Mining and Statistical Analysis Xiaoyao Lu1, a 1 School of Statistics and Mathematics
More informationCOMS 4721: Machine Learning for Data Science Lecture 23, 4/20/2017
COMS 4721: Machine Learning for Data Science Lecture 23, 4/20/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University ASSOCIATION ANALYSIS SETUP Many businesses
More information1 Machine Learning System Design
Machine Learning System Design Prioritizing what to work on: Spam classification example Say you want to build a spam classifier Spam messages often have misspelled words We ll have a labeled training
More informationTIM 50 - Business Information Systems
TIM 50 - Business Information Systems Lecture 15 UC Santa Cruz Nov 10, 2016 Class Announcements n Database Assignment 2 posted n Due 11/22 The Database Approach to Data Management The Final Database Design
More informationData Mining & Machine Learning F2.4DN1/F2.9DM1
Data Mining & Machine Learning F2.4DN1/F2.9DM1 Nick Taylor N.K.Taylor@hw.ac.uk Room EM1.62 Data Data Mining - Content Introduction to Data Mining What it is, Who does it and Why Data Warehousing Virtuous
More informationElena Marchiori Free University Amsterdam, Faculty of Science, Department of Mathematics and Computer Science, Amsterdam, The Netherlands
DATA MINING Elena Marchiori Free University Amsterdam, Faculty of Science, Department of Mathematics and Computer Science, Amsterdam, The Netherlands Keywords: Data mining, knowledge discovery in databases,
More informationData Mining. Yi-Cheng Chen ( 陳以錚 ) Dept. of Computer Science & Information Engineering, Tamkang University
Data Mining Yi-Cheng Chen ( 陳以錚 ) Dept. of Computer Science & Information Engineering, Tamkang University Why Mine Data? Commercial Viewpoint Lots of data is being collected and warehoused Web data, e-commerce
More informationApriori Algorithm. 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke
Apriori Algorithm For a given set of transactions, the main aim of Association Rule Mining is to find rules that will predict the occurrence of an item based on the occurrences of the other items in the
More informationOverview. Introduction to Data Warehousing and Business Intelligence. BI Is Important. What is Business Intelligence (BI)?
Introduction to Data Warehousing and Business Intelligence Overview Why Business Intelligence? Data analysis problems Data Warehouse (DW) introduction A tour of the coming DW lectures DW Applications Loosely
More informationMULTI-CLIENT 2017 US INTERCHANGEABLE LENS CAMERA MARKET STUDY. Consumer Imaging Behaviors and Industry Trends SERVICE AREAS:
SERVICE AREAS: Consumer and Professional Imaging MULTI-CLIENT 2017 US INTERCHANGEABLE LENS CAMERA MARKET STUDY Consumer Imaging Behaviors and Industry Trends SEPTEMBER 2017 contents Table of Contents Executive
More informationStudy on the Application Analysis and Future Development of Data Mining Technology
Study on the Application Analysis and Future Development of Data Mining Technology Ge ZHU 1, Feng LIN 2,* 1 Department of Information Science and Technology, Heilongjiang University, Harbin 150080, China
More informationManagement Information Systems MANAGING THE DIGITAL FIRM, 12 TH EDITION FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT
MANAGING THE DIGITAL FIRM, 12 TH EDITION Chapter 6 FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT VIDEO CASES Case 1: Maruti Suzuki Business Intelligence and Enterprise Databases
More informationPincer-Search: An Efficient Algorithm. for Discovering the Maximum Frequent Set
Pincer-Search: An Efficient Algorithm for Discovering the Maximum Frequent Set Dao-I Lin Telcordia Technologies, Inc. Zvi M. Kedem New York University July 15, 1999 Abstract Discovering frequent itemsets
More information