Defining a Data Mining Task. CSE3212 Data Mining. What to be mined? Or the Approaches. Task-relevant Data. Estimation.
|
|
- Tracey Parks
- 5 years ago
- Views:
Transcription
1 CSE3212 Data Mining Data Mining Approaches Defining a Data Mining Task To define a data mining task, one needs to answer the following questions: 1. What data set do I want to mine? 2. What kind of knowledge do I want to mine? 3. What background knowledge could be useful? 4. How do I measure if the results are interesting? 5. How do I display what I have discovered? Task-relevant Data Generally we wish to mine only a subset of a database, not the whole database. It may be that we only want to study something specific e.g. trends in postgraduate students countries they come from; degree program they are doing; their age; time (duration) that they taken to finish the degree; and Have they been awarded scholarship? Building the database subset may be a subtask before data mining can be done. What to be mined? Or the Approaches What kind of knowledge we are after? Classification Estimation Prediction Clustering Description Affinity Grouping Outliers Classification Classification involves considering the features of some object then assigning it it to some pre-defined class, for example: Spotting fraudulent insurance claims Which phone numbers are fax numbers Which customers are high-value The features that are considered are known as the independent attributes or variables while the attribute that constitute the pre-defined classes is called as the dependent attribute/variable. First build a model based on the known data and use the model to classify other data for which the class label is not known known as supervised learning Estimation Estimation deals with numerically valued outcomes rather than discrete categories as occurs in classification. Estimating the number of children in a family Estimating family income
2 Prediction Essentially the same as classification and estimation but involves future behaviour Historical data is used to build a model explaining behaviour (outputs) for known inputs The model developed is then applied to current inputs to predict future outputs Predict which customers will respond to a promotion Classifying loan applications Clustering Clustering is also sometimes referred to as segmentation (though this has other meanings in other fields) In clustering there are no pre-defined classes. Selfsimilarity is used to group records. The user must attach meaning to the clusters formed Clustering often precedes some other data mining task, for example: once customers are separated into clusters, a promotion might be carried out based on market basket analysis of the resulting cluster Known as un-supervised learning Description A good description of data can provide understanding of behaviour The description of the behaviour can suggest an explanation for it as well Statistical measures can be useful in describing data, as can techniques that generate rules Deviation Detection Records whose attributes deviate from the norm by significant amounts are also called outliers Application areas include: fraud detection quality control tracing defects. Visualization techniques and statistical techniques are useful in finding outliers A cluster which contains only a few records may in fact represent outliers Affinity Grouping Affinity grouping is also referred to as Market Basket Analysis A common example is the discovery of which items are frequently sold together at a supermarket. If this is known, decisions can be made about: arranging items on shelves which items should be promoted together which items should not simultaneously be discounted Rule Body Market Basket Analysis Confidence When a customer buys a shirt, in 70% of cases, he or she will also buy a tie! We find this happens in 13.5% of all purchases. Rule Head Support
3 The Usefulness of Market Basket Analysis Some rules are useful: Unknown, unexpected and indicative of some action to take. Some rules are trivial: Known by anyone familiar with the business. Some rules are inexplicable: Seem to have no explanation and do not suggest a course of action. The key to success in business is to know something that nobody else knows Aristotle Onassis Co-Occurrence Table Customer Items 1 orange juice (OJ), cola 2 milk, orange juice, window cleaner 3 orange juice, detergent 4 orange juice, detergent, cola 5 window cleaner, cola OJ Cleaner Milk Cola Detergent OJ Cleaner Milk Cola Detergent From the Co-Occurrence Table We can say that people who buys Orange Juice also will buy Cola ( or detergent). orange juice cola This association rule is satisfied by 2 out of 5 customers ( 1 and 4) hence support is 2/5 = 40% However, there are four customers (1,2,3 and 4) have purchased orange juice and hence the confidence of the above rule is only 2/4 = 50% Question: Are support and confidence measures good enough? The rule has one item (or attribute) on the left hand side and the right hand side. How do you find rules which has more than one items on the left hand side (multi-attribute rule) Support and Confidence Support: Percentage of transactions from a transaction database that the given rule satisfies. This can be taken as the probability P(X Y) where X Y indicates that a transaction contains both X and Y, that is union of item sets X and Y. Confidence: Which assess the degree of certainty of the detected association. This can be taken as the conditional probability P(Y X), that is, the probability that a transaction containing X also contains Y. More formally Support (X Y ) = P (X Y) Confidence (X Y) = P (Y X) What is a Rule? If condition then result Note: If nappies and Thursday then beer is usually better than (in the sense that it is more actionable) If Thursday then nappies and beer because it has just one item in the result Is the Rule a Useful Predictor? - 1 Confidence is the ratio of the number of transactions with all the items in the rule to the number of transactions with just the items in the condition. Consider: if B and C then A If this rule has a confidence of 0.33, it means that when B and C occur in a transaction, there is a 33% chance that A also occurs. If a 3 way combination is the most common, then consider rules with just 1 item in the result, e.g. If A and B, then C If A and C, then B
4 Is the Rule a Useful Predictor? - 2 Consider the following table of probabilities of items and there combinations: Combination Probability A 0.45 B 0.42 C 0.40 A and B 0.25 A and C 0.20 B and C 0.15 A and B and C 0.05 Is the Rule a Useful Predictor? - 3 Now consider the following rules: Rule p(condition) p(condition confidence and result) If A and B then C If A and C then B If B and C then A It is tempting to choose If B and C then A, because it is the most confident (33%) - but there is a problem Is the Rule a Useful Predictor? - 4 This rule is actually worse than just saying that A randomly occurs in the transaction - which happens 45% of the time A measure called improvement indicates whether the rule predicts the result better than just assuming the result in the first place Is the Rule a Useful Predictor? - 5 Improvement measures how much better a rule is at predicting a result than just assuming the result in the first place When improvement > 1, the rule is better at predicting the result than random chance Improvement = p(condition and result) p(condition)p(result) Is the Rule a Useful Predictor? - 6 Consider the improvement for our rules: Rule support confidence improvement If A and B then C If A and C then B If B and C then A If A then B None of the rules with three items shows any improvement - the best rule in the data actually has only two items: if A then B. A predicts the occurrence of B 1.31 times better than chance. Is the Rule a Useful Predictor? - 7 When improvement < 1, negating the result produces a better rule. For example if B and C then not A has a confidence of 0.67 and thus an improvement of 0.67/0.55 = 1.22 Negated rules may not be as useful as the original association rules when it comes to acting on the results
5 Choosing the Right Set of Items Choosing the right level of detail (the creation of classes and a taxonomy) Virtual items may be added to take advantage of information that goes beyond the taxonomy Anonymous versus signed transactions Multi-attribute Rule For 2 items on the left hand side and one item on the right hand side of a rule (e.g. If A and B then C) would require the co-occurrence matrix to be 3-dimensional. How do you visualise three dimensional co-occurrence matrix? What happens for higher dimensions? The Process for Market Basket Analysis An Example A co-occurrence cube would show associations in three dimensions - hard to visualize more We must: Choose the right set of items Generate rules by deciphering the counts in the cooccurrence matrix Overcome the practical limits imposed by many items in large numbers of transactions Consider the following database: Student(sid, name1, dob, country, degree, startsem, address1, telephone, address2, , scholarship,..) Enrolment(sid, subject-id, mark, tutegroup, tutor,..) Subject(sub-id, name, school-id, whenstarted, lecturer,..) School(name, id,..) Not all of this data is needed for decision making. Let us extract some data from this database Example We could look at the information as yob X country X degree X startsem X numsubjects X scholarship In fact it is natural to think of an enterprise data as multidimensional. yob, country, degree, startsem, numsubjects, scholarship 1965, Thailand, MIT, 991, 5, 25% 1970, Canada, BIT, 992, 4, , Australia, LLB, 993, 3, 30% 1966, Australia, LLB, 983, 4, 40% 1972, Australia, Bcom, 973, 5, 10% 1972, India, BIT/Bcom, 991, 5, 10% 1982, Sweden, MSc(IT), 991, 3, 10% Is this information useful for decision making? Not really!
6 Example Example The university management may be interested in retrieving information like: How many students are doing BIT? How many students from Thailand? How many students started in 1998? (queries involving only one variable) How many students doing BIT are from Thailand? How many MIT students started in 981? How many students from Thailand started in 993? (queries involving two variables) How many students doing MIT from Thailand started in 981? (query involving three variables) Special type of database systems, called data cube systems, are often used for answering such queries The example queries discussed earlier may be represented by a three-dimensional data cube with each edge representing one of the variables viz. startsem, country, and degree. A point inside the cube is an intersection of the coordinates defined by the edges of the cube. The coordinates of the point define the meaning of the data at that point. Let us look at a simple two-dimensional situation: country X degree For decision making this may be useful information. If we had a 2-dimensional matrix then we could find out the number of students for any country (x) and any degree (y) But in the two-dimensional situation, we don t just want to find out the number of students for any country (x) and any degree (y). We may have many other queries e.g. 1. How many students are doing MIT? 2. How many students from Thailand? 3. How many Asian students doing Law degrees? Thus there is kind of hierarchy that we wish to use, for example, the world, the continents, the regions, the countries etc. In degrees, we may want a hierarchy of university, Schools, UG and PG, individual degrees. Consider a slightly more complex situation in which we have three dimensions: country X degree X startsem for any country (x), any degree (y) and any start semester (z). We may now look at this information as a 3- dimensional cube as shown on the following slide
7 A Sample Number of students as a function of country, degree and semester degree Dimensions: country, degree, sem Hierarchical summarization paths continent school Year LLB BComp MIT Sum degree semester sum Total enrolments U.S.A Malaysia Australia Country country region ug/pg country degree semester sum semester Each edge of the cube is called a dimension. A user normally has a number of different dimensions from which the given data may be analyzed. A user therefore has a multidimensional conceptual view of the data which is represented by the cube. The points inside a cube provide aggregations. For example, a point may provide the number of students from Malaysia admitted to BComp in year Strengths and Weaknesses Strengths Clear understandable results Supports undirected data mining Works on variable length data Is simple to understand Weaknesses Requires exponentially more computational effort as the problem size grows Suits items in transactions but not all problems fit this description It can be difficult to determine the right set of items to analysis It does not handle rare items well; simply considering the level of support will exclude these items We need an algorithm to find the association rules Outlier Analysis Outlier analysis identifies data objects that do not comply with the general behaviour or model of the data. Often outliers are ignored but in applications like fraud detection the outliers are the objects of interest 2.41
CS377: Database Systems Data Warehouse and Data Mining. Li Xiong Department of Mathematics and Computer Science Emory University
CS377: Database Systems Data Warehouse and Data Mining Li Xiong Department of Mathematics and Computer Science Emory University 1 1960s: Evolution of Database Technology Data collection, database creation,
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 1 1 Acknowledgement Several Slides in this presentation are taken from course slides provided by Han and Kimber (Data Mining Concepts and Techniques) and Tan,
More informationThanks to the advances of data processing technologies, a lot of data can be collected and stored in databases efficiently New challenges: with a
Data Mining and Information Retrieval Introduction to Data Mining Why Data Mining? Thanks to the advances of data processing technologies, a lot of data can be collected and stored in databases efficiently
More informationData warehouse and Data Mining
Data warehouse and Data Mining Lecture No. 14 Data Mining and its techniques Naeem A. Mahoto Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology
More informationSupervised and Unsupervised Learning (II)
Supervised and Unsupervised Learning (II) Yong Zheng Center for Web Intelligence DePaul University, Chicago IPD 346 - Data Science for Business Program DePaul University, Chicago, USA Intro: Supervised
More informationData Mining Concepts
Data Mining Concepts Outline Data Mining Data Warehousing Knowledge Discovery in Databases (KDD) Goals of Data Mining and Knowledge Discovery Association Rules Additional Data Mining Algorithms Sequential
More informationChapter 28. Outline. Definitions of Data Mining. Data Mining Concepts
Chapter 28 Data Mining Concepts Outline Data Mining Data Warehousing Knowledge Discovery in Databases (KDD) Goals of Data Mining and Knowledge Discovery Association Rules Additional Data Mining Algorithms
More informationSCHEME OF COURSE WORK. Data Warehousing and Data mining
SCHEME OF COURSE WORK Course Details: Course Title Course Code Program: Specialization: Semester Prerequisites Department of Information Technology Data Warehousing and Data mining : 15CT1132 : B.TECH
More informationPESIT- Bangalore South Campus Hosur Road (1km Before Electronic city) Bangalore
Data Warehousing Data Mining (17MCA442) 1. GENERAL INFORMATION: PESIT- Bangalore South Campus Hosur Road (1km Before Electronic city) Bangalore 560 100 Department of MCA COURSE INFORMATION SHEET Academic
More informationCOMP 465 Special Topics: Data Mining
COMP 465 Special Topics: Data Mining Introduction & Course Overview 1 Course Page & Class Schedule http://cs.rhodes.edu/welshc/comp465_s15/ What s there? Course info Course schedule Lecture media (slides,
More informationANU MLSS 2010: Data Mining. Part 2: Association rule mining
ANU MLSS 2010: Data Mining Part 2: Association rule mining Lecture outline What is association mining? Market basket analysis and association rule examples Basic concepts and formalism Basic rule measurements
More informationOracle9i Data Mining. Data Sheet August 2002
Oracle9i Data Mining Data Sheet August 2002 Oracle9i Data Mining enables companies to build integrated business intelligence applications. Using data mining functionality embedded in the Oracle9i Database,
More informationThis tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.
About the Tutorial Data Mining is defined as the procedure of extracting information from huge sets of data. In other words, we can say that data mining is mining knowledge from data. The tutorial starts
More informationChapter 3: Data Mining:
Chapter 3: Data Mining: 3.1 What is Data Mining? Data Mining is the process of automatically discovering useful information in large repository. Why do we need Data mining? Conventional database systems
More informationDATA MINING TRANSACTION
DATA MINING Data Mining is the process of extracting patterns from data. Data mining is seen as an increasingly important tool by modern business to transform data into an informational advantage. It is
More informationJarek Szlichta
Jarek Szlichta http://data.science.uoit.ca/ Approximate terminology, though there is some overlap: Data(base) operations Executing specific operations or queries over data Data mining Looking for patterns
More informationQuestion Bank. 4) It is the source of information later delivered to data marts.
Question Bank Year: 2016-2017 Subject Dept: CS Semester: First Subject Name: Data Mining. Q1) What is data warehouse? ANS. A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile
More informationInternational Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani
LINK MINING PROCESS Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani Higher Colleges of Technology, United Arab Emirates ABSTRACT Many data mining and knowledge discovery methodologies and process models
More informationData Mining Course Overview
Data Mining Course Overview 1 Data Mining Overview Understanding Data Classification: Decision Trees and Bayesian classifiers, ANN, SVM Association Rules Mining: APriori, FP-growth Clustering: Hierarchical
More informationCOMP90049 Knowledge Technologies
COMP90049 Knowledge Technologies Data Mining (Lecture Set 3) 2017 Rao Kotagiri Department of Computing and Information Systems The Melbourne School of Engineering Some of slides are derived from Prof Vipin
More informationKnowledge Engineering and Data Mining. Knowledge engineering has 6 basic phases:
Knowledge Engineering and Data Mining Knowledge Engineering The process of building intelligent knowledge based systems is called knowledge engineering Knowledge engineering has 6 basic phases: 1. Problem
More informationDATA WAREHOUSING IN LIBRARIES FOR MANAGING DATABASE
DATA WAREHOUSING IN LIBRARIES FOR MANAGING DATABASE Dr. Kirti Singh, Librarian, SSD Women s Institute of Technology, Bathinda Abstract: Major libraries have large collections and circulation. Managing
More informationChapter 4 Data Mining A Short Introduction
Chapter 4 Data Mining A Short Introduction Data Mining - 1 1 Today's Question 1. Data Mining Overview 2. Association Rule Mining 3. Clustering 4. Classification Data Mining - 2 2 1. Data Mining Overview
More informationData warehouses Decision support The multidimensional model OLAP queries
Data warehouses Decision support The multidimensional model OLAP queries Traditional DBMSs are used by organizations for maintaining data to record day to day operations On-line Transaction Processing
More informationKNOWLEDGE DISCOVERY AND DATA MINING
KNOWLEDGE DISCOVERY AND DATA MINING Prof. Fabio A. Schreiber Dipartimento di Elettronica e Informazione Politecnico di Milano INFORMATION MANAGEMENT TECHNOLOGIES DATA WAREHOUSE DECISION SUPPORT SYSTEMS
More informationData Mining. Ryan Benton Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, La., USA.
Data Mining Ryan Benton Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, La., USA January 13, 2011 Important Note! This presentation was obtained from Dr. Vijay Raghavan
More informationEvent: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect
Event: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect BEOP.CTO.TP4 Owner: OCTO Revision: 0001 Approved by: JAT Effective: 08/30/2018 Buchanan & Edwards Proprietary: Printed copies of
More information1. Inroduction to Data Mininig
1. Inroduction to Data Mininig 1.1 Introduction Universe of Data Information Technology has grown in various directions in the recent years. One natural evolutionary path has been the development of the
More informationGUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV
GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV Subject Name: Elective I Data Warehousing & Data Mining (DWDM) Subject Code: 2640005 Learning Objectives: To understand
More informationINTRODUCTION TO DATA MINING. Daniel Rodríguez, University of Alcalá
INTRODUCTION TO DATA MINING Daniel Rodríguez, University of Alcalá Outline Knowledge Discovery in Datasets Model Representation Types of models Supervised Unsupervised Evaluation (Acknowledgement: Jesús
More informationData Mining Clustering
Data Mining Clustering Jingpeng Li 1 of 34 Supervised Learning F(x): true function (usually not known) D: training sample (x, F(x)) 57,M,195,0,125,95,39,25,0,1,0,0,0,1,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0 0
More informationINSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad
INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad - 500 043 INFORMATION TECHNOLOGY DEFINITIONS AND TERMINOLOGY Course Name : DATA WAREHOUSING AND DATA MINING Course Code : AIT006 Program
More informationOutline. Project Update Data Mining: Answers without Queries. Principles of Information and Database Management 198:336 Week 12 Apr 25 Matthew Stone
Outline Principles of Information and Database Management 198:336 Week 12 Apr 25 Matthew Stone Project Update Data Mining: Answers without Queries Patterns and statistics Finding frequent item sets Classification
More informationLecture 18. Business Intelligence and Data Warehousing. 1:M Normalization. M:M Normalization 11/1/2017. Topics Covered
Lecture 18 Business Intelligence and Data Warehousing BDIS 6.2 BSAD 141 Dave Novak Topics Covered Test # Review What is Business Intelligence? How can an organization be data rich and information poor?
More informationTIM 50 - Business Information Systems
TIM 50 - Business Information Systems Lecture 15 UC Santa Cruz Nov 10, 2016 Class Announcements n Database Assignment 2 posted n Due 11/22 The Database Approach to Data Management The Final Database Design
More informationChapter 1, Introduction
CSI 4352, Introduction to Data Mining Chapter 1, Introduction Young-Rae Cho Associate Professor Department of Computer Science Baylor University What is Data Mining? Definition Knowledge Discovery from
More informationISM 50 - Business Information Systems
ISM 50 - Business Information Systems Lecture 17 Instructor: Magdalini Eirinaki UC Santa Cruz May 29, 2007 Announcements News Folio #3 DUE Thursday 5/31 Database Assignment DUE Tuesday 6/5 Business Paper
More informationKnowledge Discovery & Data Mining
Announcements ISM 50 - Business Information Systems Lecture 17 Instructor: Magdalini Eirinaki UC Santa Cruz May 29, 2007 News Folio #3 DUE Thursday 5/31 Database Assignment DUE Tuesday 6/5 Business Paper
More informationYunfeng Zhang 1, Huan Wang 2, Jie Zhu 1 1 Computer Science & Engineering Department, North China Institute of Aerospace
[Type text] [Type text] [Type text] ISSN : 0974-7435 Volume 10 Issue 20 BioTechnology 2014 An Indian Journal FULL PAPER BTAIJ, 10(20), 2014 [12526-12531] Exploration on the data mining system construction
More informationTIM 50 - Business Information Systems
TIM 50 - Business Information Systems Lecture 15 UC Santa Cruz May 20, 2014 Announcements DB 2 Due Tuesday Next Week The Database Approach to Data Management Database: Collection of related files containing
More informationLecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R,
Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R, statistics foundations 5 Introduction to D3, visual analytics
More informationData Mining and Analytics. Introduction
Data Mining and Analytics Introduction Data Mining Data mining refers to extracting or mining knowledge from large amounts of data It is also termed as Knowledge Discovery from Data (KDD) Mostly, data
More informationCse352 Artifficial Intelligence Short Review for Midterm. Professor Anita Wasilewska Computer Science Department Stony Brook University
Cse352 Artifficial Intelligence Short Review for Midterm Professor Anita Wasilewska Computer Science Department Stony Brook University Midterm Midterm INCLUDES CLASSIFICATION CLASSIFOCATION by Decision
More informationData mining overview. Data Mining. Data mining overview. Data mining overview. Data mining overview. Data mining overview 3/24/2014
Data Mining Data mining processes What technological infrastructure is required? Data mining is a system of searching through large amounts of data for patterns. It is a relatively new concept which is
More informationMining Association Rules in Large Databases
Mining Association Rules in Large Databases Vladimir Estivill-Castro School of Computing and Information Technology With contributions fromj. Han 1 Association Rule Mining A typical example is market basket
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Computer Science 591Y Department of Computer Science University of Massachusetts Amherst February 3, 2005 Topics Tasks (Definition, example, and notes) Classification
More informationAnalytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.
Glossary of data mining terms: Accuracy Accuracy is an important factor in assessing the success of data mining. When applied to data, accuracy refers to the rate of correct values in the data. When applied
More informationCS513-Data Mining. Lecture 2: Understanding the Data. Waheed Noor
CS513-Data Mining Lecture 2: Understanding the Data Waheed Noor Computer Science and Information Technology, University of Balochistan, Quetta, Pakistan Waheed Noor (CS&IT, UoB, Quetta) CS513-Data Mining
More informationCse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University
Cse634 DATA MINING TEST REVIEW Professor Anita Wasilewska Computer Science Department Stony Brook University Preprocessing stage Preprocessing: includes all the operations that have to be performed before
More informationCS352 Lecture - Decision Support Systems, Data Mining
CS352 Lecture - Decision Support Systems, Data Mining Objectives: 1. To introduce basic OLAP concepts (cubes, rollups, ranking) 2. To introduce the notion of a data warehouse 3. To introduce the notion
More informationHigh dim. data. Graph data. Infinite data. Machine learning. Apps. Locality sensitive hashing. Filtering data streams.
http://www.mmds.org High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Network Analysis
More informationData Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation
Data Mining Part 2. Data Understanding and Preparation 2.4 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Normalization Attribute Construction Aggregation Attribute Subset Selection Discretization
More informationWinter Semester 2009/10 Free University of Bozen, Bolzano
Data Warehousing and Data Mining Winter Semester 2009/10 Free University of Bozen, Bolzano DW Lecturer: Johann Gamper gamper@inf.unibz.it DM Lecturer: Mouna Kacimi mouna.kacimi@unibz.it http://www.inf.unibz.it/dis/teaching/dwdm/index.html
More informationAn Introduction to Data Mining BY:GAGAN DEEP KAUSHAL
An Introduction to Data Mining BY:GAGAN DEEP KAUSHAL Trends leading to Data Flood More data is generated: Bank, telecom, other business transactions... Scientific Data: astronomy, biology, etc Web, text,
More informationData Mining and Business Process Management of Apriori Algorithm
IOP Conference Series: Materials Science and Engineering PAPER OPEN ACCESS Data Mining and Business Process Management of Apriori Algorithm To cite this article: Qiu Suzhen 2018 IOP Conf. Ser.: Mater.
More informationSummary of Last Chapter. Course Content. Chapter 3 Objectives. Chapter 3: Data Preprocessing. Dr. Osmar R. Zaïane. University of Alberta 4
Principles of Knowledge Discovery in Data Fall 2004 Chapter 3: Data Preprocessing Dr. Osmar R. Zaïane University of Alberta Summary of Last Chapter What is a data warehouse and what is it for? What is
More informationFig 1.2: Relationship between DW, ODS and OLTP Systems
1.4 DATA WAREHOUSES Data warehousing is a process for assembling and managing data from various sources for the purpose of gaining a single detailed view of an enterprise. Although there are several definitions
More informationPattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42
Pattern Mining Knowledge Discovery and Data Mining 1 Roman Kern KTI, TU Graz 2016-01-14 Roman Kern (KTI, TU Graz) Pattern Mining 2016-01-14 1 / 42 Outline 1 Introduction 2 Apriori Algorithm 3 FP-Growth
More informationData Mining: Data. Lecture Notes for Chapter 2. Introduction to Data Mining
Data Mining: Data Lecture Notes for Chapter 2 Introduction to Data Mining by Tan, Steinbach, Kumar Data Preprocessing Aggregation Sampling Dimensionality Reduction Feature subset selection Feature creation
More informationIntroduction to Data Mining
Introduction to JULY 2011 Afsaneh Yazdani What motivated? Wide availability of huge amounts of data and the imminent need for turning such data into useful information and knowledge What motivated? Data
More informationJeffrey D. Ullman Stanford University
Jeffrey D. Ullman Stanford University A large set of items, e.g., things sold in a supermarket. A large set of baskets, each of which is a small set of the items, e.g., the things one customer buys on
More informationAn Introduction to Data Mining
An Introduction to Data Mining Hossein Hakimzadeh Computer and Information Sciences Data Mining (B561) 1 What Is Data Mining? Original Definition: "data mining" was a statistician's term for overusing
More informationData Management Lecture Outline 2 Part 2. Instructor: Trevor Nadeau
Data Management Lecture Outline 2 Part 2 Instructor: Trevor Nadeau Data Entities, Attributes, and Items Entity: Things we store information about. (i.e. persons, places, objects, events, etc.) Have relationships
More informationExploratory Analysis: Clustering
Exploratory Analysis: Clustering (some material taken or adapted from slides by Hinrich Schutze) Heejun Kim June 26, 2018 Clustering objective Grouping documents or instances into subsets or clusters Documents
More informationIntroduction to Data Mining S L I D E S B Y : S H R E E J A S W A L
Introduction to Data Mining S L I D E S B Y : S H R E E J A S W A L Books 2 Which Chapter from which Text Book? Chapter 1: Introduction from Han, Kamber, "Data Mining Concepts and Techniques", Morgan Kaufmann
More informationOracle9i Data Mining. An Oracle White Paper December 2001
Oracle9i Data Mining An Oracle White Paper December 2001 Oracle9i Data Mining Benefits and Uses of Data Mining... 2 What Is Data Mining?... 3 Data Mining Concepts... 4 Using the Past to Predict the Future...
More informationCMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10)
CMPUT 391 Database Management Systems Data Mining Textbook: Chapter 17.7-17.11 (without 17.10) University of Alberta 1 Overview Motivation KDD and Data Mining Association Rules Clustering Classification
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 1/8/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 2 Supermarket shelf
More informationCSE 255 Lecture 6. Data Mining and Predictive Analytics. Community Detection
CSE 255 Lecture 6 Data Mining and Predictive Analytics Community Detection Dimensionality reduction Goal: take high-dimensional data, and describe it compactly using a small number of dimensions Assumption:
More informationValue Added Association Rules
Value Added Association Rules T.Y. Lin San Jose State University drlin@sjsu.edu Glossary Association Rule Mining A Association Rule Mining is an exploratory learning task to discover some hidden, dependency
More informationProbabilistic Abstraction Lattices: A Computationally Efficient Model for Conditional Probability Estimation
Probabilistic Abstraction Lattices: A Computationally Efficient Model for Conditional Probability Estimation Daniel Lowd January 14, 2004 1 Introduction Probabilistic models have shown increasing popularity
More informationMetaData for Database Mining
MetaData for Database Mining John Cleary, Geoffrey Holmes, Sally Jo Cunningham, and Ian H. Witten Department of Computer Science University of Waikato Hamilton, New Zealand. Abstract: At present, a machine
More informationData mining, 4 cu Lecture 6:
582364 Data mining, 4 cu Lecture 6: Quantitative association rules Multi-level association rules Spring 2010 Lecturer: Juho Rousu Teaching assistant: Taru Itäpelto Data mining, Spring 2010 (Slides adapted
More information10701 Machine Learning. Clustering
171 Machine Learning Clustering What is Clustering? Organizing data into clusters such that there is high intra-cluster similarity low inter-cluster similarity Informally, finding natural groupings among
More informationLectures for the course: Data Warehousing and Data Mining (IT 60107)
Lectures for the course: Data Warehousing and Data Mining (IT 60107) Week 1 Lecture 1 21/07/2011 Introduction to the course Pre-requisite Expectations Evaluation Guideline Term Paper and Term Project Guideline
More informationAssociation Rule Mining. Entscheidungsunterstützungssysteme
Association Rule Mining Entscheidungsunterstützungssysteme Frequent Pattern Analysis Frequent pattern: a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set
More informationCluster Analysis. CSE634 Data Mining
Cluster Analysis CSE634 Data Mining Agenda Introduction Clustering Requirements Data Representation Partitioning Methods K-Means Clustering K-Medoids Clustering Constrained K-Means clustering Introduction
More informationInfrequent Weighted Itemset Mining Using SVM Classifier in Transaction Dataset
Infrequent Weighted Itemset Mining Using SVM Classifier in Transaction Dataset M.Hamsathvani 1, D.Rajeswari 2 M.E, R.Kalaiselvi 3 1 PG Scholar(M.E), Angel College of Engineering and Technology, Tiruppur,
More informationA case study to introduce Microsoft Data Mining in the database course
A case study to introduce Microsoft Data Mining in the database course ABSTRACT Mohammad Dadashzadeh Oakland University The content of the database management systems course in the business curriculum
More informationAssociation rule mining
Association rule mining Association rule induction: Originally designed for market basket analysis. Aims at finding patterns in the shopping behavior of customers of supermarkets, mail-order companies,
More informationPSS718 - Data Mining
Lecture 5 - Hacettepe University October 23, 2016 Data Issues Improving the performance of a model To improve the performance of a model, we mostly improve the data Source additional data Clean up the
More informationCase Study: SAP BW Data Mining (Association Analysis)
Case Study: SAP BW Data Mining (Association Analysis) Product SAP Netweaver Release 2004s Level Undergraduate Focus BW Data Mining Author Paul Hawking Robert Jovanovic Version 1.0 MOTIVATION The management
More informationComputational Systems COMP1209
Computational Systems COMP1209 Testing Yvonne Howard ymh@ecs.soton.ac.uk A Problem A café wants to build an automated system to provide breakfasts. The robot waiter greets people before taking their order
More informationLesson 3: Building a Market Basket Scenario (Intermediate Data Mining Tutorial)
From this diagram, you can see that the aggregated mining model preserves the overall range and trends in values while minimizing the fluctuations in the individual data series. Conclusion You have learned
More informationData Preprocessing. Why Data Preprocessing? MIT-652 Data Mining Applications. Chapter 3: Data Preprocessing. Multi-Dimensional Measure of Data Quality
Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate data e.g., occupation = noisy: containing
More informationIntro to Artificial Intelligence
Intro to Artificial Intelligence Ahmed Sallam { Lecture 5: Machine Learning ://. } ://.. 2 Review Probabilistic inference Enumeration Approximate inference 3 Today What is machine learning? Supervised
More informationCHAPTER 8: ONLINE ANALYTICAL PROCESSING(OLAP)
CHAPTER 8: ONLINE ANALYTICAL PROCESSING(OLAP) INTRODUCTION A dimension is an attribute within a multidimensional model consisting of a list of values (called members). A fact is defined by a combination
More informationOptimization of Query Processing in XML Document Using Association and Path Based Indexing
Optimization of Query Processing in XML Document Using Association and Path Based Indexing D.Karthiga 1, S.Gunasekaran 2 Student,Dept. of CSE, V.S.B Engineering College, TamilNadu, India 1 Assistant Professor,Dept.
More informationHierarchical Clustering 4/5/17
Hierarchical Clustering 4/5/17 Hypothesis Space Continuous inputs Output is a binary tree with data points as leaves. Useful for explaining the training data. Not useful for making new predictions. Direction
More informationData Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395
Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 21 Table of contents 1 Introduction 2 Data mining
More informationStudy on the Application Analysis and Future Development of Data Mining Technology
Study on the Application Analysis and Future Development of Data Mining Technology Ge ZHU 1, Feng LIN 2,* 1 Department of Information Science and Technology, Heilongjiang University, Harbin 150080, China
More informationOLAP Introduction and Overview
1 CHAPTER 1 OLAP Introduction and Overview What Is OLAP? 1 Data Storage and Access 1 Benefits of OLAP 2 What Is a Cube? 2 Understanding the Cube Structure 3 What Is SAS OLAP Server? 3 About Cube Metadata
More informationData Mining Concepts & Techniques
Data Mining Concepts & Techniques Lecture No. 01 Databases, Data warehouse Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro
More informationKnowledge Modelling and Management. Part B (9)
Knowledge Modelling and Management Part B (9) Yun-Heh Chen-Burger http://www.aiai.ed.ac.uk/~jessicac/project/kmm 1 A Brief Introduction to Business Intelligence 2 What is Business Intelligence? Business
More informationCSE 158 Lecture 6. Web Mining and Recommender Systems. Community Detection
CSE 158 Lecture 6 Web Mining and Recommender Systems Community Detection Dimensionality reduction Goal: take high-dimensional data, and describe it compactly using a small number of dimensions Assumption:
More informationCSE 6242/CX Ensemble Methods. Or, Model Combination. Based on lecture by Parikshit Ram
CSE 6242/CX 4242 Ensemble Methods Or, Model Combination Based on lecture by Parikshit Ram Numerous Possible Classifiers! Classifier Training time Cross validation Testing time Accuracy knn classifier None
More informationWeka ( )
Weka ( http://www.cs.waikato.ac.nz/ml/weka/ ) The phases in which classifier s design can be divided are reflected in WEKA s Explorer structure: Data pre-processing (filtering) and representation Supervised
More informationINFREQUENT WEIGHTED ITEM SET MINING USING NODE SET BASED ALGORITHM
INFREQUENT WEIGHTED ITEM SET MINING USING NODE SET BASED ALGORITHM G.Amlu #1 S.Chandralekha #2 and PraveenKumar *1 # B.Tech, Information Technology, Anand Institute of Higher Technology, Chennai, India
More informationAn Experiment in Visual Clustering Using Star Glyph Displays
An Experiment in Visual Clustering Using Star Glyph Displays by Hanna Kazhamiaka A Research Paper presented to the University of Waterloo in partial fulfillment of the requirements for the degree of Master
More informationMeaning & Concepts of Databases
27 th August 2015 Unit 1 Objective Meaning & Concepts of Databases Learning outcome Students will appreciate conceptual development of Databases Section 1: What is a Database & Applications Section 2:
More information