Data Mining. Outline. Motivation. Data Mining. Fraud division, some large telephone company: Sharma Chakravarthy IT Laboratory and CSE Department

Size: px
Start display at page:

Download "Data Mining. Outline. Motivation. Data Mining. Fraud division, some large telephone company: Sharma Chakravarthy IT Laboratory and CSE Department"

Transcription

1 Outline Data Mining Sharma Chakravarthy IT Laboratory and CSE Department The University of Texas at Arlington Overview Association rules Will try to discuss 3 papers Database mining Association rule and graph mining Graph mining Overview One or 2 approaches classification application 2 Data Mining Motivation Fraud division, some large telephone company: The key in business is to know something that nobody else knows (Aristotle Onassis) How do we find these guys? There are 10 billion records on 10 million customers in the main database. With all this information we have about our customers and all the calls they make, can t you just ask the database to figure out which lines have been set-up temporarily and exhibited similar calling patterns in the same time periods? The information is in there, I just know it 3 4 1

2 Problem Another Example Find-similar problem just described is hard e.g., What products need to be improved? e.g., Which books won t be checked out and can be taken off the shelves? Why? Massive amounts of data More and more online data stores (e.g., Web, corporate databases, etc.) No easy way to describe what to look for Traditional, interactive approaches fail Size of data, different purposes Marketing cellular phones Churn is too high Turnover after the initial contract is too high What is a good strategy Giving new phone to everyone is too expensive (and wasteful) Bringing back customers after they leave is very difficult 5 6 What to do Data Mining A few months before the contract expires, if one can predict which customers are likely to quit, Give incentive to those who are likely to quit Don t do anything for those who are NOT likely to quit How do I predict future behavior? Corporate Palm reading! Human intuition!! Data mining (DM) or knowledge discovery (KDD) Data Mining (DM) is part of the knowledge discovery process carried out to extract valid patterns and relationships in very large data sets Usually don t know what to look for, like a voyage into the unknown Regarded as unsupervised learning from basic facts (axioms) and data Roots in AI and statistics Uses techniques from machine learning, pattern recognition, statistics, database, visualization, etc

3 Another Definition Data Mining Data mining is the iterative and interactive process of discovering valid, novel, useful, previously unknown, and understandable patterns or models in Massive data sets Constituents of Data Mining? There is an element of discovery. What is discovered may be counter-intuitive even to the expert. Exhaustive scan/processing of the available data Verification of conjecture or hypothesis The nontrivial extraction of implicit, previously unknown, and potentially useful information from data [Brawley et al., 92] 9 10 Characteristics: Data Mining Enablers Automated extraction of predictive information from large data sets Key words: Automated Extraction Predictive Large data sets A methodology is assumed (typically statistical) Reduced cost of storage Reduced cost of processing Ability to store, process, and manage large volumes of data (e.g., DW, Internet) New techniques such as association rules, sequence data processing, text mining However, Scalability, visualization of results, filtering very large outputs are new issues!

4 Data Mining has come about due to Data Mining Convergence of multiple technologies Increase in Computing Power DM Causality and correlation The above two are different! Which one does mining try to identify? Application Of statistical/ Machine learning algorithms Improved data management Drivers AI and Statistics Today, business information (or BI) systems are as important to corporations as transaction systems were earlier Mass personalization and better utilization of data Identify new and profitable markets, and channels to enter them Increase customer loyalty, profitability, life time value Decrease risk If DM is rooted in AI and statistics, what is the need for DM? AI traditionally dealt with small samples The emphasis was an learning, extrapolation, and generalization The emphasis in DM is on processing the actual data, not just samples! DM tries to leverage the data collected, accumulated and derive tangible rules/conclusions (generalization is also possible)

5 Tower of Babel Machine Learning Statistics Pattern recognition Machine learning AI Databases Visualization Observation Analysis Theory Prediction Either the predictions are correct in which case the theory is corroborated, or the predictions are wrong. New theory or exceptions! DM Vs. Machine learning DM Vs. Statistics ML methods form the core of DM Amount of data makes a (big) difference accessing examples can be a problem missing values and incomplete data DM has more modest goals: automating the tedious discovery tasks Similar goals; different methods Amount of data DM as a preliminary stage for statistical analysis Challenge to DM: better ties with statistics

6 Data Mining is NOT Data warehousing Ad hoc query/reporting Online Analytical Processing (OLAP) Data Visualization Agents/mediators, Pervasive computing, What DM is not likely to do! Substitute for human intuition and discovery I don t think a DM system will (ever?) discover e = mc 2 I don t think DM will (ever?) discover PV = RT I don t think DM will (ever?) discover gravity, Newton s law s of motion, It may discover new black holes! The value of pi is data-driven but its intuition is not! Applications DM Applications Vs. DM Customer profiling Find new customers, Market basket analysis Manage inventory Risk analysis Insurance, loan, stock, Text analysis Library, search, Fraud detection CRM, Scientific discovery, forecasting, Problem, goal and task definition (10%) Data Warehousing: data collection and organization (50%) Data Mining: data analysis and knowledge discovery (30%) Decision support / optimization: assess pros and cons, take actions (10%)

7 OLAP Vs. Data Mining DM Vs. DW DW makes DM a lot cheaper DM is one of the reasons for DW OLAP: verification-driven sales in CA Vs. FL in Q1 of 2003 DM: discovery-driven why Microsoft is making so much money?? Will Google be a successful IPO? OLTP and OLAP OLAP is user driven Analyst generates hypothesis, uses OLAP to verify e.g., people with high debt are bad credit risks Data mining tool generates the hypothesis Tool performs exploration e.g., find risk factors for granting credit Discover new patterns that analysts didn t think of e.g., debt-to-income ratio OLAP and DM complement each other How is Data Mining Used? Use data to build a model of the real world (domain of interest) describing patterns and relationships Models are used in two ways Guide business decisions e.g., determine layout of shelves in grocery store Make predictions e.g., what recipients to include on mailing list Not magic, still need to understand data, its semantics, and statistics! Things to keep in mind Misinterpretation of results Statistical significance Dirty data Too much information generated Legality Privacy/Ethics

8 DB Traditional Data Analysis Query Graphics Statistics Reporting,... Data Mining Process Identify necessary data Granularity of each field Choose preprocessing and mining techniques Use tools to complement mining Interpret results Note: this is an iterative process DM Process Data Mining Cycle Assess and transform (DW) Select: reduces cost, increases speed Explore: summarize, Segment, visualize Modify: data filtering, variable selection Model: regression, neural nets, decision trees, associations, sequences Assess (BI) DB Preprocess Rethink Mine Select Transform Analysis

9 A Word About Data Quality Can be tolerant of some noise But may lead to poor or even erroneous results Some common problems Missing fields Outliers or incorrect data Statistical significance Data warehouse integration and cleaning as a prerequisite for data mining Recall the integration process with its cleansing steps... Data Pyramid Visualization / Analysis Data Mining Data Exploration Querying, Statistics,. Data Warehouse / Data Marts Data Sources Types of data analysis DM Approaches Supervised Classification, prediction Clustering Correlation Rules (association rules) Time-series analysis Text classification/filtering Graph Mining Driven by business problems Optimize existing solutions/markets Unsupervised Exploration Relevance Find new markets

10 Predictive Modeling A black box that makes predictions about the future based on information from past and present Models Some models are better than others Understandability Accuracy Range from easy to understand to How do I interpret the results Last few year s sales data Last month s sales data Model Usually Large number of inputs available This month's projection This year s prediction Decision trees Rules knn Regression analysis Neural networks Easier Harder Model details Using a Model 1999 Data Data mining System Sep2000 data Model Nov 2000 prediction Qualitative Gives the analyst an understanding of the rules/classification If 35 < age < 50 then buy expensive cars Now, with all the recession, the above rule may change to If 25 < age < 35 then trade your expensive car to an average car Interaction with the model and visualization

11 Using a Model Model Testing Quantitative Automated process Classification/scoring done periodically (every month, when mailing is done, ) Classification into a finite set Estimate continuous numerical value (e.g., total worth of a customer) Scoring (a probability value) Model Quality Cross Validation E R R O R New data Divide the data into n sets (of equal size) Use set i for validating and build the model using sets 1, 2,, i-1, i+1,, n through n Repeat the above process for i from 1 through n Amount/representativeness of Training Data

12 Application of Statistics Techniques have been waiting for technology to catch up Statisticians have been doing small scale data mining for decades Good data mining is intelligent application of statistical process (+ some new ones) Emphasis on scalability, handle large data sets, interactive capability, visualization, integration with databases Classification Discussion Neural Networks, Decision Trees, knn, Bayes, SVM Clustering K-means, Non-hierarchical and hierarchical Prediction Linear regression, Multi-variate regression Association Rules (market-basket analysis) Apriori algorithm, FP-tree, use of taxonomies Time series analysis, sequence detection Clustering, significant interval discovery, event patterns Text Filtering Topic identification, classification, filtering Choosing the right approach to the domain of interest is an important and difficult task Data Mining Problems Data Mining Models (contd.) Classification Multiple category (large/medium/small) Value prediction Scoring Clustering/Segmentation Association Rule extraction (Market basket analysis) Sequence detection (ordered data, temporal) Graph mining (for applications where structure is important and need to be taken into account) Classification (predicting) Classifies a data item into one of predefined classes Regression and time series analysis (forecasting) Uses series of existing values to forecast what continuous values will be Clustering (description of patterns) Finding clusters that consist of similar records Association analysis and sequence discovery (description of behavior) Discovers rules for describing items that occur together in a given event or record

13 Classification Classify the input records based on the attribute values Training set; class attribute Decision tree classifiers / neural-net classifiers Tree generation SLIQ, SPRINT, CLOUDS, C4.5 Tree pruning MDL Classification A process of building a model from a training set that classifies new data, based upon the attribute values Popular classification models are neural networks, decision trees, and knn Classification models are widely used to solve business problems such as creation of mailing lists for marketing purposes Approach Neural Network Model Examine a collection of cases for which the group they belong to is already known Inductively determine the pattern of attributes or characteristics that identify the group to which each case belongs Pattern can be used to understand the data as well as to predict how new instances will be classified Very loosely based on biology Input transformed via a network of processors Processor combines weighted inputs and produces an output value I1 O1 I

14 Neural Network Neural Network Linear combination of inputs Simple linear regression Linear combination of inputs Classic perceptron I1 I1 O1 O1 I2 I Neural Network Neural Network Non-linear combination of inputs Multi-layer Neural networks Output layer I1 I1 O1 I2 O1 I2 Fully connected Hidden Layer

15 Learning Weights are adjusted by observing errors on output and propagating adjustments back through the network Back propagation Output layer I1 I2 Error O1 Neural Network Issues Difficult to understand Relationship between weights and variables is complicated No intuitive understanding of results Training time Error depends on the sample size, amount of effort in fine-tuning Pre-processing of data often required Fully connected Hidden Layer Decision Trees Decision Tables A Major data mining approach Give one attribute (e.g., wealth), try to predict the value of new people s wealth by means of some of the other available attributes. Applies to categorical outputs Categorical attribute: an attribute which takes on two or more discrete values. Also known as a symbolic attribute. Real attribute: a column of real numbers 1-d table 2-d table 3-d table or cube But the number increases exponentially as the number of attributes increase For 16 attributes, number of 3-d tables is 16 choose 3 or 16*15/2 or 510 For 100 attributes, it is 161,

16 Tid Job Age Self Industry Univ. Self Univ. Industry Salary 30K 40K 70K 60K 70K 60K Class C C 6 Self 35 60K A 7 Self 30 70K A Training Data Set C A B B Classification Example (<=50K) Class (Univ., Industry) Class B Sal (<=40) Job (>50K) Age c (Self) Class A Sample Decision Tree (>40) Class Centralized Decision Tree Induction algorithm Select a random subset of given instances Repeat Build the decision tree to explain the current window Find the exceptions of this decision tree for the remaining instances Form a new window with the exceptions to the decision tree generated from it Until there are no exceptions Selection criteria Types of Decision Trees Entropy/Information gain (Quinlan 1993) Gain ratio (used in C4.5) Gini index (used in CART) MDL (minimum description length) Decision tree can also be seen as nested if/then rules CHAID: Chi-Square Automatic Interaction Detection Kass (1980) n-way splits Categorical Variables CART: Classification and Regression Trees Breimam, Friedman, Olshen, and Stone (1984) Binary splits Continuous Variables C4.5 Quinlan (1993) Also used for rule induction

17 Nearest Neighbor classification 1NN Rule We are concerned with the following problem: we wish to label some observed pattern x with some class category θ. Two possible situations with respect to x and θ may occur: We may have complete statistical knowledge of the distribution of observation x and category θ. In this case, a standard Bayes analysis yields an optimal decision procedure. We may have no knowledge of the distribution of observation x and category θ aside from that provided by pre-classified samples. In this case, a decision to classify x into category θ will depend only on a collection of correctly classified samples. The nearest neighbor rule is concerned with the latter case. Such problems are classified in the domain of non-parametric statistics. No optimal classification procedure exists with respect to all underlying statistics under such conditions The k-nn Rule The k-nn Rule If the number of pre-classified points is large it makes good sense to use, instead of the single nearest neighbor, the majority vote of the nearest k neighbors. This method is referred to as the k-nn rule. The k-nn rule only requires An integer k A set of labeled examples (training data) A metric to measure closeness The number k should be: large to minimize the probability of misclassifying small (with respect to the number of samples) so that the points are close enough to x to give an accurate estimate of the true class of x. Disadvantages Large storage requirements Computationally intensive Highly susceptible to the curse of dimensionality

18 The k-nn Rule Classification 1-NNR versus k-nnr The use of large values of k has two main advantages Yields smoother decision regions Provides probabilistic information The ratio of examples for each class gives information about the ambiguity of the decision However, large values of k are detrimental It destroys the locality of the estimation since farther examples are taken into account In addition, it increases the computational burden Genetic algorithms Rough set approach Fuzzy set approach Prediction Prediction Linear regression is used to make predictions about a single value. Simple linear regression involves discovering the equation for a line that most nearly fits the given data. That linear equation is then used to predict values for the data Example 1: A cost modeler wants to find the prospective cost for a new contract based on the data collected from previous contracts. Example 2: If the university authorities want to predict a student's grade on a freshman college calculus midterm based on his/her SAT score, then they may apply linear regression. Linear regression assumes that the expected value of the output for a given an input, E[y x], is linear. Simplest case: y = c + a*x where a and c can be computed from the data set And can be applied to any new value of x

19 Clustering Very Simple Clustering Algorithm Unsupervised mining Segments a database into different groups Goal is to find groups whose members have two characteristics (notion of similarity) Members in each cluster are as similar as possible Members in different clusters have as few commonalties as possible Unlike classification, don t know what the clusters will be at start, or around which attributes the data will cluster Business analysts will need to analyze clusters 1. Choose k objects at random (or max separation) and make them clusters 2. For each object in data set D, find the nearest cluster and assign the object to the cluster 3. Find the new cluster center for each cluster 4. Repeat steps 2 and 3 until the input is exhausted Problems How to choose k? a huge problem! Meta approaches to choose k (computationally intensive) How to define nearness/similarity? Clustering algorithm Clustering algorithm Typically distance or Euclidean distance (in multidimensional space) is used as the nearness or similarity measure But it can be different (based on the domain) Domain knowledge is important Typically, the square-error criterion is used for convergence of the iterative algorithm Example 1: Setting up ATM machines Euclidean distance is not useful; driving distance between the population and the ATM is more important Example 2: Battlefield movement Obstacles in the path (hills, terrain) needs to be taken into account Computing the distance may not be easy! Domain knowledge is important

20 Clustering More on Clustering A process of maximizing inter-cluster similarities while minimizing intra-cluster similarities Requirements on clustering Minimal requirements of domain knowledge Discovery of clusters with arbitrary shape Good efficiency on large databases Partitioning algorithms (K-means) k-means method: Clusters represented by gravity center k-medoid method: Clusters represented by a central object Hierarchical algorithms (minimal spanning tree algorithm) Agglomerative approach (bottom-up) Divisive approach (top-down) Clustering has been studied in many fields, including sociology, statistics, machine learning, biology Scalability was not a design goal, data assumed to fit in main memory Focus on improving cluster quality Do not scale to large data sets Recently new set of algorithms with greater emphasis on scalability Clustering DM Summary Density-based methods (for noisy data) DBSCAN, OPTICS, DENCLUE Technology (computation speed, fast, cheap, and large storage) has moved many of these approaches into mainstream usage Grid-based method (uses multiresolution grid data structure) STING, CLIQUE Can now be applied to actual data sets instead of samples Click-stream analysis, recommendation systems, web search are all very large data size problems We will also address data filtering/aggregation problem in stream data processing where we deal with large amounts of continuous data

21 Questions??? 81 21

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset. Glossary of data mining terms: Accuracy Accuracy is an important factor in assessing the success of data mining. When applied to data, accuracy refers to the rate of correct values in the data. When applied

More information

What is Data Mining? Data Mining. Data Mining Architecture. Illustrative Applications. Pharmaceutical Industry. Pharmaceutical Industry

What is Data Mining? Data Mining. Data Mining Architecture. Illustrative Applications. Pharmaceutical Industry. Pharmaceutical Industry Data Mining Andrew Kusiak Intelligent Systems Laboratory 2139 Seamans Center The University of Iowa Iowa City, IA 52242-1527 andrew-kusiak@uiowa.edu http://www.icaen.uiowa.edu/~ankusiak Tel. 319-335 5934

More information

What is Data Mining? Data Mining. Data Mining Architecture. Illustrative Applications. Pharmaceutical Industry. Pharmaceutical Industry

What is Data Mining? Data Mining. Data Mining Architecture. Illustrative Applications. Pharmaceutical Industry. Pharmaceutical Industry Data Mining Andrew Kusiak Intelligent Systems Laboratory 2139 Seamans Center The University it of Iowa Iowa City, IA 52242-1527 andrew-kusiak@uiowa.edu http://www.icaen.uiowa.edu/~ankusiak Tel. 319-335

More information

Data Set. What is Data Mining? Data Mining (Big Data Analytics) Illustrative Applications. What is Knowledge Discovery?

Data Set. What is Data Mining? Data Mining (Big Data Analytics) Illustrative Applications. What is Knowledge Discovery? Data Mining (Big Data Analytics) Andrew Kusiak Intelligent Systems Laboratory 2139 Seamans Center The University of Iowa Iowa City, IA 52242-1527 andrew-kusiak@uiowa.edu http://user.engineering.uiowa.edu/~ankusiak/

More information

DATA MINING AND WAREHOUSING

DATA MINING AND WAREHOUSING DATA MINING AND WAREHOUSING Qno Question Answer 1 Define data warehouse? Data warehouse is a subject oriented, integrated, time-variant, and nonvolatile collection of data that supports management's decision-making

More information

Table Of Contents: xix Foreword to Second Edition

Table Of Contents: xix Foreword to Second Edition Data Mining : Concepts and Techniques Table Of Contents: Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments xxxi About the Authors xxxv Chapter 1 Introduction 1 (38) 1.1 Why Data

More information

DATA WAREHOUING UNIT I

DATA WAREHOUING UNIT I BHARATHIDASAN ENGINEERING COLLEGE NATTRAMAPALLI DEPARTMENT OF COMPUTER SCIENCE SUB CODE & NAME: IT6702/DWDM DEPT: IT Staff Name : N.RAMESH DATA WAREHOUING UNIT I 1. Define data warehouse? NOV/DEC 2009

More information

Slides for Data Mining by I. H. Witten and E. Frank

Slides for Data Mining by I. H. Witten and E. Frank Slides for Data Mining by I. H. Witten and E. Frank 7 Engineering the input and output Attribute selection Scheme-independent, scheme-specific Attribute discretization Unsupervised, supervised, error-

More information

INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad

INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad - 500 043 INFORMATION TECHNOLOGY DEFINITIONS AND TERMINOLOGY Course Name : DATA WAREHOUSING AND DATA MINING Course Code : AIT006 Program

More information

Contents. Foreword to Second Edition. Acknowledgments About the Authors

Contents. Foreword to Second Edition. Acknowledgments About the Authors Contents Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments About the Authors xxxi xxxv Chapter 1 Introduction 1 1.1 Why Data Mining? 1 1.1.1 Moving toward the Information Age 1

More information

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining. About the Tutorial Data Mining is defined as the procedure of extracting information from huge sets of data. In other words, we can say that data mining is mining knowledge from data. The tutorial starts

More information

Chapter 1, Introduction

Chapter 1, Introduction CSI 4352, Introduction to Data Mining Chapter 1, Introduction Young-Rae Cho Associate Professor Department of Computer Science Baylor University What is Data Mining? Definition Knowledge Discovery from

More information

Data Mining. Ryan Benton Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, La., USA.

Data Mining. Ryan Benton Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, La., USA. Data Mining Ryan Benton Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, La., USA January 13, 2011 Important Note! This presentation was obtained from Dr. Vijay Raghavan

More information

Lecture 7: Decision Trees

Lecture 7: Decision Trees Lecture 7: Decision Trees Instructor: Outline 1 Geometric Perspective of Classification 2 Decision Trees Geometric Perspective of Classification Perspective of Classification Algorithmic Geometric Probabilistic...

More information

COMP 465 Special Topics: Data Mining

COMP 465 Special Topics: Data Mining COMP 465 Special Topics: Data Mining Introduction & Course Overview 1 Course Page & Class Schedule http://cs.rhodes.edu/welshc/comp465_s15/ What s there? Course info Course schedule Lecture media (slides,

More information

SCHEME OF COURSE WORK. Data Warehousing and Data mining

SCHEME OF COURSE WORK. Data Warehousing and Data mining SCHEME OF COURSE WORK Course Details: Course Title Course Code Program: Specialization: Semester Prerequisites Department of Information Technology Data Warehousing and Data mining : 15CT1132 : B.TECH

More information

Unsupervised Learning

Unsupervised Learning Outline Unsupervised Learning Basic concepts K-means algorithm Representation of clusters Hierarchical clustering Distance functions Which clustering algorithm to use? NN Supervised learning vs. unsupervised

More information

Question Bank. 4) It is the source of information later delivered to data marts.

Question Bank. 4) It is the source of information later delivered to data marts. Question Bank Year: 2016-2017 Subject Dept: CS Semester: First Subject Name: Data Mining. Q1) What is data warehouse? ANS. A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile

More information

DATA MINING Introductory and Advanced Topics Part I

DATA MINING Introductory and Advanced Topics Part I DATA MINING Introductory and Advanced Topics Part I Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Companion slides for the text by Dr. M.H.Dunham, Data

More information

Data Preprocessing. Slides by: Shree Jaswal

Data Preprocessing. Slides by: Shree Jaswal Data Preprocessing Slides by: Shree Jaswal Topics to be covered Why Preprocessing? Data Cleaning; Data Integration; Data Reduction: Attribute subset selection, Histograms, Clustering and Sampling; Data

More information

Machine Learning using MapReduce

Machine Learning using MapReduce Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous

More information

Machine Learning Techniques for Data Mining

Machine Learning Techniques for Data Mining Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART VII Moving on: Engineering the input and output 10/25/2000 2 Applying a learner is not all Already

More information

Contents. Preface to the Second Edition

Contents. Preface to the Second Edition Preface to the Second Edition v 1 Introduction 1 1.1 What Is Data Mining?....................... 4 1.2 Motivating Challenges....................... 5 1.3 The Origins of Data Mining....................

More information

A Review on Cluster Based Approach in Data Mining

A Review on Cluster Based Approach in Data Mining A Review on Cluster Based Approach in Data Mining M. Vijaya Maheswari PhD Research Scholar, Department of Computer Science Karpagam University Coimbatore, Tamilnadu,India Dr T. Christopher Assistant professor,

More information

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation Data Mining Part 2. Data Understanding and Preparation 2.4 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Normalization Attribute Construction Aggregation Attribute Subset Selection Discretization

More information

Data Mining Course Overview

Data Mining Course Overview Data Mining Course Overview 1 Data Mining Overview Understanding Data Classification: Decision Trees and Bayesian classifiers, ANN, SVM Association Rules Mining: APriori, FP-growth Clustering: Hierarchical

More information

Cse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University

Cse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University Cse634 DATA MINING TEST REVIEW Professor Anita Wasilewska Computer Science Department Stony Brook University Preprocessing stage Preprocessing: includes all the operations that have to be performed before

More information

CMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10)

CMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10) CMPUT 391 Database Management Systems Data Mining Textbook: Chapter 17.7-17.11 (without 17.10) University of Alberta 1 Overview Motivation KDD and Data Mining Association Rules Clustering Classification

More information

Data Mining. Lecture 03: Nearest Neighbor Learning

Data Mining. Lecture 03: Nearest Neighbor Learning Data Mining Lecture 03: Nearest Neighbor Learning Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof. R. Mooney (UT Austin) Prof E. Keogh (UCR), Prof. F. Provost

More information

Data warehouse and Data Mining

Data warehouse and Data Mining Data warehouse and Data Mining Lecture No. 14 Data Mining and its techniques Naeem A. Mahoto Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology

More information

劉介宇 國立台北護理健康大學 護理助產研究所 / 通識教育中心副教授 兼教師發展中心教師評鑑組長 Nov 19, 2012

劉介宇 國立台北護理健康大學 護理助產研究所 / 通識教育中心副教授 兼教師發展中心教師評鑑組長 Nov 19, 2012 劉介宇 國立台北護理健康大學 護理助產研究所 / 通識教育中心副教授 兼教師發展中心教師評鑑組長 Nov 19, 2012 Overview of Data Mining ( 資料採礦 ) What is Data Mining? Steps in Data Mining Overview of Data Mining techniques Points to Remember Data mining

More information

CSE4334/5334 DATA MINING

CSE4334/5334 DATA MINING CSE4334/5334 DATA MINING Lecture 4: Classification (1) CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai Li (Slides courtesy

More information

INTRODUCTION TO DATA MINING. Daniel Rodríguez, University of Alcalá

INTRODUCTION TO DATA MINING. Daniel Rodríguez, University of Alcalá INTRODUCTION TO DATA MINING Daniel Rodríguez, University of Alcalá Outline Knowledge Discovery in Datasets Model Representation Types of models Supervised Unsupervised Evaluation (Acknowledgement: Jesús

More information

K Nearest Neighbor Wrap Up K- Means Clustering. Slides adapted from Prof. Carpuat

K Nearest Neighbor Wrap Up K- Means Clustering. Slides adapted from Prof. Carpuat K Nearest Neighbor Wrap Up K- Means Clustering Slides adapted from Prof. Carpuat K Nearest Neighbor classification Classification is based on Test instance with Training Data K: number of neighbors that

More information

Keywords Clustering, Goals of clustering, clustering techniques, clustering algorithms.

Keywords Clustering, Goals of clustering, clustering techniques, clustering algorithms. Volume 3, Issue 5, May 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Survey of Clustering

More information

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler BBS654 Data Mining Pinar Duygulu Slides are adapted from Nazli Ikizler 1 Classification Classification systems: Supervised learning Make a rational prediction given evidence There are several methods for

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Unit # 1 1 Acknowledgement Several Slides in this presentation are taken from course slides provided by Han and Kimber (Data Mining Concepts and Techniques) and Tan,

More information

Data Mining: Concepts and Techniques Classification and Prediction Chapter 6.1-3

Data Mining: Concepts and Techniques Classification and Prediction Chapter 6.1-3 Data Mining: Concepts and Techniques Classification and Prediction Chapter 6.1-3 January 25, 2007 CSE-4412: Data Mining 1 Chapter 6 Classification and Prediction 1. What is classification? What is prediction?

More information

Data mining fundamentals

Data mining fundamentals Data mining fundamentals Elena Baralis Politecnico di Torino Data analysis Most companies own huge bases containing operational textual documents experiment results These bases are a potential source of

More information

CS377: Database Systems Data Warehouse and Data Mining. Li Xiong Department of Mathematics and Computer Science Emory University

CS377: Database Systems Data Warehouse and Data Mining. Li Xiong Department of Mathematics and Computer Science Emory University CS377: Database Systems Data Warehouse and Data Mining Li Xiong Department of Mathematics and Computer Science Emory University 1 1960s: Evolution of Database Technology Data collection, database creation,

More information

D B M G Data Base and Data Mining Group of Politecnico di Torino

D B M G Data Base and Data Mining Group of Politecnico di Torino DataBase and Data Mining Group of Data mining fundamentals Data Base and Data Mining Group of Data analysis Most companies own huge databases containing operational data textual documents experiment results

More information

Clustering CS 550: Machine Learning

Clustering CS 550: Machine Learning Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf

More information

Data Warehousing and Machine Learning

Data Warehousing and Machine Learning Data Warehousing and Machine Learning Preprocessing Thomas D. Nielsen Aalborg University Department of Computer Science Spring 2008 DWML Spring 2008 1 / 35 Preprocessing Before you can start on the actual

More information

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X Analysis about Classification Techniques on Categorical Data in Data Mining Assistant Professor P. Meena Department of Computer Science Adhiyaman Arts and Science College for Women Uthangarai, Krishnagiri,

More information

Knowledge Discovery. Javier Béjar URL - Spring 2019 CS - MIA

Knowledge Discovery. Javier Béjar URL - Spring 2019 CS - MIA Knowledge Discovery Javier Béjar URL - Spring 2019 CS - MIA Knowledge Discovery (KDD) Knowledge Discovery in Databases (KDD) Practical application of the methodologies from machine learning/statistics

More information

Knowledge Discovery. URL - Spring 2018 CS - MIA 1/22

Knowledge Discovery. URL - Spring 2018 CS - MIA 1/22 Knowledge Discovery Javier Béjar cbea URL - Spring 2018 CS - MIA 1/22 Knowledge Discovery (KDD) Knowledge Discovery in Databases (KDD) Practical application of the methodologies from machine learning/statistics

More information

Introduction to Data Mining and Data Analytics

Introduction to Data Mining and Data Analytics 1/28/2016 MIST.7060 Data Analytics 1 Introduction to Data Mining and Data Analytics What Are Data Mining and Data Analytics? Data mining is the process of discovering hidden patterns in data, where Patterns

More information

Jarek Szlichta

Jarek Szlichta Jarek Szlichta http://data.science.uoit.ca/ Approximate terminology, though there is some overlap: Data(base) operations Executing specific operations or queries over data Data mining Looking for patterns

More information

INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering

INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering Erik Velldal University of Oslo Sept. 18, 2012 Topics for today 2 Classification Recap Evaluating classifiers Accuracy, precision,

More information

Data mining techniques for actuaries: an overview

Data mining techniques for actuaries: an overview Data mining techniques for actuaries: an overview Emiliano A. Valdez joint work with Banghee So and Guojun Gan University of Connecticut Advances in Predictive Analytics (APA) Conference University of

More information

Data Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier

Data Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier Data Mining 3.2 Decision Tree Classifier Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction Basic Algorithm for Decision Tree Induction Attribute Selection Measures Information Gain Gain Ratio

More information

Chapter 28. Outline. Definitions of Data Mining. Data Mining Concepts

Chapter 28. Outline. Definitions of Data Mining. Data Mining Concepts Chapter 28 Data Mining Concepts Outline Data Mining Data Warehousing Knowledge Discovery in Databases (KDD) Goals of Data Mining and Knowledge Discovery Association Rules Additional Data Mining Algorithms

More information

Preprocessing DWML, /33

Preprocessing DWML, /33 Preprocessing DWML, 2007 1/33 Preprocessing Before you can start on the actual data mining, the data may require some preprocessing: Attributes may be redundant. Values may be missing. The data contains

More information

Clustering in Data Mining

Clustering in Data Mining Clustering in Data Mining Classification Vs Clustering When the distribution is based on a single parameter and that parameter is known for each object, it is called classification. E.g. Children, young,

More information

ECLT 5810 Clustering

ECLT 5810 Clustering ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping

More information

Based on Raymond J. Mooney s slides

Based on Raymond J. Mooney s slides Instance Based Learning Based on Raymond J. Mooney s slides University of Texas at Austin 1 Example 2 Instance-Based Learning Unlike other learning algorithms, does not involve construction of an explicit

More information

Intro to Artificial Intelligence

Intro to Artificial Intelligence Intro to Artificial Intelligence Ahmed Sallam { Lecture 5: Machine Learning ://. } ://.. 2 Review Probabilistic inference Enumeration Approximate inference 3 Today What is machine learning? Supervised

More information

Unsupervised Learning : Clustering

Unsupervised Learning : Clustering Unsupervised Learning : Clustering Things to be Addressed Traditional Learning Models. Cluster Analysis K-means Clustering Algorithm Drawbacks of traditional clustering algorithms. Clustering as a complex

More information

CSE 573: Artificial Intelligence Autumn 2010

CSE 573: Artificial Intelligence Autumn 2010 CSE 573: Artificial Intelligence Autumn 2010 Lecture 16: Machine Learning Topics 12/7/2010 Luke Zettlemoyer Most slides over the course adapted from Dan Klein. 1 Announcements Syllabus revised Machine

More information

R07. FirstRanker. 7. a) What is text mining? Describe about basic measures for text retrieval. b) Briefly describe document cluster analysis.

R07. FirstRanker. 7. a) What is text mining? Describe about basic measures for text retrieval. b) Briefly describe document cluster analysis. www..com www..com Set No.1 1. a) What is data mining? Briefly explain the Knowledge discovery process. b) Explain the three-tier data warehouse architecture. 2. a) With an example, describe any two schema

More information

Supervised Learning with Neural Networks. We now look at how an agent might learn to solve a general problem by seeing examples.

Supervised Learning with Neural Networks. We now look at how an agent might learn to solve a general problem by seeing examples. Supervised Learning with Neural Networks We now look at how an agent might learn to solve a general problem by seeing examples. Aims: to present an outline of supervised learning as part of AI; to introduce

More information

Unsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing

Unsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing Unsupervised Data Mining: Clustering Izabela Moise, Evangelos Pournaras, Dirk Helbing Izabela Moise, Evangelos Pournaras, Dirk Helbing 1 1. Supervised Data Mining Classification Regression Outlier detection

More information

WKU-MIS-B10 Data Management: Warehousing, Analyzing, Mining, and Visualization. Management Information Systems

WKU-MIS-B10 Data Management: Warehousing, Analyzing, Mining, and Visualization. Management Information Systems Management Information Systems Management Information Systems B10. Data Management: Warehousing, Analyzing, Mining, and Visualization Code: 166137-01+02 Course: Management Information Systems Period: Spring

More information

Overview of Web Mining Techniques and its Application towards Web

Overview of Web Mining Techniques and its Application towards Web Overview of Web Mining Techniques and its Application towards Web *Prof.Pooja Mehta Abstract The World Wide Web (WWW) acts as an interactive and popular way to transfer information. Due to the enormous

More information

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SHRI ANGALAMMAN COLLEGE OF ENGINEERING & TECHNOLOGY (An ISO 9001:2008 Certified Institution) SIRUGANOOR,TRICHY-621105. DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING Year / Semester: IV/VII CS1011-DATA

More information

PESIT- Bangalore South Campus Hosur Road (1km Before Electronic city) Bangalore

PESIT- Bangalore South Campus Hosur Road (1km Before Electronic city) Bangalore Data Warehousing Data Mining (17MCA442) 1. GENERAL INFORMATION: PESIT- Bangalore South Campus Hosur Road (1km Before Electronic city) Bangalore 560 100 Department of MCA COURSE INFORMATION SHEET Academic

More information

Classification Algorithms in Data Mining

Classification Algorithms in Data Mining August 9th, 2016 Suhas Mallesh Yash Thakkar Ashok Choudhary CIS660 Data Mining and Big Data Processing -Dr. Sunnie S. Chung Classification Algorithms in Data Mining Deciding on the classification algorithms

More information

Classification: Feature Vectors

Classification: Feature Vectors Classification: Feature Vectors Hello, Do you want free printr cartriges? Why pay more when you can get them ABSOLUTELY FREE! Just # free YOUR_NAME MISSPELLED FROM_FRIEND... : : : : 2 0 2 0 PIXEL 7,12

More information

Basic Data Mining Technique

Basic Data Mining Technique Basic Data Mining Technique What is classification? What is prediction? Supervised and Unsupervised Learning Decision trees Association rule K-nearest neighbor classifier Case-based reasoning Genetic algorithm

More information

2. (a) Briefly discuss the forms of Data preprocessing with neat diagram. (b) Explain about concept hierarchy generation for categorical data.

2. (a) Briefly discuss the forms of Data preprocessing with neat diagram. (b) Explain about concept hierarchy generation for categorical data. Code No: M0502/R05 Set No. 1 1. (a) Explain data mining as a step in the process of knowledge discovery. (b) Differentiate operational database systems and data warehousing. [8+8] 2. (a) Briefly discuss

More information

ECG782: Multidimensional Digital Signal Processing

ECG782: Multidimensional Digital Signal Processing ECG782: Multidimensional Digital Signal Processing Object Recognition http://www.ee.unlv.edu/~b1morris/ecg782/ 2 Outline Knowledge Representation Statistical Pattern Recognition Neural Networks Boosting

More information

CISC 4631 Data Mining

CISC 4631 Data Mining CISC 4631 Data Mining Lecture 03: Nearest Neighbor Learning Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof. R. Mooney (UT Austin) Prof E. Keogh (UCR), Prof. F.

More information

SOCIAL MEDIA MINING. Data Mining Essentials

SOCIAL MEDIA MINING. Data Mining Essentials SOCIAL MEDIA MINING Data Mining Essentials Dear instructors/users of these slides: Please feel free to include these slides in your own material, or modify them as you see fit. If you decide to incorporate

More information

Supervised and Unsupervised Learning (II)

Supervised and Unsupervised Learning (II) Supervised and Unsupervised Learning (II) Yong Zheng Center for Web Intelligence DePaul University, Chicago IPD 346 - Data Science for Business Program DePaul University, Chicago, USA Intro: Supervised

More information

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018 MIT 801 [Presented by Anna Bosman] 16 February 2018 Machine Learning What is machine learning? Artificial Intelligence? Yes as we know it. What is intelligence? The ability to acquire and apply knowledge

More information

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1 Big Data Methods Chapter 5: Machine learning Big Data Methods, Chapter 5, Slide 1 5.1 Introduction to machine learning What is machine learning? Concerned with the study and development of algorithms that

More information

CT75 (ALCCS) DATA WAREHOUSING AND DATA MINING JUN

CT75 (ALCCS) DATA WAREHOUSING AND DATA MINING JUN Q.1 a. Define a Data warehouse. Compare OLTP and OLAP systems. Data Warehouse: A data warehouse is a subject-oriented, integrated, time-variant, and 2 Non volatile collection of data in support of management

More information

Random Forest A. Fornaser

Random Forest A. Fornaser Random Forest A. Fornaser alberto.fornaser@unitn.it Sources Lecture 15: decision trees, information theory and random forests, Dr. Richard E. Turner Trees and Random Forests, Adele Cutler, Utah State University

More information

6.034 Quiz 2, Spring 2005

6.034 Quiz 2, Spring 2005 6.034 Quiz 2, Spring 2005 Open Book, Open Notes Name: Problem 1 (13 pts) 2 (8 pts) 3 (7 pts) 4 (9 pts) 5 (8 pts) 6 (16 pts) 7 (15 pts) 8 (12 pts) 9 (12 pts) Total (100 pts) Score 1 1 Decision Trees (13

More information

Clustering Part 4 DBSCAN

Clustering Part 4 DBSCAN Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of

More information

Unsupervised Learning. Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi

Unsupervised Learning. Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi Unsupervised Learning Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi Content Motivation Introduction Applications Types of clustering Clustering criterion functions Distance functions Normalization Which

More information

Data Preprocessing. Why Data Preprocessing? MIT-652 Data Mining Applications. Chapter 3: Data Preprocessing. Multi-Dimensional Measure of Data Quality

Data Preprocessing. Why Data Preprocessing? MIT-652 Data Mining Applications. Chapter 3: Data Preprocessing. Multi-Dimensional Measure of Data Quality Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate data e.g., occupation = noisy: containing

More information

ECLT 5810 Clustering

ECLT 5810 Clustering ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping

More information

Voronoi Region. K-means method for Signal Compression: Vector Quantization. Compression Formula 11/20/2013

Voronoi Region. K-means method for Signal Compression: Vector Quantization. Compression Formula 11/20/2013 Voronoi Region K-means method for Signal Compression: Vector Quantization Blocks of signals: A sequence of audio. A block of image pixels. Formally: vector example: (0.2, 0.3, 0.5, 0.1) A vector quantizer

More information

Lecture-17: Clustering with K-Means (Contd: DT + Random Forest)

Lecture-17: Clustering with K-Means (Contd: DT + Random Forest) Lecture-17: Clustering with K-Means (Contd: DT + Random Forest) Medha Vidyotma April 24, 2018 1 Contd. Random Forest For Example, if there are 50 scholars who take the measurement of the length of the

More information

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant

More information

Data Mining and Warehousing

Data Mining and Warehousing Data Mining and Warehousing Sangeetha K V I st MCA Adhiyamaan College of Engineering, Hosur-635109. E-mail:veerasangee1989@gmail.com Rajeshwari P I st MCA Adhiyamaan College of Engineering, Hosur-635109.

More information

CSE 5243 INTRO. TO DATA MINING

CSE 5243 INTRO. TO DATA MINING CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University 09/25/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10.

More information

Data Science. Data Analyst. Data Scientist. Data Architect

Data Science. Data Analyst. Data Scientist. Data Architect Data Science Data Analyst Data Analysis in Excel Programming in R Introduction to Python/SQL/Tableau Data Visualization in R / Tableau Exploratory Data Analysis Data Scientist Inferential Statistics &

More information

Overview. Introduction to Data Warehousing and Business Intelligence. BI Is Important. What is Business Intelligence (BI)?

Overview. Introduction to Data Warehousing and Business Intelligence. BI Is Important. What is Business Intelligence (BI)? Introduction to Data Warehousing and Business Intelligence Overview Why Business Intelligence? Data analysis problems Data Warehouse (DW) introduction A tour of the coming DW lectures DW Applications Loosely

More information

Data Mining. Yi-Cheng Chen ( 陳以錚 ) Dept. of Computer Science & Information Engineering, Tamkang University

Data Mining. Yi-Cheng Chen ( 陳以錚 ) Dept. of Computer Science & Information Engineering, Tamkang University Data Mining Yi-Cheng Chen ( 陳以錚 ) Dept. of Computer Science & Information Engineering, Tamkang University Why Mine Data? Commercial Viewpoint Lots of data is being collected and warehoused Web data, e-commerce

More information

CHAPTER 4: CLUSTER ANALYSIS

CHAPTER 4: CLUSTER ANALYSIS CHAPTER 4: CLUSTER ANALYSIS WHAT IS CLUSTER ANALYSIS? A cluster is a collection of data-objects similar to one another within the same group & dissimilar to the objects in other groups. Cluster analysis

More information

GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV

GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV Subject Name: Elective I Data Warehousing & Data Mining (DWDM) Subject Code: 2640005 Learning Objectives: To understand

More information

Data Mining Concepts

Data Mining Concepts Data Mining Concepts Outline Data Mining Data Warehousing Knowledge Discovery in Databases (KDD) Goals of Data Mining and Knowledge Discovery Association Rules Additional Data Mining Algorithms Sequential

More information

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes

More information

Database and Knowledge-Base Systems: Data Mining. Martin Ester

Database and Knowledge-Base Systems: Data Mining. Martin Ester Database and Knowledge-Base Systems: Data Mining Martin Ester Simon Fraser University School of Computing Science Graduate Course Spring 2006 CMPT 843, SFU, Martin Ester, 1-06 1 Introduction [Fayyad, Piatetsky-Shapiro

More information

COMP 465: Data Mining Classification Basics

COMP 465: Data Mining Classification Basics Supervised vs. Unsupervised Learning COMP 465: Data Mining Classification Basics Slides Adapted From : Jiawei Han, Micheline Kamber & Jian Pei Data Mining: Concepts and Techniques, 3 rd ed. Supervised

More information

Gene Clustering & Classification

Gene Clustering & Classification BINF, Introduction to Computational Biology Gene Clustering & Classification Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Introduction to Gene Clustering

More information

Chapter 3: Supervised Learning

Chapter 3: Supervised Learning Chapter 3: Supervised Learning Road Map Basic concepts Evaluation of classifiers Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Summary 2 An example

More information

Using Machine Learning to Optimize Storage Systems

Using Machine Learning to Optimize Storage Systems Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation

More information