Basic Data Mining Technique
|
|
- Millicent Pearson
- 5 years ago
- Views:
Transcription
1 Basic Data Mining Technique
2 What is classification? What is prediction? Supervised and Unsupervised Learning Decision trees Association rule K-nearest neighbor classifier Case-based reasoning Genetic algorithm Rough set approach Fuzzy set approaches Chapter 4 2
3 Chapter 4 3
4 Chapter 4 4
5 Classification: predicts categorical class labels classifies data (constructs a model) based on the training set and the values (class labels) in a classifying attribute and...uses it in classifying new data Prediction: models continuous-valued functions, i.e., predicts unknown or missing values Chapter 4 5
6 Typical Applications credit approval target marketing medical diagnosis treatment effectiveness analysis Chapter 4 6
7 1. Model construction: 2. Model usage: Chapter 4 7
8 1. Model construction: describing a set of predetermined classes Each tuple/sample is assumed to belong to a predefined class, as determined by the class label attribute The set of tuples used for model construction: training set The model is represented as classification rules, decision trees, or mathematical formulae Chapter 4 8
9 Training Data Classification Algorithms N A M E RANK Y E A R S TENURED M ike A ssistant P rof 3 no M ary A ssistant P rof 7 yes B ill P rofessor 2 yes Jim A ssociate P rof 7 yes D ave A ssistant P rof 6 no A nne A ssociate P rof 3 no Classifier (Model) IF rank = professor OR years > 6 THEN tenured = yes Chapter 4 9
10 Classification Process 2. Model usage: for classifying future or unknown objects Estimate accuracy of the model The known label of test sample is compared with the classified result from the model Accuracy rate is the percentage of test set samples that are correctly classified by the model Test set is independent of training set Chapter 4 10
11 Classifier Testing Data Unseen Data (Jeff, Professor, 4) NAM E RANK YEARS TENURED Tom Assistant Prof 2 no M erlisa Associate Prof 7 no G eorge Professor 5 yes Joseph Assistant Prof 7 yes Tenured? Chapter 4 11
12 Prediction is similar to classification 1. Construct a model 2. Use model to predict unknown value Major method for prediction is regression Linear and multiple regression Non-linear regression Prediction is different from classification Classification refers to predict categorical class label Prediction models continuous-valued functions Chapter 4 12
13 1. Data Preparation 2. Evaluating Classification Methods Chapter 4 13
14 Data cleaning Preprocess data in order to reduce noise and handle missing values Relevance analysis (feature selection) Remove the irrelevant or redundant attributes Data transformation Generalize and/or normalize data Chapter 4 14
15 Predictive accuracy Speed and scalability time to construct the model time to use the model Robustness handling noise and missing values Scalability efficiency in disk-resident databases Interpretability: understanding and insight proved by the model Goodness of rules decision tree size compactness of classification rules Chapter 4 15
16 Supervised learning (classification) Supervision: The training data (observations, measurements, etc.) are accompanied by labels indicating the class of the observations New data is classified based on the training set Unsupervised learning (clustering) The class labels of training data is unknown Given a set of measurements, observations, etc. with the aim of establishing the existence of classes or clusters in the data Chapter 4 16
17 Chapter 4 17
18 Chapter 4 18
19 Decision tree A flow-chart-like tree structure Internal node denotes a test on an attribute Branch represents an outcome of the test Leaf nodes represent class labels or class distribution Use of decision tree: Classifying an unknown sample Test the attribute values of the sample against the decision tree Chapter 4 19
20 Decision tree generation consists of two phases 1. Tree construction At start, all the training examples are at the root Partition examples recursively based on selected attributes 2. Tree pruning Identify and remove branches that reflect noise or outliers Chapter 4 20
21 This follows an example from Quinlan s ID3 age income student credit_rating buys_computer <=30 high no fair no <=30 high no excellent no high no fair yes >40 medium no fair yes >40 low yes fair yes >40 low yes excellent no low yes excellent yes <=30 medium no fair no <=30 low yes fair yes >40 medium yes fair yes <=30 medium yes excellent yes medium no excellent yes high yes fair yes >40 medium no excellent no Chapter 4 21
22 age? <=30 overcast >40 student? yes credit rating? no yes fair excellent no yes no yes Chapter 4 22
23 Chapter 4 23
24 Association rule mining: Finding frequent patterns, associations, correlations, or causal structures among sets of items or objects in transaction databases, relational databases, and other information repositories. Applications: Basket data analysis, cross-marketing, catalog design, loss-leader analysis, clustering, classification, etc. Chapter 4 24
25 Chapter 4 25
26 Instance-based learning: Store training examples and delay the processing ( lazy evaluation )...until a new instance must be classified Typical approaches k-nearest neighbor approach Instances represented as points in a Euclidean space. Case-based reasoning Uses symbolic representations and knowledge-based inference Chapter 4 26
27 All instances correspond to points in the n-d space. The nearest neighbor are defined in terms of Euclidean distance. The target function could be discrete- or realvalued. For discrete-valued, the k-nn returns the most common value among the k training examples nearest to xq. Vonoroi diagram: the decision surface induced by _ 1-NN for a typical set of training examples. + _. + + _ xq _ Chapter 4 27
28 Also uses: lazy evaluation + analyze similar instances Difference: Instances... are not points in a Euclidean space Methodology Instances represented by rich symbolic descriptions (e.g., function graphs) Multiple retrieved cases may be combined Chapter 4 28
29 GA: based on an analogy to biological evolution Each rule is represented by a string of bits An initial population is created consisting of randomly generated rules e.g., IF A 1 and Not A 2 then C 2 can be encoded as 100 Based on the notion of survival of the fittest, a new population is formed to consists of the fittest rules and their offsprings The fitness of a rule is represented by its classification accuracy on a set of training examples Offsprings are generated by crossover and mutation Chapter 4 29
30 Chapter 4 30
31 Rough sets are used to approximately or roughly define equivalent classes Chapter 4 31
32 A rough set for a given class C is approximated by two sets: 1. a lower approximation (certain to be in C) and 2. an upper approximation (cannot be described as not belonging to C) Finding the minimal subsets of attributes (for feature reduction) is NPhard Chapter 4 32
33 Fuzzy logic uses truth values between 0.0 and 1.0 to represent the degree of membership (such as using fuzzy membership graph) Fuzzy membeship Low Medium High somewhat low baseline high Income Chapter 4 33
34 Attribute values are converted to fuzzy values e.g., income is mapped into the discrete categories {low, medium, high} with fuzzy values calculated For a given new sample, more than one fuzzy value may apply Each applicable rule contributes a vote for membership in the categories Typically, the truth values for each predicted category are summed Chapter 4 34
35 Data Mining: Concepts and Techniques (Chapter 7 for textbook), Jiawei Han and Micheline Kamber, Intelligent Database Systems Research Lab, School of Computing Science, Simon Fraser University, Canada Chapter 4 35
The k-means Algorithm and Genetic Algorithm
The k-means Algorithm and Genetic Algorithm k-means algorithm Genetic algorithm Rough set approach Fuzzy set approaches Chapter 8 2 The K-Means Algorithm The K-Means algorithm is a simple yet effective
More informationCOMP 465: Data Mining Classification Basics
Supervised vs. Unsupervised Learning COMP 465: Data Mining Classification Basics Slides Adapted From : Jiawei Han, Micheline Kamber & Jian Pei Data Mining: Concepts and Techniques, 3 rd ed. Supervised
More informationData Mining: Concepts and Techniques Classification and Prediction Chapter 6.1-3
Data Mining: Concepts and Techniques Classification and Prediction Chapter 6.1-3 January 25, 2007 CSE-4412: Data Mining 1 Chapter 6 Classification and Prediction 1. What is classification? What is prediction?
More informationExtra readings beyond the lecture slides are important:
1 Notes To preview next lecture: Check the lecture notes, if slides are not available: http://web.cse.ohio-state.edu/~sun.397/courses/au2017/cse5243-new.html Check UIUC course on the same topic. All their
More informationWhat Is Data Mining? CMPT 354: Database I -- Data Mining 2
Data Mining What Is Data Mining? Mining data mining knowledge Data mining is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data CMPT
More informationData warehouse and Data Mining
Data warehouse and Data Mining Lecture No. 14 Data Mining and its techniques Naeem A. Mahoto Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology
More informationClassification with Decision Tree Induction
Classification with Decision Tree Induction This algorithm makes Classification Decision for a test sample with the help of tree like structure (Similar to Binary Tree OR k-ary tree) Nodes in the tree
More informationCMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10)
CMPUT 391 Database Management Systems Data Mining Textbook: Chapter 17.7-17.11 (without 17.10) University of Alberta 1 Overview Motivation KDD and Data Mining Association Rules Clustering Classification
More informationData Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier
Data Mining 3.2 Decision Tree Classifier Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction Basic Algorithm for Decision Tree Induction Attribute Selection Measures Information Gain Gain Ratio
More informationNesnelerin İnternetinde Veri Analizi
Nesnelerin İnternetinde Veri Analizi Bölüm 3. Classification in Data Streams w3.gazi.edu.tr/~suatozdemir Supervised vs. Unsupervised Learning (1) Supervised learning (classification) Supervision: The training
More informationCse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University
Cse634 DATA MINING TEST REVIEW Professor Anita Wasilewska Computer Science Department Stony Brook University Preprocessing stage Preprocessing: includes all the operations that have to be performed before
More informationClassification and Prediction
Objectives Introduction What is Classification? Classification vs Prediction Supervised and Unsupervised Learning D t P Data Preparation ti Classification Accuracy ID3 Algorithm Information Gain Bayesian
More informationChapter 4 Data Mining A Short Introduction. 2005/6, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis Data Mining - 1
Chapter 4 Data Mining A Short Introduction 2005/6, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis Data Mining - 1 1 Today's Question 1. Data Mining Overview 2. Association Rule Mining
More informationTopic 1 Classification Alternatives
Topic 1 Classification Alternatives [Jiawei Han, Micheline Kamber, Jian Pei. 2011. Data Mining Concepts and Techniques. 3 rd Ed. Morgan Kaufmann. ISBN: 9380931913.] 1 Contents 2. Classification Using Frequent
More informationInternational Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X
Analysis about Classification Techniques on Categorical Data in Data Mining Assistant Professor P. Meena Department of Computer Science Adhiyaman Arts and Science College for Women Uthangarai, Krishnagiri,
More informationIntroduction to Data Mining. Yücel SAYGIN
Introduction to Data Mining Yücel SAYGIN ysaygin@sabanciuniv.edu http://people.sabanciuniv.edu/~ysaygin/ A Brief History Historically, we had operational databases, ex: for accounts, customers, personnel
More informationPrediction. What is Prediction. Simple methods for Prediction. Classification by decision tree induction. Classification and regression evaluation
Prediction Prediction What is Prediction Simple methods for Prediction Classification by decision tree induction Classification and regression evaluation 2 Prediction Goal: to predict the value of a given
More informationPreprocessing of Stream Data using Attribute Selection based on Survival of the Fittest
Preprocessing of Stream Data using Attribute Selection based on Survival of the Fittest Bhakti V. Gavali 1, Prof. Vivekanand Reddy 2 1 Department of Computer Science and Engineering, Visvesvaraya Technological
More informationBBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler
BBS654 Data Mining Pinar Duygulu Slides are adapted from Nazli Ikizler 1 Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Model Evaluation
More informationChapter 28. Outline. Definitions of Data Mining. Data Mining Concepts
Chapter 28 Data Mining Concepts Outline Data Mining Data Warehousing Knowledge Discovery in Databases (KDD) Goals of Data Mining and Knowledge Discovery Association Rules Additional Data Mining Algorithms
More informationAnalytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.
Glossary of data mining terms: Accuracy Accuracy is an important factor in assessing the success of data mining. When applied to data, accuracy refers to the rate of correct values in the data. When applied
More informationData Mining: Concepts and Techniques. (3 rd ed.) Chapter 3
Data Mining: Concepts and Techniques (3 rd ed.) Chapter 3 Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign & Simon Fraser University 2011 Han, Kamber & Pei. All rights
More informationA Program demonstrating Gini Index Classification
A Program demonstrating Gini Index Classification Abstract In this document, a small program demonstrating Gini Index Classification is introduced. Users can select specified training data set, build the
More informationMIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018
MIT 801 [Presented by Anna Bosman] 16 February 2018 Machine Learning What is machine learning? Artificial Intelligence? Yes as we know it. What is intelligence? The ability to acquire and apply knowledge
More informationUnsupervised Learning
Outline Unsupervised Learning Basic concepts K-means algorithm Representation of clusters Hierarchical clustering Distance functions Which clustering algorithm to use? NN Supervised learning vs. unsupervised
More informationCse352 Artifficial Intelligence Short Review for Midterm. Professor Anita Wasilewska Computer Science Department Stony Brook University
Cse352 Artifficial Intelligence Short Review for Midterm Professor Anita Wasilewska Computer Science Department Stony Brook University Midterm Midterm INCLUDES CLASSIFICATION CLASSIFOCATION by Decision
More information(Classification and Prediction)
Tamkang University Big Data Mining Tamkang University (Classification and Prediction) 1062DM04 MI4 (M2244) (2995) Wed, 9, 10 (16:10-18:00) (B206) Min-Yuh Day Assistant Professor Dept. of Information Management,
More informationJarek Szlichta
Jarek Szlichta http://data.science.uoit.ca/ Approximate terminology, though there is some overlap: Data(base) operations Executing specific operations or queries over data Data mining Looking for patterns
More informationData Mining. Lecture 03: Nearest Neighbor Learning
Data Mining Lecture 03: Nearest Neighbor Learning Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof. R. Mooney (UT Austin) Prof E. Keogh (UCR), Prof. F. Provost
More informationCS4445 Data Mining and Knowledge Discovery in Databases. A Term 2008 Exam 2 October 14, 2008
CS4445 Data Mining and Knowledge Discovery in Databases. A Term 2008 Exam 2 October 14, 2008 Prof. Carolina Ruiz Department of Computer Science Worcester Polytechnic Institute NAME: Prof. Ruiz Problem
More informationStudy on Classifiers using Genetic Algorithm and Class based Rules Generation
2012 International Conference on Software and Computer Applications (ICSCA 2012) IPCSIT vol. 41 (2012) (2012) IACSIT Press, Singapore Study on Classifiers using Genetic Algorithm and Class based Rules
More informationClassification and Regression
Classification and Regression Announcements Study guide for exam is on the LMS Sample exam will be posted by Monday Reminder that phase 3 oral presentations are being held next week during workshops Plan
More informationData Mining and Soft Computing
Data Mining and Soft Computing Session 3. Introduction to Prediction, Classification, Clustering and Association Francisco Herrera Research Group on Soft Computing and Information Intelligent Systems (SCI
More informationDecision Trees Dr. G. Bharadwaja Kumar VIT Chennai
Decision Trees Decision Tree Decision Trees (DTs) are a nonparametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target
More informationUncertain Data Classification Using Decision Tree Classification Tool With Probability Density Function Modeling Technique
Research Paper Uncertain Data Classification Using Decision Tree Classification Tool With Probability Density Function Modeling Technique C. Sudarsana Reddy 1 S. Aquter Babu 2 Dr. V. Vasu 3 Department
More informationSOCIAL MEDIA MINING. Data Mining Essentials
SOCIAL MEDIA MINING Data Mining Essentials Dear instructors/users of these slides: Please feel free to include these slides in your own material, or modify them as you see fit. If you decide to incorporate
More informationTable Of Contents: xix Foreword to Second Edition
Data Mining : Concepts and Techniques Table Of Contents: Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments xxxi About the Authors xxxv Chapter 1 Introduction 1 (38) 1.1 Why Data
More informationContents. Foreword to Second Edition. Acknowledgments About the Authors
Contents Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments About the Authors xxxi xxxv Chapter 1 Introduction 1 1.1 Why Data Mining? 1 1.1.1 Moving toward the Information Age 1
More informationA Genetic Algorithm-Based Approach for Building Accurate Decision Trees
A Genetic Algorithm-Based Approach for Building Accurate Decision Trees by Z. Fu, Fannie Mae Bruce Golden, University of Maryland S. Lele,, University of Maryland S. Raghavan,, University of Maryland Edward
More informationDecision Tree CE-717 : Machine Learning Sharif University of Technology
Decision Tree CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adapted from: Prof. Tom Mitchell Decision tree Approximating functions of usually discrete
More informationK- Nearest Neighbors(KNN) And Predictive Accuracy
Contact: mailto: Ammar@cu.edu.eg Drammarcu@gmail.com K- Nearest Neighbors(KNN) And Predictive Accuracy Dr. Ammar Mohammed Associate Professor of Computer Science ISSR, Cairo University PhD of CS ( Uni.
More informationData Mining Concepts
Data Mining Concepts Outline Data Mining Data Warehousing Knowledge Discovery in Databases (KDD) Goals of Data Mining and Knowledge Discovery Association Rules Additional Data Mining Algorithms Sequential
More informationA STUDY OF SOME DATA MINING CLASSIFICATION TECHNIQUES
A STUDY OF SOME DATA MINING CLASSIFICATION TECHNIQUES Narsaiah Putta Assistant professor Department of CSE, VASAVI College of Engineering, Hyderabad, Telangana, India Abstract Abstract An Classification
More informationInternational Journal of Software and Web Sciences (IJSWS)
International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) ISSN (Print): 2279-0063 ISSN (Online): 2279-0071 International
More informationStudy on the Application Analysis and Future Development of Data Mining Technology
Study on the Application Analysis and Future Development of Data Mining Technology Ge ZHU 1, Feng LIN 2,* 1 Department of Information Science and Technology, Heilongjiang University, Harbin 150080, China
More informationCSE4334/5334 DATA MINING
CSE4334/5334 DATA MINING Lecture 4: Classification (1) CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai Li (Slides courtesy
More informationBig Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1
Big Data Methods Chapter 5: Machine learning Big Data Methods, Chapter 5, Slide 1 5.1 Introduction to machine learning What is machine learning? Concerned with the study and development of algorithms that
More informationLecture outline. Decision-tree classification
Lecture outline Decision-tree classification Decision Trees Decision tree A flow-chart-like tree structure Internal node denotes a test on an attribute Branch represents an outcome of the test Leaf nodes
More informationPerformance Analysis of Data Mining Classification Techniques
Performance Analysis of Data Mining Classification Techniques Tejas Mehta 1, Dr. Dhaval Kathiriya 2 Ph.D. Student, School of Computer Science, Dr. Babasaheb Ambedkar Open University, Gujarat, India 1 Principal
More informationCS6375: Machine Learning Gautam Kunapuli. Mid-Term Review
Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes
More informationINTRODUCTION TO DATA MINING. Daniel Rodríguez, University of Alcalá
INTRODUCTION TO DATA MINING Daniel Rodríguez, University of Alcalá Outline Knowledge Discovery in Datasets Model Representation Types of models Supervised Unsupervised Evaluation (Acknowledgement: Jesús
More informationMachine Learning: Algorithms and Applications Mockup Examination
Machine Learning: Algorithms and Applications Mockup Examination 14 May 2012 FIRST NAME STUDENT NUMBER LAST NAME SIGNATURE Instructions for students Write First Name, Last Name, Student Number and Signature
More informationIntroduction to Data Mining and Data Analytics
1/28/2016 MIST.7060 Data Analytics 1 Introduction to Data Mining and Data Analytics What Are Data Mining and Data Analytics? Data mining is the process of discovering hidden patterns in data, where Patterns
More informationData Mining: An experimental approach with WEKA on UCI Dataset
Data Mining: An experimental approach with WEKA on UCI Dataset Ajay Kumar Dept. of computer science Shivaji College University of Delhi, India Indranath Chatterjee Dept. of computer science Faculty of
More informationA REVIEW ON VARIOUS APPROACHES OF CLUSTERING IN DATA MINING
A REVIEW ON VARIOUS APPROACHES OF CLUSTERING IN DATA MINING Abhinav Kathuria Email - abhinav.kathuria90@gmail.com Abstract: Data mining is the process of the extraction of the hidden pattern from the data
More informationKnowledge Discovery in Databases
Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Lecture notes Knowledge Discovery in Databases Summer Semester 2012 Lecture 4: Classification
More informationData Cleaning and Prototyping Using K-Means to Enhance Classification Accuracy
Data Cleaning and Prototyping Using K-Means to Enhance Classification Accuracy Lutfi Fanani 1 and Nurizal Dwi Priandani 2 1 Department of Computer Science, Brawijaya University, Malang, Indonesia. 2 Department
More informationFuzzy Partitioning with FID3.1
Fuzzy Partitioning with FID3.1 Cezary Z. Janikow Dept. of Mathematics and Computer Science University of Missouri St. Louis St. Louis, Missouri 63121 janikow@umsl.edu Maciej Fajfer Institute of Computing
More information2. Data Preprocessing
2. Data Preprocessing Contents of this Chapter 2.1 Introduction 2.2 Data cleaning 2.3 Data integration 2.4 Data transformation 2.5 Data reduction Reference: [Han and Kamber 2006, Chapter 2] SFU, CMPT 459
More informationData Preprocessing Yudho Giri Sucahyo y, Ph.D , CISA
Obj ti Objectives Motivation: Why preprocess the Data? Data Preprocessing Techniques Data Cleaning Data Integration and Transformation Data Reduction Data Preprocessing Lecture 3/DMBI/IKI83403T/MTI/UI
More informationArtificial Intelligence. Programming Styles
Artificial Intelligence Intro to Machine Learning Programming Styles Standard CS: Explicitly program computer to do something Early AI: Derive a problem description (state) and use general algorithms to
More informationChapter 2: Classification & Prediction
Chapter 2: Classification & Prediction 2.1 Basic Concepts of Classification and Prediction 2.2 Decision Tree Induction 2.3 Bayes Classification Methods 2.4 Rule Based Classification 2.4.1 The principle
More informationLecture 6 K- Nearest Neighbors(KNN) And Predictive Accuracy
Lecture 6 K- Nearest Neighbors(KNN) And Predictive Accuracy Machine Learning Dr.Ammar Mohammed Nearest Neighbors Set of Stored Cases Atr1... AtrN Class A Store the training samples Use training samples
More informationCISC 4631 Data Mining
CISC 4631 Data Mining Lecture 03: Nearest Neighbor Learning Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof. R. Mooney (UT Austin) Prof E. Keogh (UCR), Prof. F.
More informationData Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation
Data Mining Part 2. Data Understanding and Preparation 2.4 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Normalization Attribute Construction Aggregation Attribute Subset Selection Discretization
More informationData Preprocessing. Slides by: Shree Jaswal
Data Preprocessing Slides by: Shree Jaswal Topics to be covered Why Preprocessing? Data Cleaning; Data Integration; Data Reduction: Attribute subset selection, Histograms, Clustering and Sampling; Data
More informationISSN: (Online) Volume 3, Issue 9, September 2015 International Journal of Advance Research in Computer Science and Management Studies
ISSN: 2321-7782 (Online) Volume 3, Issue 9, September 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online
More informationECLT 5810 Data Preprocessing. Prof. Wai Lam
ECLT 5810 Data Preprocessing Prof. Wai Lam Why Data Preprocessing? Data in the real world is imperfect incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate
More informationData Preprocessing. Supervised Learning
Supervised Learning Regression Given the value of an input X, the output Y belongs to the set of real values R. The goal is to predict output accurately for a new input. The predictions or outputs y are
More information1) Give decision trees to represent the following Boolean functions:
1) Give decision trees to represent the following Boolean functions: 1) A B 2) A [B C] 3) A XOR B 4) [A B] [C Dl Answer: 1) A B 2) A [B C] 1 3) A XOR B = (A B) ( A B) 4) [A B] [C D] 2 2) Consider the following
More informationData Mining and Data Warehousing Classification-Lazy Learners
Motivation Data Mining and Data Warehousing Classification-Lazy Learners Lazy Learners are the most intuitive type of learners and are used in many practical scenarios. The reason of their popularity is
More informationData Mining Classification - Part 1 -
Data Mining Classification - Part 1 - Universität Mannheim Bizer: Data Mining I FSS2019 (Version: 20.2.2018) Slide 1 Outline 1. What is Classification? 2. K-Nearest-Neighbors 3. Decision Trees 4. Model
More informationData mining. Classification k-nn Classifier. Piotr Paszek. (Piotr Paszek) Data mining k-nn 1 / 20
Data mining Piotr Paszek Classification k-nn Classifier (Piotr Paszek) Data mining k-nn 1 / 20 Plan of the lecture 1 Lazy Learner 2 k-nearest Neighbor Classifier 1 Distance (metric) 2 How to Determine
More informationCHAPTER 3 A FAST K-MODES CLUSTERING ALGORITHM TO WAREHOUSE VERY LARGE HETEROGENEOUS MEDICAL DATABASES
70 CHAPTER 3 A FAST K-MODES CLUSTERING ALGORITHM TO WAREHOUSE VERY LARGE HETEROGENEOUS MEDICAL DATABASES 3.1 INTRODUCTION In medical science, effective tools are essential to categorize and systematically
More informationGUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV
GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV Subject Name: Elective I Data Warehousing & Data Mining (DWDM) Subject Code: 2640005 Learning Objectives: To understand
More informationPreprocessing DWML, /33
Preprocessing DWML, 2007 1/33 Preprocessing Before you can start on the actual data mining, the data may require some preprocessing: Attributes may be redundant. Values may be missing. The data contains
More informationPerformance Analysis of Classifying Unlabeled Data from Multiple Data Sources
Performance Analysis of Classifying Unlabeled Data from Multiple Data Sources M.JEEVAN BABU, K. SUVARNA VANI * Department of Computer Science and Engineering V. R. Siddhartha Engineering College, Vijayawada,
More informationData Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha
Data Preprocessing S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha 1 Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking
More informationMachine Learning Chapter 2. Input
Machine Learning Chapter 2. Input 2 Input: Concepts, instances, attributes Terminology What s a concept? Classification, association, clustering, numeric prediction What s in an example? Relations, flat
More informationk-nearest Neighbor (knn) Sept Youn-Hee Han
k-nearest Neighbor (knn) Sept. 2015 Youn-Hee Han http://link.koreatech.ac.kr ²Eager Learners Eager vs. Lazy Learning when given a set of training data, it will construct a generalization model before receiving
More informationVoronoi Region. K-means method for Signal Compression: Vector Quantization. Compression Formula 11/20/2013
Voronoi Region K-means method for Signal Compression: Vector Quantization Blocks of signals: A sequence of audio. A block of image pixels. Formally: vector example: (0.2, 0.3, 0.5, 0.1) A vector quantizer
More informationBasic Concepts Weka Workbench and its terminology
Changelog: 14 Oct, 30 Oct Basic Concepts Weka Workbench and its terminology Lecture Part Outline Concepts, instances, attributes How to prepare the input: ARFF, attributes, missing values, getting to know
More informationCHAPTER 4 K-MEANS AND UCAM CLUSTERING ALGORITHM
CHAPTER 4 K-MEANS AND UCAM CLUSTERING 4.1 Introduction ALGORITHM Clustering has been used in a number of applications such as engineering, biology, medicine and data mining. The most popular clustering
More informationData Mining Practical Machine Learning Tools and Techniques
Input: Concepts, instances, attributes Data ining Practical achine Learning Tools and Techniques Slides for Chapter 2 of Data ining by I. H. Witten and E. rank Terminology What s a concept z Classification,
More informationNearest Neighbor Methods
Nearest Neighbor Methods Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Nearest Neighbor Methods Learning Store all training examples Classifying a
More informationMachine Learning - Clustering. CS102 Fall 2017
Machine Learning - Fall 2017 Big Data Tools and Techniques Basic Data Manipulation and Analysis Performing well-defined computations or asking well-defined questions ( queries ) Data Mining Looking for
More informationEnhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques
24 Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Ruxandra PETRE
More informationCONCEPT FORMATION AND DECISION TREE INDUCTION USING THE GENETIC PROGRAMMING PARADIGM
1 CONCEPT FORMATION AND DECISION TREE INDUCTION USING THE GENETIC PROGRAMMING PARADIGM John R. Koza Computer Science Department Stanford University Stanford, California 94305 USA E-MAIL: Koza@Sunburn.Stanford.Edu
More informationDATA MINING Introductory and Advanced Topics Part I
DATA MINING Introductory and Advanced Topics Part I Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Companion slides for the text by Dr. M.H.Dunham, Data
More informationData Mining and Machine Learning: Techniques and Algorithms
Instance based classification Data Mining and Machine Learning: Techniques and Algorithms Eneldo Loza Mencía eneldo@ke.tu-darmstadt.de Knowledge Engineering Group, TU Darmstadt International Week 2019,
More informationMining di Dati Web. Lezione 3 - Clustering and Classification
Mining di Dati Web Lezione 3 - Clustering and Classification Introduction Clustering and classification are both learning techniques They learn functions describing data Clustering is also known as Unsupervised
More informationWhat is Data Mining? Data Mining. Data Mining Architecture. Illustrative Applications. Pharmaceutical Industry. Pharmaceutical Industry
Data Mining Andrew Kusiak Intelligent Systems Laboratory 2139 Seamans Center The University of Iowa Iowa City, IA 52242-1527 andrew-kusiak@uiowa.edu http://www.icaen.uiowa.edu/~ankusiak Tel. 319-335 5934
More informationData Mining. Practical Machine Learning Tools and Techniques. Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A.
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Input: Concepts, instances, attributes Terminology What s a concept?
More informationChapter 3: Supervised Learning
Chapter 3: Supervised Learning Road Map Basic concepts Evaluation of classifiers Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Summary 2 An example
More informationResearch on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a
International Conference on Education Technology, Management and Humanities Science (ETMHS 2015) Research on Applications of Data Mining in Electronic Commerce Xiuping YANG 1, a 1 Computer Science Department,
More information劉介宇 國立台北護理健康大學 護理助產研究所 / 通識教育中心副教授 兼教師發展中心教師評鑑組長 Nov 19, 2012
劉介宇 國立台北護理健康大學 護理助產研究所 / 通識教育中心副教授 兼教師發展中心教師評鑑組長 Nov 19, 2012 Overview of Data Mining ( 資料採礦 ) What is Data Mining? Steps in Data Mining Overview of Data Mining techniques Points to Remember Data mining
More informationCLASSIFICATION OF C4.5 AND CART ALGORITHMS USING DECISION TREE METHOD
CLASSIFICATION OF C4.5 AND CART ALGORITHMS USING DECISION TREE METHOD Khin Lay Myint 1, Aye Aye Cho 2, Aye Mon Win 3 1 Lecturer, Faculty of Information Science, University of Computer Studies, Hinthada,
More informationIteration Reduction K Means Clustering Algorithm
Iteration Reduction K Means Clustering Algorithm Kedar Sawant 1 and Snehal Bhogan 2 1 Department of Computer Engineering, Agnel Institute of Technology and Design, Assagao, Goa 403507, India 2 Department
More informationLecture 5: Decision Trees (Part II)
Lecture 5: Decision Trees (Part II) Dealing with noise in the data Overfitting Pruning Dealing with missing attribute values Dealing with attributes with multiple values Integrating costs into node choice
More informationQuestion Bank. 4) It is the source of information later delivered to data marts.
Question Bank Year: 2016-2017 Subject Dept: CS Semester: First Subject Name: Data Mining. Q1) What is data warehouse? ANS. A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile
More information