CS513-Data Mining. Lecture 2: Understanding the Data. Waheed Noor
|
|
- Godfrey Ray
- 5 years ago
- Views:
Transcription
1 CS513-Data Mining Lecture 2: Understanding the Data Waheed Noor Computer Science and Information Technology, University of Balochistan, Quetta, Pakistan Waheed Noor (CS&IT, UoB, Quetta) CS513-Data Mining March / 32
2 Outline 1 Patterns Class Activity 2 Types of Learning 3 Model 4 Data Mining Algorithms 5 Understanding your Data: Input 6 Issues with Real World Data Waheed Noor (CS&IT, UoB, Quetta) CS513-Data Mining March / 32
3 Outline 1 Patterns Class Activity 2 Types of Learning 3 Model 4 Data Mining Algorithms 5 Understanding your Data: Input 6 Issues with Real World Data Waheed Noor (CS&IT, UoB, Quetta) CS513-Data Mining March / 32
4 Pattern Example Example Consider the data of contact lens prescription from an optician, the task is to prescribe a soft, hard or no contact lens to the patient based on his/her information. We will analyze past data in order to find some patterns, if possible. Waheed Noor (CS&IT, UoB, Quetta) CS513-Data Mining March / 32
5 Contact Lens Data Waheed Noor (CS&IT, UoB, Quetta) CS513-Data Mining March / 32
6 Finding Patterns: Illustration if tear production rate = reduced then recommendation = none elseif age = young and astigmatic=no then recommendation=soft else recommendation = hard end if Waheed Noor (CS&IT, UoB, Quetta) CS513-Data Mining March / 32
7 What we get These pattern may not be enough to be generalized as rule, since example is a simple one and we do not have enough data. (i.e., may be incomplete). We can say this pattern just summarizes the data. How many possible values of input required for extracting useful patterns? ( ) Actually, the data mining task needs to generalize to new examples as well. Real life data often contains examples in which values of some features are noisy or missing. Which can effect the performance of data mining technique. Misclassification can even occur on the datasets that were used to train/learn the method. Waheed Noor (CS&IT, UoB, Quetta) CS513-Data Mining March / 32
8 Outline 1 Patterns Class Activity 2 Types of Learning 3 Model 4 Data Mining Algorithms 5 Understanding your Data: Input 6 Issues with Real World Data Waheed Noor (CS&IT, UoB, Quetta) CS513-Data Mining March / 32
9 Weather Problem Example Waheed Noor (CS&IT, UoB, Quetta) CS513-Data Mining March / 32
10 Some Complexity: Numeric Attributes Waheed Noor (CS&IT, UoB, Quetta) CS513-Data Mining March / 32
11 Classification What we have seen so far are classification rules, i.e., classifying examples We can also look examples for rules that associate values of different attributes, Association Rules. Example if temperature = cool then humidity = normal if humidity = normal and windy = false then play = yes if windy = false and play = no then outlook = sunny and humidity = high Can you identify one? Waheed Noor (CS&IT, UoB, Quetta) CS513-Data Mining March / 32
12 Rules Definition (Rules) Set of conditions/decisions that can be specifically and implicitly interpreted in some order. They are helpful tools for making classification and association of examples. E.g., decision list, that is interpreted in a sequence, or decision tree, that are interpreted hierarchically. Sometime we may get a rule set that gives unique prescription for every conceivable example, such as for above examples However, it is generally not possible, there may be situation where no rule is applicable or more than one rules are applicable (i.e., conflict will rise then we go to probability or weigths) Waheed Noor (CS&IT, UoB, Quetta) CS513-Data Mining March / 32
13 Outline 1 Patterns Class Activity 2 Types of Learning 3 Model 4 Data Mining Algorithms 5 Understanding your Data: Input 6 Issues with Real World Data Waheed Noor (CS&IT, UoB, Quetta) CS513-Data Mining March / 32
14 Types of Learning in Data Mining Classification Learning: Learning is achieved by presenting classified examples (historical/training data) in order to classify unseen examples (future/test data). Association Learning: Association among features is learned from historical data. Here it is not just limited to learning for one particular attribute or feature. Clustering: Examples are grouped together based on some similarity or homogeneity. Numeric Prediction: The outcome to be predicted is not a discrete class but the prediction is made for numerical outcome. Definition (Concept) Any thing that is being learned is called the concept, and the output of the learning method is known as concept description. Waheed Noor (CS&IT, UoB, Quetta) CS513-Data Mining March / 32
15 Outline 1 Patterns Class Activity 2 Types of Learning 3 Model 4 Data Mining Algorithms 5 Understanding your Data: Input 6 Issues with Real World Data Waheed Noor (CS&IT, UoB, Quetta) CS513-Data Mining March / 32
16 Model Vs. Pattern Definition (Model) Describe global summary of the dataset, i.e., makes statement about any point in full measurement space. For example, predicting a values or assigning an example to the cluster. Even if some points in this space is missing. Model Representation At its simplest form, a model can be represented by: Y = ax + c where Y and X are variables (Y is outcome), and a and c are model parameters. This is a linear model, since Y is a linear function of X a. a Unlike Statistics, linearity here is in terms of variables rather than model parameters Waheed Noor (CS&IT, UoB, Quetta) CS513-Data Mining March / 32
17 Pattern Definition Describes a structure relating to a small parts (local) of data or measurement space. For example, mail order purchase data may reveal a pattern that customers buying particular product also buy an other product. Example (Fraud Detection) Bank transaction data can be mined for fraud detection, once the usual behaviors are described by patterns. Once these structures are defined their parameters can be estimated from the data. Models or patterns with parameter values are called fitted models or patterns respectively. Fitted models or patterns are then used on future data. Waheed Noor (CS&IT, UoB, Quetta) CS513-Data Mining March / 32
18 Outline 1 Patterns Class Activity 2 Types of Learning 3 Model 4 Data Mining Algorithms 5 Understanding your Data: Input 6 Issues with Real World Data Waheed Noor (CS&IT, UoB, Quetta) CS513-Data Mining March / 32
19 Why Algorithms We have seen that the data mining tasks rise in variety of different real world applications For example, Exploratory data analysis, descriptive modeling, predictive modeling, patterns and rules discovery, contents retrieval, and so on. To accomplish these tasks we need algorithms, termed as data mining algorithms Readings You should read about Real World Applications of Data Mining from different resources to build understanding of different types of problems and data mining tasks. Waheed Noor (CS&IT, UoB, Quetta) CS513-Data Mining March / 32
20 Data Mining Algorithms Not very strict, generally there are four basic components of a data mining algorithm Components 1 Model or Pattern Structure: Describe the underlying structure or functional forms that we seek from the data. 2 Score Function: Also known as cost function, objective function or performance measure, It is used to evaluate or judge the learning capability and quality of the fitted structure (pattern or model). 3 Optimization or Searching: Optimizing the score function and searching through different possible model and pattern structures to find the best. 4 Data Management Strategy: Effective management of large data during optimization and searching. Waheed Noor (CS&IT, UoB, Quetta) CS513-Data Mining March / 32
21 Outline 1 Patterns Class Activity 2 Types of Learning 3 Model 4 Data Mining Algorithms 5 Understanding your Data: Input 6 Issues with Real World Data Waheed Noor (CS&IT, UoB, Quetta) CS513-Data Mining March / 32
22 Understanding Input Definition (Example or Instance) A record or row in the data file is called an example or instance or observation. They may have relationship among them or independent of each other in some way. Definition (Attribute) The columns or fields of the data file that are fixed, predefined are known as features or attributes. An instance characterizes the set of attributes by its values. These attributes if selected or used for mining task then they will be referred as variables for the data mining algorithm. Waheed Noor (CS&IT, UoB, Quetta) CS513-Data Mining March / 32
23 Types of Data Quantitative Data: Numerical data, either continuous (e.g., Amount of sales, temperature) or integer (e.g., number of students in a class) Qualitative Data: That approximates or characterizes but does not measure, e.g., present or absent, level of agreement. Categorical Data: That represents one of several (limited) categories, e.g., color of an object, gender of the customer etc. They are also some time called discrete as they represent some well separated categories. Waheed Noor (CS&IT, UoB, Quetta) CS513-Data Mining March / 32
24 Measurement Levels Nominal : A variable can be treated as nominal when its values represent categories with no intrinsic ranking (for example, the departments of the company in which an employee works). Examples of nominal variables include region, zip code, and religious affiliation. Ordinal : A variable can be treated as ordinal when its values represent categories with some intrinsic ranking (for example, levels of service satisfaction from highly dissatisfied to highly satisfied). Examples of ordinal variables include attitude scores representing degree of satisfaction or confidence and preference rating scores. Scale : A variable can be treated as scale when its values represent ordered categories with a meaningful metric, so that distance comparisons between values are appropriate. Examples of scale variables include age in years and income in thousands of dollars. Waheed Noor (CS&IT, UoB, Quetta) CS513-Data Mining March / 32
25 Class Activity 1 Identify different types of data, and assign different measuring levels: Waheed Noor (CS&IT, UoB, Quetta) CS513-Data Mining March / 32
26 Class Activity 2 Identify different types of data, and assign different measuring levels: Waheed Noor (CS&IT, UoB, Quetta) CS513-Data Mining March / 32
27 Outline 1 Patterns Class Activity 2 Types of Learning 3 Model 4 Data Mining Algorithms 5 Understanding your Data: Input 6 Issues with Real World Data Waheed Noor (CS&IT, UoB, Quetta) CS513-Data Mining March / 32
28 Issues with Input Data Due to many reasons real world data is sometime inaccurate, inexact or incomplete as apposed to the assumption of data mining algorithms. Sparse Data Most attributes of the data may contain zero values, e.g., if a market basket data contains data of purchases by customers then for many products that customer has not purchased, quantity will be zero. Waheed Noor (CS&IT, UoB, Quetta) CS513-Data Mining March / 32
29 Missing Values Respondent in a survey may refuse to answer few questions or malfunction instrument may not record data for some attributes or values of some attributes in some circumstances may not be measured. These dataset will then contain missing values for specific attributes. Missing Values may be represented in the dataset by an out-of-range value, or negative value if it is not possible for the attribute to have negative value, by a dash, question mark, etc. When collecting or recording data, one may not find an attribute useful for their operation but that attribute may be important for mining task, then we are faced with missing attributes. For example, university may not be interested in the parent s education or income but these attributed may have significance when mining students data for possible financial aid offer. Waheed Noor (CS&IT, UoB, Quetta) CS513-Data Mining March / 32
30 Example of Missing Values Waheed Noor (CS&IT, UoB, Quetta) CS513-Data Mining March / 32
31 Inaccurate Values Since data for data mining task is not explicitly collected or recorded for this purpose one should carefully analyze data for rogue attributes or attribute values. Inaccuracy may occur: Typography Measurement Error Merging data from different sources Deliberately Waheed Noor (CS&IT, UoB, Quetta) CS513-Data Mining March / 32
32 References I Waheed Noor (CS&IT, UoB, Quetta) CS513-Data Mining March / 32
Input: Concepts, Instances, Attributes
Input: Concepts, Instances, Attributes 1 Terminology Components of the input: Concepts: kinds of things that can be learned aim: intelligible and operational concept description Instances: the individual,
More informationMachine Learning Chapter 2. Input
Machine Learning Chapter 2. Input 2 Input: Concepts, instances, attributes Terminology What s a concept? Classification, association, clustering, numeric prediction What s in an example? Relations, flat
More informationData Mining. Practical Machine Learning Tools and Techniques. Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A.
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Input: Concepts, instances, attributes Terminology What s a concept?
More informationBasic Concepts Weka Workbench and its terminology
Changelog: 14 Oct, 30 Oct Basic Concepts Weka Workbench and its terminology Lecture Part Outline Concepts, instances, attributes How to prepare the input: ARFF, attributes, missing values, getting to know
More informationData Mining Practical Machine Learning Tools and Techniques
Input: Concepts, instances, attributes Data ining Practical achine Learning Tools and Techniques Slides for Chapter 2 of Data ining by I. H. Witten and E. rank Terminology What s a concept z Classification,
More informationDATA ANALYSIS I. Types of Attributes Sparse, Incomplete, Inaccurate Data
DATA ANALYSIS I Types of Attributes Sparse, Incomplete, Inaccurate Data Sources Bramer, M. (2013). Principles of data mining. Springer. [12-21] Witten, I. H., Frank, E. (2011). Data Mining: Practical machine
More informationAnalytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.
Glossary of data mining terms: Accuracy Accuracy is an important factor in assessing the success of data mining. When applied to data, accuracy refers to the rate of correct values in the data. When applied
More informationData Mining. Part 1. Introduction. 1.4 Input. Spring Instructor: Dr. Masoud Yaghini. Input
Data Mining Part 1. Introduction 1.4 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Instances Attributes References Instances Instance: Instances Individual, independent example of the concept to be
More informationData Mining. 1.3 Input. Fall Instructor: Dr. Masoud Yaghini. Chapter 3: Input
Data Mining 1.3 Input Fall 2008 Instructor: Dr. Masoud Yaghini Outline Instances Attributes References Instances Instance: Instances Individual, independent example of the concept to be learned. Characterized
More informationData Mining. Part 1. Introduction. 1.3 Input. Fall Instructor: Dr. Masoud Yaghini. Input
Data Mining Part 1. Introduction 1.3 Fall 2009 Instructor: Dr. Masoud Yaghini Outline Instances Attributes References Instances Instance: Instances Individual, independent example of the concept to be
More informationARTIFICIAL INTELLIGENCE (CS 370D)
Princess Nora University Faculty of Computer & Information Systems ARTIFICIAL INTELLIGENCE (CS 370D) (CHAPTER-18) LEARNING FROM EXAMPLES DECISION TREES Outline 1- Introduction 2- know your data 3- Classification
More information22/10/16. Data Coding in SPSS. Data Coding in SPSS. Data Coding in SPSS. Data Coding in SPSS
DATA CODING IN SPSS STAFF TRAINING WORKSHOP March 28, 2017 Delivered by Dr. Director of Applied Economics Unit African Heritage Institution Enugu Nigeria To code data in SPSS, Lunch the SPSS The Data Editor
More informationData Representation Information Retrieval and Data Mining. Prof. Matteo Matteucci
Data Representation Information Retrieval and Data Mining Prof. Matteo Matteucci Instances, Attributes, Concepts 2 Instances The atomic elements of information from a dataset Also known as records, prototypes,
More informationData Mining Algorithms: Basic Methods
Algorithms: The basic methods Inferring rudimentary rules Data Mining Algorithms: Basic Methods Chapter 4 of Data Mining Statistical modeling Constructing decision trees Constructing rules Association
More informationECLT 5810 Clustering
ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping
More informationIntroduction to Data Mining and Data Analytics
1/28/2016 MIST.7060 Data Analytics 1 Introduction to Data Mining and Data Analytics What Are Data Mining and Data Analytics? Data mining is the process of discovering hidden patterns in data, where Patterns
More informationResearch Data Analysis using SPSS. By Dr.Anura Karunarathne Senior Lecturer, Department of Accountancy University of Kelaniya
Research Data Analysis using SPSS By Dr.Anura Karunarathne Senior Lecturer, Department of Accountancy University of Kelaniya MBA 61013- Business Statistics and Research Methodology Learning outcomes At
More informationAssociation Rules. Charles Sutton Data Mining and Exploration Spring Based on slides by Chris Williams and Amos Storkey. Thursday, 8 March 12
Association Rules Charles Sutton Data Mining and Exploration Spring 2012 Based on slides by Chris Williams and Amos Storkey The Goal Find patterns : local regularities that occur more often than you would
More informationSummary. Machine Learning: Introduction. Marcin Sydow
Outline of this Lecture Data Motivation for Data Mining and Learning Idea of Learning Decision Table: Cases and Attributes Supervised and Unsupervised Learning Classication and Regression Examples Data:
More informationChapter 3: Data Mining:
Chapter 3: Data Mining: 3.1 What is Data Mining? Data Mining is the process of automatically discovering useful information in large repository. Why do we need Data mining? Conventional database systems
More informationECLT 5810 Clustering
ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping
More informationData Mining Concepts & Techniques
Data Mining Concepts & Techniques Lecture No. 03 Data Processing, Data Mining Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology
More informationWhat Is Data Mining? CMPT 354: Database I -- Data Mining 2
Data Mining What Is Data Mining? Mining data mining knowledge Data mining is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data CMPT
More informationCOMP33111: Tutorial and lab exercise 7
COMP33111: Tutorial and lab exercise 7 Guide answers for Part 1: Understanding clustering 1. Explain the main differences between classification and clustering. main differences should include being unsupervised
More informationData Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395
Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 21 Table of contents 1 Introduction 2 Data mining
More informationCMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10)
CMPUT 391 Database Management Systems Data Mining Textbook: Chapter 17.7-17.11 (without 17.10) University of Alberta 1 Overview Motivation KDD and Data Mining Association Rules Clustering Classification
More informationData Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1394
Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1394 1 / 20 Table of contents 1 Introduction 2 Data mining
More informationRepresenting structural patterns: Reading Material: Chapter 3 of the textbook by Witten
Representing structural patterns: Plain Classification rules Decision Tree Rules with exceptions Relational solution Tree for Numerical Prediction Instance-based presentation Reading Material: Chapter
More informationDESIGN AND IMPLEMENTATION OF BUILDING DECISION TREE USING C4.5 ALGORITHM
1 Proceedings of SEAMS-GMU Conference 2007 DESIGN AND IMPLEMENTATION OF BUILDING DECISION TREE USING C4.5 ALGORITHM KUSRINI Abstract. Decision tree is one of data mining techniques that is applied in classification
More informationData Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.4. Spring 2010 Instructor: Dr. Masoud Yaghini Outline Using IF-THEN Rules for Classification Rule Extraction from a Decision Tree 1R Algorithm Sequential Covering Algorithms
More informationData Mining Input: Concepts, Instances, and Attributes
Data Mining Input: Concepts, Instances, and Attributes Chapter 2 of Data Mining Terminology Components of the input: Concepts: kinds of things that can be learned Goal: intelligible and operational concept
More informationNormalization and denormalization Missing values Outliers detection and removing Noisy Data Variants of Attributes Meta Data Data Transformation
Preprocessing Data Normalization and denormalization Missing values Outliers detection and removing Noisy Data Variants of Attributes Meta Data Data Transformation Reading material: Chapters 2 and 3 of
More informationData Preprocessing. Slides by: Shree Jaswal
Data Preprocessing Slides by: Shree Jaswal Topics to be covered Why Preprocessing? Data Cleaning; Data Integration; Data Reduction: Attribute subset selection, Histograms, Clustering and Sampling; Data
More informationLecture 6: Unsupervised Machine Learning Dagmar Gromann International Center For Computational Logic
SEMANTIC COMPUTING Lecture 6: Unsupervised Machine Learning Dagmar Gromann International Center For Computational Logic TU Dresden, 23 November 2018 Overview Unsupervised Machine Learning overview Association
More informationClassification and Regression
Classification and Regression Announcements Study guide for exam is on the LMS Sample exam will be posted by Monday Reminder that phase 3 oral presentations are being held next week during workshops Plan
More informationData analysis using Microsoft Excel
Introduction to Statistics Statistics may be defined as the science of collection, organization presentation analysis and interpretation of numerical data from the logical analysis. 1.Collection of Data
More informationCS434 Notebook. April 19. Data Mining and Data Warehouse
CS434 Notebook April 19 2017 Data Mining and Data Warehouse Table of Contents The DM Process MS s view (DMX)... 3 The Basics... 3 The Three-Step Dance... 3 Few Important Concepts... 4 More on Attributes...
More informationSOCIAL MEDIA MINING. Data Mining Essentials
SOCIAL MEDIA MINING Data Mining Essentials Dear instructors/users of these slides: Please feel free to include these slides in your own material, or modify them as you see fit. If you decide to incorporate
More informationPreprocessing Short Lecture Notes cse352. Professor Anita Wasilewska
Preprocessing Short Lecture Notes cse352 Professor Anita Wasilewska Data Preprocessing Why preprocess the data? Data cleaning Data integration and transformation Data reduction Discretization and concept
More informationInstance-Based Representations. k-nearest Neighbor. k-nearest Neighbor. k-nearest Neighbor. exemplars + distance measure. Challenges.
Instance-Based Representations exemplars + distance measure Challenges. algorithm: IB1 classify based on majority class of k nearest neighbors learned structure is not explicitly represented choosing k
More informationHomework # 4. Example: Age in years. Answer: Discrete, quantitative, ratio. a) Year that an event happened, e.g., 1917, 1950, 2000.
Homework # 4 1. Attribute Types Classify the following attributes as binary, discrete, or continuous. Further classify the attributes as qualitative (nominal or ordinal) or quantitative (interval or ratio).
More informationData Mining. Practical Machine Learning Tools and Techniques. Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A.
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Output: Knowledge representation Tables Linear models Trees Rules
More informationChapter 1 Introduction to Statistics
Corresponds to ELEMENTARY STATISTICS USING THE TI 83/84 PLUS CALCULATOR 3rd ed. Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by Mario F. Triola Chapter 1 Introduction
More informationPSS718 - Data Mining
Lecture 5 - Hacettepe University October 23, 2016 Data Issues Improving the performance of a model To improve the performance of a model, we mostly improve the data Source additional data Clean up the
More informationExtra readings beyond the lecture slides are important:
1 Notes To preview next lecture: Check the lecture notes, if slides are not available: http://web.cse.ohio-state.edu/~sun.397/courses/au2017/cse5243-new.html Check UIUC course on the same topic. All their
More informationData Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha
Data Preprocessing S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha 1 Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking
More informationData Mining. 3.3 Rule-Based Classification. Fall Instructor: Dr. Masoud Yaghini. Rule-Based Classification
Data Mining 3.3 Fall 2008 Instructor: Dr. Masoud Yaghini Outline Using IF-THEN Rules for Classification Rules With Exceptions Rule Extraction from a Decision Tree 1R Algorithm Sequential Covering Algorithms
More informationData Mining. Lab 1: Data sets: characteristics, formats, repositories Introduction to Weka. I. Data sets. I.1. Data sets characteristics and formats
Data Mining Lab 1: Data sets: characteristics, formats, repositories Introduction to Weka I. Data sets I.1. Data sets characteristics and formats The data to be processed can be structured (e.g. data matrix,
More informationCS414-Artificial Intelligence
CS414-Artificial Intelligence Lecture 6: Informed Search Algorithms Waheed Noor Computer Science and Information Technology, University of Balochistan, Quetta, Pakistan Waheed Noor (CS&IT, UoB, Quetta)
More informationData Mining. Covering algorithms. Covering approach At each stage you identify a rule that covers some of instances. Fig. 4.
Data Mining Chapter 4. Algorithms: The Basic Methods (Covering algorithm, Association rule, Linear models, Instance-based learning, Clustering) 1 Covering approach At each stage you identify a rule that
More informationMachine Learning Feature Creation and Selection
Machine Learning Feature Creation and Selection Jeff Howbert Introduction to Machine Learning Winter 2012 1 Feature creation Well-conceived new features can sometimes capture the important information
More informationData Mining Practical Machine Learning Tools and Techniques
Output: Knowledge representation Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter of Data Mining by I. H. Witten and E. Frank Decision tables Decision trees Decision rules
More informationCSE4334/5334 DATA MINING
CSE4334/5334 DATA MINING Lecture 4: Classification (1) CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai Li (Slides courtesy
More informationFrequency distribution
Frequency distribution In order to describe situations, draw conclusions, or make inferences about events, the researcher must organize the data in some meaningful way. The most convenient method of organizing
More informationMAT 155. Chapter 1 Introduction to Statistics. sample. population. parameter. statistic
MAT 155 Dr. Claude Moore Cape Fear Community College Chapter 1 Introduction to Statistics 1 1Review and Preview 1 2Statistical Thinking 1 3Types of Data 1 4Critical Thinking 1 5Collecting Sample Data Key
More informationChapter 28. Outline. Definitions of Data Mining. Data Mining Concepts
Chapter 28 Data Mining Concepts Outline Data Mining Data Warehousing Knowledge Discovery in Databases (KDD) Goals of Data Mining and Knowledge Discovery Association Rules Additional Data Mining Algorithms
More informationA Brief Introduction to Data Mining
A Brief Introduction to Data Mining L. Torgo ltorgo@dcc.fc.up.pt Departamento de Ciência de Computadores Faculdade de Ciências / Universidade do Porto Feb, 2017 What is Data Mining? Introduction A possible
More informationData Preprocessing. Why Data Preprocessing? MIT-652 Data Mining Applications. Chapter 3: Data Preprocessing. Multi-Dimensional Measure of Data Quality
Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate data e.g., occupation = noisy: containing
More informationJue Wang (Joyce) Department of Computer Science, University of Massachusetts, Boston Feb Outline
Learn to Use Weka Jue Wang (Joyce) Department of Computer Science, University of Massachusetts, Boston Feb-09-2010 Outline Introduction of Weka Explorer Filter Classify Cluster Experimenter KnowledgeFlow
More informationTour-Based Mode Choice Modeling: Using An Ensemble of (Un-) Conditional Data-Mining Classifiers
Tour-Based Mode Choice Modeling: Using An Ensemble of (Un-) Conditional Data-Mining Classifiers James P. Biagioni Piotr M. Szczurek Peter C. Nelson, Ph.D. Abolfazl Mohammadian, Ph.D. Agenda Background
More informationData Mining. Ryan Benton Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, La., USA.
Data Mining Ryan Benton Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, La., USA January 13, 2011 Important Note! This presentation was obtained from Dr. Vijay Raghavan
More information1. Inroduction to Data Mininig
1. Inroduction to Data Mininig 1.1 Introduction Universe of Data Information Technology has grown in various directions in the recent years. One natural evolutionary path has been the development of the
More informationA Brief Introduction to Data Mining
A Brief Introduction to Data Mining L. Torgo ltorgo@dcc.fc.up.pt Departamento de Ciência de Computadores Faculdade de Ciências / Universidade do Porto Sept, 2014 Introduction Motivation for Data Mining?
More informationA Simple Guide to Using SPSS (Statistical Package for the. Introduction. Steps for Analyzing Data. Social Sciences) for Windows
A Simple Guide to Using SPSS (Statistical Package for the Social Sciences) for Windows Introduction ٢ Steps for Analyzing Data Enter the data Select the procedure and options Select the variables Run the
More informationPredict Outcomes and Reveal Relationships in Categorical Data
PASW Categories 18 Specifications Predict Outcomes and Reveal Relationships in Categorical Data Unleash the full potential of your data through predictive analysis, statistical learning, perceptual mapping,
More informationClassification with Decision Tree Induction
Classification with Decision Tree Induction This algorithm makes Classification Decision for a test sample with the help of tree like structure (Similar to Binary Tree OR k-ary tree) Nodes in the tree
More informationCS Database Design - Assignments #3 Due on 30 March 2015 (Monday)
CS422 - Database Design - Assignments #3 Due on 30 March 205 (Monday) The solutions must be hand written, no computer printout, and no photocopy.. (From CJ Date s book 4th edition, page 536) Figure represents
More informationClustering Analysis Basics
Clustering Analysis Basics Ke Chen Reading: [Ch. 7, EA], [5., KPM] Outline Introduction Data Types and Representations Distance Measures Major Clustering Methodologies Summary Introduction Cluster: A collection/group
More informationNearest neighbor classification DSE 220
Nearest neighbor classification DSE 220 Decision Trees Target variable Label Dependent variable Output space Person ID Age Gender Income Balance Mortgag e payment 123213 32 F 25000 32000 Y 17824 49 M 12000-3000
More informationFeatures: representation, normalization, selection. Chapter e-9
Features: representation, normalization, selection Chapter e-9 1 Features Distinguish between instances (e.g. an image that you need to classify), and the features you create for an instance. Features
More informationHomework 1 Sample Solution
Homework 1 Sample Solution 1. Iris: All attributes of iris are numeric, therefore ID3 of weka cannt be applied to this data set. Contact-lenses: tear-prod-rate = reduced: none tear-prod-rate = normal astigmatism
More informationData Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier
Data Mining 3.2 Decision Tree Classifier Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction Basic Algorithm for Decision Tree Induction Attribute Selection Measures Information Gain Gain Ratio
More informationData Preprocessing UE 141 Spring 2013
Data Preprocessing UE 141 Spring 2013 Jing Gao SUNY Buffalo 1 Outline Data Data Preprocessing Improve data quality Prepare data for analysis Exploring Data Statistics Visualization 2 Document Data Each
More informationMACHINE LEARNING Example: Google search
MACHINE LEARNING Lauri Ilison, PhD Data Scientist 20.11.2014 Example: Google search 1 27.11.14 Facebook: 350 million photo uploads every day The dream is to build full knowledge of the world and know everything
More informationIBM SPSS Categories. Predict outcomes and reveal relationships in categorical data. Highlights. With IBM SPSS Categories you can:
IBM Software IBM SPSS Statistics 19 IBM SPSS Categories Predict outcomes and reveal relationships in categorical data Highlights With IBM SPSS Categories you can: Visualize and explore complex categorical
More informationIBM SPSS Categories 23
IBM SPSS Categories 23 Note Before using this information and the product it supports, read the information in Notices on page 55. Product Information This edition applies to version 23, release 0, modification
More informationData Mining with Weka
Data Mining with Weka Class 5 Lesson 1 The data mining process Ian H. Witten Department of Computer Science University of Waikato New Zealand weka.waikato.ac.nz Lesson 5.1 The data mining process Class
More informationQuestion Bank. 4) It is the source of information later delivered to data marts.
Question Bank Year: 2016-2017 Subject Dept: CS Semester: First Subject Name: Data Mining. Q1) What is data warehouse? ANS. A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile
More informationUNIT 2 Data Preprocessing
UNIT 2 Data Preprocessing Lecture Topic ********************************************** Lecture 13 Why preprocess the data? Lecture 14 Lecture 15 Lecture 16 Lecture 17 Data cleaning Data integration and
More informationECLT 5810 Data Preprocessing. Prof. Wai Lam
ECLT 5810 Data Preprocessing Prof. Wai Lam Why Data Preprocessing? Data in the real world is imperfect incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate
More informationLecture 6 K- Nearest Neighbors(KNN) And Predictive Accuracy
Lecture 6 K- Nearest Neighbors(KNN) And Predictive Accuracy Machine Learning Dr.Ammar Mohammed Nearest Neighbors Set of Stored Cases Atr1... AtrN Class A Store the training samples Use training samples
More informationData Mining Concepts
Data Mining Concepts Outline Data Mining Data Warehousing Knowledge Discovery in Databases (KDD) Goals of Data Mining and Knowledge Discovery Association Rules Additional Data Mining Algorithms Sequential
More informationR07. FirstRanker. 7. a) What is text mining? Describe about basic measures for text retrieval. b) Briefly describe document cluster analysis.
www..com www..com Set No.1 1. a) What is data mining? Briefly explain the Knowledge discovery process. b) Explain the three-tier data warehouse architecture. 2. a) With an example, describe any two schema
More informationA Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2
A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 1 Department of Electronics & Comp. Sc, RTMNU, Nagpur, India 2 Department of Computer Science, Hislop College, Nagpur,
More informationBasic concepts and terms
CHAPTER ONE Basic concepts and terms I. Key concepts Test usefulness Reliability Construct validity Authenticity Interactiveness Impact Practicality Assessment Measurement Test Evaluation Grading/marking
More informationISSN: (Online) Volume 3, Issue 9, September 2015 International Journal of Advance Research in Computer Science and Management Studies
ISSN: 2321-7782 (Online) Volume 3, Issue 9, September 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online
More informationData Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation
Data Mining Part 2. Data Understanding and Preparation 2.4 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Normalization Attribute Construction Aggregation Attribute Subset Selection Discretization
More informationRight-click on whatever it is you are trying to change Get help about the screen you are on Help Help Get help interpreting a table
Q Cheat Sheets What to do when you cannot figure out how to use Q What to do when the data looks wrong Right-click on whatever it is you are trying to change Get help about the screen you are on Help Help
More information2. (a) Briefly discuss the forms of Data preprocessing with neat diagram. (b) Explain about concept hierarchy generation for categorical data.
Code No: M0502/R05 Set No. 1 1. (a) Explain data mining as a step in the process of knowledge discovery. (b) Differentiate operational database systems and data warehousing. [8+8] 2. (a) Briefly discuss
More informationDefining a Data Mining Task. CSE3212 Data Mining. What to be mined? Or the Approaches. Task-relevant Data. Estimation.
CSE3212 Data Mining Data Mining Approaches Defining a Data Mining Task To define a data mining task, one needs to answer the following questions: 1. What data set do I want to mine? 2. What kind of knowledge
More informationData Mining and Analytics. Introduction
Data Mining and Analytics Introduction Data Mining Data mining refers to extracting or mining knowledge from large amounts of data It is also termed as Knowledge Discovery from Data (KDD) Mostly, data
More informationData can be in the form of numbers, words, measurements, observations or even just descriptions of things.
+ What is Data? Data is a collection of facts. Data can be in the form of numbers, words, measurements, observations or even just descriptions of things. In most cases, data needs to be interpreted and
More informationData Mining Course Overview
Data Mining Course Overview 1 Data Mining Overview Understanding Data Classification: Decision Trees and Bayesian classifiers, ANN, SVM Association Rules Mining: APriori, FP-growth Clustering: Hierarchical
More informationOrganizing Data. Class limits (in miles) Tally Frequency Total 50
2 2 Organizing Data Objective 1. Organize data using frequency distributions. Suppose a researcher wished to do a study on the number of miles the employees of a large department store traveled to work
More informationOracle9i Data Mining. Data Sheet August 2002
Oracle9i Data Mining Data Sheet August 2002 Oracle9i Data Mining enables companies to build integrated business intelligence applications. Using data mining functionality embedded in the Oracle9i Database,
More informationKnowledge Engineering and Data Mining. Knowledge engineering has 6 basic phases:
Knowledge Engineering and Data Mining Knowledge Engineering The process of building intelligent knowledge based systems is called knowledge engineering Knowledge engineering has 6 basic phases: 1. Problem
More informationScaling Techniques in Political Science
Scaling Techniques in Political Science Eric Guntermann March 14th, 2014 Eric Guntermann Scaling Techniques in Political Science March 14th, 2014 1 / 19 What you need R RStudio R code file Datasets You
More informationCSIS. Pattern Recognition. Prof. Sung-Hyuk Cha Fall of School of Computer Science & Information Systems. Artificial Intelligence CSIS
Pattern Recognition Prof. Sung-Hyuk Cha Fall of 2002 School of Computer Science & Information Systems Artificial Intelligence 1 Perception Lena & Computer vision 2 Machine Vision Pattern Recognition Applications
More informationPREDICTING UPCOMING STUDENTS PERFORMANCE USING MINING TECHNIQUE
PREDICTING UPCOMING STUDENTS PERFORMANCE USING MINING TECHNIQUE Madhan kumar R 1 and Rajesh N 2 1,2 Department of information science, The National Institute of Engineering, Mysuru-570008 Abstract- to
More informationClustering of Data with Mixed Attributes based on Unified Similarity Metric
Clustering of Data with Mixed Attributes based on Unified Similarity Metric M.Soundaryadevi 1, Dr.L.S.Jayashree 2 Dept of CSE, RVS College of Engineering and Technology, Coimbatore, Tamilnadu, India 1
More information