K236: Basis of Data Science

Size: px
Start display at page:

Download "K236: Basis of Data Science"

Transcription

1 Schedule of K236 K236: Basis of Data Science Lecture 6: Data Preprocessing Lecturer: Tu Bao Ho and Hieu Chi Dam TA: Moharasan Gandhimathi and Nuttapong Sanglerdsinlapachai 1. Introduction to data science 6/9 2. Introduction to data science 6/13 3. Data and databases 6/16 4. Review of univariate statistics 6/20 5. Review of linear algebra 6/23 6. Data mining software 6/27 7. Data preprocessing 6/30 8. Classification and prediction (1) (1) 7/4 9. Knowledge evaluation 7/7 10. Classification and prediction (2) (2) 7/ Classification and prediction (3) (3) 7/ Mining association rules (1) 7/ Mining association rules (2) 7/ Cluster analysis 7/ Review and Examination (the data is not fixed) 7/27 2 The data analysis process Outline Lecture'6 1 Create/select target$database Select$sampling technique$and$ sample$data Data$organized$by$function$ Data$warehousing 1. Why Preprocess the Data? 2. Data Cleaning 3. Data Integration 2 Supply$missing$ values Eliminate noisy$data 4. Data Reduction Normalize values Transform values Create$derived attributes Find$important attributes$& value$ranges 5. Data Transformation 3 Select$DM$ task$(s) Lecture'7*9,'10*14 Select$DM$ method$(s) Extract$ knowledge Lecture'8 Test$ knowledge Refine$ knowledge 5 Transform$to different representation Query$&$report$generation Aggregation$&$sequences Advanced$methods 4 3 4

2 Why preprocess the data? Major tasks in data preprocessing Common properties of large real-world databases: Incomplete: lacking attribute values or certain of interest Noisy: containing errors or outliers Inconsistent: containing discrepancies in codes or names 1 2 Data cleaning Data integration Veracity problem! No quality data, no quality analysis results! 5 3 Data reduction (instances and dimensions) 4 Data transformation 6 Major tasks in data preprocessing Outline Data cleaning! Fill in missing values, smooth noisy data, identify or remove outliers, and resolve inconsistencies Data integration! Integration of multiple databases, data cubes, or files Data transformation! Normalization and aggregation Data reduction! Obtains reduced representation in volume but produces the same or similar analytical results Data discretization! Part of data reduction but with particular importance, especially for numerical data 1. Why Preprocess the Data? 2. Data Cleaning 3. Data Integration 4. Data Reduction 5. Data Transformation 7 8

3 Data cleaning tasks Missing data Fill in missing values Identify outliers and smooth out noisy data Correct inconsistent data Data is not always available! e.g., many tuples have no recorded value for several attributes, such as customer income in sales data Missing data may be due to! equipment malfunction! inconsistent with other recorded data and thus deleted! data not entered due to misunderstanding! certain data may not be considered important at the time of entry! not register history or changes of the data Missing data may need to be inferred Missing values in databases Missing values in databases Missing values may hide a true answer underlying in the data Many data mining programs cannot be applied with data that includes missing values Methods 1. Ignore&the&tuples 2. Fill&in&the&missing&value&manually& (tedious&+&infeasible?) 3. Use&a&global&constant&to&fill&in&the& missing&value 4. Use&the&attribute&mean&to&fill&the& missing&values 5. Use&the&attribute&mean&(or&mode& for&categorical&attribute)&for&all& samples&belonging&to&the&same& class&as&the&given&tuple. 6. Use&the&most&probable&value&to&&& fill&the&missing&value Methods: yes no yes no no yes unknown unknown dna dna unknown dna Class&attribute:&norm,&lt2norm,&gt2norm Other&six&attributes&all$have&missing&values Others 12

4 Noisy data Noise: random error or variance in a measured variable Incorrect attribute values may due to! faulty data collection instruments! data entry problems! data transmission problems! technology limitation! inconsistency in naming convention Other data problems which requires data cleaning! duplicate records! incomplete data! inconsistent data How to handle noisy data? Binning method! first sort data and partition into (equi-depth) bins! then one can smooth by bin means, smooth by bin median, smooth by bin boundaries, etc. Clustering! detect and remove outliers Combined computer and human inspection! detect suspicious values and check by human Regression! smooth by fitting the data into regression functions How to handle noisy data? How to handle noisy data? Binning: to smooth a sorted data value by consulting its neighborhood, that is, the value around it (local smoothing)! Smoothing by bin means: each value in a bin is replaced by the mean value of the bin! Smoothing by bin medians: each bin value is replaced by the bin median! Smoothing by bin boundaries: the minimum and maximum values in a given bin are identified as bin boundaries 15 The original data 9, 21, 24, 21, 4, 26, 28, 34, 29, 8, 15, 25 Sort data in the increasing order, and partition into (equidepth) bins: 4, 8, 9, 15, 21, 21, 24, 25, 26, 28, 29, 34 Smoothing by bin means 9, 9, 9, 9, 22, 22, 22, 22, 29, 29, 29, 29 Smoothing by bin boundaries (replaced by the closest boundary) 4, 4, 4, 15, 21, 21, 25, 25, 26, 26, 26, 34 16

5 How to handle noisy data? How to handle noisy data? Outliers may be detected by clustering analysis " Combined computer and human inspection: Output patterns with surprise content to a list. A human can identify the actual garbage ones. " Regression: by fitting the data to a function, such as with regression Y1 Y1 y y = x + 1! Linear regression Values that fall outside of the set of clusters may be considered outliers! Multiple linear regression: more than two variables and the data are fit to a multidimensional surface X1 x Outline Data integration 1. Why Preprocess the Data? 2. Data Cleaning 3. Data Integration 4. Data Reduction 5. Data Transformation Data integration combines data from multiple sources (multiple DBs, data cubes, flat files) into a coherent data store. Schema integration (entity identification problem): How can equivalent entities from multiple data sources be matched up? Redundancy: An attribute may be redundant if it can be derived from another table

6 Data integration Outline Redundancy: can be detected by correlation analysis (correlation coefficient), e.g., how strongly one attribute implies another attribute. r A, B # ( A " A)( B " B) = ( n " 1)!! A B 1. Why Preprocess the Data? 2. Data Cleaning 3. Data Integration 4. Data Reduction 5. Data Transformation Detection and resolution of data value conflicts Strategies for data reduction Data cube aggregation Data cube aggregation Dimension reduction # Aggregation operations are applied to the data in the construction of a data cube Data compression Numerosity reduction Discretization and concept hierarchy generation On2the2left,2the2sales are2shown2per2quarter. On2the2right,2the2data are2aggregated2to provide2the2annual sales

7 Data cube aggregation Data compression: Attribute selection A2data2cube2for2 multidimensional2 analysis2of2sales2 data2with2respect2 to2annual2sales2per2 item2type2for2each2 branch2of2company Attribute subset selection (also called feature selection )! Stepwise forward selection! Stepwise backward elimination! Combination of forward and backward elimination! Many other methods Data compression: Wavelet transforms Data compression: PCA Discrete wavelet transformation (DWT): a linear signal processing technique that, when applied to a data vector D, transforms it to a numerically different vector D of wavelet coefficients. Store only a small fraction of the strongest of the wavelet coefficients Real data WT J=-1 J=-2 RWT Principal Components Analysis: transform data points from k-dimensions into c-dimensions (c! k) with minimum loss of information PCA searches for c-dimensional orthogonal vectors that can best be used to represent data. The original data are thus projected onto a much smaller space of c dimensions (c principal components) Only used for numerical data 3 2 Y 1 O1 O2 O3 O4 O5 " "1 Question: Reduction to one dimension? Z1 and Z2, which is better? X 27 28

8 Numerosity reduction Numerosity reduction: histogram Can we reduce the data volume by choosing alternative, smaller forms of data representation? Parameter methods: a model is used to estimate the data, so that typically only the data parameters need be stored, instead of the actual data! Regression and Log-Linear Models: y = # x + $ Non-parameter methods: for storing reduced representations of the data include! Histograms! Clustering! Sampling Singleton2buckets:2Each2 bucket2represents2one2 priceovalue/frequency2pair An2equiwidth2histogram,2where2 values2are2aggregated2so2that2each2 bucket2has2a2uniform2width2of2$ Numerosity reduction: Clustering Numerosity reduction: Sampling A22OD2plot2of2 customer2data2with2 respect2to2customer2 locations2in2a2city,2 showing2three2data2 clusters.2each2cluster2 center is2marked2 with2a Simple random sample without replacement of size n (SRSWOR) Simple random sample with replacement of size n (SRSWR) Cluster sample Stratified sample equal&proportion& (e.g.,&½) 31 32

9 Outline Data transformation 1. Why Preprocess the Data? 2. Data Cleaning 3. Data Integration 4. Data Reduction 5. Data Transformation Smoothing: to remove noise from data Aggregation: summary or aggregation are applied to the data Generalization: low-level or primitive data are replaced by higher-level concepts through the use of concept hierarchy Normalization: attribute data are scaled so as to fall within a small specified range, says 0.0 to 1.0 Attribute construction: new attributes are constructed and added from the given set of attributes to help the mining process: from continuous to discrete (discretization) and from discrete to continuous (word embedding) Min-max and z-score normalization Discretization min*max'normalization:&suppose& min A and&max A are&minimum&and& maximum&values&of&attribute.&we map&a&value&v&of&a&to&v &in&the&range& [newmin A,&newmax A ]&by Example:'Suppose&min A and&max A are&$12,000&and&$98,000.&we&want& to&map&minimum&and&maximum& values&of&attribute.&we&want&to map& income&to&the&range&[0.0,&1.0].&so,& $73,600&is&transformed&to& z*score'normalization: The&values& for&an&attribute&a&are&normalized& based&on&the&mean&and&standard& deviation&of&a Example:'If&the&mean&and&standard& deviation&are&$54,000&and&$16,000,& the&$73,600&is&transformed&to&! " =! %&' ( %)* ( %&' ( '+,%)* ( '+,%&' ( + '+,%&' ( 73,600 12, = ,000 12,000! " =! 8 : ( 73,600 54,000 = ,000 Three types of attributes:! Nominal (categorical): red, yellow, blue, green! Ordinal: small, middle, large, extreme large! Continuous: real numbers Discretization: divide the range of a continuous attribute into intervals! Some classification algorithms only accept categorical attributes.! Reduce data size by discretization! Prepare for further analysis 35 36

10 Discretization # Binning # Histogram2analysis # Cluster2analysis # EntropyObased2discretization # Segmentation2by2Natural2Partitioning Entropy-based discretization # Given2a2set2of2samples2S,2if2S2is2partitioned2into2two2intervals2S12and2 S22using2boundary2T,2the2entropy2after2partitioning2is S E S T S Ent (, ) = 1 ( S ) S Ent ( S2 ) # The2boundary2that2minimizes2the2entropy2function2over2all2possible2 boundaries2is2selected2as2a2binary2discretization. # The2process2is2recursively2applied2to2partitions2obtained2until2some2 stopping2criterion2is2met,2e.g., S Ent( S)! E( T, S) > " 37 # Experiments2show2that2it2may2reduce2data2size2and2improve2 classification2accuracy 38 What is word embedding? Some more complex data transformation Word embedding: Mapping a word (or phrase) from it's original high dimensional input space to a lower-dimensional numerical vector space. Word2vec is a group of related models that are used to produce word embeddings.! These models are shallow, two-layer neural networks that are trained to reconstruct linguistic contexts of words.! Word2vec takes as its input a large corpus of text and produces a vector space, typically of several hundred dimensions, with each unique word in the corpus being assigned a corresponding vector in the space. Word vectors are positioned in the vector space such that words that share common contexts in the corpus are located in close proximity to one another in the space. words C Input&space X Latent&semantic&indexing& % documents dims dims documents words U dims D dims V Feature&space F Topic models words documents C Normalized cooccurrence matrix %: X $ F where the problem can be solved in F words topics & topics documents '

11 Summary Data preprocessing is an important issue as real-world data tend to be incomplete, noisy, and inconstant Data cleaning routines can be used to fill in missing values, smooth noisy data, identify outliers, and correct data inconsistencies Data integration combines data from multiple sources to form a coherent data store Data transformation routines convert the data into appropriate forms for analyzing. Data reduction techniques can be used to obtain a reduced representation of the data while minimizing the loss of information content Automatic generation of concept hierarchies can involve different techniques for numeric data, and may be based on number of distinct values of attributes for categorical data Data preprocessing remains as an active area of research Homework The labor.arff provided by WEKA has 57 instances, 16 descriptive attributes, and the class attribute with two values bad and good. The atrributes of labor.arff have many missing values. Do the following (1) Use the methods in Lecture 6 to treat the missing values of all attributes in labor.arff (2) Explain why the method you used for each attribute is appropriate? Submit the written report (pdf) by July 7, Hint: 1. You can use ARFF-Viewer in Tool of WEKA to visualize the labor.arff 2. You may have at least to ways to work on labor data (labor.arff): Use the tool arff2csv.zip at our website to convert the data into Excel format, and use the data represented in Excel for your preprocessing, or Take the labor data from UCI: and store it in Excel format (or whatever you like) to process. 41

3. Data Preprocessing. 3.1 Introduction

3. Data Preprocessing. 3.1 Introduction 3. Data Preprocessing Contents of this Chapter 3.1 Introduction 3.2 Data cleaning 3.3 Data integration 3.4 Data transformation 3.5 Data reduction SFU, CMPT 740, 03-3, Martin Ester 84 3.1 Introduction Motivation

More information

2. Data Preprocessing

2. Data Preprocessing 2. Data Preprocessing Contents of this Chapter 2.1 Introduction 2.2 Data cleaning 2.3 Data integration 2.4 Data transformation 2.5 Data reduction Reference: [Han and Kamber 2006, Chapter 2] SFU, CMPT 459

More information

UNIT 2 Data Preprocessing

UNIT 2 Data Preprocessing UNIT 2 Data Preprocessing Lecture Topic ********************************************** Lecture 13 Why preprocess the data? Lecture 14 Lecture 15 Lecture 16 Lecture 17 Data cleaning Data integration and

More information

Data Preprocessing. Slides by: Shree Jaswal

Data Preprocessing. Slides by: Shree Jaswal Data Preprocessing Slides by: Shree Jaswal Topics to be covered Why Preprocessing? Data Cleaning; Data Integration; Data Reduction: Attribute subset selection, Histograms, Clustering and Sampling; Data

More information

Preprocessing Short Lecture Notes cse352. Professor Anita Wasilewska

Preprocessing Short Lecture Notes cse352. Professor Anita Wasilewska Preprocessing Short Lecture Notes cse352 Professor Anita Wasilewska Data Preprocessing Why preprocess the data? Data cleaning Data integration and transformation Data reduction Discretization and concept

More information

Data Preprocessing. Why Data Preprocessing? MIT-652 Data Mining Applications. Chapter 3: Data Preprocessing. Multi-Dimensional Measure of Data Quality

Data Preprocessing. Why Data Preprocessing? MIT-652 Data Mining Applications. Chapter 3: Data Preprocessing. Multi-Dimensional Measure of Data Quality Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate data e.g., occupation = noisy: containing

More information

Data Preprocessing Yudho Giri Sucahyo y, Ph.D , CISA

Data Preprocessing Yudho Giri Sucahyo y, Ph.D , CISA Obj ti Objectives Motivation: Why preprocess the Data? Data Preprocessing Techniques Data Cleaning Data Integration and Transformation Data Reduction Data Preprocessing Lecture 3/DMBI/IKI83403T/MTI/UI

More information

Data Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha

Data Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha Data Preprocessing S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha 1 Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 3

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 3 Data Mining: Concepts and Techniques (3 rd ed.) Chapter 3 Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign & Simon Fraser University 2011 Han, Kamber & Pei. All rights

More information

Summary of Last Chapter. Course Content. Chapter 3 Objectives. Chapter 3: Data Preprocessing. Dr. Osmar R. Zaïane. University of Alberta 4

Summary of Last Chapter. Course Content. Chapter 3 Objectives. Chapter 3: Data Preprocessing. Dr. Osmar R. Zaïane. University of Alberta 4 Principles of Knowledge Discovery in Data Fall 2004 Chapter 3: Data Preprocessing Dr. Osmar R. Zaïane University of Alberta Summary of Last Chapter What is a data warehouse and what is it for? What is

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 3. Chapter 3: Data Preprocessing. Major Tasks in Data Preprocessing

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 3. Chapter 3: Data Preprocessing. Major Tasks in Data Preprocessing Data Mining: Concepts and Techniques (3 rd ed.) Chapter 3 1 Chapter 3: Data Preprocessing Data Preprocessing: An Overview Data Quality Major Tasks in Data Preprocessing Data Cleaning Data Integration Data

More information

Data Preprocessing. Data Mining 1

Data Preprocessing. Data Mining 1 Data Preprocessing Today s real-world databases are highly susceptible to noisy, missing, and inconsistent data due to their typically huge size and their likely origin from multiple, heterogenous sources.

More information

CS 521 Data Mining Techniques Instructor: Abdullah Mueen

CS 521 Data Mining Techniques Instructor: Abdullah Mueen CS 521 Data Mining Techniques Instructor: Abdullah Mueen LECTURE 2: DATA TRANSFORMATION AND DIMENSIONALITY REDUCTION Chapter 3: Data Preprocessing Data Preprocessing: An Overview Data Quality Major Tasks

More information

Data Mining. Data preprocessing. Hamid Beigy. Sharif University of Technology. Fall 1395

Data Mining. Data preprocessing. Hamid Beigy. Sharif University of Technology. Fall 1395 Data Mining Data preprocessing Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 15 Table of contents 1 Introduction 2 Data preprocessing

More information

Data Preprocessing. Komate AMPHAWAN

Data Preprocessing. Komate AMPHAWAN Data Preprocessing Komate AMPHAWAN 1 Data cleaning (data cleansing) Attempt to fill in missing values, smooth out noise while identifying outliers, and correct inconsistencies in the data. 2 Missing value

More information

Data preprocessing Functional Programming and Intelligent Algorithms

Data preprocessing Functional Programming and Intelligent Algorithms Data preprocessing Functional Programming and Intelligent Algorithms Que Tran Høgskolen i Ålesund 20th March 2017 1 Why data preprocessing? Real-world data tend to be dirty incomplete: lacking attribute

More information

Data Mining. Data preprocessing. Hamid Beigy. Sharif University of Technology. Fall 1394

Data Mining. Data preprocessing. Hamid Beigy. Sharif University of Technology. Fall 1394 Data Mining Data preprocessing Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1394 1 / 15 Table of contents 1 Introduction 2 Data preprocessing

More information

UNIT 2. DATA PREPROCESSING AND ASSOCIATION RULES

UNIT 2. DATA PREPROCESSING AND ASSOCIATION RULES UNIT 2. DATA PREPROCESSING AND ASSOCIATION RULES Data Pre-processing-Data Cleaning, Integration, Transformation, Reduction, Discretization Concept Hierarchies-Concept Description: Data Generalization And

More information

By Mahesh R. Sanghavi Associate professor, SNJB s KBJ CoE, Chandwad

By Mahesh R. Sanghavi Associate professor, SNJB s KBJ CoE, Chandwad By Mahesh R. Sanghavi Associate professor, SNJB s KBJ CoE, Chandwad Data Analytics life cycle Discovery Data preparation Preprocessing requirements data cleaning, data integration, data reduction, data

More information

ECT7110. Data Preprocessing. Prof. Wai Lam. ECT7110 Data Preprocessing 1

ECT7110. Data Preprocessing. Prof. Wai Lam. ECT7110 Data Preprocessing 1 ECT7110 Data Preprocessing Prof. Wai Lam ECT7110 Data Preprocessing 1 Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking certain attributes of interest,

More information

cse634 Data Mining Preprocessing Lecture Notes Chapter 2 Professor Anita Wasilewska

cse634 Data Mining Preprocessing Lecture Notes Chapter 2 Professor Anita Wasilewska cse634 Data Mining Preprocessing Lecture Notes Chapter 2 Professor Anita Wasilewska Chapter 2: Data Preprocessing (book slide) Why preprocess the data? Descriptive data summarization Data cleaning Data

More information

CS570: Introduction to Data Mining

CS570: Introduction to Data Mining CS570: Introduction to Data Mining Fall 2013 Reading: Chapter 3 Han, Chapter 2 Tan Anca Doloc-Mihu, Ph.D. Some slides courtesy of Li Xiong, Ph.D. and 2011 Han, Kamber & Pei. Data Mining. Morgan Kaufmann.

More information

Data Preprocessing in Python. Prof.Sushila Aghav

Data Preprocessing in Python. Prof.Sushila Aghav Data Preprocessing in Python Prof.Sushila Aghav Sushila.aghav@mitcoe.edu.in Content Why preprocess the data? Descriptive data summarization Data cleaning Data integration and transformation April 24, 2018

More information

Data Mining and Analytics. Introduction

Data Mining and Analytics. Introduction Data Mining and Analytics Introduction Data Mining Data mining refers to extracting or mining knowledge from large amounts of data It is also termed as Knowledge Discovery from Data (KDD) Mostly, data

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES 2: Data Pre-Processing Instructor: Yizhou Sun yzsun@ccs.neu.edu September 10, 2013 2: Data Pre-Processing Getting to know your data Basic Statistical Descriptions of Data

More information

Chapter 2 Data Preprocessing

Chapter 2 Data Preprocessing Chapter 2 Data Preprocessing CISC4631 1 Outline General data characteristics Data cleaning Data integration and transformation Data reduction Summary CISC4631 2 1 Types of Data Sets Record Relational records

More information

Data Collection, Preprocessing and Implementation

Data Collection, Preprocessing and Implementation Chapter 6 Data Collection, Preprocessing and Implementation 6.1 Introduction Data collection is the loosely controlled method of gathering the data. Such data are mostly out of range, impossible data combinations,

More information

ECLT 5810 Data Preprocessing. Prof. Wai Lam

ECLT 5810 Data Preprocessing. Prof. Wai Lam ECLT 5810 Data Preprocessing Prof. Wai Lam Why Data Preprocessing? Data in the real world is imperfect incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate

More information

DATA PREPROCESSING. Tzompanaki Katerina

DATA PREPROCESSING. Tzompanaki Katerina DATA PREPROCESSING Tzompanaki Katerina Background: Data storage formats Data in DBMS ODBC, JDBC protocols Data in flat files Fixed-width format (each column has a specific number of characters, filled

More information

Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques Data Mining: Concepts and Techniques Chapter 2 Original Slides: Jiawei Han and Micheline Kamber Modification: Li Xiong Data Mining: Concepts and Techniques 1 Chapter 2: Data Preprocessing Why preprocess

More information

Data Mining Concepts & Techniques

Data Mining Concepts & Techniques Data Mining Concepts & Techniques Lecture No. 02 Data Processing, Data Mining Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology

More information

Data Preprocessing. Outline. Motivation. How did this happen?

Data Preprocessing. Outline. Motivation. How did this happen? Outline Data Preprocessing Motivation Data cleaning Data integration and transformation Data reduction Discretization and hierarchy generation Summary CS 5331 by Rattikorn Hewett Texas Tech University

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 03 : 13/10/2015 Data Mining: Concepts and Techniques (3 rd ed.) Chapter

More information

Data Preprocessing. Erwin M. Bakker & Stefan Manegold. https://homepages.cwi.nl/~manegold/dbdm/

Data Preprocessing. Erwin M. Bakker & Stefan Manegold. https://homepages.cwi.nl/~manegold/dbdm/ Data Preprocessing Erwin M. Bakker & Stefan Manegold https://homepages.cwi.nl/~manegold/dbdm/ http://liacs.leidenuniv.nl/~bakkerem2/dbdm/ s.manegold@liacs.leidenuniv.nl e.m.bakker@liacs.leidenuniv.nl 9/26/17

More information

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation Data Mining Part 2. Data Understanding and Preparation 2.4 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Normalization Attribute Construction Aggregation Attribute Subset Selection Discretization

More information

Road Map. Data types Measuring data Data cleaning Data integration Data transformation Data reduction Data discretization Summary

Road Map. Data types Measuring data Data cleaning Data integration Data transformation Data reduction Data discretization Summary 2. Data preprocessing Road Map Data types Measuring data Data cleaning Data integration Data transformation Data reduction Data discretization Summary 2 Data types Categorical vs. Numerical Scale types

More information

Jarek Szlichta

Jarek Szlichta Jarek Szlichta http://data.science.uoit.ca/ Open data Business Data Web Data Available at different formats 2 Data Scientist: The Sexiest Job of the 21 st Century Harvard Business Review Oct. 2012 (c)

More information

Data Mining: Concepts and Techniques. Chapter 2

Data Mining: Concepts and Techniques. Chapter 2 Data Mining: Concepts and Techniques Chapter 2 Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign www.cs.uiuc.edu/~hanj 2006 Jiawei Han and Micheline Kamber, All rights

More information

Data Exploration and Preparation Data Mining and Text Mining (UIC Politecnico di Milano)

Data Exploration and Preparation Data Mining and Text Mining (UIC Politecnico di Milano) Data Exploration and Preparation Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining, : Concepts and Techniques", The Morgan Kaufmann

More information

Sponsored by AIAT.or.th and KINDML, SIIT

Sponsored by AIAT.or.th and KINDML, SIIT CC: BY NC ND Table of Contents Chapter 2. Data Preprocessing... 31 2.1. Basic Representation for Data: Database Viewpoint... 31 2.2. Data Preprocessing in the Database Point of View... 33 2.3. Data Cleaning...

More information

Data Preprocessing. Data Mining: Concepts and Techniques. c 2012 Elsevier Inc. All rights reserved.

Data Preprocessing. Data Mining: Concepts and Techniques. c 2012 Elsevier Inc. All rights reserved. 3 Data Preprocessing Today s real-world databases are highly susceptible to noisy, missing, and inconsistent data due to their typically huge size (often several gigabytes or more) and their likely origin

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 3

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 3 Data Mining: Concepts and Techniques (3 rd ed.) Chapter 3 Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign & Simon Fraser University 2013 Han, Kamber & Pei. All rights

More information

A Survey on Data Preprocessing Techniques for Bioinformatics and Web Usage Mining

A Survey on Data Preprocessing Techniques for Bioinformatics and Web Usage Mining Volume 117 No. 20 2017, 785-794 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu ijpam.eu A Survey on Data Preprocessing Techniques for Bioinformatics and Web

More information

DEPARTMENT OF INFORMATION TECHNOLOGY IT6702 DATA WAREHOUSING & DATA MINING

DEPARTMENT OF INFORMATION TECHNOLOGY IT6702 DATA WAREHOUSING & DATA MINING DEPARTMENT OF INFORMATION TECHNOLOGY IT6702 DATA WAREHOUSING & DATA MINING UNIT I PART A 1. Define data mining? Data mining refers to extracting or mining" knowledge from large amounts of data and another

More information

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler, Sanjay Ranka

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler, Sanjay Ranka BBS654 Data Mining Pinar Duygulu Slides are adapted from Nazli Ikizler, Sanjay Ranka Topics What is data? Definitions, terminology Types of data and datasets Data preprocessing Data Cleaning Data integration

More information

2 CONTENTS. 3.8 Bibliographic Notes... 45

2 CONTENTS. 3.8 Bibliographic Notes... 45 Contents 3 Data Preprocessing 3 3.1 Data Preprocessing: An Overview................. 4 3.1.1 Data Quality: Why Preprocess the Data?......... 4 3.1.2 Major Tasks in Data Preprocessing............. 5 3.2

More information

Data Preprocessing. Chapter Why Preprocess the Data?

Data Preprocessing. Chapter Why Preprocess the Data? Contents 2 Data Preprocessing 3 2.1 Why Preprocess the Data?........................................ 3 2.2 Descriptive Data Summarization..................................... 6 2.2.1 Measuring the Central

More information

2. (a) Briefly discuss the forms of Data preprocessing with neat diagram. (b) Explain about concept hierarchy generation for categorical data.

2. (a) Briefly discuss the forms of Data preprocessing with neat diagram. (b) Explain about concept hierarchy generation for categorical data. Code No: M0502/R05 Set No. 1 1. (a) Explain data mining as a step in the process of knowledge discovery. (b) Differentiate operational database systems and data warehousing. [8+8] 2. (a) Briefly discuss

More information

Data Preprocessing UE 141 Spring 2013

Data Preprocessing UE 141 Spring 2013 Data Preprocessing UE 141 Spring 2013 Jing Gao SUNY Buffalo 1 Outline Data Data Preprocessing Improve data quality Prepare data for analysis Exploring Data Statistics Visualization 2 Document Data Each

More information

Table Of Contents: xix Foreword to Second Edition

Table Of Contents: xix Foreword to Second Edition Data Mining : Concepts and Techniques Table Of Contents: Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments xxxi About the Authors xxxv Chapter 1 Introduction 1 (38) 1.1 Why Data

More information

Cse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University

Cse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University Cse634 DATA MINING TEST REVIEW Professor Anita Wasilewska Computer Science Department Stony Brook University Preprocessing stage Preprocessing: includes all the operations that have to be performed before

More information

CS378 Introduction to Data Mining. Data Exploration and Data Preprocessing. Li Xiong

CS378 Introduction to Data Mining. Data Exploration and Data Preprocessing. Li Xiong CS378 Introduction to Data Mining Data Exploration and Data Preprocessing Li Xiong Data Exploration and Data Preprocessing Data and Attributes Data exploration Data pre-processing Data Mining: Concepts

More information

Cse352 Artifficial Intelligence Short Review for Midterm. Professor Anita Wasilewska Computer Science Department Stony Brook University

Cse352 Artifficial Intelligence Short Review for Midterm. Professor Anita Wasilewska Computer Science Department Stony Brook University Cse352 Artifficial Intelligence Short Review for Midterm Professor Anita Wasilewska Computer Science Department Stony Brook University Midterm Midterm INCLUDES CLASSIFICATION CLASSIFOCATION by Decision

More information

Data Preparation. Data Preparation. (Data pre-processing) Why Prepare Data? Why Prepare Data? Some data preparation is needed for all mining tools

Data Preparation. Data Preparation. (Data pre-processing) Why Prepare Data? Why Prepare Data? Some data preparation is needed for all mining tools Data Preparation Data Preparation (Data pre-processing) Why prepare the data? Discretization Data cleaning Data integration and transformation Data reduction, Feature selection 2 Why Prepare Data? Why

More information

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset. Glossary of data mining terms: Accuracy Accuracy is an important factor in assessing the success of data mining. When applied to data, accuracy refers to the rate of correct values in the data. When applied

More information

Contents. Foreword to Second Edition. Acknowledgments About the Authors

Contents. Foreword to Second Edition. Acknowledgments About the Authors Contents Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments About the Authors xxxi xxxv Chapter 1 Introduction 1 1.1 Why Data Mining? 1 1.1.1 Moving toward the Information Age 1

More information

Data Preprocessing. Data Preprocessing

Data Preprocessing. Data Preprocessing Data Preprocessing Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville ranka@cise.ufl.edu Data Preprocessing What preprocessing step can or should

More information

Data Mining: Data. Lecture Notes for Chapter 2. Introduction to Data Mining

Data Mining: Data. Lecture Notes for Chapter 2. Introduction to Data Mining Data Mining: Data Lecture Notes for Chapter 2 Introduction to Data Mining by Tan, Steinbach, Kumar Data Preprocessing Aggregation Sampling Dimensionality Reduction Feature subset selection Feature creation

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 3

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 3 Data Mining: Concepts and Techniques (3 rd ed.) Chapter 3 Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign & Simon Fraser University 2013 Han, Kamber & Pei. All rights

More information

University of Florida CISE department Gator Engineering. Data Preprocessing. Dr. Sanjay Ranka

University of Florida CISE department Gator Engineering. Data Preprocessing. Dr. Sanjay Ranka Data Preprocessing Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville ranka@cise.ufl.edu Data Preprocessing What preprocessing step can or should

More information

Using the DATAMINE Program

Using the DATAMINE Program 6 Using the DATAMINE Program 304 Using the DATAMINE Program This chapter serves as a user s manual for the DATAMINE program, which demonstrates the algorithms presented in this book. Each menu selection

More information

CS570 Introduction to Data Mining

CS570 Introduction to Data Mining CS570 Introduction to Data Mining Department of Mathematics and Computer Science Li Xiong Data Exploration and Data Preprocessing Data and attributes Data exploration Data pre-processing 2 10 What is Data?

More information

Basic Concepts Weka Workbench and its terminology

Basic Concepts Weka Workbench and its terminology Changelog: 14 Oct, 30 Oct Basic Concepts Weka Workbench and its terminology Lecture Part Outline Concepts, instances, attributes How to prepare the input: ARFF, attributes, missing values, getting to know

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Features and Feature Selection Hamid R. Rabiee Jafar Muhammadi Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Features and Patterns The Curse of Size and

More information

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SHRI ANGALAMMAN COLLEGE OF ENGINEERING & TECHNOLOGY (An ISO 9001:2008 Certified Institution) SIRUGANOOR,TRICHY-621105. DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING Year / Semester: IV/VII CS1011-DATA

More information

Preprocessing and Visualization. Jonathan Diehl

Preprocessing and Visualization. Jonathan Diehl RWTH Aachen University Chair of Computer Science VI Prof. Dr.-Ing. Hermann Ney Seminar Data Mining WS 2003/2004 Preprocessing and Visualization Jonathan Diehl January 19, 2004 onathan Diehl Preprocessing

More information

Data Mining: Exploring Data. Lecture Notes for Data Exploration Chapter. Introduction to Data Mining

Data Mining: Exploring Data. Lecture Notes for Data Exploration Chapter. Introduction to Data Mining Data Mining: Exploring Data Lecture Notes for Data Exploration Chapter Introduction to Data Mining by Tan, Steinbach, Karpatne, Kumar 02/03/2018 Introduction to Data Mining 1 What is data exploration?

More information

Dta Mining and Data Warehousing

Dta Mining and Data Warehousing CSCI645 Fall 23 Dta Mining and Data Warehousing Instructor: Qigang Gao, Office: CS219, Tel:494-3356, Email: qggao@cs.dal.ca Teaching Assistant: Christopher Jordan, Email: cjordan@cs.dal.ca Office Hours:

More information

Data Mining Course Overview

Data Mining Course Overview Data Mining Course Overview 1 Data Mining Overview Understanding Data Classification: Decision Trees and Bayesian classifiers, ANN, SVM Association Rules Mining: APriori, FP-growth Clustering: Hierarchical

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Features and Feature Selection Hamid R. Rabiee Jafar Muhammadi Spring 2013 http://ce.sharif.edu/courses/91-92/2/ce725-1/ Agenda Features and Patterns The Curse of Size and

More information

Question Bank. 4) It is the source of information later delivered to data marts.

Question Bank. 4) It is the source of information later delivered to data marts. Question Bank Year: 2016-2017 Subject Dept: CS Semester: First Subject Name: Data Mining. Q1) What is data warehouse? ANS. A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile

More information

Data Mining. Jeff M. Phillips. January 7, 2019 CS 5140 / CS 6140

Data Mining. Jeff M. Phillips. January 7, 2019 CS 5140 / CS 6140 Data Mining CS 5140 / CS 6140 Jeff M. Phillips January 7, 2019 What is Data Mining? What is Data Mining? Finding structure in data? Machine learning on large data? Unsupervised learning? Large scale computational

More information

Data Mining: Exploring Data. Lecture Notes for Chapter 3

Data Mining: Exploring Data. Lecture Notes for Chapter 3 Data Mining: Exploring Data Lecture Notes for Chapter 3 1 What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include

More information

Web Information Retrieval

Web Information Retrieval Lucian Blaga University of Sibiu Hermann Oberth Engineering Faculty Computer Science Department Web Information Retrieval First Technical Report PhD title: Data Mining for unstructured data Author: Daniel

More information

DATA WAREHOUING UNIT I

DATA WAREHOUING UNIT I BHARATHIDASAN ENGINEERING COLLEGE NATTRAMAPALLI DEPARTMENT OF COMPUTER SCIENCE SUB CODE & NAME: IT6702/DWDM DEPT: IT Staff Name : N.RAMESH DATA WAREHOUING UNIT I 1. Define data warehouse? NOV/DEC 2009

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Features and Feature Selection Hamid R. Rabiee Jafar Muhammadi Spring 2012 http://ce.sharif.edu/courses/90-91/2/ce725-1/ Agenda Features and Patterns The Curse of Size and

More information

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Feature Selection CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Dimensionality reduction Feature selection vs. feature extraction Filter univariate

More information

Data Mining. Jeff M. Phillips. January 12, 2015 CS 5140 / CS 6140

Data Mining. Jeff M. Phillips. January 12, 2015 CS 5140 / CS 6140 Data Mining CS 5140 / CS 6140 Jeff M. Phillips January 12, 2015 Data Mining What is Data Mining? Finding structure in data? Machine learning on large data? Unsupervised learning? Large scale computational

More information

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan, Steinbach, Kumar What is data exploration? A preliminary exploration of the data to better understand its characteristics.

More information

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 21 Table of contents 1 Introduction 2 Data mining

More information

Course on Data Mining ( )

Course on Data Mining ( ) Course on Data Mining (581550-4) Intro/Ass. Rules 24./26.10. Episodes 30.10. 7.11. Home Exam Clustering 14.11. KDD Process 21.11. Text Mining 28.11. Appl./Summary 21.11.2001 Data mining: KDD Process 1

More information

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Applications of visual analytics, data types 3 Data sources and preparation Project 1 out 4

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Applications of visual analytics, data types 3 Data sources and preparation Project 1 out 4 Lecture Topic Projects 1 Intro, schedule, and logistics 2 Applications of visual analytics, data types 3 Data sources and preparation Project 1 out 4 Data representation 5 Data reduction, notion of similarity

More information

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1394

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1394 Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1394 1 / 20 Table of contents 1 Introduction 2 Data mining

More information

IJSRD - International Journal for Scientific Research & Development Vol. 4, Issue 05, 2016 ISSN (online):

IJSRD - International Journal for Scientific Research & Development Vol. 4, Issue 05, 2016 ISSN (online): IJSRD - International Journal for Scientific Research & Development Vol. 4, Issue 05, 2016 ISSN (online): 2321-0613 A Study on Handling Missing Values and Noisy Data using WEKA Tool R. Vinodhini 1 A. Rajalakshmi

More information

Data mining. Classification k-nn Classifier. Piotr Paszek. (Piotr Paszek) Data mining k-nn 1 / 20

Data mining. Classification k-nn Classifier. Piotr Paszek. (Piotr Paszek) Data mining k-nn 1 / 20 Data mining Piotr Paszek Classification k-nn Classifier (Piotr Paszek) Data mining k-nn 1 / 20 Plan of the lecture 1 Lazy Learner 2 k-nearest Neighbor Classifier 1 Distance (metric) 2 How to Determine

More information

R07. FirstRanker. 7. a) What is text mining? Describe about basic measures for text retrieval. b) Briefly describe document cluster analysis.

R07. FirstRanker. 7. a) What is text mining? Describe about basic measures for text retrieval. b) Briefly describe document cluster analysis. www..com www..com Set No.1 1. a) What is data mining? Briefly explain the Knowledge discovery process. b) Explain the three-tier data warehouse architecture. 2. a) With an example, describe any two schema

More information

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai Decision Trees Decision Tree Decision Trees (DTs) are a nonparametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target

More information

CT75 (ALCCS) DATA WAREHOUSING AND DATA MINING JUN

CT75 (ALCCS) DATA WAREHOUSING AND DATA MINING JUN Q.1 a. Define a Data warehouse. Compare OLTP and OLAP systems. Data Warehouse: A data warehouse is a subject-oriented, integrated, time-variant, and 2 Non volatile collection of data in support of management

More information

Enterprise Miner Software: Changes and Enhancements, Release 4.1

Enterprise Miner Software: Changes and Enhancements, Release 4.1 Enterprise Miner Software: Changes and Enhancements, Release 4.1 The correct bibliographic citation for this manual is as follows: SAS Institute Inc., Enterprise Miner TM Software: Changes and Enhancements,

More information

CSE4334/5334 Data Mining 4 Data and Data Preprocessing. Chengkai Li University of Texas at Arlington Fall 2017

CSE4334/5334 Data Mining 4 Data and Data Preprocessing. Chengkai Li University of Texas at Arlington Fall 2017 CSE4334/5334 Data Mining 4 Data and Data Preprocessing Chengkai Li University of Texas at Arlington Fall 2017 10 What is Data? Collection of data objects and their attributes Attributes An attribute is

More information

ECLT 5810 Clustering

ECLT 5810 Clustering ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping

More information

SCHEME OF COURSE WORK. Data Warehousing and Data mining

SCHEME OF COURSE WORK. Data Warehousing and Data mining SCHEME OF COURSE WORK Course Details: Course Title Course Code Program: Specialization: Semester Prerequisites Department of Information Technology Data Warehousing and Data mining : 15CT1132 : B.TECH

More information

Data Mining. Jeff M. Phillips. January 9, 2013

Data Mining. Jeff M. Phillips. January 9, 2013 Data Mining Jeff M. Phillips January 9, 2013 Data Mining What is Data Mining? Finding structure in data? Machine learning on large data? Unsupervised learning? Large scale computational statistics? Data

More information

SAS Visual Analytics 8.2: Getting Started with Reports

SAS Visual Analytics 8.2: Getting Started with Reports SAS Visual Analytics 8.2: Getting Started with Reports Introduction Reporting The SAS Visual Analytics tools give you everything you need to produce and distribute clear and compelling reports. SAS Visual

More information

Problem Analysis and Preprocessing

Problem Analysis and Preprocessing Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Problem Analysis and Preprocessing Paul Prasse, Niels Landwehr, Tobias Scheffer Overview Analysis of learning problems Understanding

More information

ENTERPRISE MINER: 1 DATA EXPLORATION AND VISUALISATION

ENTERPRISE MINER: 1 DATA EXPLORATION AND VISUALISATION ENTERPRISE MINER: 1 DATA EXPLORATION AND VISUALISATION JOZEF MOFFAT, ANALYTICS & INNOVATION PRACTICE, SAS UK 10, MAY 2016 DATA EXPLORATION AND VISUALISATION AGENDA SAS Webinar 10th May 2016 at 10:00 AM

More information

Week 2 Engineering Data

Week 2 Engineering Data Week 2 Engineering Data Seokho Chi Associate Professor Ph.D. SNU Construction Innovation Lab Source: Tan, Kumar, Steinback (2006) 10 What is Data? Collection of data objects and their attributes An attribute

More information

Data Mining MTAT

Data Mining MTAT Data Mining MTAT.03.183 (4AP = 6EAP) Descriptive analysis and preprocessing Jaak Vilo 2009 Fall Reminder shopping basket Database consists of sets of items bought together Describe the data Characterise

More information

Normalization and denormalization Missing values Outliers detection and removing Noisy Data Variants of Attributes Meta Data Data Transformation

Normalization and denormalization Missing values Outliers detection and removing Noisy Data Variants of Attributes Meta Data Data Transformation Preprocessing Data Normalization and denormalization Missing values Outliers detection and removing Noisy Data Variants of Attributes Meta Data Data Transformation Reading material: Chapters 2 and 3 of

More information

Data Foundations. Topic Objectives. and list subcategories of each. its properties. before producing a visualization. subsetting

Data Foundations. Topic Objectives. and list subcategories of each. its properties. before producing a visualization. subsetting CS 725/825 Information Visualization Fall 2013 Data Foundations Dr. Michele C. Weigle http://www.cs.odu.edu/~mweigle/cs725-f13/ Topic Objectives! Distinguish between ordinal and nominal values and list

More information