Preprocessing and Visualization. Jonathan Diehl
|
|
- Esmond James
- 6 years ago
- Views:
Transcription
1 RWTH Aachen University Chair of Computer Science VI Prof. Dr.-Ing. Hermann Ney Seminar Data Mining WS 2003/2004 Preprocessing and Visualization Jonathan Diehl January 19, 2004 onathan Diehl Preprocessing and Visualization January 19,
2 Structure Preprocessing and Visualization Introduction Theoretical Fundamentals Visualization Preprocessing onathan Diehl Preprocessing and Visualization January 19,
3 Introduction Literature D. Hand, H. Manila, P. Smyth: Principles of Data Mining. MIT Press, Cambridge, MA, 2001, Chapters 2 and 3 J. Han, M. Kamber: Data Mining: Concepts and Techniques. Academic Press, San Diego, CA, 2001, Chapter 3 S. Roweis, L.Saul: Locally Linear Embedding Homepage. roweis/lle/ onathan Diehl Preprocessing and Visualization January 19,
4 Introduction Overview ata Preprocessing: data manipulation prior to mining improves quality or speed of actual mining ata Exploration: combined human and computer analysis utilizes human s natural abilities ata Visualization: methods to display data to the human data is plotted graphically onathan Diehl Preprocessing and Visualization January 19,
5 Structure Preprocessing and Visualization Introduction and Motivation Theoretical Fundamentals Visualization Preprocessing onathan Diehl Preprocessing and Visualization January 19,
6 Theoretical Fundamentals Data Representation data consists of N samples (measures, etc.) and M attributes (variables, dimensions, etc.) for each sample n and each attribute m there exists one data value x n,m onathan Diehl Preprocessing and Visualization January 19,
7 Theoretical Fundamentals Data Representation ata Set: X := ( X(1),...,X(N)) T with X(n) =(x n,1,...,x n,m ) ata Matrix: x 1,1... x 1,M X :=.. x N,1... x N,M ith x n,m value of n-th sample and m-th attribute onathan Diehl Preprocessing and Visualization January 19,
8 Theoretical Fundamentals Measures of Location ample Mean: ˆµ = 1 N edian: N n=1 X(n) with X set of N data values 50% of values below median ode: data point which occurs most often onathan Diehl Preprocessing and Visualization January 19,
9 Theoretical Fundamentals Measures of Dispersion ariance: ˆσ 2 = 1 N N ( ) 2 X(n) µ with X set of N data values and mean µ n=1 uartiles: 25%/75% of values below quartile ange: difference between lowest and highest data value onathan Diehl Preprocessing and Visualization January 19,
10 Theoretical Fundamentals Measures of Correlation ovariance (of attributes A and B): Cov(A, B) = 1 N N n=1 ( A(n) µa ) ( B(n) µb ) ovariance Matrix: V = 1 N XT X with X zero-centered data matrix orrelation Coefficient: Cov(A, B) ρ A,B = σ A σ B onathan Diehl Preprocessing and Visualization January 19,
11 Theoretical Fundamentals Modelling Data inear Regression (for two-dimensional data set): Y = a + b X with a, b coefficients and X, Y attributes ultiple Linear Regression: Y = a + N n=1 b n X n onathan Diehl Preprocessing and Visualization January 19,
12 Structure Preprocessing and Visualization Introduction and Motivation Theoretical Fundamentals Visualization Preprocessing onathan Diehl Preprocessing and Visualization January 19,
13 Visualization Histogram onathan Diehl Preprocessing and Visualization January 19,
14 Visualization Kernel Method stimated density of kernel function K with bandwidth h: f(x) = 1 N ( ) x X(n) K for given data set X of N values N h n=1 here K(t)dt =1 onathan Diehl Preprocessing and Visualization January 19,
15 Visualization Kernel Method onathan Diehl Preprocessing and Visualization January 19,
16 Visualization Boxplot onathan Diehl Preprocessing and Visualization January 19,
17 Visualization Scatterplot onathan Diehl Preprocessing and Visualization January 19,
18 Visualization Contour Plot onathan Diehl Preprocessing and Visualization January 19,
19 Visualization Trellis Plot onathan Diehl Preprocessing and Visualization January 19,
20 Structure Preprocessing and Visualization Introduction and Motivation Theoretical Fundamentals Visualization Preprocessing onathan Diehl Preprocessing and Visualization January 19,
21 Data Cleaning roblems: incomplete data (missing values) erroneous (noisy) data inconsistent data ( data integration) onathan Diehl Preprocessing and Visualization January 19,
22 Missing Values olutions: ignore samples loose important information determine most likely value: 1. make use of statistical measures (mean/median, class mean/median) 2. construct model of attribute relations (regression) and calculate value 3. construct decision tree and derive value... onathan Diehl Preprocessing and Visualization January 19,
23 Noisy Data: Regression. construct model of entire data set: linear Regression multiple Regression (regression over weighted linear combination of attributes). calculate data points from regression equation global smoothing, strong data reduction onathan Diehl Preprocessing and Visualization January 19,
24 Noisy Data: Regression 40 original data regression onathan Diehl Preprocessing and Visualization January 19,
25 Data Integration roblems: entity identification problem ( metadata) data redundancy ( correlation analysis) value conflicts ( data transformation) onathan Diehl Preprocessing and Visualization January 19,
26 Data Transformation roblem: data has to meet certain criteria before further processing.g. attribute values must be weighted equally for many analysis methods onathan Diehl Preprocessing and Visualization January 19,
27 Normalization it attribute A of data X into predefined range (e.g. [0.0, 1.0]) min-max normalization: f(i) = A(i) min A max A min A (newmax A newmin A )+newmin A z-score normalization: g(i) = A(i) µ A σ A onathan Diehl Preprocessing and Visualization January 19,
28 Attribute Construction onstruct new (redundant) attributes summarizing stored information sums, products of samples/data classes Fast online access to summarized data onathan Diehl Preprocessing and Visualization January 19,
29 Data Reduction roblem: huge databases with many attributes very slow mining process onathan Diehl Preprocessing and Visualization January 19,
30 Data Cube Aggregation multidimensional arrangement of aggregated information dimensions (axes) represent attributes multiple levels of abstraction possible (concept hierarchy) Efficient organization of summarized data onathan Diehl Preprocessing and Visualization January 19,
31 Data Cube Aggregation onathan Diehl Preprocessing and Visualization January 19,
32 Attribute Subset Selection etermine significance of attributes: correlation coefficient information gain ake selection: stepwise selection/elimination decision tree induction onathan Diehl Preprocessing and Visualization January 19,
33 Attribute Subset Selection A? Y N B? C? Y N Y N Class 1 Class 2 Class 2 Class 1 onathan Diehl Preprocessing and Visualization January 19,
34 Principal Component Analysis im: Maximize variability project data onto principal components (linear combination of attributes) normalize principal components zero-centered data set (subtract mean from all values) onathan Diehl Preprocessing and Visualization January 19,
35 Principal Component Analysis ariability of projection vector a (S := X T X scatter matrix): (Xa) T ( ) s 2 a = Xa = a T X T Xa = a T Sa pose normalization constraint and maximize: u = a T Sa λ(a T a 1) δu = 2Sa 2λa =0 δa }{{} eigenvalue form max eigenvector a with largest eigenvalue is first principal component onathan Diehl Preprocessing and Visualization January 19,
36 Principal Component Analysis data is projected onto the first K eigenvectors small values of K sufficient because late principal components are insignificant (eigenvalue approaches zero) for K =2principal component analysis can be used to project data into a plane for visualization onathan Diehl Preprocessing and Visualization January 19,
37 Multidimensional Scaling im: Preserve Distances fit multidimensional data into plane minimize squared distance error zero-centered data set (subtract mean from all values) onathan Diehl Preprocessing and Visualization January 19,
38 Multidimensional Scaling iven B = XX T solve (euclidian distance) d 2 ij = x i x j 2 = x T i X i 2x T i x j + x T j x j = b ii + b jj 2b ij hen minimize N N (δ ij d ij ) 2 i=1 j=1 ith δ ij multidimensional distance between data point i and j onathan Diehl Preprocessing and Visualization January 19,
39 Locally Linear Embedding im: Preserve Neighborhoods reduce data to lower dimensional embedding find locally linear patches of the data reconstruct data points from its neighbors invariant to rotations, rescalings and translations onathan Diehl Preprocessing and Visualization January 19,
40 Locally Linear Embedding onathan Diehl Preprocessing and Visualization January 19,
41 Locally Linear Embedding. Find neighbors. Reconstruct data point as linear combination of neighbors Weight matrix. Calculate low-dimensional representation of weight matrix using eigenvector analysis onathan Diehl Preprocessing and Visualization January 19,
42 onathan Diehl Preprocessing and Visualization January 19,
43 and Visualization Summary and Discussion onathan Diehl Preprocessing and Visualization January 19,
Preprocessing and Visualization. Jonathan Diehl
RWTH Aachen University Chair of Computer Science VI Prof.Dr.-Ing.HermannNey Seminar Data Mining in WS 2003 / 2004 Preprocessing and Visualization Jonathan Diehl Matrikelnummer 235087 January 19, 2004 Tutor:
More informationData preprocessing Functional Programming and Intelligent Algorithms
Data preprocessing Functional Programming and Intelligent Algorithms Que Tran Høgskolen i Ålesund 20th March 2017 1 Why data preprocessing? Real-world data tend to be dirty incomplete: lacking attribute
More information2. Data Preprocessing
2. Data Preprocessing Contents of this Chapter 2.1 Introduction 2.2 Data cleaning 2.3 Data integration 2.4 Data transformation 2.5 Data reduction Reference: [Han and Kamber 2006, Chapter 2] SFU, CMPT 459
More informationData Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha
Data Preprocessing S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha 1 Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking
More informationData Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University
Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Exploratory data analysis tasks Examine the data, in search of structures
More informationSummary of Last Chapter. Course Content. Chapter 3 Objectives. Chapter 3: Data Preprocessing. Dr. Osmar R. Zaïane. University of Alberta 4
Principles of Knowledge Discovery in Data Fall 2004 Chapter 3: Data Preprocessing Dr. Osmar R. Zaïane University of Alberta Summary of Last Chapter What is a data warehouse and what is it for? What is
More information3. Data Preprocessing. 3.1 Introduction
3. Data Preprocessing Contents of this Chapter 3.1 Introduction 3.2 Data cleaning 3.3 Data integration 3.4 Data transformation 3.5 Data reduction SFU, CMPT 740, 03-3, Martin Ester 84 3.1 Introduction Motivation
More informationData Preprocessing. Why Data Preprocessing? MIT-652 Data Mining Applications. Chapter 3: Data Preprocessing. Multi-Dimensional Measure of Data Quality
Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate data e.g., occupation = noisy: containing
More informationData Preprocessing Yudho Giri Sucahyo y, Ph.D , CISA
Obj ti Objectives Motivation: Why preprocess the Data? Data Preprocessing Techniques Data Cleaning Data Integration and Transformation Data Reduction Data Preprocessing Lecture 3/DMBI/IKI83403T/MTI/UI
More informationUNIT 2 Data Preprocessing
UNIT 2 Data Preprocessing Lecture Topic ********************************************** Lecture 13 Why preprocess the data? Lecture 14 Lecture 15 Lecture 16 Lecture 17 Data cleaning Data integration and
More informationData Preprocessing. Data Mining 1
Data Preprocessing Today s real-world databases are highly susceptible to noisy, missing, and inconsistent data due to their typically huge size and their likely origin from multiple, heterogenous sources.
More informationData Mining. Data preprocessing. Hamid Beigy. Sharif University of Technology. Fall 1394
Data Mining Data preprocessing Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1394 1 / 15 Table of contents 1 Introduction 2 Data preprocessing
More informationData Mining: Concepts and Techniques. (3 rd ed.) Chapter 3. Chapter 3: Data Preprocessing. Major Tasks in Data Preprocessing
Data Mining: Concepts and Techniques (3 rd ed.) Chapter 3 1 Chapter 3: Data Preprocessing Data Preprocessing: An Overview Data Quality Major Tasks in Data Preprocessing Data Cleaning Data Integration Data
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES 2: Data Pre-Processing Instructor: Yizhou Sun yzsun@ccs.neu.edu September 10, 2013 2: Data Pre-Processing Getting to know your data Basic Statistical Descriptions of Data
More informationData Mining: Concepts and Techniques
Data Mining: Concepts and Techniques Chapter 2 Original Slides: Jiawei Han and Micheline Kamber Modification: Li Xiong Data Mining: Concepts and Techniques 1 Chapter 2: Data Preprocessing Why preprocess
More informationData Mining and Analytics. Introduction
Data Mining and Analytics Introduction Data Mining Data mining refers to extracting or mining knowledge from large amounts of data It is also termed as Knowledge Discovery from Data (KDD) Mostly, data
More informationECLT 5810 Data Preprocessing. Prof. Wai Lam
ECLT 5810 Data Preprocessing Prof. Wai Lam Why Data Preprocessing? Data in the real world is imperfect incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate
More informationRoad Map. Data types Measuring data Data cleaning Data integration Data transformation Data reduction Data discretization Summary
2. Data preprocessing Road Map Data types Measuring data Data cleaning Data integration Data transformation Data reduction Data discretization Summary 2 Data types Categorical vs. Numerical Scale types
More informationData Mining. Data preprocessing. Hamid Beigy. Sharif University of Technology. Fall 1395
Data Mining Data preprocessing Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 15 Table of contents 1 Introduction 2 Data preprocessing
More informationK236: Basis of Data Science
Schedule of K236 K236: Basis of Data Science Lecture 6: Data Preprocessing Lecturer: Tu Bao Ho and Hieu Chi Dam TA: Moharasan Gandhimathi and Nuttapong Sanglerdsinlapachai 1. Introduction to data science
More informationData Mining. CS57300 Purdue University. Bruno Ribeiro. February 1st, 2018
Data Mining CS57300 Purdue University Bruno Ribeiro February 1st, 2018 1 Exploratory Data Analysis & Feature Construction How to explore a dataset Understanding the variables (values, ranges, and empirical
More informationUNIT 2. DATA PREPROCESSING AND ASSOCIATION RULES
UNIT 2. DATA PREPROCESSING AND ASSOCIATION RULES Data Pre-processing-Data Cleaning, Integration, Transformation, Reduction, Discretization Concept Hierarchies-Concept Description: Data Generalization And
More informationRecognition: Face Recognition. Linda Shapiro EE/CSE 576
Recognition: Face Recognition Linda Shapiro EE/CSE 576 1 Face recognition: once you ve detected and cropped a face, try to recognize it Detection Recognition Sally 2 Face recognition: overview Typical
More informationData Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation
Data Mining Part 2. Data Understanding and Preparation 2.4 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Normalization Attribute Construction Aggregation Attribute Subset Selection Discretization
More informationPreprocessing Short Lecture Notes cse352. Professor Anita Wasilewska
Preprocessing Short Lecture Notes cse352 Professor Anita Wasilewska Data Preprocessing Why preprocess the data? Data cleaning Data integration and transformation Data reduction Discretization and concept
More informationData Mining: Concepts and Techniques. (3 rd ed.) Chapter 3
Data Mining: Concepts and Techniques (3 rd ed.) Chapter 3 Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign & Simon Fraser University 2011 Han, Kamber & Pei. All rights
More informationData Preprocessing. Komate AMPHAWAN
Data Preprocessing Komate AMPHAWAN 1 Data cleaning (data cleansing) Attempt to fill in missing values, smooth out noise while identifying outliers, and correct inconsistencies in the data. 2 Missing value
More informationBy Mahesh R. Sanghavi Associate professor, SNJB s KBJ CoE, Chandwad
By Mahesh R. Sanghavi Associate professor, SNJB s KBJ CoE, Chandwad Data Analytics life cycle Discovery Data preparation Preprocessing requirements data cleaning, data integration, data reduction, data
More informationData Preprocessing. Slides by: Shree Jaswal
Data Preprocessing Slides by: Shree Jaswal Topics to be covered Why Preprocessing? Data Cleaning; Data Integration; Data Reduction: Attribute subset selection, Histograms, Clustering and Sampling; Data
More informationTraining-Free, Generic Object Detection Using Locally Adaptive Regression Kernels
Training-Free, Generic Object Detection Using Locally Adaptive Regression Kernels IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIENCE, VOL.32, NO.9, SEPTEMBER 2010 Hae Jong Seo, Student Member,
More informationData Mining: Concepts and Techniques. Chapter 2
Data Mining: Concepts and Techniques Chapter 2 Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign www.cs.uiuc.edu/~hanj 2006 Jiawei Han and Micheline Kamber, All rights
More informationInformation Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 03 : 13/10/2015 Data Mining: Concepts and Techniques (3 rd ed.) Chapter
More informationcse634 Data Mining Preprocessing Lecture Notes Chapter 2 Professor Anita Wasilewska
cse634 Data Mining Preprocessing Lecture Notes Chapter 2 Professor Anita Wasilewska Chapter 2: Data Preprocessing (book slide) Why preprocess the data? Descriptive data summarization Data cleaning Data
More informationCS570: Introduction to Data Mining
CS570: Introduction to Data Mining Fall 2013 Reading: Chapter 3 Han, Chapter 2 Tan Anca Doloc-Mihu, Ph.D. Some slides courtesy of Li Xiong, Ph.D. and 2011 Han, Kamber & Pei. Data Mining. Morgan Kaufmann.
More informationCIE L*a*b* color model
CIE L*a*b* color model To further strengthen the correlation between the color model and human perception, we apply the following non-linear transformation: with where (X n,y n,z n ) are the tristimulus
More information3. Multidimensional Information Visualization II Concepts for visualizing univariate to hypervariate data
3. Multidimensional Information Visualization II Concepts for visualizing univariate to hypervariate data Vorlesung Informationsvisualisierung Prof. Dr. Andreas Butz, WS 2009/10 Konzept und Basis für n:
More informationVisualizing and Exploring Data
Visualizing and Exploring Data Sargur University at Buffalo The State University of New York Visual Methods for finding structures in data Power of human eye/brain to detect structures Product of eons
More informationAcquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data.
Summary Statistics Acquisition Description Exploration Examination what data is collected Characterizing properties of data. Exploring the data distribution(s). Identifying data quality problems. Selecting
More informationChapter 2 Data Preprocessing
Chapter 2 Data Preprocessing CISC4631 1 Outline General data characteristics Data cleaning Data integration and transformation Data reduction Summary CISC4631 2 1 Types of Data Sets Record Relational records
More informationECT7110. Data Preprocessing. Prof. Wai Lam. ECT7110 Data Preprocessing 1
ECT7110 Data Preprocessing Prof. Wai Lam ECT7110 Data Preprocessing 1 Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking certain attributes of interest,
More information2014 Stat-Ease, Inc. All Rights Reserved.
What s New in Design-Expert version 9 Factorial split plots (Two-Level, Multilevel, Optimal) Definitive Screening and Single Factor designs Journal Feature Design layout Graph Columns Design Evaluation
More informationData Preprocessing. Javier Béjar AMLT /2017 CS - MAI. (CS - MAI) Data Preprocessing AMLT / / 71 BY: $\
Data Preprocessing S - MAI AMLT - 2016/2017 (S - MAI) Data Preprocessing AMLT - 2016/2017 1 / 71 Outline 1 Introduction Data Representation 2 Data Preprocessing Outliers Missing Values Normalization Discretization
More informationDigital Geometry Processing
Digital Geometry Processing Spring 2011 physical model acquired point cloud reconstructed model 2 Digital Michelangelo Project Range Scanning Systems Passive: Stereo Matching Find and match features in
More informationData Mining. 2.4 Data Integration. Fall Instructor: Dr. Masoud Yaghini. Data Integration
Data Mining 2.4 Fall 2008 Instructor: Dr. Masoud Yaghini Data integration: Combines data from multiple databases into a coherent store Denormalization tables (often done to improve performance by avoiding
More informationExploratory Data Analysis EDA
Exploratory Data Analysis EDA Luc Anselin http://spatial.uchicago.edu 1 from EDA to ESDA dynamic graphics primer on multivariate EDA interpretation and limitations 2 From EDA to ESDA 3 Exploratory Data
More information3D Models and Matching
3D Models and Matching representations for 3D object models particular matching techniques alignment-based systems appearance-based systems GC model of a screwdriver 1 3D Models Many different representations
More informationCSC 411: Lecture 14: Principal Components Analysis & Autoencoders
CSC 411: Lecture 14: Principal Components Analysis & Autoencoders Raquel Urtasun & Rich Zemel University of Toronto Nov 4, 2015 Urtasun & Zemel (UofT) CSC 411: 14-PCA & Autoencoders Nov 4, 2015 1 / 18
More informationCSC 411: Lecture 14: Principal Components Analysis & Autoencoders
CSC 411: Lecture 14: Principal Components Analysis & Autoencoders Richard Zemel, Raquel Urtasun and Sanja Fidler University of Toronto Zemel, Urtasun, Fidler (UofT) CSC 411: 14-PCA & Autoencoders 1 / 18
More informationCS 521 Data Mining Techniques Instructor: Abdullah Mueen
CS 521 Data Mining Techniques Instructor: Abdullah Mueen LECTURE 2: DATA TRANSFORMATION AND DIMENSIONALITY REDUCTION Chapter 3: Data Preprocessing Data Preprocessing: An Overview Data Quality Major Tasks
More informationData Mining: Concepts and Techniques. (3 rd ed.) Chapter 3
Data Mining: Concepts and Techniques (3 rd ed.) Chapter 3 Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign & Simon Fraser University 2013 Han, Kamber & Pei. All rights
More informationImage Processing. Image Features
Image Processing Image Features Preliminaries 2 What are Image Features? Anything. What they are used for? Some statements about image fragments (patches) recognition Search for similar patches matching
More informationIAT 355 Visual Analytics. Data and Statistical Models. Lyn Bartram
IAT 355 Visual Analytics Data and Statistical Models Lyn Bartram Exploring data Example: US Census People # of people in group Year # 1850 2000 (every decade) Age # 0 90+ Sex (Gender) # Male, female Marital
More informationData Mining - Data. Dr. Jean-Michel RICHER Dr. Jean-Michel RICHER Data Mining - Data 1 / 47
Data Mining - Data Dr. Jean-Michel RICHER 2018 jean-michel.richer@univ-angers.fr Dr. Jean-Michel RICHER Data Mining - Data 1 / 47 Outline 1. Introduction 2. Data preprocessing 3. CPA with R 4. Exercise
More informationThe Novel Approach for 3D Face Recognition Using Simple Preprocessing Method
The Novel Approach for 3D Face Recognition Using Simple Preprocessing Method Parvin Aminnejad 1, Ahmad Ayatollahi 2, Siamak Aminnejad 3, Reihaneh Asghari Abstract In this work, we presented a novel approach
More informationData Preprocessing. Javier Béjar. URL - Spring 2018 CS - MAI 1/78 BY: $\
Data Preprocessing Javier Béjar BY: $\ URL - Spring 2018 C CS - MAI 1/78 Introduction Data representation Unstructured datasets: Examples described by a flat set of attributes: attribute-value matrix Structured
More informationD-Optimal Designs. Chapter 888. Introduction. D-Optimal Design Overview
Chapter 888 Introduction This procedure generates D-optimal designs for multi-factor experiments with both quantitative and qualitative factors. The factors can have a mixed number of levels. For example,
More informationCOMBINED METHOD TO VISUALISE AND REDUCE DIMENSIONALITY OF THE FINANCIAL DATA SETS
COMBINED METHOD TO VISUALISE AND REDUCE DIMENSIONALITY OF THE FINANCIAL DATA SETS Toomas Kirt Supervisor: Leo Võhandu Tallinn Technical University Toomas.Kirt@mail.ee Abstract: Key words: For the visualisation
More informationUnsupervised Learning
Unsupervised Learning Learning without Class Labels (or correct outputs) Density Estimation Learn P(X) given training data for X Clustering Partition data into clusters Dimensionality Reduction Discover
More informationData Preprocessing. Erwin M. Bakker & Stefan Manegold. https://homepages.cwi.nl/~manegold/dbdm/
Data Preprocessing Erwin M. Bakker & Stefan Manegold https://homepages.cwi.nl/~manegold/dbdm/ http://liacs.leidenuniv.nl/~bakkerem2/dbdm/ s.manegold@liacs.leidenuniv.nl e.m.bakker@liacs.leidenuniv.nl 9/26/17
More informationCSE 258 Lecture 5. Web Mining and Recommender Systems. Dimensionality Reduction
CSE 258 Lecture 5 Web Mining and Recommender Systems Dimensionality Reduction This week How can we build low dimensional representations of high dimensional data? e.g. how might we (compactly!) represent
More informationOn a fast discrete straight line segment detection
On a fast discrete straight line segment detection Ali Abdallah, Roberto Cardarelli, Giulio Aielli University of Rome Tor Vergata Abstract Detecting lines is one of the fundamental problems in image processing.
More informationClustering CS 550: Machine Learning
Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf
More informationLast time... Coryn Bailer-Jones. check and if appropriate remove outliers, errors etc. linear regression
Machine learning, pattern recognition and statistical data modelling Lecture 3. Linear Methods (part 1) Coryn Bailer-Jones Last time... curse of dimensionality local methods quickly become nonlocal as
More informationData Exploration and Preparation Data Mining and Text Mining (UIC Politecnico di Milano)
Data Exploration and Preparation Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining, : Concepts and Techniques", The Morgan Kaufmann
More informationPrincipal Component Analysis (PCA) is a most practicable. statistical technique. Its application plays a major role in many
CHAPTER 3 PRINCIPAL COMPONENT ANALYSIS ON EIGENFACES 2D AND 3D MODEL 3.1 INTRODUCTION Principal Component Analysis (PCA) is a most practicable statistical technique. Its application plays a major role
More informationNonlinear Dimensionality Reduction Applied to the Classification of Images
onlinear Dimensionality Reduction Applied to the Classification of Images Student: Chae A. Clark (cclark8 [at] math.umd.edu) Advisor: Dr. Kasso A. Okoudjou (kasso [at] math.umd.edu) orbert Wiener Center
More informationCOSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor
COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality
More informationPart I. Graphical exploratory data analysis. Graphical summaries of data. Graphical summaries of data
Week 3 Based in part on slides from textbook, slides of Susan Holmes Part I Graphical exploratory data analysis October 10, 2012 1 / 1 2 / 1 Graphical summaries of data Graphical summaries of data Exploratory
More informationLecture Topic Projects
Lecture Topic Projects 1 Intro, schedule, and logistics 2 Applications of visual analytics, basic tasks, data types 3 Introduction to D3, basic vis techniques for non-spatial data Project #1 out 4 Data
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,
More informationDATA PREPROCESSING. Tzompanaki Katerina
DATA PREPROCESSING Tzompanaki Katerina Background: Data storage formats Data in DBMS ODBC, JDBC protocols Data in flat files Fixed-width format (each column has a specific number of characters, filled
More informationBBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler, Sanjay Ranka
BBS654 Data Mining Pinar Duygulu Slides are adapted from Nazli Ikizler, Sanjay Ranka Topics What is data? Definitions, terminology Types of data and datasets Data preprocessing Data Cleaning Data integration
More informationDatabase and Knowledge-Base Systems: Data Mining. Martin Ester
Database and Knowledge-Base Systems: Data Mining Martin Ester Simon Fraser University School of Computing Science Graduate Course Spring 2006 CMPT 843, SFU, Martin Ester, 1-06 1 Introduction [Fayyad, Piatetsky-Shapiro
More informationCPSC 340: Machine Learning and Data Mining. Kernel Trick Fall 2017
CPSC 340: Machine Learning and Data Mining Kernel Trick Fall 2017 Admin Assignment 3: Due Friday. Midterm: Can view your exam during instructor office hours or after class this week. Digression: the other
More informationTable Of Contents: xix Foreword to Second Edition
Data Mining : Concepts and Techniques Table Of Contents: Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments xxxi About the Authors xxxv Chapter 1 Introduction 1 (38) 1.1 Why Data
More informationComputer vision: models, learning and inference. Chapter 13 Image preprocessing and feature extraction
Computer vision: models, learning and inference Chapter 13 Image preprocessing and feature extraction Preprocessing The goal of pre-processing is to try to reduce unwanted variation in image due to lighting,
More information2 CONTENTS. 3.8 Bibliographic Notes... 45
Contents 3 Data Preprocessing 3 3.1 Data Preprocessing: An Overview................. 4 3.1.1 Data Quality: Why Preprocess the Data?......... 4 3.1.2 Major Tasks in Data Preprocessing............. 5 3.2
More informationCSE 255 Lecture 5. Data Mining and Predictive Analytics. Dimensionality Reduction
CSE 255 Lecture 5 Data Mining and Predictive Analytics Dimensionality Reduction Course outline Week 4: I ll cover homework 1, and get started on Recommender Systems Week 5: I ll cover homework 2 (at the
More informationData Mining: Exploring Data. Lecture Notes for Chapter 3
Data Mining: Exploring Data Lecture Notes for Chapter 3 1 What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include
More information10-701/15-781, Fall 2006, Final
-7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly
More informationWeek 7 Picturing Network. Vahe and Bethany
Week 7 Picturing Network Vahe and Bethany Freeman (2005) - Graphic Techniques for Exploring Social Network Data The two main goals of analyzing social network data are identification of cohesive groups
More informationFREE-FORM SURFACE DESIGN: SUBDIVISION + SMOOTHING (4) 20 STEPS OF NEW ALGORITHM WITH = 0:33 = 0:34 FREE-FORM SURFACE DESIGN: SUBDIVISION + SMOOTHING (
SMOOTH INTERPOLATION (2) ONE CONSTRAINT (x N C ) 1 = x 1 WRITE DESIRED CONSTRAINED SMOOTH SIGNAL x N AS SUM OF C UNCONSTRAINED SMOOTH SIGNAL x N = Fx (F = f(k) N ) PLUS SMOOTH DEFORMATION d 1 x N C = xn
More informationDimensionality Reduction, including by Feature Selection.
Dimensionality Reduction, including by Feature Selection www.cs.wisc.edu/~dpage/cs760 Goals for the lecture you should understand the following concepts filtering-based feature selection information gain
More informationChapter 3: Intensity Transformations and Spatial Filtering
Chapter 3: Intensity Transformations and Spatial Filtering 3.1 Background 3.2 Some basic intensity transformation functions 3.3 Histogram processing 3.4 Fundamentals of spatial filtering 3.5 Smoothing
More informationData Preprocessing. Chapter Why Preprocess the Data?
Contents 2 Data Preprocessing 3 2.1 Why Preprocess the Data?........................................ 3 2.2 Descriptive Data Summarization..................................... 6 2.2.1 Measuring the Central
More informationStatistical image models
Chapter 4 Statistical image models 4. Introduction 4.. Visual worlds Figure 4. shows images that belong to different visual worlds. The first world (fig. 4..a) is the world of white noise. It is the world
More informationData Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining
Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan, Steinbach, Kumar What is data exploration? A preliminary exploration of the data to better understand its characteristics.
More informationData Mining: Exploring Data. Lecture Notes for Data Exploration Chapter. Introduction to Data Mining
Data Mining: Exploring Data Lecture Notes for Data Exploration Chapter Introduction to Data Mining by Tan, Steinbach, Karpatne, Kumar 02/03/2018 Introduction to Data Mining 1 What is data exploration?
More informationAnalyzing Outlier Detection Techniques with Hybrid Method
Analyzing Outlier Detection Techniques with Hybrid Method Shruti Aggarwal Assistant Professor Department of Computer Science and Engineering Sri Guru Granth Sahib World University. (SGGSWU) Fatehgarh Sahib,
More information3D from Photographs: Automatic Matching of Images. Dr Francesco Banterle
3D from Photographs: Automatic Matching of Images Dr Francesco Banterle francesco.banterle@isti.cnr.it 3D from Photographs Automatic Matching of Images Camera Calibration Photographs Surface Reconstruction
More informationDimension Reduction of Image Manifolds
Dimension Reduction of Image Manifolds Arian Maleki Department of Electrical Engineering Stanford University Stanford, CA, 9435, USA E-mail: arianm@stanford.edu I. INTRODUCTION Dimension reduction of datasets
More informationSpecific Object Recognition: Matching in 2D
Specific Object Recognition: Matching in 2D engine model Is there an engine in the image? If so, where is it located? image containing an instance of the model Alignment Use a geometric feature-based model
More informationRegression III: Advanced Methods
Lecture 3: Distributions Regression III: Advanced Methods William G. Jacoby Michigan State University Goals of the lecture Examine data in graphical form Graphs for looking at univariate distributions
More informationCPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017
CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2017 Assignment 3: 2 late days to hand in tonight. Admin Assignment 4: Due Friday of next week. Last Time: MAP Estimation MAP
More information/4 Directions: Graph the functions, then answer the following question.
1.) Graph y = x. Label the graph. Standard: F-BF.3 Identify the effect on the graph of replacing f(x) by f(x) +k, k f(x), f(kx), and f(x+k), for specific values of k; find the value of k given the graphs.
More informationAugmented Reality VU. Computer Vision 3D Registration (2) Prof. Vincent Lepetit
Augmented Reality VU Computer Vision 3D Registration (2) Prof. Vincent Lepetit Feature Point-Based 3D Tracking Feature Points for 3D Tracking Much less ambiguous than edges; Point-to-point reprojection
More informationData Mining: Exploring Data. Lecture Notes for Chapter 3
Data Mining: Exploring Data Lecture Notes for Chapter 3 Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler Look for accompanying R code on the course web site. Topics Exploratory Data Analysis
More informationNon-linear dimension reduction
Sta306b May 23, 2011 Dimension Reduction: 1 Non-linear dimension reduction ISOMAP: Tenenbaum, de Silva & Langford (2000) Local linear embedding: Roweis & Saul (2000) Local MDS: Chen (2006) all three methods
More information( ) =cov X Y = W PRINCIPAL COMPONENT ANALYSIS. Eigenvectors of the covariance matrix are the principal components
Review Lecture 14 ! PRINCIPAL COMPONENT ANALYSIS Eigenvectors of the covariance matrix are the principal components 1. =cov X Top K principal components are the eigenvectors with K largest eigenvalues
More informationWeb Information Retrieval
Lucian Blaga University of Sibiu Hermann Oberth Engineering Faculty Computer Science Department Web Information Retrieval First Technical Report PhD title: Data Mining for unstructured data Author: Daniel
More information