Pattern Recognition: Theory and Applications

Similar documents
CS 521 Data Mining Techniques Instructor: Abdullah Mueen

2. Basic Task of Pattern Classification

Data Mining: Data. Lecture Notes for Chapter 2. Introduction to Data Mining

Data Preprocessing. Data Preprocessing

University of Florida CISE department Gator Engineering. Data Preprocessing. Dr. Sanjay Ranka

CS 584 Data Mining. Classification 1

UNIT 2 Data Preprocessing

Data Preprocessing. Data Mining 1

Applying Supervised Learning

Data Preprocessing. Why Data Preprocessing? MIT-652 Data Mining Applications. Chapter 3: Data Preprocessing. Multi-Dimensional Measure of Data Quality

CS570: Introduction to Data Mining

3. Data Preprocessing. 3.1 Introduction

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

2. Data Preprocessing

Data Mining Classification: Bayesian Decision Theory

3 Feature Selection & Feature Extraction

By Mahesh R. Sanghavi Associate professor, SNJB s KBJ CoE, Chandwad

Machine Learning Feature Creation and Selection

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Segmentation of Images

Biometrics Technology: Image Processing & Pattern Recognition (by Dr. Dickson Tong)

DI TRANSFORM. The regressive analyses. identify relationships

Feature Selection for Image Retrieval and Object Recognition

PATTERN RECOGNITION USING NEURAL NETWORKS

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

K236: Basis of Data Science

Data Preprocessing. Slides by: Shree Jaswal

Clustering and Visualisation of Data

Data Mining: Data. What is Data? Lecture Notes for Chapter 2. Introduction to Data Mining. Properties of Attribute Values. Types of Attributes

Data Mining: Data. Lecture Notes for Chapter 2. Introduction to Data Mining

Table Of Contents: xix Foreword to Second Edition

Linear Methods for Regression and Shrinkage Methods

Name of the lecturer Doç. Dr. Selma Ayşe ÖZEL

ECG782: Multidimensional Digital Signal Processing

Machine Learning Techniques for Data Mining

Data Preprocessing Yudho Giri Sucahyo y, Ph.D , CISA

Data Mining. Ryan Benton Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, La., USA.

SGN (4 cr) Chapter 10

Slides for Data Mining by I. H. Witten and E. Frank

Data Exploration and Preparation Data Mining and Text Mining (UIC Politecnico di Milano)

Announcements. Recognition. Recognition. Recognition. Recognition. Homework 3 is due May 18, 11:59 PM Reading: Computer Vision I CSE 152 Lecture 14

Image Analysis, Classification and Change Detection in Remote Sensing

3D Models and Matching

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation

Data Mining. Jeff M. Phillips. January 7, 2019 CS 5140 / CS 6140

Data Preprocessing. Komate AMPHAWAN

Facial Expression Recognition Using Non-negative Matrix Factorization

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

Recognition: Face Recognition. Linda Shapiro EE/CSE 576

Machine Learning (CSMML16) (Autumn term, ) Xia Hong

Contents. Foreword to Second Edition. Acknowledgments About the Authors

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06

Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

What is machine learning?

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

Supervised vs. Unsupervised Learning

CHAPTER 4 WAVELET TRANSFORM-GENETIC ALGORITHM DENOISING TECHNIQUE

CSE 158. Web Mining and Recommender Systems. Midterm recap

Data Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

FACE RECOGNITION USING INDEPENDENT COMPONENT

Rapid growth of massive datasets

CSIS Introduction to Pattern Classification. CSIS Introduction to Pattern Classification 1

CHAPTER 3 DIFFERENT DOMAINS OF WATERMARKING. domain. In spatial domain the watermark bits directly added to the pixels of the cover

Unsupervised Learning

Dimension reduction : PCA and Clustering

Gene Clustering & Classification

Data Mining. Data preprocessing. Hamid Beigy. Sharif University of Technology. Fall 1395

Efficient Image Compression of Medical Images Using the Wavelet Transform and Fuzzy c-means Clustering on Regions of Interest.

Unsupervised learning in Vision

Machine Learning 13. week

CPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2018

10701 Machine Learning. Clustering

Image Processing. Image Features

Data Preprocessing. Erwin M. Bakker & Stefan Manegold.

Research on the New Image De-Noising Methodology Based on Neural Network and HMM-Hidden Markov Models

Summary of Last Chapter. Course Content. Chapter 3 Objectives. Chapter 3: Data Preprocessing. Dr. Osmar R. Zaïane. University of Alberta 4

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 3. Chapter 3: Data Preprocessing. Major Tasks in Data Preprocessing

CHAPTER 8 COMPOUND CHARACTER RECOGNITION USING VARIOUS MODELS

An Introduction to Pattern Recognition

(Refer Slide Time 00:17) Welcome to the course on Digital Image Processing. (Refer Slide Time 00:22)

Nominal Data. May not have a numerical representation Distance measures might not make sense. PR and ANN

IMAGE ANALYSIS, CLASSIFICATION, and CHANGE DETECTION in REMOTE SENSING

Clustering. Supervised vs. Unsupervised Learning

CHAPTER 1 INTRODUCTION

Discriminate Analysis

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

COMBINED METHOD TO VISUALISE AND REDUCE DIMENSIONALITY OF THE FINANCIAL DATA SETS

CS325 Artificial Intelligence Ch. 20 Unsupervised Machine Learning

Object Recognition. Lecture 11, April 21 st, Lexing Xie. EE4830 Digital Image Processing

CMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10)

Data Mining. Data preprocessing. Hamid Beigy. Sharif University of Technology. Fall 1394

10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors

Selection of Location, Frequency and Orientation Parameters of 2D Gabor Wavelets for Face Recognition

Pattern Recognition in Image Analysis

Pattern Classification Algorithms for Face Recognition

Transcription:

Pattern Recognition: Theory and Applications Saroj K. Meher Systems Science and Informatics Unit, Indian Statistical Institute, Bangalore Centre.

2/134 Contents: Pattern recognition Introduction Definitions and framework Advantages and Applications areas Discrimination functions Learning methodologies Some examples of classification and Clustering methods Semisupervised and co-training algorithm Multiple Classification Systems Contd

3/134 Pattern Recognition (PR): Motivation Pattern recognition stems from the need for automated machine recognition of objects, signals or images, or the need for automated decision-making based on a given set of parameters.

4/134 PR Defined PR is a domain of machine learning. It is the act of taking in raw data and make an action based on it. The assignment of a physical object or event to one of several prespecified categories Duda and Hart A problem of estimating density functions in a high-dimensional space and dividing the space into the regions of categories or classes Fukunaga The science that concerns the description or classification (recognition) of measurements Schalkoff The process of giving names ω to observations x, Schürmann PR is concerned with answering the question What is this? Morse PR PLAYS THE ROLE WHEN A DECISION IS ABOUT TO TAKE

5/134 Hybrid Systems Knowledge-based Systems Probabilistic reasoning Approximate reasoning Case based reasoning Fuzzy logic Rough sets Neuro-fuzzy Genetic neural Fuzzy genetic Fuzzy Rough Neuro Rough Machine Intelligence Non-linear Dynamics Data Driven Systems Neural network system Evolutionary computing Pattern recognition and learning Chaos theory Rescaled range analysis (wavelet) Fractal analysis Machine Intelligence: A core concept for grouping various advanced technologies with Pattern Recognition and Learning

6/134 Pattern Recognition System (PRS) Measurement Feature Decision Space Space Space

7/134 What is a pattern? A pattern is the opposite of a chaos; it is an entity vaguely defined, that could be given a name. WHAT ABOUT TEXTURE?

8/134 What is a pattern? A pattern is an abstract object, or a set of measurements describing a physical object.

9/134 Examples of Patterns Cristal Patterns: atómic or molecular Their structures are represented by 3D graphs and can be described by deterministic grammars or formal languages

10/134 Examples of Patterns Patterns of Constellations Patterns of constellations are represented by 2D planar graphs Human perception has strong tendency to find patterns from anything. We see patterns from even random noise --- we are more likely to believe a hidden pattern than denying it when the risk (reward) for missing (discovering) a pattern is often high.

11/134 Examples of Patterns Biological Patterns --- Landmarks are identified from biologic forms and these patterns are then represented by a list of points. But for other forms, like the root of plants, Points cannot be registered crossing instances. Applications: Biometrics, computacional anatomy, brain mapping,

12/134 Examples of Patterns Biological Patterns Landmarks are identified from biologic forms and these patterns are then represented by a list of points.

13/134 Examples of Patterns Music Patterns Ravel Symphony?

14/134 Examples of Patterns People Recognition Funny, Funny Patterns Behavior?

15/134 Examples of Patterns Discovery and Association of Patterns Statistics show connections between the shape of one s face (adults) and his/her Character. There is also evidence that the outline of children s face is related to alcohol abuse during pregnancy.

16/134 Examples of Patterns Discovery and Association of Patterns What are the features? Statistics show connections between the shape of one s face (adults) and his/her Character.

17/134 Examples of Patterns Patterns of Brain Activity We may understand patterns of brain activity and find relationships between brain activities, cognition, and behaviors

18/134 Examples of Patterns Variation Patterns: 1. Expression geometric deformation 2. illumination--- Photometric deformation 3. Transformation 2D pose 3D 4. Noise and Occlusion

19/134 Examples of Patterns A broad range of texture patterns are generated by stochastic processes.

20/134 Examples of Patterns. How are these patterns represented in human mind?

21/134 Examples of Patterns Speech signals and Hidden Markov models

22/134 Examples of Patterns Natural Language and stochastic grammar..

23/134 Examples of Patterns Object Recognition Patterns everywhere?

24/134 Examples of Patterns Maps Recognition Patterns of Global Warming?

25/134 Examples of Patterns Finacial Series Pattern Recognition

26/134 Examples of Patterns How to Trade Chart Patterns?

27/134 Examples of Patterns Pattern Recognition in Medical Diagnosis

28/134 Examples of Patterns Optical Character Recognition

29/134 Examples of Patterns Graphic Arts Escher, who else?

30/134 Examples of Patterns Human Genome Beautiful Patterns!

31/134 Approaches Statistical PR: based on underlying statistical model of patterns and pattern classes. Neural networks: classifier is represented as a network of cells modeling neurons of the human brain (connectionist approach). Structural (or syntactic) PR: pattern classes represented by means of formal structures as grammars, automata, strings, etc.

32/134 What is a pattern class? A pattern class (or category) is a set of patterns sharing common attributes. A collection of similar (not necessarily identical) objects. During recognition given objects are assigned to prescribed classes.

33/134 What is pattern recognition? Theory, Algorithms, Systems to put Patterns into Categories Relate Perceived Pattern to Previously Perceived Patterns Learn to distinguish patterns of interest from their background

34/134 Human Perception Humans have developed highly sophisticated skills for sensing their environment and taking actions according to what they observe, e.g., Recognizing a face. Understanding spoken words. Reading handwriting. Distinguishing fresh food from its smell. We would like to give similar capabilities to machines.

35/134 Examples of applications Optical Character Recognition (OCR) Handwritten: sorting letters by postal code. Printed texts: reading machines for blind people, digitalization of text documents. Biometrics Face recognition, verification, retrieval. Finger prints recognition. Speech recognition. Diagnostic systems Medical diagnosis: X-Ray, EKG (ElectroCardioGraph) analysis. Military applications Automated Target Recognition (ATR). Image segmentation and analysis (recognition from aerial or satelite photographs).

36/134 THE STATISTICAL APPROACH

37/134 Examples of applications Grid by Grid Comparison A A B Grid by Grid Comparison

38/134 Examples of applications Grid by Grid Comparison A A B 0 0 1 0 0 0 1 0 0 1 1 1 1 0 0 1 1 0 0 1 0 1 1 0 0 1 1 0 0 1 1 0 1 0 0 1 1 0 0 1 No of Mismatch= 3 38

39/134 Examples of applications Grid by Grid Comparison A A B Grid by Grid Comparison

40/134 Examples of applications Grid by Grid Comparison A A B 0 0 1 0 0 0 1 0 0 1 1 1 1 0 0 1 1 0 0 1 No of Mismatch= 9 1 1 1 0 0 1 0 1 0 1 1 1 0 1 0 1 1 1 1 0

41/134 */-+1@# Problem with Grid by Grid Comparison Time to recognize a pattern - Proportional to the number of stored patterns ( Too costly with the increase of number of patterns stored ) A-Z a-z 0-9 Solution Artificial Intelligence

42/134 Human and Machine Perception We are often influenced by the knowledge of how patterns are modeled and recognized in nature when we develop pattern recognition algorithms. Research on machine perception also helps us gain deeper understanding and appreciation for pattern recognition systems in nature. Yet, we also apply many techniques that are purely numerical and do not have any correspondence in natural systems.

43/134 Pattern Recognition Two Phase : Learning and Detection. Time to learn is higher. Driving a car Difficult to learn but once learnt it becomes natural. Can use AI learning methodologies such as: Neural Network. Machine Learning.

44/134 Learning How can machine learn the rule from data? Supervised learning: a teacher provides a category label or cost for each pattern in the training set. Here samples from information classes (training data) are used for learning and then classifying unknown data points/patterns. Unsupervised learning: the system forms clusters or natural groupings of the input patterns. Semi-Supervised: A small training samples from information classes are used for initial learning. The model is further built using unlabeled samples for classifying unknown patterns

Types of Prediction Problems (1/2) Classification The PR problem of assigning an object to a class The output of the PR system is an integer label e.g. classifying a product as good or bad in a quality control test Regression A generalization of a classification task The output of the PR system is a real-valued number e.g. predicting the share value of a firm based on past performance and stock market indicators

Types of Prediction Problems (2/2) Clustering The problem of organizing objects into meaningful groups The system returns a (sometimes hierarchical) grouping of objects e.g. organizing life forms into a taxonomy of species Description The problem of representing an object in terms of a series of primitives The PR system produces a structural or linguistic description e.g. labeling an ECG signal in terms of P, QRS and T complexes

47/134 Classification vs. Clustering Classification (known categories) Clustering (creation of new categories) Category A Category B Classification (Supervised Classification) Clustering (Unsupervised Classification)

48/134

49/134 Basic concepts (Classification) y Hidden state y Y - Cannot be directly measured. x1 Feature vector x2 = x - x is a point in feature space X. x n - Patterns with equal hidden state belong to the same class. Task Pattern - To design a classifer (decision rule) x X - A vector of observations (measurements). d : X Y which decides about a hidden state based on an onbservation.

50/134 Dimension The dimension of a space or object is informally defined as the minimum number of coordinates needed to specify any point within it, e.g., A line has a dimension of one because only one coordinate is needed to specify a point on it (for example, the point at 5 on a number line). A surface such as a plane or the surface of a cylinder or sphere has a dimension of two because two coordinates are needed to specify a point on it (for example, to locate a point on the surface of a sphere you need both its latitude and its longitude). The inside of a cube, a cylinder or a sphere is three-dimensional because three coordinates are needed to locate a point within these spaces. From left to right, the square, the cube, and the tesseract. The square is bounded by 1-dimensional lines, the cube by 2-dimensional areas, and the tesseract by 3-dimensional volumes.

51/134 Feature extraction Task: to extract features which are good for classification. Good features: Objects from the same class have similar feature values. Objects from different classes have different values. Good features Bad features

52/134 Dimensionality Reduction

53/134 Dimensionality Reduction A limited yet salient feature set simplifies both pattern representation and classifier design. Pattern representation is easy for 2D and 3D features. How to make pattern with high dimensional features viewable?

54/134 Dimensionality Reduction How? Feature Extraction Create new features based on the original feature set Transforms are usually involved Feature Selection Select the best subset from a given feature set.

55/134 Main Issues in Dimensionality Reduction The choice of a criterion function Commonly used criterion: classification error The determination of the appropriate dimensionality Correlated with the intrinsic dimensionality of data

56/134 Dimensionality Reduction Feature Extraction

57/134 Feature Extractor x i Feature Extractor y i ( x, x 2,, x ) T ( y, y, y ) T i1 i id i i2, im m d, usually 1

58/134 Some Important Methods Principal Component Analysis (PCA) or Karhunen-Loeve Expansion Project Pursuit Independent Component Analysis (ICA) Factor Analysis Discriminate Analysis Kernel PCA Multidimensional Scaling (MDS) Feed-Forward Neural Networks Self-Organizing Map Linear Approaches Nonlinear Approaches Neural Networks

59/134 Dimensionality Reduction Feature Selection

60/134 Feature Selector d # possible Selections m x x Feature Selector ( x, x, x ) T 2, ( x, x,, x ) T 1 d 1 ' 2' m ' m d, usually

61/134 The problem Given a set of d features, select a subset of size m that leads to the smallest classification error. No nonexhaustive sequential feature selection procedure can be guaranteed to produce the optimal subset.

62/134 Optimal Methods Exhaustive Search Evaluate all possible subsets Branch-and-Bound The monotonicity property of the criterion function has to be held.

63/134 Suboptimal Methods Best Individual Features Sequential Forward Selection (SFS) Sequential Backward Selection (SBS) Plus l-take away r Selection Sequential Forward Floating Search and Sequential Backward Floating Search

64/134 m k m m 2 1 x n x x 2 1 φ 1 φ 2 φ n m k m m m 3 2 1 x n x x 2 1 Feature extraction Feature selection Feature extraction methods

65/134 Example: 1 height Task: jockey-hoopster recognition. weight x x 1 2 = x The set of hidden state is The feature space is X Y = { H, J} 2 = R Linear classifier: Training examples x, y ),,( x l, y )} x 2 {( 1 1 l y = H d( x) = H J if if ( w x) + b ( w x) + b < 0 0 y = J ( w x) + b = 0 x 1

66/134 Classifier A classifier partitions feature space X into class-labeled regions such that X = X X X and X X X {0} 1 2 Y 1 2 Y = X 1 X 3 X 1 X 1 X 2 X 2 X 3 The classification consists of determining to which region a feature vector x belongs to. Borders between decision boundaries are called decision regions.

67/134 Representation of classifier A classifier is typically represented as a set of discriminant functions f i ( x) : X R, i = 1,, Y i > j j i The classifier assigns a feature vector x to the i-th class if f ( x) f ( x) f 1 ( x) x Feature vector f 2 ( x) max y Class identifier f Y ( x) Discriminant function

68/134 Pattern Discrimination Revisited Decision Regions and Functions The feature space is x = [ x, x ] R 1 Linear decision function : d( x) = w x1 + w2 x2 + w0 2 2 1 = 0 Linear classifier: d( x) = o x if if ( w x) + b ( w x) + b < 0 0

69/134 Case Study Fish Classification: Sea Bass / Salmon. Salmon Problem: Sorting incoming fish on a conveyor belt according to species. Sea-bass

70/134 Case Study (Cont.) What can cause problems during sensing? Lighting conditions. Position of fish on the conveyor belt. Camera noise. etc What are the steps in the process? 1. Capture image. 2. Isolate fish 3. Take measurements 4. Make decision

71/134 Case Study (Cont.) Pre-processing Feature Extraction Classification Sea Bass Salmon

72/134 Case Study (Cont.) Pre-Processing: Image enhancement Separating touching or occluding fish. Finding the boundary of the fish.

73/134 Case Study (Cont.) How to separate sea bass from salmon? Possible features to be used: Length Lightness Width Number and shape of fins Position of the mouth Etc Assume a fisherman told us that a sea bass is generally longer than a salmon. Even though sea bass is longer than salmon on the average, there are many examples of fish where this observation does not hold.

74/134 Feature Selection: Good/Bad Good features Bad features

75/134 Case Study (Cont.) How to separate sea bass from salmon? To improve recognition, we might have to use more than one feature at a time. Single features might not yield the best performance. Combinations of features might yield better performance. x x 1 2 x x 1 2 : lightness : width

76/134 Features and Distributions The length/width is a poor feature alone! Select the lightness as a possible feature.

77/134 Decision/classification Boundaries

78/134 Decision/classification Boundaries

79/134 Decision/classification Boundaries We might add other features that are not correlated with the ones we already have. A precaution should be taken not to reduce the performance by adding such noisy features Ideally, the best decision boundary should be the one which provides an optimal performance such as in the following figure (next slide):

80/134 Decision/classification Boundaries

81/134 Decision/classification Boundaries However, our satisfaction is premature because the central aim of designing a classifier is to correctly classify novel input Issue of generalization!

82/134 Decision/classification Boundaries

83/134 Occam s Razor Entities are not to be multiplied without necessity William of Occam (1284-1347)

84/134 Mapping Data to a New Space Fourier transform Wavelet transform Two Sine Waves Two Sine Waves + Noise Frequency April 16, 2013 84

85/134 What Is Wavelet Transform? Decomposes a signal into different frequency subbands Applicable to n-dimensional signals Data are transformed to preserve relative distance between objects at different levels of resolution Allow natural clusters to become more distinguishable Used for image compression April 16, 2013 85

86/134 What Is Wavelet Transform? Discrete wavelet transform (DWT) for linear signal processing, multiresolution analysis Compressed approximation: store only a small fraction of the strongest of the wavelet coefficients Similar to discrete Fourier transform (DFT), but better lossy compression, localized in space Method: Length, L, must be an integer power of 2 (padding with 0 s, when necessary) Each transform has 2 functions: smoothing, difference Applies to pairs of data, resulting in two set of data of length L/2 Applies two functions recursively, until reaches the desired length April 16, 2013 86

87/134 What Decomposition Wavelets: A math tool for space-efficient hierarchical decomposition of functions S = [2, 2, 0, 2, 3, 5, 4, 4] can be transformed to S^ = [2 3 / 4, -1 1 / 4, 1 / 2, 0, 0, -1, -1, 0] Compression: many small detail coefficients can be replaced by 0 s, and only the significant coefficients are retained April 16, 2013 87

88/134 Why Wavelet Transform? Use hat-shape filters Emphasize region where points cluster Suppress weaker information in their boundaries Effective removal of outliers Insensitive to noise, insensitive to input order Multi-resolution Detect arbitrary shaped clusters at different scales Efficient Complexity O(N) April 16, 2013 88

89/134 Principal Component Analysis (PCA) Find a projection that captures the largest amount of variation in data The original data are projected onto a much smaller space, resulting in dimensionality reduction. We find the eigenvectors of the covariance matrix, and these eigenvectors define the new space x 2 e x 1 April 16, 2013 89

90/134 Principal Component Analysis (Steps) Given N data vectors from n-dimensions, find k n orthogonal vectors (principal components) that can be best used to represent data Normalize input data: Each attribute falls within the same range Compute k orthonormal (unit) vectors, i.e., principal components Each input data (vector) is a linear combination of the k principal component vectors The principal components are sorted in order of decreasing significance or strength Since the components are sorted, the size of the data can be reduced by eliminating the weak components, i.e., those with low variance (i.e., using the strongest principal components, it is possible to reconstruct a good approximation of the original data) Works for numeric data only April 16, 2013 90

91/134 Feature Subset Selection: revisited Another way to reduce dimensionality of data Redundant attributes Duplicate much or all of the information contained in one or more other attributes E.g., purchase price of a product and the amount of sales tax paid Irrelevant attributes Contain no information that is useful for the data mining task at hand E.g., students' ID is often irrelevant to the task of predicting students' GPA April 16, 2013 91

92/134 Feature Subset Selection Techniques: Brute-force approch: Try all possible feature subsets as input to data mining algorithm Embedded approaches: Feature selection occurs naturally as part of the data mining algorithm Filter approaches: Features are selected before data mining algorithm is run Wrapper approaches: Use the data mining algorithm as a black box to find best subset of attributes

93/134 Example: Classification and Clustering

94/134 Validation of models

95/134 Semisupervised and Co-Training

96/134 Multiple Classification Systems

97/134 Granular Computing

98/134 GrC practical

99/134 Pattern Recognition System (PRS) Measurement Feature Decision Space Space Space Uncertainties arise from deficiencies of information available from a situation Deficiencies may result from incomplete, imprecise, illdefined, not fully reliable, vague, contradictory information in various stages of a PRS