Fundamentals of Media Processing. Shin'ichi Satoh Kazuya Kodama Hiroshi Mo Duy-Dinh Le

Similar documents
Our Learning Problem, Again

Image Segmentation EEE 508

Cluster Analysis. Andrew Kusiak Intelligent Systems Laboratory

CSCI 5090/7090- Machine Learning. Spring Mehdi Allahyari Georgia Southern University

Administrative UNSUPERVISED LEARNING. Unsupervised learning. Supervised learning 11/25/13. Final project. No office hours today

Pattern Recognition Systems Lab 1 Least Mean Squares

Lecture 6. Lecturer: Ronitt Rubinfeld Scribes: Chen Ziv, Eliav Buchnik, Ophir Arie, Jonathan Gradstein

The isoperimetric problem on the hypercube

DATA MINING II - 1DL460

SD vs. SD + One of the most important uses of sample statistics is to estimate the corresponding population parameters.

Big-O Analysis. Asymptotics

Eigenimages. Digital Image Processing: Bernd Girod, 2013 Stanford University -- Eigenimages 1

Improving Template Based Spike Detection

Diego Nehab. n A Transformation For Extracting New Descriptors of Shape. n Locus of points equidistant from contour

Ones Assignment Method for Solving Traveling Salesman Problem

Eigenimages. Digital Image Processing: Bernd Girod, Stanford University -- Eigenimages 1

UNIT 4 Section 8 Estimating Population Parameters using Confidence Intervals

Big-O Analysis. Asymptotics

. Written in factored form it is easy to see that the roots are 2, 2, i,

Parabolic Path to a Best Best-Fit Line:

The Closest Line to a Data Set in the Plane. David Gurney Southeastern Louisiana University Hammond, Louisiana

Bezier curves. Figure 2 shows cubic Bezier curves for various control points. In a Bezier curve, only

condition w i B i S maximum u i

IMAGE-BASED MODELING AND RENDERING 1. HISTOGRAM AND GMM. I-Chen Lin, Dept. of CS, National Chiao Tung University

Computers and Scientific Thinking

Package popkorn. R topics documented: February 20, Type Package

Mathematical Stat I: solutions of homework 1

Lecture 5. Counting Sort / Radix Sort

. Perform a geometric (ray-optics) construction (i.e., draw in the rays on the diagram) to show where the final image is formed.

IMP: Superposer Integrated Morphometrics Package Superposition Tool

Improved Random Graph Isomorphism

CSC165H1 Worksheet: Tutorial 8 Algorithm analysis (SOLUTIONS)

Computational Geometry

Announcements. Recognition III. A Rough Recognition Spectrum. Projection, and reconstruction. Face detection using distance to face space

Polynomial Functions and Models. Learning Objectives. Polynomials. P (x) = a n x n + a n 1 x n a 1 x + a 0, a n 0

EVALUATION OF TRIGONOMETRIC FUNCTIONS

Optimized Aperiodic Concentric Ring Arrays

Lecturers: Sanjam Garg and Prasad Raghavendra Feb 21, Midterm 1 Solutions

The Platonic solids The five regular polyhedra

SAMPLE VERSUS POPULATION. Population - consists of all possible measurements that can be made on a particular item or procedure.

CS200: Hash Tables. Prichard Ch CS200 - Hash Tables 1

A New Morphological 3D Shape Decomposition: Grayscale Interframe Interpolation Method

Bayesian approach to reliability modelling for a probability of failure on demand parameter

9 x and g(x) = 4. x. Find (x) 3.6. I. Combining Functions. A. From Equations. Example: Let f(x) = and its domain. Example: Let f(x) = and g(x) = x x 4

Random Graphs and Complex Networks T

Alpha Individual Solutions MAΘ National Convention 2013

Test 4 Review. dy du 9 5. sin5 zdz. dt. 5 Ê. x 2 È 1, 3. 2cos( x) dx is less than using Simpson's. ,1 t 5 t 2. ft () t2 4.

Performance Plus Software Parameter Definitions

Intro to Scientific Computing: Solutions

Numerical Methods Lecture 6 - Curve Fitting Techniques

Recursive Estimation

Chapter 11. Friends, Overloaded Operators, and Arrays in Classes. Copyright 2014 Pearson Addison-Wesley. All rights reserved.

MATHEMATICAL METHODS OF ANALYSIS AND EXPERIMENTAL DATA PROCESSING (Or Methods of Curve Fitting)

CIS 121 Data Structures and Algorithms with Java Fall Big-Oh Notation Tuesday, September 5 (Make-up Friday, September 8)

Analysis of Documents Clustering Using Sampled Agglomerative Technique

Chapter 5. Functions for All Subtasks. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

Neuro Fuzzy Model for Human Face Expression Recognition

Kernel Smoothing Function and Choosing Bandwidth for Non-Parametric Regression Methods 1

CHAPTER IV: GRAPH THEORY. Section 1: Introduction to Graphs

Basic allocator mechanisms The course that gives CMU its Zip! Memory Management II: Dynamic Storage Allocation Mar 6, 2000.

Lecture 13: Validation

VALIDATING DIRECTIONAL EDGE-BASED IMAGE FEATURE REPRESENTATIONS IN FACE RECOGNITION BY SPATIAL CORRELATION-BASED CLUSTERING

1 Graph Sparsfication

Algorithms for Disk Covering Problems with the Most Points

Data diverse software fault tolerance techniques

CMPT 125 Assignment 2 Solutions

15 UNSUPERVISED LEARNING

Evaluating Top-k Selection Queries

Dimensionality Reduction PCA

1.2 Binomial Coefficients and Subsets

Section 7.2: Direction Fields and Euler s Methods

What are we going to learn? CSC Data Structures Analysis of Algorithms. Overview. Algorithm, and Inputs

( n+1 2 ) , position=(7+1)/2 =4,(median is observation #4) Median=10lb

Algorithm. Counting Sort Analysis of Algorithms

Learning to Shoot a Goal Lecture 8: Learning Models and Skills

Performance Comparisons of PSO based Clustering

Vaseem Durrani Technical Analyst, Aedifico Tech Pvt Ltd., New Delhi, India

Python Programming: An Introduction to Computer Science

Area As A Limit & Sigma Notation

Unsupervised Discretization Using Kernel Density Estimation

How do we evaluate algorithms?

Convex hull ( 凸殻 ) property

Classification of binary vectors by using DSC distance to minimize stochastic complexity

Math Section 2.2 Polynomial Functions

Introduction. Nature-Inspired Computing. Terminology. Problem Types. Constraint Satisfaction Problems - CSP. Free Optimization Problem - FOP

BASED ON ITERATIVE ERROR-CORRECTION

x x 2 x Iput layer = quatity of classificatio mode X T = traspositio matrix The core of such coditioal probability estimatig method is calculatig the

Minimum Spanning Trees

Chapter 9. Pointers and Dynamic Arrays. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

Elementary Educational Computer

Minimum Spanning Trees. Application: Connecting a Network

A Novel Feature Extraction Algorithm for Haar Local Binary Pattern Texture Based on Human Vision System

Name Date Hr. ALGEBRA 1-2 SPRING FINAL MULTIPLE CHOICE REVIEW #1

COMP9318: Data Warehousing and Data Mining

Force Network Analysis using Complementary Energy

Sorting in Linear Time. Data Structures and Algorithms Andrei Bulatov

EE 459/500 HDL Based Digital Design with Programmable Logic. Lecture 13 Control and Sequencing: Hardwired and Microprogrammed Control

Civil Engineering Computation

Journal of Chemical and Pharmaceutical Research, 2013, 5(12): Research Article

Dynamic Programming and Curve Fitting Based Road Boundary Detection

Transcription:

Fudametals of Media Processig Shi'ichi Satoh Kazuya Kodama Hiroshi Mo Duy-Dih Le

Today's topics Noparametric Methods Parze Widow k-nearest Neighbor Estimatio Clusterig Techiques k-meas Agglomerative Hierarchical Clusterig

Bayesia decisio theory A posteriori probability (posterior): the probability of the state of ature give that the feature value has bee observed e.g., P(ω x) Likelihood: the likelihood of the state of ature with respect to the feature value e.g., p(x ω) Bayes formula P(ω x)=p(x ω)p(ω)/p(x)

Bayesia decisio theory

Bayesia decisio theory

Normal distributio

Covariace matrix ad its algebraic/geometric iterpretatio What is the quadratic form? x1 x2 x3 μ φ y1 y2 y3

Classificatio Usig PCA e T D? y 1 x 2 y 2 y y i T R? Σ U u Y U E{ XX 1 2 1 T u 2 T }, u m Σu u m X, e X U Y i i i x 1 Detectio of faces based o distace from face space Recogitio of faces based o distace withi face space J. M. Rehg 2002

Noparametric Methods So far we studied "parametric" methods. Probability distributio fuctios (or equivaletly decisio boudaries) ca be represeted by parametric forms. Normal desity case: mea ad variace (or covariace matrix) PCA case: low-dimesioal subspace ad its spa These methods assume that the uderlyig probability distributio of the actual observatios is kow ad yields parametric forms. However, i may cases this assumptio is suspect.

Noparametric Methods Simple approach is to compose histogram Kowig sample data, we ca compose histogram with certai bi size (divisio of each axis) Treat the histogram as probability distributio fuctio

Noparametric Methods The optimal umber of bis M (or bi size) is the issue. If bi width is small (i.e., big M), the the estimated desity is very spiky (i.e., oisy). If bi width is large (i.e., small M), the the true structure of the desity is smoothed out. I practice, we eed to fid a optimal value for M that compromises betwee these two issues. Also, how we exted to the multidimesioal case?

Noparametric Desity Estimatio The probability that a give vector x, draw from the ukow desity p(x), will fall iside some regio R i the iput space is give by: P p( x') dx' R If we have data poits {x 1, x 2,..., x } draw idepedetly from p(x), the probability that k of them will fall i R is give by the biomial law: P ( k ) Pk P (1 P ) k k k

Noparametric Desity Estimatio The expected value of k is: E[ k] P The expected percetage of poits fallig i R is: E[ k/ ] The variace is give by: P 2 P(1 P) Var[ k / ] E[( k / P) ]

Noparametric Desity Estimatio The distributio is sharply peaked as, thus: P k/ Approximatio 1

Noparametric Desity Estimatio If we assume that p(x) is cotiuous ad does ot vary sigificatly over the regio R, we ca approximate P by: P p( x') dx' p( x) V Approximatio 2 R where V is the volume eclosed by R.

Noparametric Desity Estimatio Combiig these two approximatios we have: p( x) k/ V The above approximatio is based o cotradictory assumptios: R is relatively large (i.e., it cotais may samples so that P k is sharply peaked) Approximatio 1 R is relatively small so that p(x) is approximately costat iside the itegratio regio Approximatio 2 We eed to choose a optimum R i practice...

Noparametric Desity Estimatio Suppose we form regios R 1, R 2,... cotaiig x. R 1 cotais k 1 sample, R 2 cotais k 2 samples, etc. R i has volume V i ad cotais k i samples. The -th estimate p (x) of p(x) is give by: p k / ( x) V

Noparametric Desity Estimatio The followig coditios must be satisfied i order for p (x) to coverge to p(x): limv 0 lim k lim k / 0 Approximatio 2 Approximatio 1 to allow p (x) to coverge

Noparametric Desity Estimatio How to choose the optimum values for V ad k? k / p ( x) Two leadig approaches: V (1) Fix the volume V ad determie k from the data (kerel-based desity estimatio methods), e.g., V 1/ (2) Fix the value of k ad determie the correspodig volume V from the data (k-earest eighbor method), e.g., k

Noparametric Desity Estimatio

Parze Widows Problem: Give a vector x, estimate p(x) Assume R to be a hypercube with sides of legth h, cetered o the poit x: d k / V h p ( ) x V To fid a expressio for k (i.e., # poits i the hypercube) let us defie a kerel fuctio: 1 1 u j j 1,..., d ( u) 2 0 otherwise

Parze Widows The total umber of poits x i fallig iside the hypercube is: i k x x i1 h cetered at x The, the estimate becomes p p k / ( x) V 1 1 x x i ( x) i1 V h equals 1 if x i falls withi hypercube Parze widows estimate

Parze Widows The desity estimate is a superpositio of kerel fuctios ad the samples x i. 1 1 x x i p( x) i1 V h ( u) iterpolates the desity betwee samples. Each sample x i cotributes to the estimate based o its distace from x.

Parze Widows The kerel fuctio ( u) ca have a more geeral form (i.e., ot just hypercube). I order for p (x) to be a legitimate estimate, must be a valid desity itself: ( u) 0 ( u) du1

Parze Widows The parameter h acts as a smoothig parameter that eeds to be optimized. Whe h is too large, the estimated desity is over-smoothed (i.e., superpositio of broad kerel fuctios). Whe h is too small, the estimate represets the properties of the data rather tha the true desity (i.e., superpositio of arrow kerel fuctios)

Parze Widows ( u) assumig differet h values:

Parze Widows Example: p (x) estimates assumig 5 samples:

Parze Widows Example: both p(x) ad ( u) are Gaussia h h 1 / p (x)

Parze Widows Example: p(x) cosists of a uiform ad triagular desity ad ( u) is Gaussia h h 1 / p (x)

k-nearest Neighbor Estimate Fix k ad allow V to vary: Cosider a hypersphere aroud x. Allow the radius of the hypersphere to grow util it cotais k data poits. V is determied by the volume of the hypersphere. p k / ( x) V size depeds o desity

k-nearest Neighbor Estimate The parameter k acts as a smoothig parameter ad eeds to be optimized.

k-nearest Neighbor Estimate Parze widows k -earest-eighbor k k 1

k-nearest Neighbor Estimate Parze widows k -earest-eighbor k k 1

k-nearest Neighbor Classifier Suppose that we have c classes ad that class ω i cotais i poits with 1 + 2 +...+ c = P( / x) i Give a poit x, we fid the k earest eighbors Suppose that k i poits from k belog to class ω i, the: p p ( x / ) P( ) i i p ( x) ki ( x / i) V i

k-nearest Neighbor Classifier

k-nearest Neighbor Classifier The prior probabilities ca be computed as: i P( i ) Usig the Bayes rule, the posterior probabilities ca be computed as follows: where p ( x / ) P( ) k P( i / x) p ( x) k p k ( x) V i i i

k-nearest Neighbor Classifier k-earest-eighbor classificatio rule: Give a data poit x, fid a hypersphere aroud it that cotais k poits ad assig x to the class havig the largest umber of represetatives iside the hypersphere. p( x / i) P( i) ki P( i / x) p( x) k Whe k=1, we get the earest-eighbor rule.

k-nearest Neighbor Classifier

k-nearest Neighbor Classifier The decisio boudary is piece-wise liear. Each lie segmet correspods to the perpedicular bisector of two poits belogig to differet classes.

k-nearest Neighbor Classifier Let P* be the miimum possible error, which is give by the miimum error rate classifier. Let P be the error give by the earest eighbor rule. Give ulimited umber of traiig data, it ca be show that: c P P P (2 P ) 2P c 1 * * * *

k-nearest Neighbor Classifier

k-nearest Neighbor Classifier

Clusterig So far we assumed that the class labels are give for traiig samples. Sometimes it's very costly to provide class labels. What ca we do if we do't kow class labels? Usupervised methods, or smart preprocessig methods Clusterig discovers distict subclasses observed i the data distributio.

Clusterig

Algorithm k-meas 1. Determie the umber of clusters: k 2. (Radomly) guess k cluster ceter locatios 3. Each data poit fids out which ceter it's closest to 4. Each ceter fids the cetroid of the poits it ows 5. Termiate if assigmet of N data poits does ot chage 6. Repeat from 3 otherwise

K-meas Clusterig: Step 1 Algorithm: k-meas, Distace Metric: Euclidea Distace 5 4 3 k 1 2 k 2 1 0 k 3 0 1 2 3 4 5

K-meas Clusterig: Step 2 Algorithm: k-meas, Distace Metric: Euclidea Distace 5 4 3 k 1 2 k 2 1 0 k 3 0 1 2 3 4 5

K-meas Clusterig: Step 3 Algorithm: k-meas, Distace Metric: Euclidea Distace 5 4 k 1 3 2 1 k 2 k 3 0 0 1 2 3 4 5

K-meas Clusterig: Step 4 Algorithm: k-meas, Distace Metric: Euclidea Distace 5 4 k 1 3 2 1 k 2 k 3 0 0 1 2 3 4 5

K-meas Clusterig: Step 5 Algorithm: k-meas, Distace Metric: Euclidea Distace 5 expressio i coditio 2 4 3 2 1 0 k 2 k 3 0 1 2 3 4 5 expressio i coditio 1 k 1

Hierarchical Clusterig Algorithm (Agglomerative Hierarchical Clusterig) 1. iitialize c: desired umber of clusters, c 1 =, D i ={x i } for i=1,..., 2. c 1 =c 1-1 3. fid earest clusters, say, D i ad D j 4. merge D i ad D j 5. repeat from 2 util c=c 1 6. retur c clusters

Hierarchical Clusterig Dedrogram

The Nearest-Neighbor Algorithm The Nearest-Neighbor Algorithm If miimum distace betwee elemets of two clusters is used, the method is called the earesteighbor cluster algorithm. If it is termiated whe the distace betwee earest clusters exceeds a arbitrary threshold, it is called the sigle-likage algorithm.

The Nearest-Neighbor Algorithm

The Nearest-Neighbor Algorithm The Farthest-Neighbor Algorithm If maximum distace betwee elemets of two clusters is used, the method is called the farthesteighbor cluster algorithm. If it is termiated whe the distace betwee earest clusters exceeds a arbitrary threshold, it is called the complete-likage algorithm.

The Nearest-Neighbor Algorithm