Classifying Facial Expression with Radial Basis Function Networks, using Gradient Descent and K-means

Similar documents
6 Gradient Descent. 6.1 Functions

A Convex Clustering-based Regularizer for Image Segmentation

Almost Disjunct Codes in Large Scale Multihop Wireless Network Media Access Control

Image Segmentation using K-means clustering and Thresholding

1 Surprises in high dimensions

Improving Performance of Sparse Matrix-Vector Multiplication

Multilevel Linear Dimensionality Reduction using Hypergraphs for Data Analysis

Transient analysis of wave propagation in 3D soil by using the scaled boundary finite element method

Discriminative Filters for Depth from Defocus

Bayesian localization microscopy reveals nanoscale podosome dynamics

Non-homogeneous Generalization in Privacy Preserving Data Publishing

Software Reliability Modeling and Cost Estimation Incorporating Testing-Effort and Efficiency

Robust Camera Calibration for an Autonomous Underwater Vehicle

Study of Network Optimization Method Based on ACL

Estimating Velocity Fields on a Freeway from Low Resolution Video

Learning Polynomial Functions. by Feature Construction

A New Search Algorithm for Solving Symmetric Traveling Salesman Problem Based on Gravity

Data Mining: Clustering

Evolutionary Optimisation Methods for Template Based Image Registration

Particle Swarm Optimization Based on Smoothing Approach for Solving a Class of Bi-Level Multiobjective Programming Problem

Fast Fractal Image Compression using PSO Based Optimization Techniques

FTSM FAST TAKAGI-SUGENO FUZZY MODELING. Manfred Männle

State Indexed Policy Search by Dynamic Programming. Abstract. 1. Introduction. 2. System parameterization. Charles DuHadway

An Investigation in the Use of Vehicle Reidentification for Deriving Travel Time and Travel Time Distributions

CONSTRUCTION AND ANALYSIS OF INVERSIONS IN S 2 AND H 2. Arunima Ray. Final Paper, MATH 399. Spring 2008 ABSTRACT

Multi-camera tracking algorithm study based on information fusion

Generalized Edge Coloring for Channel Assignment in Wireless Networks

1/5/2014. Bedrich Benes Purdue University Dec 6 th 2013 Prague. Modeling is an open problem in CG

Optimization Beyond the Convolution: Generalizing Spatial Relations with End-to-End Metric Learning

EFFICIENT ON-LINE TESTING METHOD FOR A FLOATING-POINT ADDER

Cellular Ants: Combining Ant-Based Clustering with Cellular Automata

Classical Mechanics Examples (Lagrange Multipliers)

1/5/2014. Bedrich Benes Purdue University Dec 12 th 2013 INRIA Imagine. Modeling is an open problem in CG

Clustering using Particle Swarm Optimization. Nuria Gómez Blas, Octavio López Tolic

Shift-map Image Registration

THE APPLICATION OF ARTICLE k-th SHORTEST TIME PATH ALGORITHM

Shift-map Image Registration

Short-term prediction of photovoltaic power based on GWPA - BP neural network model

Multimodal Stereo Image Registration for Pedestrian Detection

All-to-all Broadcast for Vehicular Networks Based on Coded Slotted ALOHA

Intensive Hypercube Communication: Prearranged Communication in Link-Bound Machines 1 2

Research Article Research on Law s Mask Texture Analysis System Reliability

Figure 1: 2D arm. Figure 2: 2D arm with labelled angles

Chapter 5 Proposed models for reconstituting/ adapting three stereoscopes

Coupling the User Interfaces of a Multiuser Program

New Version of Davies-Bouldin Index for Clustering Validation Based on Cylindrical Distance

Chalmers Publication Library

Generalized Edge Coloring for Channel Assignment in Wireless Networks

AN INVESTIGATION OF FOCUSING AND ANGULAR TECHNIQUES FOR VOLUMETRIC IMAGES BY USING THE 2D CIRCULAR ULTRASONIC PHASED ARRAY

Correlation-based color mosaic interpolation using a connectionist approach

Loop Scheduling and Partitions for Hiding Memory Latencies

Bends, Jogs, And Wiggles for Railroad Tracks and Vehicle Guide Ways

Image compression predicated on recurrent iterated function systems

Divide-and-Conquer Algorithms

A Plane Tracker for AEC-automation Applications

A Revised Simplex Search Procedure for Stochastic Simulation Response Surface Optimization

Exploring Context with Deep Structured models for Semantic Segmentation

Fast Window Based Stereo Matching for 3D Scene Reconstruction

On Effectively Determining the Downlink-to-uplink Sub-frame Width Ratio for Mobile WiMAX Networks Using Spline Extrapolation

Feedback Error Learning Neural Network Applied to a Scara Robot

Design and Analysis of Optimization Algorithms Using Computational

Optimization of cable-stayed bridges with box-girder decks

A fast embedded selection approach for color texture classification using degraded LBP

CS350 - Exam 4 (100 Points)

Discrete Markov Image Modeling and Inference on the Quadtree

William S. Law. Erik K. Antonsson. Engineering Design Research Laboratory. California Institute of Technology. Abstract

Nearest Neighbor Search using Additive Binary Tree

THE BAYESIAN RECEIVER OPERATING CHARACTERISTIC CURVE AN EFFECTIVE APPROACH TO EVALUATE THE IDS PERFORMANCE

A PSO Optimized Layered Approach for Parametric Clustering on Weather Dataset

A Discrete Search Method for Multi-modal Non-Rigid Image Registration

Online Appendix to: Generalizing Database Forensics

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

AnyTraffic Labeled Routing

Frequency Domain Parameter Estimation of a Synchronous Generator Using Bi-objective Genetic Algorithms

Computer Graphics Chapter 7 Three-Dimensional Viewing Viewing

Random Clustering for Multiple Sampling Units to Speed Up Run-time Sample Generation

A Framework for Dialogue Detection in Movies

Message Transport With The User Datagram Protocol

Parts Assembly by Throwing Manipulation with a One-Joint Arm

Coordinating Distributed Algorithms for Feature Extraction Offloading in Multi-Camera Visual Sensor Networks

An Adaptive Routing Algorithm for Communication Networks using Back Pressure Technique

A Duality Based Approach for Realtime TV-L 1 Optical Flow

Solution Representation for Job Shop Scheduling Problems in Ant Colony Optimisation

Exploring Context with Deep Structured models for Semantic Segmentation

Comparison of Methods for Increasing the Performance of a DUA Computation

4.2 Implicit Differentiation

MANJUSHA K.*, ANAND KUMAR M., SOMAN K. P.

Selection Strategies for Initial Positions and Initial Velocities in Multi-optima Particle Swarms

WLAN Indoor Positioning Based on Euclidean Distances and Fuzzy Logic

DEVELOPMENT OF DamageCALC APPLICATION FOR AUTOMATIC CALCULATION OF THE DAMAGE INDICATOR

EXTRACTION OF MAIN AND SECONDARY ROADS IN VHR IMAGES USING A HIGHER-ORDER PHASE FIELD MODEL

Blind Data Classification using Hyper-Dimensional Convex Polytopes

Research Article Inviscid Uniform Shear Flow past a Smooth Concave Body

Skyline Community Search in Multi-valued Networks

Modifying ROC Curves to Incorporate Predicted Probabilities

Using Vector and Raster-Based Techniques in Categorical Map Generalization

APPLYING GENETIC ALGORITHM IN QUERY IMPROVEMENT PROBLEM. Abdelmgeid A. Aly

Clustering Expression Data. Clustering Expression Data

filtering LETTER An Improved Neighbor Selection Algorithm in Collaborative Taek-Hun KIM a), Student Member and Sung-Bong YANG b), Nonmember

Lecture 1 September 4, 2013

Transcription:

Classifying Facial Expression with Raial Basis Function Networks, using Graient Descent an K-means Neil Allrin Department of Computer Science University of California, San Diego La Jolla, CA 9237 nallrin@cs.ucs.eu Anrew Smith Department of Computer Science University of California, San Diego La Jolla, CA 9237 atsmith@cs.ucs.eu Doug Turnbull Department of Computer Science University of California, San Diego La Jolla, CA 9237 turnbul@cs.ucs.eu Abstract This paper compares methos of training raial basis function networks. We foun RBF networks initialize by supervise clustering perform better than networks initialize by unsupervise clustering an improve with graient escent. Graient escent i not significantly improve the networks initialize by supervise clustering. Introuction Raial basis function (RBF) networks are commonly use for pattern classification. In this paper, we explore three techniques for training RBF networks. First, we present the structure of stanar RBF networks an escribe three techniques for learning their parameters. We then evaluate these training techniques on a set of human facial images. 2 RBF Networks Raial basis function networks typically have two istinct layers as shown in figure. The bottom layer consists of a set of basis functions each of which is a function of the istance between an input pattern an a prototype pattern. A stanar choice of basis function is the Gaussian: ()

$ $ outputs y_ y_c basis functions bias phi_ phi_m x_ x_ inputs Figure : A generic RBF network. The top layer is a simple linear iscriminant that outputs a weighte sum of the basis functions. The equation for a single output is:! #" $,+ &%('*) ).- / 243 (2) 5 7698 For our networks, we set the weights of the top layer using the least squares error. Rewriting the formula for in matrix notation, equation 2 becomes. The minimum of '! @? A8<BC8D6 B E8<BF the sum square error : <;>; 6 B 8JH 8 8JHI K L8<BC8I NM ' G8IH&F is 8<B where is the pseuo-inverse of,. 2. Initializing Parameters for Raial Basis Functions Shperical Gaussian basis functions, as in equation, each have two parameters, an, for the O P th basis function. The three techniques we explore for fining these parameters are maximum likelihoo (ML), K-Means, an graient escent. Maximum Likelihoo: Let Q 'SRQTUTT WV,X be the set of input patterns in class O an let the number of basis functions, assuming one basis function per class. Using maximum likelihoo to set the parameters of basis function P to: Y Z[ _ ]^ ^ V<\ Y Z[ %C' V \! %(' (3) B K-means: K-means is an unsupervise clustering algorithm that fins a set of ` Figure aopte from [2] (4) centers that iteratively

: : k ) ' k minimizes the mean square istance from each ata point to its closest center[4]. Each center represents a cluster, or subset of nearby ata points. The average value for all the points in a cluster are use to fin a for the basis function corresponing to that cluster. is set to the stanar eviation of the points in the cluster, as in equation 4. Using K- Means to fin basis functions is less restrictive than using ML because the number of basis functions is not epenent on the number of classes. 2.2 Improving parameters using Graient Descent A common technique for reucing the error of machine learning moels is graient escent. The basic iea of graient escent is to reuce the error by iteratively ajusting the parameters of the learning moel. The graient of the error function points in the irection of maximum increase, so the error of the learning moel can be reuce by moving the parameters in the opposite irection of the graient. Typically this is accomplishe through the iterative process of repeately calculating the graient for a particular input, or set of inputs, an ajusting the parameters by a small amount. The parameters of our moel are the weights of the linear iscriminant, ), an the means an stanar eviations of the raial basis functions, an. Since the optimal set of weights for a given set of raial basis functions can be calculate in one step, there is no nee to use an iterative process to upate the weights. However, no such close form exists for the optimal set of Gaussians, so we must resort to iterative processes. Our training proceure uses stochastic graient escent which trains the RBF network for a number of epochs. Each epoch trains the RBF network on every ata point in our training set, one at a time, in ranom orer.!! Wc? Our error function is the common sum-of-squares error, :. a;gb %(' The erivative of this error with respect to the stanar eviation of basis function j,, is e $! fg?.q ) U h U U U ji (5) The erivative of the error with respect to the i th component of the j th mean, k e $ Cl? Q U ) %('! For each basis function, the term, ;b backpropagation algorithm. >m? g U e Jn, is (6) can be compute with the qrs rt X!u q where Once we have calculate the graient of the error, we upate the means an stanar eviations by moving them by a small amount against the graient: k Copk is a small, ecreasing value calle the learning rate. After the basis functions are upate, a new set of optimal weights is calculate. 3 Human Expression Classification 3. Experimental Methoology The goal of our experiments is to train RBF networks that can classify images of human faces as one of the following six expressions: happy, sa, surprize, afrai, isguste, angry. We use three methos for learning the parameters of the network. The first metho uses supervise learning (ML) of the gaussian parameters, k an, combine with the least square error solution for the top layer weights. The secon metho uses unsupervise

clustering (K-Means) to learn k an, again using least squares to learn the weights. The thir metho uses graient escent on k an, upating the weights with least squares after every training example. For the techniques that use K-Means, the number of Gaussians are varie from to 53. For the supervise learning techniques, six Gaussians are use, one for each expression. For all experiments, we use ifference images instea of the original images of facial expressions. For each image of a face, we efine the corresponing ifference image as the ifference between that image an the average of all images of that iniviual. Also, we use principal component analysis to reuce the imensionality of the ata to 2. The original images are ownsample to fit our computational resources. To evaluate the success our our networks, we use cross-valiation which reserves a small amount of the training set to test the network s ability to generalize to novel ata. 3.2 Results Supervise an Unsupervise Clustering: Training Data 8 6 4 2 5 6 5 2 25 3 35 4 45 5 Novel Data 8 6 4 2 5 6 5 2 25 3 35 4 45 5 Figure 2: Performance of supervise (ML) an unsupervise (K-Means) clustering. The circle represents the average supervise result an the squares represent the average unsupervise results with varying numbers of clusters. Figure 2 shows that supervise clustering generalizes better than unsupervise clustering with any number of clusters. Unsupervise clustering i best in the range of 2 to 3 clusters, achieving roughly 5% correct classification, but with a high stanar eviation. The

sharp falloff in generalization after 45 clusters combine with the near perfect classifcation of the training ata inicates memorization. Graient Descent:.5 5 5 2 25 3 35 4 45 5.5 5 5 2 25 3 35 4 45 5.5 5 5 2 25 3 35 4 45 5 Figure 3: Graient escent initialize with K-Means. In orer from top to bottom are the training, hol out, an test sets. Figure 3 shows that a network with RBFs initialize with K-Means an traine with graient escent generalizes best with aroun six clusters as oppose to 25 clusters without graient escent. Using six raial basis functions instea of twenty-five is intuitively better because a learning moel with fewer parameters implies generalization rather than memorization. The fact that we are classifying six facial expressions also seems to inicate goo generalization. Using graient escent to improve a network initialize with supervise clustering ha little effect. Figure 4a illustrates the use of a hol out set to reuce memorization. The network fins an optimal set of weights for the hol out set quickly, but continues to memorize the training set. 4b shows that graient escent significantly moves the centers of RBFs initialize with K-Means. Comparison: Figure 5 shows the success rate for both K-Means initialize RBF networks, an K-Means initialize an graient escent improve RBF networks, printe sie-by-sie for comparison. Clearly, graient escent improves the RBFs place with K-Means.

.5.8 RMS Error.6.5.4.5.2.5 (a) Error of Training Set Error of Holout Set 5 5 2 25 Epoch.5 (b).5.5 Figure 4: (a) Mean square error of training (soli) an holout sets (otte) uring graient escent training. (b) The centers of six RBFs being move by graient escent, projecte onto three ranom imensions..9.9.8.8.7.7.6.5.4.6.5.4.3.3.2.2.. 2 3 4 5 2 3 4 5 Figure 5: K-Means initialize RBFs versus K-Means initialize an graient escent improve RBFs. 4 Conclusion The network that generalize best was foun using supervise clustering with all principal components, as summarize in table. For all of our tests in the results section we use the top 2 principal components, representing 8 percent of the total variance in the ata set. Using all components increase the success of RBF networks using supervise clustering,

Table : 2 Principal Components All (52) Principal Components mean st mean st training ata.926.2.. novel ata.659.67.742.42 but i not greatly improve the success of RBF networks initialize with K-means. Graient escent greatly improves generalization of RBF networks initialize with unsupervise clustering. However, as with all iterative error reuction techniques it is more computationaly intensive. Using graient escent to improve the RBFs foun with supervise clustering i not significantly reuce error. However, it is not always possible to use supervise clustering because it might be the case that only a small portion of the entire ata set has known classes. In this case, one woul use unsupervise clustering on the entire ata set to set the parameters of the raial basis functions, an use least-squares to fin the weights that minimize the error on the classifie ata. Iniviual Contributions Neil Allrin: I helpe write the ML classifier, helpe a little with the back-prop coe, helpe with testing, helpe write the writeup, an stuff. I ha fun an stuff. Anrew Smith: I wrote the graient escent proceure. I wrote one version of a finite-ifference error graient approximator, use for coe checking purposes. I ran the tests to generate the figures in this paper. I wrote k-means. I wrote the least-squares solver (it was easy in matlab) This project was a bit too rushe for me to have more than a marginal amount of fun. Doug Turnbull: I was wrote much of the file loaing, preprocessing, an testing coe in aition to implementing the maximum likelihoo moule. I was also involve in running tests an writing this report. I transcene fun. References [] Allrin, N., Smith, A., Turnbull, D. (23) A Three-Unit Network is All You Nee to Discover Females, CSE253, UCSD. [2] Bishop, C.M. (995) Neural Networks for Pattern Recognition, Oxfor University Press, New York. [3] Hastie, T., Tibshirani, R., Frieman, J. (2) Elements of Statistical Learning, Springer-Verlag, New York. [4] Kanungo, T, et. al. (2) The Analysis of a Simple k-means Clustering Algorithm, Computational Geometry 2, Hong Kong.