Parallel Architecture & Programing Models for Face Recognition

Similar documents
Parallel Architecture for Face Recognition using MPI

Dimension Reduction CS534

Linear Discriminant Analysis for 3D Face Recognition System

Image-Based Face Recognition using Global Features

Facial Expression Detection Using Implemented (PCA) Algorithm

Multidirectional 2DPCA Based Face Recognition System

An Integrated Face Recognition Algorithm Based on Wavelet Subspace

Recognition, SVD, and PCA

( ) =cov X Y = W PRINCIPAL COMPONENT ANALYSIS. Eigenvectors of the covariance matrix are the principal components

GPU Based Face Recognition System for Authentication

Face Recognition for Mobile Devices

Face Recognition for Different Facial Expressions Using Principal Component analysis

Recognition: Face Recognition. Linda Shapiro EE/CSE 576

CIE L*a*b* color model

FACE RECOGNITION USING SUPPORT VECTOR MACHINES

Applications Video Surveillance (On-line or off-line)

PCA and KPCA algorithms for Face Recognition A Survey

Facial Expression Recognition using Principal Component Analysis with Singular Value Decomposition

International Journal of Digital Application & Contemporary research Website: (Volume 1, Issue 8, March 2013)

Color Space Projection, Feature Fusion and Concurrent Neural Modules for Biometric Image Recognition

CHAPTER 3 PRINCIPAL COMPONENT ANALYSIS AND FISHER LINEAR DISCRIMINANT ANALYSIS

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP( 1

Face Recognition using Eigenfaces SMAI Course Project

MATRIX BASED SEQUENTIAL INDEXING TECHNIQUE FOR VIDEO DATA MINING

Network Traffic Measurements and Analysis

Facial Expression Recognition Using Non-negative Matrix Factorization

Large-Scale Face Manifold Learning

A Multi-Tiered Optimization Framework for Heterogeneous Computing

Benchmark runs of pcmalib on Nehalem and Shanghai nodes

Linear Discriminant Analysis in Ottoman Alphabet Character Recognition

Unsupervised learning in Vision

Recognition of Non-symmetric Faces Using Principal Component Analysis

Discriminate Analysis

On the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters

Spectral Classification

Principal Component Analysis (PCA) is a most practicable. statistical technique. Its application plays a major role in many

Haresh D. Chande #, Zankhana H. Shah *

Machine Learning 13. week

Image Processing. Image Features

Multi-Pose Face Recognition And Tracking System

NOWADAYS, there are many human jobs that can. Face Recognition Performance in Facing Pose Variation

Face Detection and Recognition in an Image Sequence using Eigenedginess

FACE RECOGNITION BASED ON GENDER USING A MODIFIED METHOD OF 2D-LINEAR DISCRIMINANT ANALYSIS

Performance Evaluation of BLAS on a Cluster of Multi-Core Intel Processors

Final Project Face Detection and Recognition

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

De-identifying Facial Images using k-anonymity

CS4670: Computer Vision

Singular Value Decomposition, and Application to Recommender Systems

Eigenfaces versus Eigeneyes: First Steps Toward Performance Assessment of Representations for Face Recognition

The Novel Approach for 3D Face Recognition Using Simple Preprocessing Method

Face recognition using Singular Value Decomposition and Hidden Markov Models

ECG782: Multidimensional Digital Signal Processing

Automatic Attendance System Based On Face Recognition

Fast PCA Computation in a DBMS with Aggregate UDFs and LAPACK

Invisible Watermarking Using Eludician Distance and DWT Technique

Performance analysis of robust road sign identification

Applied Neuroscience. Columbia Science Honors Program Fall Machine Learning and Neural Networks

Face detection and recognition. Many slides adapted from K. Grauman and D. Lowe

Clustering and Visualisation of Data

Biometrics Technology: Image Processing & Pattern Recognition (by Dr. Dickson Tong)

LECTURE ATTENDANCE SYSTEM WITH FACE RECOGNITION AND IMAGE PROCESSING

Classification of Hyperspectral Breast Images for Cancer Detection. Sander Parawira December 4, 2009

Facial Expression Classification with Random Filters Feature Extraction

[2008] IEEE. Reprinted, with permission, from [Yan Chen, Qiang Wu, Xiangjian He, Wenjing Jia,Tom Hintz, A Modified Mahalanobis Distance for Human

Facial expression recognition using shape and texture information

FACE RECOGNITION USING INDEPENDENT COMPONENT

A New Multi Fractal Dimension Method for Face Recognition with Fewer Features under Expression Variations

Adaptive Video Compression using PCA Method

Mobile Face Recognization

SINGLE-SAMPLE-PER-PERSON-BASED FACE RECOGNITION USING FAST DISCRIMINATIVE MULTI-MANIFOLD ANALYSIS

Video Image Based Multimodal Face Recognition System

Human Face Recognition Using Image Processing PCA and Neural Network

Face detection and recognition. Detection Recognition Sally

Stereo and Epipolar geometry

Using Intel Streaming SIMD Extensions for 3D Geometry Processing

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Accelerating Implicit LS-DYNA with GPU

Using Parallel Computing Toolbox to accelerate the Video and Image Processing Speed. Develop parallel code interactively

Face Recognition At-a-Distance Based on Sparse-Stereo Reconstruction

AN EFFECTIVE DETECTION OF SATELLITE IMAGES VIA K-MEANS CLUSTERING ON HADOOP SYSTEM. Mengzhao Yang, Haibin Mei and Dongmei Huang

Clustering and Dimensionality Reduction

Learning to Recognize Faces in Realistic Conditions

ECE519 Advanced Operating Systems

Lab # 2 - ACS I Part I - DATA COMPRESSION in IMAGE PROCESSING using SVD

Image Variant Face Recognition System

Face Recognition using Tensor Analysis. Prahlad R. Enuganti

Face Recognition with Local Binary Patterns

Content-based Image and Video Retrieval. Image Segmentation

MATE-EC2: A Middleware for Processing Data with Amazon Web Services

Developing a Data Driven System for Computational Neuroscience

Robust Face Recognition via Sparse Representation Authors: John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma

International Journal of Advancements in Research & Technology, Volume 2, Issue 8, August ISSN

Lecture 4 Face Detection and Classification. Lin ZHANG, PhD School of Software Engineering Tongji University Spring 2018

Age Invariant Face Recognition Aman Jain & Nikhil Rasiwasia Under the Guidance of Prof R M K Sinha EE372 - Computer Vision and Document Processing

Linear Algebra libraries in Debian. DebConf 10 New York 05/08/2010 Sylvestre

Semi-Supervised PCA-based Face Recognition Using Self-Training

Face Recognition Based on LDA and Improved Pairwise-Constrained Multiple Metric Learning Method

10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors

Generic Face Alignment Using an Improved Active Shape Model

Transcription:

Parallel Architecture & Programing Models for Face Recognition Submitted by Sagar Kukreja Computer Engineering Department Rochester Institute of Technology

Agenda Introduction to face recognition Feature Extraction Principle Component Analysis Distributed Parallel Programing Model for face recognition using MPI Improving recognition on huge databases MMX based Recognition Algorithm Distributed Parallel System Architecture for face recognition Performance Results

Introduction A face recognition system is a computer application capable of identifying or verifying a person from a digital image or a video frame from a video source. One of the ways to do this is by comparing selected facial features from the image and a face database.

Face Detection The face recognition applications are widely used in different fields like security and computer vision. The recognition process should be done in real time to take fast decisions. Basic idea: slide a window across image and evaluate a face model at every location

Related Work in Feature Extraction Appearance based methods in which features are mainly from pixel intensity values Linear Analysis Methods like Principle Component Analysis, Independent Component Analysis. Non-linear Analysis Methods Geometrical Feature Matching - The geometrical feature approach construct feature vectors based on the geometrical representation of the face. The feature vector contains the face outline, eyes, mouth and nose positions.

Principle Component Analysis PCA is considered as a feature extraction technique and is widely used in facial recognition applications by projecting images in new face space. PCA can reduce the dimensionality of the image. However, PCA consumes a lot of processing time due to its high intensive computation nature.

PCA Recognition System

Problem With PCA The main problem of PCA is consuming huge amount of time because of its computationally intensive nature. A major step of the algorithm is calculating the covariance matrix of the dataset to be able to get the eigenvectors and eigenvalues. This step intensively increases the execution time of the algorithm when the training data-base contains a lot of pictures

Various approaches to solve the problem One approach is calculating the eigenvectors and eigenvalues without calculating the covariance matrix. Methods like the expectation maximization algorithm (EM) to reduce the determinant matrix manipulation. Another approach is use of distributed computing to improve recognition on huge databases, specifically, TH-FACE. The proposed algorithm uses special distributed parallel architecture with mmx technology to speed-up the matching process in PCA algorithm.

Various approaches to solve the problem Compute PCA in one pass on a large data set based on summarization matrices. Furthermore, that algorithm is applied on database management systems(dbms). They use parallel data set summarization via user-defined aggregations and solve Singular Value Decomposition (SVD) by using Math Kernel Library (MKL) parallel variant of the Linear Algebra PACKage (LAPACK) library. In another approach, a distributed parallel system consisting of one host, four slaves and some clients is used. The Parallel Virtual Machine (PVM) is established by the host. In addition, the communication is done over TCP socket and the whole system is communicated and linked over 100M network switch, and achieves an acceleration ratio of 4.133 if the whole system works together.

Steps to compute PCA For the purpose of computing the mean, convert the image to be a column in a 2D matrix called A. This matrix represents all images. Calculating the average of all pixels in all images (each pixel with its corresponding pixels) and subtracting the average from X to remove any lighting. consequently, all pixels are returned back to the origin of the system.

Steps to compute PCA Get the eigenvalues and eigenvectors by calculating the covariance matrix. The number of eigenvalues and eigenvector is equal to P-1. Sort the eignvectors based on their eigenvalues. The longest vectors which have the largest eigenvalues can split the points in the proper way. Project the training images to this new space by applying dot product between A and V matrices, Store the resulting face space.

Steps to Compute PCA(In recognition Phase) Convert image into a ((MN)1) vector. Subtract the mean calculated in the training phase from this image vector. Project the testing image in the face space. Calculate the Euclidean distance between the projected test image and all training images. Images having the distance less than or equal to a predetermined threshold is matched to, the testing image belong to that image.

Proposed Approach We exploit distributed parallel programming models to improve the execution of PCA for face recognition. This enables the distribution of either data or tasks over a network of connected computers. The Message Passing Interface (MPI) enables us to run different MPI tasks concurrently on different machines. In addition, MPI handles the communication by sending message between nodes. MPICH2 implementation is a high-performance and portable implementation of the MPI standard.

Proposed Approach We use MPICH2 with one master and four slaves nodes to implement a distributed database environment to handle two scenarios. Having one test image and a very large database, searching for a matching face could take too long if done on a single processing node. When the training database is somehow fixed, or rarely updated and the training steps are done once or infrequently Every processing node has a complete copy of the database. This way, each node performs the training phase once and stores the results in its local memory. When the master receives a stream of testing images, they are distributed to the slaves.

Experimental Results Two proposed approaches are evaluated by applying them to different database sizes and measuring the speed-up. Main metric for evaluation is the execution time The proposed architectures on a cluster hosted on the Faculty of Computer and Information Sciences, Ain Shams University, Egypt. Cluster had 2 blade chassis; each having six blade server. Each blade server has two Quad-Core IntelRXeonRCPU E5520 @ 2.27GHz with 24GB RAM. All twelve servers are connected together on an infinite band network with VMware ESXi installed directly on each server. Vsphere client is used to post jobs on these servers. Five servers are used in the presented experiments and the Facial Recognition Technology (FERET) database is used in training and testing

Experimental Results Distribute Training Database and Duplicate Test Image The experiments are carried out using one master node and four slaves with five different database sizes. In the training phase, the master divides the training database equally over the slaves and itself. The maximum speed-up scored is 25X when the training database is distributed on five servers, achieving superlinear speed-up. In the recognition phase, the master node sends copies of the testing image to the four slaves. Each slave tries to recognize the testing image against its local training set and sends the result back to the master. After distributing the databases on different numbers of servers, the recognition time decrease and the speed-up increases linearly to 5X when using five servers.

Experimental Results Sequence diagram to recognize 1 test image vs 1000 images in training DB

Experimental Results Duplicate Training Database and Distributed test Image. The training database is fixed and the training phase is done once at the master node. In the recognition phase, a number of test images are captured from video camera attached to the master node which distributes them to the slaves in a manner similar to the one used in the previous. In case of 500 test images when searching in training database have 500 images, the sequentially required 757.172 seconds and recognition time dropped to 151.313 seconds on five machines.

Experimental Results Sequence diagram to recognize stream of testing images in training DB

Conclusion Distributing the training set and duplicating the test image is most likely the best solution when the stored training images are updated regularly and there is only input test at a time. Having a large number of test faces or processing a video stream for recognition, is best handled by centralizing the training set and distributing the test images The proposed systems improve execution time up to 25X in training and 5X in recognition phase, reaching super-linear and linear speedup

Alternative Approach One approach suggest a use of distributed computing to improve recognition on huge databases, specifically, TH-FACE. Their proposed algorithm uses special distributed parallel architecture with mmx technology to speed-up the matching process in PCA algorithm. They use multimodal face recognition method (MMP-PCA). A multimodal part face recognition method based on principal component analysis (MMP-PCA) is adopted to perform the recognition task, and the MMX technology is introduced to accelerate the matching procedure.

MMX Technology As an extension to the basic Intel Architecture (IA), Intel s MMX technology is designed to improve the performance of multimedia and communication algorithms. The technology includes new instructions and data types, which achieve new levels of performance for these algorithms on host processors. The usage of MMX technology speeds up the matching by 14 times.

System Framework

Client Structure

Distributed parallel system architecture Parallel Virtual Machine (PVM) The PC cluster used in this face recognition system forms a parallel virtual machine, which consists of a single host and five slaves. The PVM transparently handles all message routing, data conversion, task scheduling across the network. The PVM environment is set up by the host, and then a PVM table is created and maintained by host. The PVM table is a linked table. The host node holds the face feature database and document information database, while the slave nodes hold their IP address and the name of sub-database [2]. The linked table structure permits to add slaves infinitely and also delete a slave very conveniently.

Distributed parallel system architecture Communication Process

Distributed parallel system architecture Distributed database The PVM-based distributed system has to transfer messages between each node of the system. To decrease the spending of communication, the main face database (MFDB) is split into five sub-databases (SFDB). So only the face feature data to be queried should be transferred between the host and slaves, also between the clients and the host. Multithreading in flow control In order to respond to query requests from multi- clients, multithreading technique is needed in flow control. Multithreading means that the concurrent operation of more than one path can be executed within one course. All the threads shared the resource and state of the course

The MMX-based recognition algorithm MMP-PCA face recognition method The first step is to segment the face parts. According to the structure of human faces, we divide a human face into five parts: bare face, eyebrow, eye, nose and mouth. Then principal component analysis (PCA) is performed on the facial parts. We calculate the eigenvalues of each facial part, choose d largest eigenvalues (we use d=60) and preserve corresponding eigenvectors. In this way, the eigenface, eigeneyebrow, eigeneye, eigennose and eigenmouth can be obtained, respectively. The known human faces eigenvalues are stored in the database.

The MMX-based recognition algorithm MMP-PCA face recognition method In the face recognition procedure, we first calculate the projection eigenvalue of the human face, and then calculate its similarity score with the projection eigenvalues stored in the database. After that, we sort the faces in the database according to the similarity scores from large to small. Let X represent the projection eigenvalue of the human face to be recognized,represents the projection eigenvalue of a known human face in database, then the similarity score is computed as follow:

MMX-based matching procedure The MMX technology exploits the parallelism inherent in many multimedia and communication algorithms. Many of these algorithms exhibit the property of fixed computation on a large data set, Small, native data types (for example, 8-bit pixels,16-bit audio samples) Regular and recurring memory access patterns; Localized, recurring operations performed on the data; Intensive computation. The matching procedure is carried out by comparing the projection eigenvalue according to (2).The value is in the range of 0 to 1: the larger the value is, the more similar the two eigenvalues are.

Data Structure of Eigen Value and MMX Implementation each eigenvalue data is presented by 16-bit, and the number of data in eigenface, eigeneyebrow, eigeneye, eigennose and eigenmouth is 100, 60, 80, 50 and 40, respectively. Because the MMX Registers are all 64-bit, each read-in operation can get 64-bit /16-bit = 4 data. In order to achieve the principle of data matching, the length of each eigenvalue should be divided exactly by 4. Only the length of eigennose can not satisfy this requirement, so we add zero in the eigennose, which make the length of eigennose 52.

System Performance MMX acceleration test In this, we use three different size databases. On each database, 10 different face images are recognized, and the matching time is recorded, then we assume the mean value of the 10 recorded times as the matching time. As shown in table, with the usage of MMX technology, the matching speed is increased by about 14 times.

System Performance Furthermore, as the database size increases, the actual improvement of MMX acceleration is even larger.

System Performance Distributed parallel architecture acceleration test In the test for the distributed parallel architecture,we record the mean value of 10 matching time. One case is searching on one PC and other cases are searching on the distributed parallel architecture with k PCs linked by 1000M network switch. In the first case, the PC in experiment holds a database of 400,000, and in the other cases, each PC holds a database of 400,000/ k. Both cases use the MMP-PCA algorithm without MMX acceleration. As shown, with the adoption of the distributed parallel architecture, the acceleration ratio can approach the number of PCs.

Conclusion The system presented above displays excellent performance on large, or even huge databases: querying one face image in 2,560,000 faces costs only 1.094s and the identification rate is above 85% in most cases. Such great performance is achieved by the particular architecture and ingenious algorithms of this system. Hence, from the two approaches presented here, we can conclude that parallel implementation and parallel architecture are useful in reducing computation time significantly in face recognition.

References K. Meng, G.-D. Su, C.-C. Li, B. Fu, and J. Zhou, A high performance face recognition system based on a huge face database, International Conference on Machine Learning and Cybernetics, vol. 8, pp. 5159 5164, 2005. Parallel Architecture for Face Recognition using MPI, Dalia Shouman Ibrahim, Salma Hamdy,Computer Science Department, Computer and Information Sciences, Ain shams University, Egypt

Questions??