Application of Principal Components Analysis and Gaussian Mixture Models to Printer Identification

Similar documents
Expectation Maximization (EM) and Gaussian Mixture Models

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Unsupervised Learning

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

10-701/15-781, Fall 2006, Final

The Curse of Dimensionality

Supervised vs unsupervised clustering

CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points]

CS 229 Midterm Review

Clustering Lecture 5: Mixture Model

Problems 1 and 5 were graded by Amin Sorkhei, Problems 2 and 3 by Johannes Verwijnen and Problem 4 by Jyrki Kivinen. Entropy(D) = Gini(D) = 1

Network Traffic Measurements and Analysis

10. MLSP intro. (Clustering: K-means, EM, GMM, etc.)

Unsupervised Learning

Random Forest A. Fornaser

Segmentation: Clustering, Graph Cut and EM

K-Means Clustering 3/3/17

CS 195-5: Machine Learning Problem Set 5

Random projection for non-gaussian mixture models

Principal Component Image Interpretation A Logical and Statistical Approach

Clustering & Dimensionality Reduction. 273A Intro Machine Learning

Client Dependent GMM-SVM Models for Speaker Verification

Deep Learning for Computer Vision

( ) =cov X Y = W PRINCIPAL COMPONENT ANALYSIS. Eigenvectors of the covariance matrix are the principal components

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning

Last week. Multi-Frame Structure from Motion: Multi-View Stereo. Unknown camera viewpoints

Network Traffic Measurements and Analysis

CSC 411: Lecture 14: Principal Components Analysis & Autoencoders

Probabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information

Content-based image and video analysis. Machine learning

Some questions of consensus building using co-association

Classification with PAM and Random Forest

22 October, 2012 MVA ENS Cachan. Lecture 5: Introduction to generative models Iasonas Kokkinos

Shared Kernel Models for Class Conditional Density Estimation

CS325 Artificial Intelligence Ch. 20 Unsupervised Machine Learning

Cyber attack detection using decision tree approach

IMAGE ANALYSIS, CLASSIFICATION, and CHANGE DETECTION in REMOTE SENSING

ECE 5424: Introduction to Machine Learning

Machine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves

Note Set 4: Finite Mixture Models and the EM Algorithm

Classification Algorithms in Data Mining

A Distance-Based Classifier Using Dissimilarity Based on Class Conditional Probability and Within-Class Variation. Kwanyong Lee 1 and Hyeyoung Park 2

Image Segmentation Region Number. Ping Guo and Michael R. Lyu. The Chinese University of Hong Kong, normal (Gaussian) mixture, has been used in a

Unsupervised learning in Vision

Grundlagen der Künstlichen Intelligenz

ECE 285 Class Project Report

K-Means and Gaussian Mixture Models

Recognition: Face Recognition. Linda Shapiro EE/CSE 576

Class-Information-Incorporated Principal Component Analysis

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Human Action Recognition Using Independent Component Analysis

Normalized Texture Motifs and Their Application to Statistical Object Modeling

MIT Samberg Center Cambridge, MA, USA. May 30 th June 2 nd, by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA

Introduction to Pattern Recognition Part II. Selim Aksoy Bilkent University Department of Computer Engineering

Unsupervised Learning: Clustering

Determination of 3-D Image Viewpoint Using Modified Nearest Feature Line Method in Its Eigenspace Domain

Expectation-Maximization. Nuno Vasconcelos ECE Department, UCSD

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample

Training-Free, Generic Object Detection Using Locally Adaptive Regression Kernels

Image Analysis, Classification and Change Detection in Remote Sensing

An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework

Mixture Models and the EM Algorithm

Table of Contents. Recognition of Facial Gestures... 1 Attila Fazekas

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

SD 372 Pattern Recognition

Data Analysis 3. Support Vector Machines. Jan Platoš October 30, 2017

Solution Sketches Midterm Exam COSC 6342 Machine Learning March 20, 2013

Clustering: Classic Methods and Modern Views

CANCER PREDICTION USING PATTERN CLASSIFICATION OF MICROARRAY DATA. By: Sudhir Madhav Rao &Vinod Jayakumar Instructor: Dr.

Robust Face Recognition via Sparse Representation Authors: John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Parameter Selection for EM Clustering Using Information Criterion and PDDP

On Classification: An Empirical Study of Existing Algorithms Based on Two Kaggle Competitions

I How does the formulation (5) serve the purpose of the composite parameterization

Biometrics Technology: Image Processing & Pattern Recognition (by Dr. Dickson Tong)

Analysis of Functional MRI Timeseries Data Using Signal Processing Techniques

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

Statistics of Natural Image Categories

Fundamentals of Digital Image Processing

Learning from Data Mixture Models

Robust Kernel Methods in Clustering and Dimensionality Reduction Problems

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas

Clustering web search results

Generative and discriminative classification techniques

Biology Project 1

Fisher Distance Based GA Clustering Taking Into Account Overlapped Space Among Probability Density Functions of Clusters in Feature Space

Split Merge Incremental LEarning (SMILE) of Mixture Models

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques

Homework #4 Programming Assignment Due: 11:59 pm, November 4, 2018

CSC 411: Lecture 14: Principal Components Analysis & Autoencoders

Adaptive Metric Nearest Neighbor Classification

Facial expression recognition using shape and texture information

Verification: is that a lamp? What do we mean by recognition? Recognition. Recognition

A Survey on Feature Extraction Techniques for Palmprint Identification

What do we mean by recognition?

Data Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier

Generalized Principal Component Analysis CVPR 2007

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

Transcription:

Application of Principal Components Analysis and Gaussian Mixture Models to Printer Identification Gazi. Ali, Pei-Ju Chiang Aravind K. Mikkilineni, George T. Chiu Edward J. Delp, and Jan P. Allebach School of Electrical and Computer Engineering School of Mechanical Engineering Purdue University West Lafayette, IDIAA Abstract Printer identification based on a printed document has many desirable forensic applications. In the electrophotographic process (EP) quasiperiodic banding artifacts can be used as an effective intrinsic signature. However, in text only document analysis, the absence of large midtone areas makes it difficult to capture suitable signals for banding detection. Frequency domain analysis based on the projection signals of individual characters does not provide enough resolution for proper printer identification. Advanced pattern recognition techniques and knowledge about the print mechanism can help us to device an appropriate method to detect these signatures. We can get reliable intrinsic signatures from multiple projections to build a classifier to identify the printer. Projections from individual characters can be viewed as a high dimensional data set. In order to create a highly effective pattern recognition tool, this high dimensional projection data has to be represented in a low dimensional space. The dimension reduction can be performed by some well known pattern recognition techniques. Then a classifier can be built based on the reduced dimension data set. A popular choice is the Gaussian Mixture Model where each printer can be represented by a Gaussian distribution. The distributions of all the printers help us to determine the mixing coefficient for the projection from an unknown printer. Finally, the decision making algorithm can vote for the correct printer. In this paper we will describe different classification algorithms to identify an unknown printer. We will present the experiments based on several different EP printers in our printer bank. The classification results based on different classifiers will be compared. This research is supported by a grant from ational Science Foundation, under award number 019893. Introduction In our previous work, we have described the intrinsic and extrinsic features that can be used for printer identification 1. Our intrinsic feature extraction method is based on frequency domain analysis of the one dimensional projected signal. If there are a sufficient number of samples in the projected signal, the Fourier transform gives us the correct banding frequency. When we work with a text-only document, our objective is to get the banding frequency from the projected signals of individual letters. In this situation, there are not enough samples per projection to give high frequency domain resolution. Significant overlap between spectra from different printers makes it difficult to use it as an effective classification method. The printer identification or classification task is closely related to various pattern identification and pattern recognition techniques. The intrinsic features are the patterns that are used to recognize an unknown printer. The basic idea is to create a classifier that can utilize the intrinsic signatures from a document to make proper identification. A Gaussian mixture model(gmm) or the tree based classifier is suitable for the classification part; but the initial dimension reduction is performed by principal component analysis. Principal component analysis (PCA) is often used as a dimension-reducing technique within some other type of analysis. Classical PCA is a linear transform that maps the data into a lower dimensional space by preserving as much data variance as possible. In the case of intrinsic feature extraction, PCA can be used to reduce the dimension of the projected signal. The proper number of components can be chosen to discriminate between different printers. These components are the features that can be used by the classifier (GMM or tree classifier). The Gaussian mixture model defines the overall data set as a combination of several different Gaussian distributions. The parameters of the model are determined by the

Tree growing & pruning Tree classifier Tree structure Printed document from Printer X Digitized version of the document Segmented characters Scanning Deskewing, segmentation, recognition PCA Reduced dimension feature space GMM Printer X identified Modeling Majority Vote Figure 1: Steps followed in printer characterization. training data. Once the parameters are selected, the model is used to predict the printer, based on projections from the unknown printer. Binary tree structured classifiers are formed by repeated splits of the original data set. Tree classifiers are also suitable for the printer identification problem. Properly grown and pruned tree leaves should represent different printers. In Fig. 1, the printer identification process is described briefly with the help of a flowchart. The printed document is scanned and the scanned image is used to get the one dimensional projected signal from individual characters 1. PCA provides the lower dimensional feature space. GMM or tree classifier works on the feature space to correctly identify the unknown printer. In the following sections of the paper, dimension reduction technique by PCA, classification by GMM, and tree growing and pruning algorithm by binary tree classifiers are explained. Experimental results related to these algorithms are also provided. Principal Component Analysis (PCA) The theory behind principal component analysis is described in detail by Jolliffe, Fukunaga 3, and Webb 4. In this section the fundamentals of the PCA are described. An n dimensional vector X can be represented by the summation of n linearly independent vectors. X = y i φ i =Φ, (1) where y i is the i-th principal component and the φ i s are the basis vectors obtained from the eigenvectors of the covariance matrix of X. Using only m<nof φ i s the vector X can be approximated as X(m) = m y i φ i + i=m+1 b i φ i, () where b i s are constants. The coefficients b i and the vectors φ i are to be determined so that X can be best approximated. If the first my i s are calculated, the resulting error is X(m) =X X(m) = i=m+1 (y i b i )φ i. (3) The set of m eigenvectors of the covariance matrix of X, which corresponds to the m largest eigenvalues, minimizes the error over all choices of m orthonormal basis vectors. The expansion of a random vector in the eigenvectors of covariance matrix is also called the discrete version of the Karhunen-Loéve expansion 3. Method of Canonical Variates In the printer identification problem, we have additional information about the class label from the training data. We can get training samples from different known printers. The class label will represent different printer models. This additional class label information provides the optimal linear discrimination between classes with a linear projection of the data. This is known as the method of canonical variates 5. Here the basis vectors are obtained from the between class and within class covariance matrices. Due to singularity in the covariance matrix, this method has to be implemented with the help of simultaneous digonalization 6. This version of PCA is described in detail by Webb 4. We have to diagonalize two symmetric matrices S W and S B simultaneously: 1. The within class covariance matrix S W is whitened. The same transformation is applied to the between class covariance matrix S B. Let us denote by S B the transformed S B.. The orthonormal transformation is applied to diagonalize S B. The complete mathematical formulation can be found in Webb 4 and Fukunaga 6.

Printer 1 8 6 LJ4050 Samsung1450 Okipage Test(LJ4050) Test Page 1-D Projections PCA Printer nd Principal Component 4 0 Figure : Principal component analysis using 1D projected signal. To perform the experiments, we have used the printers in our printer bank 1. The experimental procedure is depicted in Fig.. The test page has the letter I in 10pt., 1pt. and 14pt. size in Arial font. Each test page has 40-100 letters. From each letter, a one dimensional projected signal is extracted. The projected signals are mean subtracted and normalized. This step is done to remove variability due to long term trends, such as cartridge depletion and printer wear, and other factors which are not stable intrinsic features. The projected signals from different printers are concatenated into a large data matrix. The Canonical Variates method is applied to this data matrix to get the principal components. The PCA using five different printer models is shown in Fig. 3. Each projection has 168 samples. The high dimensional data is represented only by the first two principal components. The classes (different printers) are well separated. A sixth printer is added as a test printer. The sixth printer is an HP Laserjet 4050 and the projections from this printer( ) overlap with those of the other Laserjet 4050( ). The projections from the Laserjet 1000( ) and Laserjet 100( ) overlap because of the similarities in their banding characteristics 1. It should be noted that the Samsung ML-1450(+) and the Okipage 14e( ) show well separated classes. Gaussian Mixture Model for Classification The dimension of the projected signal is reduced by PCA. The next step is to classify the printers using the features. The Gaussian mixture model (GMM) is a generative model. The posterior probability of a data point can be determined using Bayes theorem. A model with m components is given by m p(x) = P (j)p(x j). (4) j=1 4 6 8 6 4 0 4 6 8 1 st Principal Component Figure 3: Representation of the projected signal by the first two principal components. The parameter P (j) is called the mixing coefficients. The component density function p(x j) is Gaussian with spherical covariance matrix and data dimension d. { 1 x µj } p(x j) = (πσj ) exp. (5) d σ j Suppose that the data set contains projections from different printers A 1,..., A k. The classification can be done by using the posterior probabilities of class membership P (A k x) 7. The density model for class A t is built by training the model only on data from that class. This gives an estimate of p(x A t ). Then by using the Bayes theorem, P (A t x) = p(x A t)p (A t ). (6) p(x) The prior probabilities P (A t ) are determined by the fraction of samples class A t in the training set. Parameter Estimation The parameters of a Gaussian mixture are determined from a data set by maximizing the data likelihood. The most widely used method is the expectation-maximization (EM) algorithm 8, 9. The EM algorithm works by using the log likelihood expression of the complete data set. The maximization is repeatedly performed over the modified likelihood. Basically, the EM algorithm iteratively modifies the GMM parameters to decrease the negative log likelihood of the data set 7. The source or class of each data point x i is known during the training process. The maximization

steps are explained in detail by Render 10. The resulting equations are P k+1 (j) = 1 P k (j x i ), (7) where P (k+1) (j) is the mixing coefficient at the (k +1)-th iteration. is the total number of samples, and P k (j x i ) is the posterior probability from class i at the k-th iteration. The mean and the variance are updated by using the following relations (σ k+1 j µ k+1 j = ) = 1 d P k (j x i )x i. (8) P k (j x i ) P k (j x i ) x i µ k+1 j P k (j x i ). (9) The initialization of the model is performed by the k- means algorithm. First, a rough clustering is done; and the number of clusters is determined by the number of printer models in the training set. Then each data point is assumed to belong to the closest cluster center. Initial prior probability, mean, and variances are calculated from these clusters. After that, the iterative part of the algorithm starts by using Eqs.(7)-(9). The feature space is obtained by PCA as described in the previous section. The model is based on only the first two principal components (d=). Model initialization is performed by seven iterations of the k-means algorithm. The initial parameters are used by the iterative EM algorithm to get the means and variances of the different classes. The EM algorithm is terminated by either the number of iterations or by a threshold depending on the amount of change in the parameters between successive iterations. The classification is done by majority vote. For each projection from the unknown printer, the posterior probabilities of all the classes are determined. The unknown projection belongs to the class with highest probability. This operation is performed with all the projections from the unknown printer. The class with highest number of votes represents the model of the unknown printer. In Table 1, the classification result for five different printer models is presented. In the printer bank, we have two printers for each model 1. One printer is used for training; and the other printer is used for testing. The initial training data set is created by forty projections from each Table 1: Classification of Five Printers using GMM Class LJ4050 Oki SS CCR Test (%) LJ4050 40 0 0 0 0 100 0 5 15 0 0 6.5 0 35 5 0 0 1.5 Oki 0 0 0 40 0 100 SS1450 0 0 0 0 40 100 LJ=Laserjet, Oki=Okipage 14e, SS= Samsung ML-1450, and CCR= Correct Classification Rate different printer model. Then the sixth printer is added as an unknown printer to check the performance of the classifier. Forty projections from each printer are used for the experiment during training and testing. For example, when Laserjet 4050 is tested, the classifier predicts that all the 40 projections are from the Laserjet 4050 class, i.e. the classification is 100% correct. The Samsung ML-1450 and Okipage 14e are also identified correctly. Due to the close banding characteristics between the Laserjet 100 and Laserjet 1000, the correct classification rate is decreased. This result confirms the outcome from the banding analysis 1 and PCA. Classification and Regression Tree (CART) A classification tree is a multistage decision process that uses subsets of features at different levels of the tree. The construction of the tree involves three steps: 1. Splitting rule selection for each internal node.. Terminal node determination. 3. Class label assignment to terminal nodes. The approach we have used in our experiments is based on CART 11. The Gini criterion is used as the splitting rule. The terminal nodes are determined when there is pure class membership. In the printer identification problem, the class labels of the training printers are known; and the unknown printer can be given a temporary label. After the classification process, the class label of the unknown printer can be determined. The CART algorithm also works in the reduced dimension feature space generated by PCA. At each node the impurity function is defined by the Gini criterion. The algorithm tries to minimize the node impurity by selecting the optimum splitting. When the node impurity is zero; a terminal node is reached. When the complete tree is grown in this manner, it is overfitted. So the tree is pruned by the crossvalidation method.

6 4 Second Split PC <-0.85 PC 1 = 1 st Principal Component PC = nd Principal Component nd Principal Component 0 4 6 First Split LJ4050 Oki SS1450 TEST LJ4050 PC <-3.5 PC 1 <-1.16 PC <1.89 PC 1 <-3.79 LJ4050 Test LJ4050 Okipage SS 1450 8 8 6 4 0 4 6 1 st Principal Component Figure 5: Binary tree structure for classifying an unknown printer. Five printer models are used in training process. Figure 4: Splitting of the projected signal in the principal component domain by Gini criterion. The experimental setup is similar to that for the GMM classification. Five different printer models are used for training; and a sixth printer is added as a test printer. Figure 4 shows the first two splitting on the data set. The first splitting is done at the value of -0.8473 of the second principal component. This split separates the Laserjet 1000 and Laserjet 100 from the other printers. Similarly a second split at -1.15 of the first principal component separates the Laserjet 4050 from the Okipage 14e and Samsung ML- 1450. Figure 5 shows the complete grown and pruned tree. Each terminal node represents an individual printer. The Laserjet 1000 and Laserjet 100 have the same parent node because of the similarities in the banding characteristics. The training and the test Laserjet 4050 have the same parent. So, the unknown Laserjet 4050 is identified properly by the tree classifier. The Samsung ML-1450 and the Okipage 14e are represented by separate terminal nodes. Conclusion Printer identification from the text-only document requires sophisticated techniques based on feature extraction and pattern recognition. In our work, we presented a method for reducing the dimension of the data set. This reduced dimension data set functions as a feature space for the classifier. We developed two different classifiers based on the Gaussian mixture model and binary tree. If there is distinction in the feature space, both classifiers can identify the unknown printer properly. References 1. G.. Ali, P. C. Chiang, A. K. Mikkilineni, J..P. Allebach, G. T. Chiu and E. J. Delp, Intrinsic and Extrinsic Signatures for Information Hiding and Secure Printing with Electrophotographic Devices, in Proceedings of the IS&T s IP19: International Conference on Digital Printing Technologies, 003,pp. 511-515.. I. T. Jolliffe, Principal Component Analysis, Springer- Verlag, Second Edition, 00, pp. 199-31. 3. K. Fukunaga, Statistical Pattern Recognition, Morgan Kaufmann, Second Edition, 1990, pp. 400-407. 4. K. Webb, Statistical Pattern Recognition, John Wiley & Sons, Second edition, 00, pp. 319-334. 5. I. T. abney, etlab: Algorithms for Pattern Recognition, Springer, pp. 73-75. 6. K. Fukunaga, Statistical Pattern Recognition, Morgan Kaufmann, Second Edition, 1990, pp. 4-33. 7. I. T. abney, etlab: Algorithms for Pattern Recognition, Springer, pp. 79-113. 8. A. P. Dempster,. M. Laird, and D. B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, J. Roy. Stat. Society B, vol 39, 1977, pp. 1-38. 9. C. F. J. Wu, On the convergence properties of the EM algorithm, The Annals of Statistics, vol 11, no. 1, 1983, pp. 95-103. 10. R. A. Render and H. F. Walker, Mixture Densities, maximum likelihood, and EM algorithm, SIAM review, vol. 6, no., April 1984, pp. 195-39. 11. L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classification and regression trees, Wadsworth & Brooks, 1984, pp. 18-36. Biography Gazi aser Ali received the B.Sc. degree in electrical engineering from Bangladesh University of Engineering and Technology, in 1998. He is currently pursuing his Ph.D. degree in the School of Electrical and Computer Engineering, Purdue University, West Lafayette. He is a student member of IEEE and IS&T.