Pattern recognition systems Lab 8 Bayesian Classifier: Simple Digit Recognition Application

Size: px

Start display at page:

Download "Pattern recognition systems Lab 8 Bayesian Classifier: Simple Digit Recognition Application"

Henry Rice
5 years ago
Views:

1 Pattern recognition systems Lab 8 Bayesian Classifier: Simple Digit Recognition Application 1. Objectives In this lab session we will study the naïve Bayes algorithm and we will apply it to a simple recognition problem: we will try to distinguish between two classes of digits one and two. 2. Theoretical Background 2.1. Naïve Bayes Naïve Bayes classifiers can handle an arbitrary number of independent variables whether continuous or categorical. Given a set of variables, X = {x1,x2,...,xd}, we want to construct the posterior probability for the event Cj among a set of possible outcomes C = {c1,c2,..., cd}. In a more familiar language, X is the predictors and C is the set of categorical levels present in the dependent variable. Using Bayes' rule:,,,,,, where p(cj x1,x2,..., xd) is the posterior probability of class membership, i.e., the probability that X belongs to Cj. Since Naïve Bayes assumes that the conditional probabilities of the independent variables are statistically independent we can decompose the likelihood to a product of terms: and rewrite the posterior probability as: Using Bayes' rule above, we label a new case X with a class level Cj that achieves the highest posterior probability. Although the assumption that the predictor (independent) variables are independent is not always accurate, it does simplify the classification task dramatically, since it allows the class conditional densities p(xk Cj) to be calculated separately for each variable, i.e., it reduces a multidimensional task to a number of one-dimensional ones. In effect, Naïve Bayes reduces a high-dimensional density estimation task to one-dimensional kernel density estimation. Furthermore, the assumption does not seem to greatly affect the

2 posterior probabilities, especially in regions near decision boundaries, thus, leaving the classification task unaffected. Naïve Bayes can be modeled in several different ways including normal, log-normal, gamma and Poisson density functions Naïve Bayes Example The Naïve Bayes Classifier technique is based on the Bayesian theorem. Despite its simplicity, Naive Bayes can often outperform sophisticated classification methods. To demonstrate the concept of Naïve Bayes Classification, consider the example displayed in the illustration above. As indicated, the objects can be classified as either GREEN or RED. Our task is to classify new cases as they arrive, i.e., decide to which class label they belong, based on the currently exiting objects. Since there are twice as many GREEN objects as RED, it is reasonable to believe that a new case (which hasn't been observed yet) is twice as likely to have membership GREEN rather than RED. In the Bayesian analysis, this belief is known as the prior probability. Prior probabilities are based on previous experience, in this case the percentage of GREEN and RED objects, are often used to predict outcomes before they actually happen. Thus, we can write: Since there is a total of 60 objects, 40 of which are GREEN and 20 RED, our prior probabilities for class membership are:

3 Having formulated our prior probability, we are now ready to classify a new object (WHITE circle). Since the objects are well clustered, it is reasonable to assume that the more GREEN (or RED) objects in the vicinity of X, the more likely that the new cases belong to that particular color. In order to estimate the likelihood we will use a nonparametric density estimation method called fixed-radius near neighbors. That is we will analyze the distribution of data in a vicinity of X. More specifically to measure this likelihood, we draw a circle around X which encompasses a number (to be chosen a priori) of points irrespective of their class labels. Then we calculate the number of points in the circle belonging to each class label. From this we calculate the likelihood: From the illustration above, it is clear that Likelihood of X given GREEN is smaller than Likelihood of X given RED, since the circle encompasses 1 GREEN object and 3 RED ones. Thus: Although the prior probabilities indicate that X may belong to GREEN (given that there are twice as many GREEN compared to RED) the likelihood indicates otherwise; that the class membership of X is RED (given that there are more RED objects in the vicinity of X than GREEN). In the Bayesian analysis, the final classification is produced by combining both sources of information, i.e., the prior and the likelihood, to form a posterior probability using the so-called Bayes' rule (named after Rev. Thomas Bayes )

4 Finally, we classify X as RED since its class membership achieves the largest posterior probability. The above probabilities are not normalized. However, this does not affect the classification outcome since their normalizing constants are the same. 3. Implementation details and practical work In this lab session we will implement a simple digit recognition system We have two classes of digits ONE and TWO (grayscale images having 28x28 pixels): Given an unknown digit (test image T), that can be either ONE or TWO we need to find the real class to which it belongs. Use as features all the pixels from the images (values from 0 to 255 images with 8 bits/pixel). We may use the following schematic algorithm: Load the test image (which will be classified either ONE or TWO) Load the templates by selecting a folder path. This folder contains all the templates from both classes. The filenames belonging to class ONE are named template1*.bmp and those from class TWO are named template2*.bmp Compute the prior probabilities of each class o P(ONE) = NrTemplatesInClassONE / TotalNumberOfTemplates o P(TWO) = NrTemplatesInClassTWO / TotalNumberOfTemplates Compute the likelihood of the test image to belong to class ONE and the likelihood of test image to belong to class TWO. Consider the space where we have a set of digits with known class and a test image T. To measure the likelihoods P(T ONE) and P(T TWO), we should define a neighborhood around T which encompasses all the images at a distance lower than a predefined threshold dthreshold (this threshold should be large enough to have at least one image inside):

5 In the implementation you can consider dthreshold=35 gray levels. The distance (in gray levels) between two images A and B (same width and height) will be computed as the average of the sum of absolute distances of all pixels. The result is a real number between 0 and 255 representing the gray levels average difference between the images: Aij Bij 0 i height 0jwidth d( A, B) height width where A, B are the images and Aij and Bij are the intensity value of pixels in the position (i,j). Compute the number of images (NrNeighborhoodClassONE) of class ONE that are inside the neighborhood (d(t, ImageONE) dthreshold) Compute the number of images (NrNeighborhoodClassTWO) of class TWO that are inside the neighborhood (d(t, ImageTWO) dthreshold) The likelihood of the test image T to be a ONE is o P(T ONE) NrNeighborhoodClassONE / NrTemplatesInClassONE The likelihood of the test image T to be a TWO is o P(T TWO) NrNeighborhoodClassTWO / NrTemplatesInClassTWO Compute the posterior probability and assign the test image to the class that has the largest posterior probability o P(ONE T) P(T ONE)*P(ONE) o P(TWO T) P(T TWO)*P(TWO) Display in a message box: P(ONE), P(TWO), P(T ONE), P(T TWO), P(ONE T), P(TWO T), the classification result (class ONE or class TWO) Implement a function that loads a test image and the templates. For loading the templates and classifying the test image add a new function to the processing menu and use the following code (starting from this code you will develop the entire classification algorithm!):

6 // In the include section please add: // #include <afxdisp.h> BEGIN_PROCESSING(); BYTE *lpsi,*lpsrci; DWORD dwwidthi,dwheighti,wi; HDIB hbmpsrci; CFile filein; CFileException fe; AfxEnableControlContainer(); char buffer[max_path]; BROWSEINFO bi; ZeroMemory(&bi, sizeof(bi)); SHGetPathFromIDList(SHBrowseForFolder(&bi), buffer); if (strcmp(buffer,"")==0) return; char directorypath[max_path]; CFileFind ffind; int nextfile; CString msg; //TEMPLATE class 1 strcpy(directorypath,buffer); strcat(directorypath,"\\template1*.bmp"); nextfile=ffind.findfile(directorypath); int nrimages1=0; while (nextfile) { nrimages1++; nextfile=ffind.findnextfile(); CString fnin=ffind.getfilepath(); filein.open(fnin, CFile::modeRead CFile::shareDenyWrite, &fe); hbmpsrci = (HDIB)::ReadDIBFile(fileIn); filein.close(); lpsi = (BYTE*)::GlobalLock((HGLOBAL)hBmpSrcI); dwwidthi = ::DIBWidth((LPSTR)lpSI); dwheighti = ::DIBHeight((LPSTR)lpSI); lpsrci=(byte*)::finddibbits((lpstr)lpsi); DWORD wi=widthbytes(dwwidthi*8); ///////////// DO THE PROCESING WITH THE CURRENT IMAGE IN CLASS 1 ::GlobalUnlock((HGLOBAL)hBmpSrcI); } msg.format("found and processed %d images in class 1",nrImages1); AfxMessageBox(msg); //TEMPLATE class 2 strcpy(directorypath,buffer); strcat(directorypath,"\\template2*.bmp"); nextfile=ffind.findfile(directorypath); int nrimages2=0; while (nextfile) { nrimages2++; nextfile=ffind.findnextfile(); CString fnin=ffind.getfilepath(); filein.open(fnin, CFile::modeRead CFile::shareDenyWrite, &fe);

7 hbmpsrci = (HDIB)::ReadDIBFile(fileIn); filein.close(); lpsi = (BYTE*)::GlobalLock((HGLOBAL)hBmpSrcI); dwwidthi = ::DIBWidth((LPSTR)lpSI); dwheighti = ::DIBHeight((LPSTR)lpSI); lpsrci=(byte*)::finddibbits((lpstr)lpsi); DWORD wi=widthbytes(dwwidthi*8); ///////////// DO THE PROCESSING WITH THE CURRENT IMAGE IN CLASS 2 ::GlobalUnlock((HGLOBAL)hBmpSrcI); } msg.format("found and processed %d images in class 2",nrImages2); AfxMessageBox(msg); END_PROCESSING("Bayes Classification"); 3.2. Modify the code implemented at 3.1 to recognize the test image as belonging to one of these three classes of digits: ONE, TWO, and THREE. Display in a message box: P(ONE), P(TWO), P(THREE), P(T ONE), P(T TWO), P(T THREE), P(ONE T), P(TWO T), P(THREE T), the classification result (class ONE, class TWO or class THREE). The test image T will belong to the class having the greater posterior probability. 4. References [1] Electronic Statistics Textbook

Computer Vision. Exercise Session 10 Image Categorization

Computer Vision. Exercise Session 10 Image Categorization Computer Vision Exercise Session 10 Image Categorization Object Categorization Task Description Given a small number of training images of a category, recognize a-priori unknown instances of that category