Enhancing Forestry Object Detection using Multiple Features

Size: px

Start display at page:

Download "Enhancing Forestry Object Detection using Multiple Features"

Valerie Maxwell
5 years ago
Views:

1 Enhancing Forestry Object Detection using Multiple Features A THESIS Submitted in partial fulfillment of requirements for Master Degree of Computing Science By: Ahmad Ostovar ahos0003@student.umu.se Supervisors: Thomas Hellström,Mostafa Pordel Umeå University Umeå, Sweden {thomash,pordel}@cs.umu.se December 9, 2011

2 1 I. Abstract In this Master s project increasing the performance of object detection in forestry environment, based on the extracted features is studied. There are several object detection projects for robots which are based on feature calculation and extraction. An example of these kinds of projects is the sugar beet project [3] that has inspired the feature selection and calculations parts presented in this report. Extracted feature sets are given to several classifiers and their results are merged and fused such that the overall performance of the forestry object detection increases. Furthermore different supervised and unsupervised methods of dimensionality reduction are applied on the feature set as an approach to improve classification accuracy. Comparison between the output classification performance of dimensionality reduction methods show that applying supervised methods result in improving the classification performance by about 12 percent. 1

3 II. Acknowledgement Preparation of this master thesis took about six months. I m thankful to all people who helped me during the time that I was working on my thesis project. I like to thank my supervisors, Thomas Hellstrm and Mostafa Pordel who helped me both all the time during my project. I want to thank Thomas Hellstrm because of giving me advices about this thesis project and also Mostafa Pordel who spent lots of time to answer my questions and also helped me to find out my way to my aim. Finally I want to express my gratitude to my parents. They support me whenever and where ever I needed to. Ahmad Ostovar, December

4 Contents 1 I. Abstract 1 2 Introduction 5 3 Sensors SR-3000 SwissRanger Microsoft Kinect Image Collection 8 5 Background Elimination 12 6 Feature Extraction Color Based Features R&G&B Mean and Standard Deviation Mean and Variance of R&G&B Color Images Skewness of R&G&B Color Images Depth Mean & Variance Shape Based Features Area Perimeter Elongation Compactness Solidity FormFactor Convexity Number of Corner Edge and Corner HOG Harris Classifiers KNN (K-Nearest Neighbor) Decision Tree Naive Bayes LDA MLP

5 7.6 WMV Dimensionality Reduction methods Unsupervised dimensionality reduction methods PCA GPLVM Dimensionality reduction using K-mean K-mean Applying K-mean to Extracted Features K-mean for Supervised Dimensionality Reduction (KSDR) Supervised feature Selection and reduction Feed Forward Backward Reduction An example for comparison between Feed Forward and Backward Reduction methods Applying Feed Forward Applying Backward Reduction Results Conclusion and Future work 46 4

6 2 Introduction Increasing the performance of object detection in images from forest environments is the goal of this Master s project. The programming part is done in MATLAB software, and Microsoft Kinect camera is used for collecting images. In total 519 images have been collected and classified as Tree, Bush, Stone and Human. Features are extracted based on the images by detecting and eliminating image s background. Background elimination is done by using Depth images of the Kinect camera. The extracted features are divided into three categories, color and shape based features which are based on 25 extracted features which are motivated by Sugar beet project [3] and edge and corner that are feature which extracted based on object edges and corners. To classify objects, a number of classifiers including K-Nearest Neighbor (KNN), Multi Layer Perceptron (MLP), Linear Discriminant Analysis (LDA), Decision Tree (DT), Naive Bayes (NB) and Weighted Majority Vote (WMV) are implemented. Classification based on different categories help to determine which category of features and their combination result in higher classification performance. As the number of extracted features is large then some combinations of feature have destructive effects on the classification performance, therefore to reduce the number of these combinations some methods of dimensionality reduction has been applied. Dimensionality reduction methods reduce the data dimensionality and improve classification performance in terms of speed, accuracy and simplicity [8]. Dimensionality reduction methods that are implemented include, PCA, GPLVM, KSDR, Feed Forward and Backward Reduction. Figure 1 shows steps that are accomplished in this project. The order of sections in this thesis is as follow, Section 3 describes the selection process of a proper camera for image collection. Section 4 describes the image collection and a method for retrieving images in MATLAB software. Section 5 consists of a discussion about Background elimination. In Sections 6 and 7, feature extraction, selection and classifiers are described. Section 8 explains dimensionality reduction methods. Sections 10 and 11 review the results of the project, conclusion and future works. 5

7 3 Sensors Figure 1: Steps which are accomplished in this project To collect images, two different kinds of cameras are studied. The first camera is the SR-3000 SwissRanger, a product of MESA Imaging AG [1]. The second one is a product of Microsoft Company, the Kinect camera. Both cameras are equipped with infrared sensors to create depth images which are used to produce 3-D images. Kinect is additionally equipped with an RGB camera while the SR-3000 is equipped with a monochrome camera. 3.1 SR-3000 SwissRanger This camera uses a set of amplitude-modulated NIR (Near-Infrared) light released from LEDs on the camera to illuminates the interest prospect to be measured. The camera s lens focused the backscattered light from the object onto a solid-state detector, the return signal demodulated to determine the range to the target [36]. As it is stated in the camera manufacturer s website, the SR-3000 (Figure 2) is a general purpose range imaging camera to measure the real-time depth maps which operates under indoor lighting conditions [1], [50] and [27]. Our tests show that the effective range of the camera is about 1 meter at outdoor environment. As the camera should be used in forestry environment, its effective range is not satisfying because it fails to take full images of large objects like trees while the tree is in proper distance to the camera. Furthermore, sunlight has a very destructive effect on the performance of camera as the camera is designed to work under indoor lightning condition [1]. Poor test outputs regarding mentioned issues are giving enough reasons to avoid using SR-3000 SwissRanger camera for the purpose of image collection. 6

8 Figure 2: SR-3000 Camera [1] Figure 3: A human object is 1 located meter from the camera. The sun is behind the camera. 7

Figure 4: A human object is 1 located meter from the camera. The sun is in front of the camera. 3.2 Microsoft Kinect Kinect is a 3-D camera that produced by Microsoft and released in 2010.

9 Figure 4: A human object is 1 located meter from the camera. The sun is in front of the camera. 3.2 Microsoft Kinect Kinect is a 3-D camera that produced by Microsoft and released in This camera involves a 3-D depth sensor, a RGB camera and a multi-array microphone (Figure 5). The depth sensor has an infrared projector which is combined with a monochrome CMOS sensor, also the effective working range of kinect is meter [32]. As Kinect has a wider effective range, a higher output performance in different light situations rather than the SR and also it provides RGB images then Kinect camera is selected for image acquisition in this project. 4 Image Collection In this thesis project, a requirement is existence of a proper set of object s image. Totally 519 images including trees, bushes, humans and stones are collected using the Kinect camera. Table 1 shows the number of images of each object. Each shot includes a RGB image (Figure 6) and a Depth image (Figure 7). RGB and depth images provide different kinds of information about the objects which are used in order to extract color and shape base features. To use images, a proper calling method in MATLAB is required. 8

10 Figure 5: Kinect 3D camera sensor [2] Figure 6: RGB Image of Kinect camera Figure 7: Depth Image associated to Figure 6 9

11 Table 1: Number of Image Shots from each Interest Object Objects # of Image Shots Tree 127 Bush 193 Stone 37 Human 162 Total 519 To do so, XML (Extensible Markup Language) is chosen. XML is a markup language which use tree pattern as a distinguishing characteristic [49]. Due to its tree-structure, XML is suitable for storing, querying, and manipulating complex data [10]. Due to the flexible semi-structure nature of XML it becomes a popular format for exchange and storage of data, as a result of these characteristics a wide variety of databases are allows to be represented [52]. An advantage of implementing XML is that making changes in image collection and their tags is less time consuming and it is an easier method rather than making changes in the folder of images and their related tags as making changes in the folder of images will effects on the order of image s tags. An image is a composition of different objects rather than a single one. For example, in an image of a bush, other objects like a part of the ground, stones, trees or tree s leafs may be included. These extra objects are considered as background of the image and must be removed to make it possible to extract features of the interest object. There are several methods for selecting the object in the image such as choosing the object manually by applying a method of selecting connected pixels, using morphological properties and eliminating the object background by using depth information. The first alternative method is not selected due to its high computation time on each image. The performance of applied morphological methods for the purpose of background elimination was not satisfying. So it is decided to use the depth information for background elimination. Images 8 and 11 show an example of objects before and after background elimination by using depth information. 10

12 Figure 8: RGB image by Kinect camera Figure 9: Depth image by depth sensor of Kinect camera Figure 10: Background elimination steps 11

13 5 Background Elimination In this project to eliminate the background, depth information of the image is used. Figure 10 shows the involved steps for background elimination. The depth image is taken by the depth sensor of Kinect camera (Figure 9). To eliminate the background, mean value of depth pixels in the depth image is calculated, and those pixels with smaller value than the mean are converted to 0 and the rest to 1. The resulting binary image (Figure 20) shows how the background of the image is removed and the interest object remains as the foreground. As an alternative method it is possible to set an interval for the purpose of converting the depth pixels value, such that if pixel value is in the interval then its value would change to 0 or 1. By applying the alternative method it is possible to eliminate all the extra objects pixels regardless of their distance to the camera or their location in the image. By combining the binary image with the RGB image, the background of RGB image will also be removed. As the output image of the depth sensor has different size than the RGB image, the binary image is converted from 240x320 (Figure 9) to 480x640 to make both depth and RGB image same size and make it possible to combine two images. By combining binary and RGB images, all the pixels with 0 values in binary image change their corresponding pixel value of RGB image to 0 and rest of pixels in RGB image keep their original value. By merging the binary image with the RGB image the interesting object in the foreground remains. Figure 11 shows the result image which is ready to use for extracting features without any destructive effects from objects in the background. 6 Feature Extraction Distinguishing different characteristics of objects is a requirement in object detection. So, it is helpful to extract precise characteristics of different objects to make object detection more accurate. Selecting proper set of different characteristics is important as it can results in miss classification of objects and also decrement the performance of object detection. A set of these characteristics is called feature set. To have an appropriate feature set, by motivation of Sugar beets project [3], 25 features extracts from four variety of objects. Interest objects are Trees, Bushes, Humans and stones. To find out more about features and their effects on the object classification 12

14 Figure 11: RGB Image after background elimination by applying depth information performance, extracted features divides to three categories. First category includes color based, the second category consist shape based and the third one is edge and corner of the object. All categories of features are extracted after background elimination. Color and shape based features are extracted from RGB or depth images. Color based features are based on the object s colors, while shape based features are those which are independent of color and instead they are based on the shape of the object. RGB images are taken by RGB camera of Kinect and depth images are collected by its depth sensor. Also a third category is added as a feature set which is not based on the previously extracted features, instead it extracts edges and corners of objects. To extract edges and corners of objects two methods including HOG and Harris are applied to the images. Table 2 shows the list of features and their category. 6.1 Color Based Features R&G&B Mean and Standard Deviation Mean value is calculated over the whole object after normalizing R&G&B color Images. To extract these three features, raw image divided to its three dimensions which are Red, Green and Blue color images and has the same size as the raw image. The mean value of each color in its related image is 13

15 Table 2: Extracted Featuers # Extracted Feature Base 1 Depth Mean Color 2 Depth Variance Color 3 Red Variance Color 4 Green Variance Color 5 Blue Variance Color 6 Green Mean* Color 7 Form Factor Shape 8 Solidity Shape 9 Red Skewness Color 10 Green Skewness Color 11 Blue Skewness Color 12 Red Std* Color 13 Red Mean* Color 14 Perimeter Shape 15 Red Mean Color 16 Green Mean Color 17 Blue Mean Color 18 Greeen Std* Color 19 Elongation Shape 20 Compactness Shape 21 Blue Std* Color 22 Blue Mean* Color 23 Area Shape 24 Convexity Shape 25 Number of Corners Shape 26 HOG Edge 27 Harris Corner * on Normalized RGB Image - 14

16 calculated based on the following formulas after normalizing the image. For example to normalize a Red Image and calculate its Mean value following formulas are applicable: N ormalizedredimage = ΣRed Red 2 +Green 2 +Blue 2 Where Red, Green and Blue are stand for Red, Green and Blue pixels of image. RedMean = ΣRed RedP ixels Red, Green and Blue are stands for pixels of R&G&B images of RGB image. Standard Deviation is the variation of each color from its mean value. To calculate Standard Deviation over R&G&B color image also raw image is divided to its 3 dimensions and normalized. As a part of color base features, three statistical moments are calculated. Statistical moment functions are applicable to many different aspects of image processing, ranging from invariant pattern recognition and image encoding to pose estimation. Statistical moments describe the image content (or distribution) with respect to its axes. They are designed to capture global and detailed geometric information about the image [35]. Three statistical moments includes Mean, Variance and Skewness of R&G&B Color Images Mean and Variance of R&G&B Color Images Two first statistical moments are Mean and Variance. They represent how R&G&B Color are distributed statistically in each color dimension of raw image. As interest objects have differentiable colors, calculating Mean and Variance of R&G&B color of objects are beneficial for classification with respect to different colors that objects have Skewness of R&G&B Color Images Skewness is the third statistical moment which helps to find out the spread of data around the mean value. Calculating skewness as a feature assists in defining how each color values expanse about the mean value. The skewness of a distribution is defined as Skewness = E(X µ)3, Where µ is the mean of σ 3 15

17 Figure 12: Area region of Tree object x, S is the standard deviation of x, and E(t) represents the expected value of the quantity Depth Mean & Variance As the depth image is also constructed based on the colors which shows the depth of objects Mean and Variance of the object s depth are extracted as two features in this part which shows the distribution of depth on the interest object. 6.2 Shape Based Features Following features are extracted base on the shape of image after background elimination Area It estimates the area of an object. To calculate the area of an object, related MATLAB function calculates On (1) pixels of image. Figure 12 shows the area region of a Tree object Perimeter In this feature the perimeter of object is calculated. MATLAB perimeter function calculates the boundary pixels of the object. A pixel is part of the perimeter if it is nonzero and it is connected to at least one zero-valued pixel. 16

Figure 13: Perimeter region of a Tree Object Figure 13 shows the region of a detected Tree object for computing perimeter feature. 6.2.3 Elongation Elongation is defined as (Area/ (T hickness) 2 ).

18 Figure 13: Perimeter region of a Tree Object Figure 13 shows the region of a detected Tree object for computing perimeter feature Elongation Elongation is defined as (Area/ (T hickness) 2 ). The Area of object is calculated previously but several methods are introduced to calculate the Thickness, the method that is selected is advised by Patric and Alder [28]. In this method the Thickness of an object is (Area/(Perimeter/2)) Compactness Defines how compact is the object. Area and Perimeter have direct effects on the value of Compactness. Definition of Compactness is Area/(P erimeter) 2. Area and Perimeter in the formula are calculated based on the Area and Perimeter features which calculated in previous sections Solidity The description of solidity is Area/(Convex Area). Convex Area is a Scalar that specifies the number of pixels in Convex Image. Convex Image is a binary image that specifies the convex hull. With all pixels within the hull filled in, the image is the size of the bounding box of the region FormFactor Form Factor is a measure of how much object mass there is in the center in relation to how much object mass there is in the periphery. The formula for 17

19 Form Factor is: Form Factor = ((4πArea) 1 2 )/Perimeter). If Form Factor is a number close to 1, then its shape is similar to a circle Convexity Convexity is defined as the ratio between the convex perimeter of an object and its perimeter ((Convex Perimeter)/Perimeter). The convex perimeter is the perimeter of the Convex hull of an object that is defined as the smallest convex shape that contains the object. The convexity takes the value 1 for convex objects and lesser values for objects with irregular boundaries. It also provides a measure of roundness, which is defined as the ratio between the area of the object and the area of a circle with the same convex perimeter [6] Number of Corner This feature defines the number of corners which exist in an object. To extract this feature, curves of image are extracted and base on these curves number of corners are calculated. For example a tree has more corners rather than a stone because of distributed shape of tree and curvy shape of a stone. Figure 16 shows the detected corner of a Tree object. 6.3 Edge and Corner HOG Histograms of Oriented Gradients firstly was introduced for human detection [15]. As it is defined by Navneet Dalal and Bill Triggs [15] HOG uses distribution of local intensity gradients or edge directions to characterize object appearance and shape, it also divides an image into small cells. For each cell a local 1-D histogram of gradient direction or edge orientation is calculated over existing pixels in the cell. By assigning different sizes to the cells, the performance of HOG function also changed. For example if the size of cells decreased then the output vector which store the gradient information of pixels in the cell increased and vice versa. As the size of cells and output vector has reversed relation, decreasing the size of cells will results in more time consuming computation and consequently a bigger output vector which result in decreasing the classification speed. Therefore it is important to assign an accurate size to the cells and use a dimensionality reduction 18

Figure 14: Defined edges of Tree object by applying HOG (axes show the size of image) method to reduce the size of output vector with the purpose of preventing from mentioned issues.

20 Figure 14: Defined edges of Tree object by applying HOG (axes show the size of image) method to reduce the size of output vector with the purpose of preventing from mentioned issues. In this method the input is object image after background elimination with the size of 480-by-640 and the output is a vector which stores the gradient information of cell s pixels. In implemented HOG function, by defining the size of cells as 20-by-20 the output vector has the size of 1-by As the dimension of extracted HOG feature vector is high it is beneficial to reduce its dimensionality before using it for the classification purpose. PCA is applied to the HOG feature vector [41] to reduce the dimensionality. Assigning different values as the size of HOG vector in PCA show that setting the size of HOG vector to 300 results to highest classification performance. Image 14 shows the output figure of HOG which indicates the gradients of object pixels in the defined cells Harris Harris corner detector [26] is used widely in computer vision problems [44]. The main idea in Harris corner detector is to find corners by shifting a sliding window over the image. Defining corners of the object is based on changing the intensity of the sliding window by shifting to different directions. Three regions are definable by considering the intensity changes of sliding window. Flat region when the intensity is not changing, Edge if the intensity is changing strongly in only one direction of shifting and Corner in case of strong intensity changes in all directions. As an edge is looks like a straight 19

Figure 15: Detected Corners of Tree object by applying Harris corner detector line, then by shifting the sliding window in a direction which is perpendicular to the line extension it results in

21 Figure 15: Detected Corners of Tree object by applying Harris corner detector line, then by shifting the sliding window in a direction which is perpendicular to the line extension it results in strong intensity changes. Also as a corner is the conjunction point of two edges then by shifting the sliding window into many directions the intensity would change. The Input in this function is the object image after background elimination. To discover the Harris corners of the object a beginning position is defined to the sliding window. The sliding window is moved to different directions and Harris function compares the intensity of the object in new position and the starting point. If the intensity of sliding window is not changed then it is a Flat region, if the intensity is changed strongly in only one direction it is an Edge otherwise it is a Corner. In the implemented code of Harris corner detector, the output is number of corners which are exist in an object. For example the number of corners in trees and bushes are much more than humans and stones due to the distributed shape and existence of more sides in trees and bushes in compare to other objects. Image 15 shows 50 discovered corners of a Tree object after applying Harris corner detector. 7 Classifiers Classification involves a broad range of decision making using classifiers to categorize concern objects. Classifiers take the extracted features as input to determine the class of each object. As all the implemented classifiers are supervised, therefore they have two aspects, training and testing. In the training step, a part of data points and their predefined classes are given to 20

22 Table 3: Color and Shape based features of Figure 8 # Extracted Feature Calculated Value 1 Depth Mean Depth Variance Red Variance Green Variance Blue Variance Green Mean* Form Factor Solidity Red Skewness Green Skewness Blue Skewness Red Std* Red Mean* Perimeter Red Mean Green Mean Blue Mean Greeen Std* Elongation Compactness Blue Std* Blue Mean* Area Convexity Number of Corners 104 * on Normalized RGB Image - Color Base Shape Base Table 4: Extracted Features Categories R&G&B of Mean,Std,Variance,Skewness,Mean of R&G& B Area,Perimeter,Elongation,FormFactor,Compactness Solidity,Convexity,Depth Mean, Variance & Number of Corners 21

23 Figure 16: Detected corners the classifier to train. In this step classifier learn how to classify objects base on input s weight. In the testing step, data points are categorize with respect to the results of training step and the method of classification that it learned in the training step. To achieve a higher classification performance, it is essential to divide data points for training and testing steps with an accurate ratio. Dividing data points with different ratio for training and testing steps results to different outputs. Therefore it is essential to test all the possible proportions of data points for training and testing, for this purpose crossvalidation applied to the data points. In applied cross validation, data points are divided to training and testing sets in 10 folds and totally 361 images is used for training and 158 for testing. Implementing cross-validation results in more accurate training and testing which increases the performance of object classification. Followings are the description of 5 classifiers which are implemented in this project. 22

24 7.1 KNN (K-Nearest Neighbor) K Nearest Neighbor is a classification algorithm which labeling the new sample by considering the labels of the k most similar examples in a training set [21]. Assigning several numbers to K show that highest performance is achieved by selecting K as 5. Assigned numbers are the same for both training and testing steps. K Nearest Neighbor uses neighborhood classification to predict value of the new query instance and to determine the K nearest neighbors it calculates the least distance from the query instance to the training samples. An unclassified sample is assigned to the class represented to the majority of its K nearest neighbors in the training set [17]. There are several methods for calculation of nearest neighbor distance, but the highest classification performance is reached by applying the Euclid distance method. Steps in KNN include: Determine the parameter K (number of nearest neighbors). Compute the distance between the new query instance and all the training samples. Sort calculated distances and determine the nearest neighbors based on the assigned value to K. 7.2 Decision Tree Decision tree is a divide and conquer approach to classification which in large databases it determine features and extract patterns that are important for predictive modeling and discrimination [40]. This classifier uses tree, subtrees and nodes for making decision about classes that it assigns to input values and use regression method to finalize the output. This classifier is a predictive model that maps the training set to target value and the interpret ability of the constructed model by the decision tree is an advantage over other pattern recognition techniques [40]. 7.3 Naive Bayes This classifier is working by using a set of discriminant functions, and estimating the relevant probabilities from the training set [20]. Naive Bayes is a 23

25 probabilistic classifier based on applying Bayes theorem with naive independence assumptions. This classifier learns from training data, the conditional probability of each attribute (A i ) given the class label (C). Classification is done by applying Bayes rule to compute the probability of C given the particular instance of {A 1,..., A n }, and predicting the class with the highest posterior probability [24]. 7.4 LDA Linear Discriminant Analysis originally known as the fisher mapping [46], does the classification based on calculating linear function. It is a technique that maximize the linear separation ability between inputs which belong to different classes [46]. 7.5 MLP Multi Layer Perceptron is used to train neural networks which is consist of 3 layers, input, hidden and output layer. Input layer is a vector of predictor variable values and distributes the values to each of the neurons in the hidden layer. Also there is a constant input that is called the bias that is fed to each of the hidden layers. For classification problems with categorical target variables, there are N neurons in the output layer producing N values, one for each of the N categories of the target variable. All neural networks have an input layer and an output layer, but the number of hidden layers may vary. After testing MLP classifier on different cases it shows that it is not a reliable classifier for this project. MLP classifier is removed from the list of classifiers due to its low speed of execution [9] and the random initializing of weight [13] which results in different output at every time of execution with the same input. 7.6 WMV Weighted Majority Vote is a classification method for fusion of classifiers. As classifier s performance might not be satisfying, WMV gives more power to the more competent classifiers in making the final decision [34]. WMV compares the classification results of classifiers and select the output s class based on 2 concerns. First, the assigned class to the output of more powerful 24

26 Table 5: Classification performances based on different categories of features Classifiers Color Based Shape Based Color and Shape Based Corner(HOG) Edge (Harris) Object s pixels KNN 78% 71% 76% 72% 68% 20% DT 84% 66% 80% 66% 65% 18% LDA 77% 71% 72% - 72% 35% NB 76% 64% 84% 75% 73% 42% WMV 84% 71% 82% 80% 75% 40% classifier (the one with more accuracy) and second the number of classifiers which defined the same class to the output. As the aim of this project is to increase the classification performance of object detection, some dimensionality reduction methods are applied to the feature sets to reduce their dimensionality and find out if they result to a higher object classification. 8 Dimensionality Reduction methods To classify interest objects it is require to extract features but it has been observed in practice that adding features to the feature set may actually reduce the performance of classifiers [29]. Therefore some dimensionality reduction methods are applied to the feature sets to reduce the dimensionality of data points and increase the classification performance. Reducing the dimensionality of data points can be achieved by feature selection or feature transformation. Feature selection is a NP-hard optimization problem over a discrete space [33]. It focuses on selecting a subset of the original feature set [39]. Feature selection does not change any individual feature. On the other hand, feature transformation works on continues space and divides to linear, non-linear, probabilistic and non-probabilistic methods. Feature transformation methods create new feature set out of the original set [45]. Selected methods of dimensionality reduction include PCA, GPLVM, Feed Forward, Backward Reduction and a new method based on K-mean (KSDR). Dimensionality reduction methods are divided to two categories of Supervised and Unsupervised (Table 6). 8.1 Unsupervised dimensionality reduction methods The input in unsupervised dimensionality reduction methods is object s pixels. To have an appropriate input for these methods it is necessary to extract 25

27 Table 6: Dimensionality reduction methods except KSDR Supervised Unsupervised Feed Forward PCA Backward Reduction GPLVM object s pixels. But different objects have different sizes which result in inputs with various sizes. To avoid this problem a boundary box is defined around the object which covers all parts of object. Figure 21 shows an example of a boundary box which covers a Tree object. To define the boundary box, the coordinates of four points of the object are extracted. These coordinates include the highest and lowest left and right points of the object. By connecting defined points a boundary box around the object is formed. To make all the objects which are covered by boundary boxes the same, size of defined objects in the boundary box converted to 340-by-420. Figure 18 shows the input image (the image which covered by the bounding box) after changing its size to 340-by-420. Therefore the input of and GPLVM is the same for all types of objects PCA Principle Component Analysis (PCA) is a popular linear, non-probabilistic method that includes multidimensional representations which is used for data compression and dimensionality reduction[51]. PCA uses several intercorrelated quantitative dependent variables to describe the observations [4] and the output of PCA maximally can be the same size as the input data [53]. The idea of PCA is to apply a linear transformation on the input data to reduce the dimensionality such that retrieving data from the output data points being possible with maximum similarity to the original data points. A day life simple example of PCA is taking a picture by a camera. As objects are 3-D and it is not possible to show all parts of them in a 2-D image, photographers always try to show more important parts of an object which still make the observer able to get information about all parts of the object even if they are not shown in the image. Therefore, a photographer should turn the object in different directions to set a view which shows the maximum attributes of the object to make observer able to predict or guess about parts which are missed in the image. PCA is used widely in image classification by reducing the dimensionality of data points (See [18], [14] and [12]). 26

28 PCA works as follow: Extract those pixels of input image which are more important for retrieving the original image after dimensionality reduction. These pixels include object s pixels after background elimination. Compressing the extracted pixels. Sort the extracted pixels based on the level of similarity that is required for retrieving the original data. To reduce dimensionality of data points PCA applies following steps to the input image: Determining the size of input image. Standardizing the data set by converting the mean value of data points to zero and standard deviation to one, to centering and scaling the data. Computing the covariance matrix of standardized data set. Calculating the eigenvalue and vector of the covariance matrix to define the coefficients of the Principal Components and their respective variances. For the purpose of calculating the eigenvalue and vectors assume that [V D] = eig(cov(dataset)). The output matrixes V and D are used to extract the required values of eigenvalue and vector with respect to this issue that matrix V contains coefficients of Principal Components and the diagonal elements in matrix D store the variance of the respective PCs. Extracting and normalizing the diagonal of matrix D. In this step the purpose is to find the proportion of each Principle Components in representing of the total variance. Output of this step indicates the number of PCs that should select to recreate the most similar values to the original data set. Multiply the standardized data by the coefficient matrix (Matrix V ) to calculate the principle components (PCs = (dataset) V ). 27

29 Figure 17: Background elimination on the object in the boundary area which defined in image 21 Figures 17 and 18 show different functions which are applied to the input image (Figure 21). For the forestry object classification, PCA is fed by object s image pixels with the size of 340-by-420. It reduces the number of columns from 420 to 60 PCs with the compression about 86 percent while it holds 99 percentages of the original image data. It means that by decreasing the dimensionality of the input image from 420 to 60 it is possible to recreate the data with 99 percent similarity to the original data set. It is also possible to select different numbers of PCs instead of 60, but the similarity of recreated data and the original set of data will be different and respectively the classification performance will change. In this project the number of PCs is selected as 60 based on this idea that feeding classifiers with more similar data to the original data set results in more accurate classification. For example, by selecting 40 of PCs the similarity of recreated data and the original data set decreased to 70% and also the classification accuracy reduced. The output of PCA which is object s image pixels after dimensionality reduction with the size of 340-by-60 is fed to classifiers to classify objects and define the accuracy of this method. Image 19 shows the recreated data after applying PCA to the input image of

30 Figure 18: Figure 17 after resizing. Due to the different size of objects the boundary area is set to 340-by-420 to cover all parts of objects. This Image is the input for both PCA and GPLVM. Figure 19: Retrieved image with 99% similarity to the input image 18 by applying PCA 29

31 8.1.2 GPLVM The concept of using Gaussian processes for the purpose of dimensionality reduction, through Gaussian Process Latent Variable Models (GPLVM), has been introduced by Lawrence in 2003 [37]. The main idea behind the GPLVMs is to find a non-linear function that smoothly maps low-dimensional latent space vectors to a high dimensional observation space [43]. GPLVM is a probabilistic non-linear PCA which instead of applying linear transformation, using non-linear methods. It combines Gaussian process and a linear transformation method such as PCA, Kernel PCA, Probabilistic PCA and Dual Probabilistic PCA to reduce the data dimensionality. In this project PCA is selected to be used in GPLVM as a linear transformation to combine with Gaussian process. Applying GPLVM and GP on the data sets with the purpose of reducing the dimensionality and increasing the classification performance is also implemented by other researchers (See [31],[47] and [5]). To have a GPLVM function, it is needed to have the Latent Variable Model and the Gaussian Process. Calculating linear latent variable model is a pre-requisite for computing non-linear transformation. To compute latent variable model assume that: q is dimension of latent space, d is dimension of data space, n is number of data points and Y is the centered data. X is the Latent Variable and W is the mapping matrix which is used to map centered data and latent variable. (W R d q ) Linear latent variable model represent data Y with a lower dimensional set of latent variable X by applying this linear relationship: Y i,: = W X i,: + η Where η is a noise term and η N(0, σ 2 I) Gaussian Process In GP the focus is on the probability over functions and the variance of posterior. A Gaussian Process likelihood is of the form P (Y X) = N(Y 0, K) where K is the covariance function or Kernel. Formally, a Gaussian process generates data located throughout some domain such that any finite subset of the range follows a multivariate Gaussian distribution [22] also it often used for Regression and calculation of Posterior Distribution over Functions. By accomplishing the Latent Variable 30

Figure 20: Binary Image after background elimination by using depth Model, Gaussian Process and also applying a linear transformation (PCA), GPLVM as a non-linear function for the purpose of

32 Figure 20: Binary Image after background elimination by using depth Model, Gaussian Process and also applying a linear transformation (PCA), GPLVM as a non-linear function for the purpose of dimensionality reduction method is achieved. The input image in GPLVM is Figure 21 which Figures 17 and 18 show modifications that are applied it before feeding to the GPLVM. In both PCA and GPLVM inputs are pixels of image after background elimination. By applying PCA and GPLVM, number of columns reduced as they are considered as features but the number of rows stayed unchanged due to assumption that they are observations of feature s sets. The input in GPLVM is object s image pixels with size of 340-by-420. Same as PCA the number of Principle Components is selected as 60 to retrieve data with highest similarity to the original data points. Selecting this value for PCs represent that retrieved data has 99 percent similarity to the original data set. The output of GPLVM with size of 340-by-60 is fed to the classifiers for object detection and classification. Figure 22 shows the output of GPLVM after dimensionality reduction. By applying GPLVM to the input image with size 340-by-420, its dimensions reduced to 340-by-60. Classifying objects based on the output of PCA and GPLVM means that classifiers are fed with the output of PCA and GPLVM, and they classify objects based on the pixels of the image without doing any feature extraction as mentioned previously. Instead in these methods features are considered as image s pixels and the new created set of features include the new values of the output image s pixels which in applied methods have 99 percent similarity to the original one. Table 7 shows the performance of applied classifiers on the output of PCA and GPLVM. 31

performance before and after applying PCA and GPLVM Classifier Performance performance performance

33 Figure 21: Drawing a boundary around the detected object by using binary image Figure 22: GPLVM Image after applying GPLVM on the input image 18 Table 7: Comparison of object detection performance before and after applying PCA and GPLVM Classifier Performance performance performance - On PCA On GPLVM On Image Pixels KNN 48% 34% 30% DT 51% 53% 34% LDA % NB 72% 69% 45% WMV 69% 63% 40% 32

34 8.2 Dimensionality reduction using K-mean K-mean K-mean is an unsupervised method that was originally introduced by Forgy [23] and McQueen [38] in K-Means is simple and can be used for a wide variety of data types [16] but its algorithm suffers from initial starting conditions effects (initial clustering and instance order effects) [42]. It categorizes input data by clustering and the number of clusters (K) are defined by the user. This method minimizing the square error function to defines clusters which are represented by the centroids. Applying K-mean results in converting the wide continues range of features to discrete numbers. In this method, data points are clustered base on the mean value that is calculated for each cluster. To find out if a data point belongs to a cluster, the distance of data point to the mean value of cluster is calculated. If the data point is the closest to the cluster centroid then it belongs to the cluster and it is in the correct group, otherwise if the data point is not closest then its group changes to the cluster which has the least distance to its centroids. K-mean puts data points with more similarity (Similar distance to the centroid of the cluster) in the same group. An advantage of doing so is to reduce the dimensionality of data by changing the value of the feature by the cluster number. Figures 23 and 24 show two examples of K-mean clustering method for the same data point but with different values of K. Applying K-mean to the extracted feature s value with assigning a proper value to K, reduces the dimensionality of features which in order results in faster classification due to reduced dimensionality and higher performance of object classification (See [48],[7] and [19]) Applying K-mean to Extracted Features Totally 519 image shots of forest environment including four types of objects are taken. As table 2 shows from these images 25 features are extracted. The applied function in MATLAB is Kmeans function which takes data points (extracted features) and K value as input and gives the number of group that each data point belongs to as the output. In applied K-mean function, inputs are extracted features of 519 image (519-by-25 matrix) and the K value. K-mean output is a same size matrix but different in variety of data. Members of output matrix are the number of groups that data points belong to. The extracted features values R but clustering numbers N 33

35 Figure 23: K-mean clustering for 20,000 random data points, k = 2 Figure 24: K-mean clustering for same data points in Figure 23. but different value of k, k = 5 34

36 Figure 25: K-mean Algorithm 35

37 and they have the same size. K-mean cluster features into groups based on their similarities in case of distance to the clusters mean value. The similarity of values in each group is based on their centroids (mean value), by minimizing the square error function [16]. To make clustering of data points more efficient, Square Euclidean distance measurement is selected to measure the distance of data points to cluster s centroid. In K-mean assigning accurate number of clusters is often not obvious, and choosing k automatically is a hard algorithmic problem [25] thus as a weak point the focus is to find a method to assign the best K value to extracted feature to obtain the highest classification performance K-mean for Supervised Dimensionality Reduction (KSDR) The motivation in applying KSDR is to make a combination of FF and Backward reduction methods such that we have an intermediate solution instead of starting with an empty set or a complete set. What K-mean is doing is actually grouping the data set to K clusters. As the first step, KSDR uses K-mean attributes to categorize each feature into K classes. Then it runs similar to FF and BR for N iterations ( N is the size of feature sets). In each iteration, KSDR selects one feature, classifies it to K 1, K 2,... 1 clusters and evaluates the classification accuracy given the new feature sets. The number of cluster which results to the highest classification accuracy is then set for K value for the selected feature. By assigning proper value of K to all the features, KSDR eliminates features that have K value less than or equal to threshold T. It means that KSDR removes the features that contribute the most to classification accuracy when they have the same value (for K = 1) or insufficient clusters (for T > 1). The main drawback of KSDR is the two parameters K and T which assigning proper values to them is an optimization problem. In KSDR it is assumed that predefined value to K is a random number of 1 and the range of numbers to check for K is {1...10} N. Different range of numbers examined to find the best K for each feature. To check all the possible permutations it is require to check n=25 features which every feature has a K in {1...10} N. The cost of this computation is O(k n ) which is too high. The formula shows that there is possible permutation to check. Due to its high cost, an optimization method is require to decrease the cost of K-mean function to find the best K value for each feature with best classification performance. Therefore the following method 36

38 is implemented. In the optimization method the range of K divides to two parts, one which belongs to the set of {1...10} N and in the second category it belongs to {1...5} N. A reason to select the beginning of intervals from 1 is that assigning K = 1 as the cluster number to the data points means that those data points have the least importance in compare to other data point with higher K value and gives the opportunity to remove them from the feature set to reduce the dimensionality. For the purpose of decreasing the dimensionality as much as possible, data points with K = 1, K = 1, 2 and K = 1, 2, 3 are selected to be removed in order. Choosing different set of numbers for removing features results in reducing the dimensionality as much as possible but the issue that should be consider is that in this project reducing the dimensionality is valuable when it also increase the classification performance. Followings are steps which are accomplished in optimization method. For all the features K is sets to 1. Change the K value of first feature from 1 to 10 while K is fixed to 1 for others (If K value is set to a feature in other steps rather than step 1, do not change its value). In every iteration that K value which assigned to the feature is changed, give the set of features to the classifiers (K value is set to 1 for the rest of features). Find and set the best K value which results in highest classification performance. Repeat last three steps for all the features. Remove features which their best K is equal to the set of {1}, {1, 2} and {1, 2, 3}. At each removal iteration, calculate the classification performance of the remaining set. 37

39 Table 8: Remaining features and their classification performance. Range of K is from 1 to 10 and in each iteration features with k = 1, k = 1,2 or k = 1,2,3 are taken out. Taken Out Features # of Features NB KNN LDA DT WMV K = 1 1,2,4,6,9,13..15,18..19, % 80% 78% 85% 85% K = 1,2 1,2,6, % 81% 77% 84% 82% K = 1,2, % 75% 76% 74% 78% Table 9: Remaining features at each step and the related classification performance. Range of K is from 1 to 5 and in each iteration features with k = 1, k = 1,2 or k = 1,2,3 are taken out. Taken Out Features # of Features NB KNN LDA DT WMV K = 1 1,2,6,10, % 91% 77% 88% 90% K = 1,2 1,2,6,11,17,18, % 77% 74% 85% 85% K = 1,2, % 75% 76% 74% 78% Tables 13 and 9 show the new feature sets and their classification performance after applying optimization method. As table 9 shows, the best object detection performance is achieved when K belongs to {1...5} N and features are removed if their K value is equal to 1. The highest attained performance is 90% on only 5 features. These outcomes show that applying KSDR for the purpose of dimensionality reduction increased the object detection performance by 6% in comparison to the highest classification performance (84%) which is achieved without applying any type of feature selection and reduction methods. Additionally, this performance is achieved on only 5 features instead of 25. Advantages of K-mean indicates that applying this function on on-line classification has some benefits such as decreasing the computation time and increasing the performance of classification. 8.3 Supervised feature Selection and reduction Feed Forward Feed Forward (FF) is doing dimensionality reduction by feature selection. It means that FF extracts a subset of feature set which provides a higher classification performance than the original set of extracted features. FF does not guarantee to find the optimal solution as it does not check all the possible subsets of the feature set [30]. FF begins with an empty subset and assigns features to it as it goes further at each step. In feed forward 38

40 [11] the extracted features feed to the classifiers one by one and the one with highest classification performance selected, then in order the rest of features added to the selected feature(s) to find the best feature set with highest classification performance. The highest classification performance is achieved on the testing set by applying 10 folded cross-validation on the input data points. This process continues in each execution step for the length of extracted features minus those feature(s) which has been assigned to the subset previously. In this method the cost of computation is O( N 2 +N ), 2 which is high due to the computation of all the possible permutations for the remaining number of features at each step. In this method four steps accomplished, Find a feature with highest classification performance. Add one feature at a time to the selected feature. Find the set with highest classification performance. Repeat last two steps for the length of features. Table 11 shows which features are selected at each step and the classification performance of the selected set of feature(s) Backward Reduction Backward Reduction is chosen as it is one of the most commonly used methods for feature selection [30]. Backward Reduction method results in removing those features which their combination with other features reduces the classification performance. BR starts with the original features set instead of empty set in FF and eliminate features one-by-one to increase the classification performance. Removing features result in decreasing the dimensionality of extracted feature set. In Backward Reduction 2 steps are accomplished, firstly, features are taken out one-by-one and the classification performance of remaining set of features is calculated by giving them to the classifiers. Then, the feature which in its absence the highest performance is achieved will removed from the feature set. These steps continue for the length of feature set. By applying these steps on the extracted feature set, a subset with highest classification performance will be selected. Removing features, results in eliminating features which their combination with other features have a destructive effect on the total performance of classification. Steps in Backward Reduction include, 39

41 Table 10: Extracted features and their performance # Extracted Feature Calculated Value 1 Depth Mean Depth Variance Red Variance Green Variance Blue Variance Green Mean* Form Factor Solidity Red Skewness Green Skewness Blue Skewness Red Std* Red Mean* Perimeter Red Mean Green Mean Blue Mean Greeen Std* Elongation Compactness Blue Std* Blue Mean* Area Convexity Number of Corners * on Normalized RGB Image - Table 11: Selected features and their performance by applying Feed Forward on 25 extracted features Features Performance 6 78% 6,9 88% 6,9,2 90% 6,9,2,14 92% 6,9,2,14,13 92% 6,9,2,14,13,1 92% 6,9,2,14,13,1,8 93% 6,9,2,14,13,1,8,7 93% 6,9,2,14,13,1,8,7,22 94% 40

42 Table 12: Order of removed features and the performance of remaining set of features, after applying Backward Reduction on 25 extracted features Removed Features Remained Features Performance of Remained Features 13 1,2,3...12,14,15, % 13,20 1,2,3...12,14, ,21, % 13,20,7 1,2...6,8...12,14, ,21, % 13,20,7,18 1,2...6,8...12,14, ,19,21, % 13,20,7,18,14 1,2...6,8...12, ,19,21, % 13,20,7,18,14,25 1,2...6,8...12, ,19,21, % 13,20,7,18,14,25,21 1,2...6,8...12, ,19,22,23,24 88% 13,20,7,18,14,25,21,16 1,2...6,8...12,15,17,19,22,23,24 88% 13,20,7,18,14,25,21,16,9 1,2...6,8,10,11,12,15,17,19,22,23,24 87% 13,20,7,18,14,25,21,16,9,8 1,2...6,10,11,12,15,17,19,22,23,24 86% 13,20,7,18,14,25,21,16,9,8,3 1,2,4,5,6,10,11,12,15,17,19,22,23,24 87% 13,20,7,18,14,25,21,16,9,8,3,12 1,2,4,5,6,10,11,15,17,19,22,23,24 88% 13,20,7,18,14,25,21,16,9,8,3,12,11 1,2,4,5,6,10,15,17,19,22,23,24 88% 13,20,7,18,14,25,21,16,9,8,3,12,11,5 1,2,4,6,10,15,17,19,22,23,24 89% 13,20,7,18,14,25,21,16,9,8,3,12,11,5,19 1,2,4,6,10,15,17,22,23,24 91% 13,20,7,18,14,25,21,16,9,8,3,12,11,5,19,24 1,2,4,6,10,15,17,22,23 91% 13,20,7,18,14,25,21,16,9,8,3,12,11,5,19,24,22 1,2,4,6,10,15,17,23 91% 13,20,7,18,14,25,21,16,9,8,3,12,11,5,19,24,22,4 1,2,6,10,15,17,23 92% 13,20,7,18,14,25,21,16,9,8,3,12,11,5,19,24,22,4,17 1,2,6,10,15,23 94% Table 13: Performance of different methods of feature selection on the total number of features Method Weighted Majority Vote Selected Features Set # of Features Feed Forward 94% 6,9,2,14,13,1,8,7,22 9 Backward Reduction 94% 1,2,6,10,15,23 6 Take out features one-by-one and calculate the classification performance of remaining set. Remove the feature that its absence results to the highest performance of classification. Table 12 shows order of removing and remaining features and their performance. 41

43 Table 14: Applying first step of FF to the features in table 19 Feature Performance on Classification A 85% B 45% C 50% Table 15: Applying second step of FF to the features in table 19 Feature set Performance on Classification A,B 80% A,C 75% 9 An example for comparison between Feed Forward and Backward Reduction methods For more clarification here is an example to compare Feed Forward and Backward Reduction methods. Assume that three features exist. The purpose is to apply FF and BR on them and compare the outputs. Tables 19 and 20 shows features, their combinations and classification performance. 9.1 Applying Feed Forward In FF features are taken one-by-one and their classification performance is calculated. The feature with highest classification performance is selected and remaining set of feature added to the selected features one-by-one, by adding each feature the classification performance of this set is calculated. Figure 14 shows the first step of applying FF on the feature set. In this step features and their classification performance are calculated to find a feature with highest performance, thus feature A is selected. On the next step, as Figure 15 shows remaining features are added to the selected feature. The feature set with highest classification performance that is selected is combination of features A and B. In the final step, a feature which was remained in the last step is added to the combination of feature A and B. As Figure 16 shows it results in 70 percent performance in classification. By comparing output of FF in different steps, the feature with highest classification performance that was achieved is feature A. So the 42

44 Table 16: Applying second step of FF to the features in table 19 Feature set Performance on Classification A,B,C 70% Table 17: Applying first step of Backward Reduction to the features in table 19 Feature set Performance on Classification B,C(Absence of A) 90% A,C(Absence of B) 75% A,B(Absence of C) 80% output of FF method on the feature set is feature A with 80% performance in classifying objects. 9.2 Applying Backward Reduction Table 17 shows the results of applying the first step of Backward Reduction to the features. As the combination of features B and C has the highest performance in absence of feature A, feature A removes from the feature set. Table 18 shows applying BR on the remaining set of features. In this level, as feature C has the highest performance in absence of feature B, it is selected as the remaining feature. By applying BR on the feature set two different set of features remained, a combination of features B and C and feature C. Comparison between these two sets result in selecting the best set of feature with highest classification performance which is the combination of features B and C with 90% classification performance. Comparing FF with BR in this example shows a weak point of FF. As it is mentioned in description of Feed Forward this method is not able to explore all the possible combination of features (Combination of features B and C ). Therefore the output of FF is not the optimal solution. On the other hand, BR checks more possible combinations of features and it results in more appropriate output. Table 18: Applying first step of Backward Reduction to the features B and C - Performance on Classification C(Absence of B) 50% B(Absence of C) 45% 43

45 Table 19: List of features and their performance Feature Performance on Classification A 85% B 45% C 50% Table 20: Different Combination of Features and their performance Feature Performance on Classification A,B 80% A,C 75% B,C 90% A,B,C 70% 10 Results Extracted features are divided to three categories to define the effects of each type on the classification performance in object classification. Categories include Color and Shape based features which are divided based on 25 extracted 2 and the other category is edge and corner of objects which are extracted by applying HOG and Harris functions on the image of objects. To extract features, the image s background is eliminated by using depth information. As table 5 shows, using color based features which includes 17 features results in highest classification performance which is 84%, on the other hand, shape based features classification performance is 71% on 8 features. The total classification performance on 25 extracted features is 82% while applying HOG and Harris for extracting edge and corner features results in 80% and 75% in order. In comparison between classification performances of feature s categories, edge features which are extracted by HOG achieved the highest classification performance (Figure 27). 80% performance by HOG is achieved on only one feature which is the object s edges on the image s pixels while color based feature s performance is reached to 84% on 17 features. The HOG classification performance is about four percent less than color based features classification performance but as this performance is achieved on one feature it shows that the edges features are more efficient rather than 17 features. As the number of extracted features based on color and shape is large it is necessary to decrease dimensionality of data set and find out the effects 44

46 of dimensionality reduction methods on the classification performance. Two supervised, two unsupervised and a unique method based on K-mean are applied as dimensionality reduction methods on 25 extracted features. PCA and GPLVM are applied to the data set as unsupervised methods which using feature transformation for dimensionality reduction. The inputs in these methods are the image pixels of the object with the size of 340-by-420 and the output which classification is based on is image with the size of 340- by-60. As Table 7 shows, applying PCA and GPLVM results in 69% and 63% accurate classification. It shows that PCA has higher classification output in compare to GPLVM. The classification performance of these two methods are not as high as other methods but it is noticeable that their inputs are only object s pixels without applying any method of feature extraction. Achieved performance by applying PCA and GPLVM is about 20% (Table 5) higher than feeding classifiers with same input without applying any method of dimensionality reduction. Supervised methods for dimensionality reduction are consisting of Feed Forward and Backward Reduction. The input is 25 extracted features and these methods find the best combination of features with highest classification performance using feature selection. As Table 22 shows, both FF and BR has the same classification performance of 94% but this performance is attained on 9 features by FF and only 6 by BR. Comparing the output results of FF and BR present that BR method is more efficient as it decreased the number of features from 25 to only 6 while FF decreased it to 9 with the same classification performance. The new unique method based on K-mean is K Mean Supervised Dimensionality Reduction (KSDR) which uses both supervised and unsupervised methods. K-mean is originally unsupervised but in this method it is combined with a supervised method to reduce the dimensionality of data set. The input in KSDR is 25 extracted features. As Tables 8 and 9 show, defining different values to the input parameters of KSDR results in dissimilar accuracy in classifying objects. The highest classification performance that is achieved by this method is 90% on 5 features. Comparison between KSDR and other methods show that its classification performance is higher than the original set of features, color and shape based feature, PCA and GPLVM. It shows that applying KSDR increase the classification performance by decreasing the object dimensionality. To compare the classification performance on each object based on the extracted fetures tables of precision and recall are computed. Tables 23, 24, 25, 31 and 32 belongs to the classification of objects without applying any 45

method of dimensionality reduction while Tables 26, 27, 28, 29 and 30 show the classification of objects after applying dimensionality reduction methods.

47 method of dimensionality reduction while Tables 26, 27, 28, 29 and 30 show the classification of objects after applying dimensionality reduction methods. It is noticeable that in these tables the number of objects is selected by the classifiers after applying cross validation on the input set of data. To find out which objects are classified more accurate, the highest recall number of objects are compared in Table 33 which shows that Human objects have the highest classification performance that its recall value is 1 in most of the settings. On the other hand, the highest precision number is reached by Bush (Table 33), this number shows that what percentage of all the objects which are classified as a Bush really belongs to this class. Applying defined dimensionality reduction methods increased the object classification performance in comparison to the condition that they are not applied to the data set (Table 21 and Figure 26). The highest classifying performance that is achieved 94% on 6 features by applying Backward Reduction as a dimensionality reduction method. Figure 26: Classification performance on color and shape based features and number of features, before and after applying dimensionality reduction methods 11 Conclusion and Future work Totally 519 images of Trees, Bushes, Stones and Humans are taken using Kinect camera. Based on these images by eliminating background, 25 features are extracted, converted to Normalized of zero and one and divided 46

Figure 27: Classification performance on extraced features Table 21: Comparison between classification performances of features before and after applying dimensionality reduction methods

48 Figure 27: Classification performance on extraced features Table 21: Comparison between classification performances of features before and after applying dimensionality reduction methods Classification Performance # of Feaures Color based 84% 17 Shape based 71% 8 Color and Shape based 82% 25 HOG 80% 300 Harris 75% 1 PCA 69% 340-by-60 GPLVM 63% 340-by-60 Feed Forward 94% 9 Backward Reduction 94% 6 KSDR 90% 5 Table 22: Dimensionalitty reduction methods, their classification performance and number of remaining features - PCA GPLVM Backward Reduction Feed Forward KSDR Performance 69% 63% 94% 94% 90% # of Features 340-by by

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality