ALBERT-LUDWIGS-UNIVERSITÄT FREIBURG INSTITUT FÜR INFORMATIK

Size: px
Start display at page:

Download "ALBERT-LUDWIGS-UNIVERSITÄT FREIBURG INSTITUT FÜR INFORMATIK"

Transcription

1 ALBERT-LUDWIGS-UNIVERSITÄT FREIBURG INSTITUT FÜR INFORMATIK Lehrstuhl für Mustererkennung und Bildverarbeitung Prof. Dr. Hans Burkhardt Comparison of Content-Based Image Retrieval using one-class and two-class SVM Studienarbeit Julia Ick Mai 2004 August 2004

2 Erklärung Hiermit erkläre ich, daß die vorliegende Arbeit von mir selbständig und nur unter Verwendung der aufgeführten Hilfsmittel erstellt wurde. Freiburg, den

3 Contents 1 Motivation 2 2 Support Vector Machines Two-Class SVM One-Class SVM Similarity Measures Euclidian Distance Histogram Intersection Image Retrieval with similarity measures Kernel Functions Common Kernel Functions The Histogram Intersection Kernel Relevance Feedback with SVMs Using a Two-Class SVM Using a One-Class SVM Using a Similarity Measure Relevance ranking induced by the histogram intersection kernel 20 7 Implementation 22 8 Results Tests with the MPEG-7 Content Set Tests with the Benchathlon image collection Conclusions

4 Chapter 1 Motivation Nowadays people are confronted more and more with very large multimedia data sets. For example, the amount of images found in the world wide web increased immensely in the past years and will keep growing in the future. To cope with these tremendous sets, it is necessary to develop good systems to search multimedia data. In the following we will concentrate on image search systems. One way to construct an image search system is to label each image with associated data, like the lename, the date of creation or descriptive words. Then it is possible to search for an image by searching in the associated data. A disadvantage of this method is, that a lot of work with labeling has to be done before the image search database can be used. But the main problem with this method is, that it is very hard for the user of the search system to describe the image he is looking for in a way that the computer has a chance to give the desired result. Descriptions of the same image by dierent users will correspond in the fewest cases. Image descriptions are very subjective. Another way to create an image search system is to search on the basis of the content of the image. Search systems that use this method are called Content-Based-Image Retrieval (CBIR) systems. The main idea is, that the user uploads or selects an image and requests the computer to nd the most similar images in the database. To make the comparison of images more simple, a small number of numerical, easy to detect low level features are used instead of the raw image data. Low level features are features like color histogram, textures, shapes and edges. A problem is that the user most likely is not able to express an image with low-level features alone. The user will need the help of high level features, which can not be detected by the computer. For example, what if the user wants to search for images with ying or sitting birds with no preference for the color of the feathers? How can this target concept be described alone with low level features? A major goal for image search technics would be to nd a way to ll the gap between the perception of humans and computers. In the meantime we have to try to capture the user s target image concept with computer detectable low-level features as good as possible. Because our CBIR system will be confronted with dierent users with dierent search interests, the system has to be able to learn. It is important that the system tries to catch the target image concept quickly and with high accuracy. Learning means that the system nds and weights the features which best describe the user s desired target image. 2

5 CHAPTER 1. MOTIVATION 3 Before choosing a learning method we have to dene how many relevance classes should be distinguished. Here we will treat CBIR as a two-class classication problem. That means that an image will either be relevant or irrelevant. To achieve a high accuracy, the system has to see a sucient amount of training examples. In the case of image retrieval the examples have to be provided by the user. One way to provide the training examples is that the user uploads them. But this is very time consuming and supposes that the user has a set of examples at hand. A much easier way, which we will be using here, is to present a small number of images from the database to the user for labeling. A system which improves the search result iteratively by asking the user to label a set of query images is called a Relevance Feedback system. The images for the query can be picked randomly or purposely by the system. Systems which pick their training examples randomly are called passive learners and systems which pick their training examples purposely are called active learners. A passive learner might take a long time and a lot of examples to learn the target image concept. No user has the patience to label more than a few dozen images, so it might be better to chose the images on purpose in some way. Now, what are the best images for query? Training examples are good if they minimize the space of all possible target image concepts. Consequently the most informative images for the query round are those that restrict the space of all possible target image concepts the most. The strategy used by our relevance feedback system to learn the target image concept is as follows. Firstly the user is presented a small set of images for labeling. It is assumed that at least one image of the set is relevant. An example of an already labeled initial query set of our CBIR system is shown in gure 1.1. The blue frame marks the chosen relevant image and all unmarked images are thought to be irrelevant. When the user is done with labeling, the system uses this information to compute the image concept, which it thinks is the most possible one. After that, the current best result images and the images of the second query round are presented to the user. The query set will consist of the most informative images. Figure 1.2 shows in the upper part the 20 best result images after our example initial query round from gure 1.2. The images of the second query round are presented in the lower part. Based on the user s answers the representation of the systems image concept can be updated. The best results after the second query round of our example can be viewed in gure 1.3. These query rounds are repeated until the user thinks the system is good enough. Which machine learning method should we use in the relevance feedback CBIR system? Most easy to detect low level features are numerical and therefore an image can be represented as a vector of numerical values. Every image belongs to one of two classes: relevant or irrelevant. Therefore a possible learning method to use are Support Vector Machines (SVM). Two dierent types of SVMs have already been successfully tested in CBIR relevance feedback systems. One of the two SVM types is the well known two-class SVM, which tries to separate the training data with a hyperplane. This kind of SVM has been used for image retrieval by S.Tong and E.Chang [5]. The second type tested is the less well known one-class SVM, which tries to t a tight hypersphere around the positive training examples. This type of SVM was introduced for image retrieval by Y.Chen [4].

6 CHAPTER 1. MOTIVATION 4 Figure 1.1: Screenshot of our CBIR system after labeling the rst set of query images

7 CHAPTER 1. MOTIVATION 5 Figure 1.2: Results after the initial query round

8 CHAPTER 1. MOTIVATION 6 Figure 1.3: Results after the second query round

9 CHAPTER 1. MOTIVATION 7 But which of these two SVM types is the better type for image search? The relevant images are only a very small fraction of the image database. One might assume that the relevant images cluster in the feature space in a certain way. The irrelevant images will certainly not cluster and might be all around the cluster of relevant images. In this case one might think that a one-class SVM separates the two classes better, because it can easily t a hypersphere around the cluster. A two-class SVM with a linear kernel will surely fail to separate the two classes, but maybe a rbf kernel will do? In the following it will be tried to compare these two SVM types.

10 Chapter 2 Support Vector Machines Support Vector Machines were rst introduced by Vapnik. They have a good generalization ability and can easily be applied to classication problems of the following form. Assume that each data instance can be represented as a vector x R n and that each instance belongs to one of two classes. Lets call them the positive and the negative class. The instances of the positive class are labeled with 1 and the instances of the negative class with -1. L = { 1, 1} is the set of possible labels. Given a set of training examples {(x 1,y 1 ),...,(x l,y l )} with (x i,y i ) R n L. The problem now is to nd a function that will assign any unlabeled instance to the correct class. In the following it will be shown how the two SVM types, one-class and two-class, will handel this problem. 2.1 Two-Class SVM Figure 2.1: Positive (yellow) and negative (blue) training examples Figure 2.2: Separating hyperplane of the two-class SVM Two-class SVMs solve the classication problem by nding a maximal margin hyperplane that separates the positive training instances from the negative ones. All positive training instances will lie on one and all negative training instances will lie on the other side of the hyperplane. The training instances that lie closest to the hyperplane are called support vectors. In most cases the training instances are not linearly separable in the original feature space R n. In this case the training instances can be transformed nonlinearly into a higher 8

11 CHAPTER 2. SUPPORT VECTOR MACHINES 9 dimensional feature space F with a mapping φ : R n F x φ(x) In the higher dimensional feature space the instances can be separated much easier. The classication function is f(x) = sgn(w φ(x) + b). Instances mapped to F lying on the positive side of the hyperplane w φ(x) + b = 0 are classied as members of the positive class and the ones on the other side are classied as members of the negative class. It is easy to see that members of F appear only in dot products. Any algorithm that uses dot products can be performed implicitly in F using a kernel function. Instead of mapping a instance to the possibly very high dimensional vector space F and performing a dot product there, a kernel function that only calculates with vectors of R n can be used. This saves a lot of computational time. Another advantage of using a kernel function is that the decision function can be calculated even then when the mapping φ can not be described analytically. The decision function can be rewritten as l f(x) = sgn( y i α i k(x i,x) + b) i=1 To calculate b R and α 1,..., α l 0 the following quadratic optimization problem has to be solved: subject to l maximize W(α) = α i 1 l α i α j y i y j k(x i,x j ) 2 i=1 i,j=1 l α i y i = 0, 0 α i C, i = 1,..., l i=1 The parameter C is called cost and is used to regulate the trade o between having a SVM that classies the training instances all correct and a SVM that allows outliers. The higher the value for C the more classication errors are allowed on the training set. By allowing more classication errors the SVM has the chance to choose a more simple decision boundary and to avoid overtting. Only the instances nearest to the decision boundary, the support vectors, are needed to dene the boundary. These instances will have an nonzero α i. All other instances will have α i = 0 and are irrelevant for the classication problem. 2.2 One-Class SVM One-class SVMs solve the classication problem by nding the smallest hypersphere which contains most of the positive training instances. The information of negative training instances is completely ignored while calculating the hypersphere. The hypersphere should be as small as possible to minimize the risk of including negative instances. All instances inside the "ball" will be classi- ed as positive and all instances outside the ball will be classied as negative.

12 CHAPTER 2. SUPPORT VECTOR MACHINES 10 Figure 2.3: Positive (yellow) and negative (blue) training examples Figure 2.4: Separating hypersphere of the one-class SVM The hypersphere need not contain all positive training instances. Training instances may contain noise, therefore outliers should be detected and singled out. According to the case of the two-class SVMs, the training instances x can be projected nonlinearly with a mapping φ(x) into a higher dimensional feature space F and the hypersphere can be calculated there. This will achieve a more complex decision boundary in the original feature space. The goal is to compute a hypersphere which is as small as possible while at the same time it contains most of the l positive training instances. This can be formulated in a primal form as: min R ζ i R R,ζ R l,c F νl i subject to φ(x i ) c 2 R 2 + ζ i, ζ i 0, i = 1,...l Figure 2.5: Enclosing hypersphere of a one-class SVM with ν = 0.1 Figure 2.6: Enclosing hypersphere of a one-class SVM with ν = 0.8 The ζ i s are slack variables to denote the distance of an instance from the ball. They are used to penalize outliers. If ζ i > 0 then the positive training instance x i is detected as an outlier and lies outside of the hypersphere with the radius R. To set the trade o between the radius of the ball and the number of training instances it encloses the parameter ν [0,1] is used. If ν is chosen to be small,

13 CHAPTER 2. SUPPORT VECTOR MACHINES 11 then the hypersphere is allowed to grow so that more training instances can be put into the ball. If ν is chosen to be large,then the hypersphere should be kept small while allowing that a fraction of the training instances lie outside. The primal form of the optimization problem can be transformed into a dual form using Lagrangian multipliers. The corresponding Lagrange function is: L(R, ζ,c,α) = R l l ζ i + α i ( φ(x i ) c 2 R 2 ζ i ) νl i=1 i=1 with α i 0 This function has to be minimized. For the minimum the following conditions have to hold: L R L c l l = 0 2R 2R α i = 0 α i = 1 i=1 i=1 l li=1 α i φ(x i ) l = 0 α i (2c 2φ(x i )) = 0 c = li=1 c = α i φ(x i ) i=1 α i i=1 The center c is completely determined by α alone. The radius R is also not independent. So, the Lagrange function can be constructed only consisting of the independent variables ζ and α. L(ζ, α) = 1 l l l ζ i + α i ( φ(x i ) α i φ(x i ) 2 ζ i ) νl i=1 i=1 i=1 = 1 l ζ i νl i=1 l l l ] + α i [φ(x i ) φ(x i ) + α j α k φ(x j ) φ(x k ) 2 α j φ(x j ) φ(x i ) i=1 j,k=1 j=1 l α i ζ i i=1 = 1 l l l l ζ i + α i φ(x i ) φ(x i ) α i α j φ(x i ) φ(x j ) α i ζ i νl i=1 i=1 i,j=1 i=1 Now L should be minimized with respect to the ζ i s and with subject to ζ i > 0. So either L/ ζ i = 0 if such a point exists, or ζ i = 0 and L/ ζ i > 0. L(ζ, α) ζ i = 1 νl α i 0 α i 1 νl Now L can be rewritten without the ζ i`s. l l L(x,α) = α i φ(x i ) φ(x i ) α 1 α j φ(x i ) φ(x j ) i=1 i,j=1

14 CHAPTER 2. SUPPORT VECTOR MACHINES 12 This leads to the following dual form: min α α i α j k(x i,x j ) i,j i α i k(x i,x i ) subject to 0 α i 1 νl, α i = 1 i The optimal α s can be computed by solving this dual problem with the help of a QP optimization method. After that the center of the hypersphere can be calculated, if the mapping φ(x) is known: c = i α i φ(x i ) But the mapping φ(x) will be unknown in most cases. The decision function f(x) = sgn(r 2 φ(x i ) c 2 ) can be computed without the center using the corresponding kernel function: f(x) = sgn(r 2 i,j α i α j k(x i,x j ) + 2 i α i k(x i,x) k(x,x)) The support vectors are those instances x i with 0 < α i < 1/(νl), the x i s with α i = 1 νl are the outliers and the x i s with α i = 0 are the instances lying truly inside the ball. The radius R is computed such that all support vectors lie on the hull of the hypersphere. This is the case, if for all support vectors the argument of the sgn is zero. The distance of an instance to the center, which will be needed for ranking the images later on, can be calculated in the following way: d(x) = α i α j k(x i,x j ) 2 i,j i α i k(x i,x) + k(x,x)

15 Chapter 3 Similarity Measures Image Retrieval is more of a ranking problem than a classication problem. It is not enough to label 100 images as relevant and 9900 as irrelevant in a database of images. It is more important that the images of the database are ranked according to their relevance and that the n-most relevant images are returned. For the ranking a method for measuring the similarity of two images is needed. Two of many possible similarity measures are the euclidian distance, which will be introduced in section 3.1, and histogram intersection, which will be discussed in section Euclidian Distance The euclidian distance is the distance measure induced by the L 2 -norm. Let x and y be the feature vectors of length n of two images. The euclidian distance of the two images is dened as: ( n L 2 (x, y) = x i y i 2) 1/2 i=1 The smaller the euclidian distance of two images the more similar they are thought of. 3.2 Histogram Intersection Histogram intersection is a method to measure the similarity of two images with respect to their colors. Lets denote the histograms of the images A im and B im with A and B. Let both images consist of N pixels and let both histograms consist of m bins. The i-th bin (i=1,...,m) of the histogram A is denoted with A i, and the i-th bin of the histogram B is denoted with B i. It holds, that m A i = N and i=1 m B i = N i=1 Now, the histogram intersection is dened as m K int (A,B) = min(a i,b i ) i=1 13

16 CHAPTER 3. SIMILARITY MEASURES 14 The higher the histogram intersection value of two images, the greater is the common part of the histograms, and the more similar are both images. If the sum of the histogram bins is normalized to one, the highest similarity value of two images will also be one. Histogram intersection is closely related to the L 1 -norm. Let x and y be the feature vectors of length n of two images. Then the L 1 -distance is dened as ( n ) L 1 (x, y) = x i y i Histogram intersection is related to the L 1 -distance in the following way: i=1 K int (x, y) = 1 L 1(x, y) Image Retrieval with similarity measures Lets imagine that our image retrieval system is based on similarity measures. How are the most relevant images computed when only one relevant labeled training image is given? In this case the system will calculate the similarity values of all images of the database and will return the n-most similar images. If more than one relevant example are given by the user, a reference point for the similarity comparison has to be chosen. Normally the mean of the feature vectors of the relevant example images is used. This method for nding a reference point treats each example the same and is not able to detect outliers. Another method, which we will be using in our CBIR system, is to train a linear one-class SVM with the relevant feature vectors and then use the center of the hypersphere as the reference point. The center can be calculated because the training data is not projected into a higher dimensional feature space. After the reference point has been calculated all images can be compared to it using the selected similarity measure and a relevance ranking can be computed.

17 Chapter 4 Kernel Functions The CBIR relevance feedback system has been tested with both SVM types and with varios kernel functions. All, except one, are commonly known kernel functions. The common kernels will be listed in the next section and thereafter the histogram intersection kernel will be introduced. 4.1 Common Kernel Functions Four of the ve kernel functions we use in our relevance feedback CBIR system are: linear kernel: k(x i,x j ) = (x i x j ) polynomial kernel: k(x i,x j ) = (γ(x i x j ) + coef0) d, γ > 0 radial basis function(rbf)kernel: k(x i,x j ) = exp( γ x i x j 2, γ > 0 sigmoid kernel: k(x i,x j ) = tanh(γ(x i x j ) + coef0) The occurring kernel parameter are γ, the degree d and a coecient coef The Histogram Intersection Kernel In the previous section histogram intersection was introduced as a method for measuring the similarity of two images. Annalisa Barla [6] has shown that histogram intersection is a mercer kernel, i.e. it yields hyperplanes with a guaranteed maximum margin in the mapped feature space. This will be shown by constructing a mapping from a feature space R n to a higher dimensional feature space F, where histogram intersection will be a dot product. Let A im and B im be images with N pixels and the corresponding histograms are A and B. Each histogram has m bins. A is now mapped to an N m-dimensional binary vector A dened as A 1 {}}{ A = ( 1, 1,...,1, 0,...,0, }{{} N A 1 A 2 {}}{ 1,1,...,1, 0,.., 0,..., }{{} N A 2 A m {}}{ 1, 1,...,1, 0,..,0 ) }{{} N A m B is mapped to B similarly. The histogram intersection K int (A,B) is now equal to the standard inner product between the two vectors A and B: K int (A,B) = A B 15

18 CHAPTER 4. KERNEL FUNCTIONS 16 Now it has been proven, that histogram intersection is a positive denite kernel function. Histogram intersection will be the fth kernel used in our CBIR system.

19 Chapter 5 Relevance Feedback with SVMs How will the relevance feedback part of the CBIR system be realized with the help of the support vector machines? SVMs are binary classiers which separate the relevant images from the irrelevant images with a boundary in a higher dimensional feature space. The relevant images lie on one side and the irrelevant images lie on the other side. When asked to classify an unlabeled instance, the SVM normally gives back only the computed class label. But the SVM can be changed slightly, so that a numerical value is returned instead. How this numerical value will look like depends on the SVM type that is used. This will be discussed for both SVM types in the sections 5.1 and 5.2. Nevertheless the numerical value given back by the SVM will induce a relevance order on all images in the database. The n-most relevant images can now be easily determined and be presented to the user as the result. For the next feedback round the n-most informative images have to be determined. This can be also be done with the help of the numerical value given back by the SVM. After the labeling is nished, the newly labeled instances are added to the old training set and the SVM is trained with this new set. Instead of SVMs the relevance feedback system can use any similarity measure to give a relevance order on the image database. This is discussed further in section Using a Two-Class SVM Figure 5.1: Separating hyperplane of a linear two-class SVM Figure 5.2: Relevance ranking of all points 17

20 CHAPTER 5. RELEVANCE FEEDBACK WITH SVMS 18 Two-class SVMs separate the relevant images from the irrelevant images with a hyperplane in a higher dimensional feature space. The most informative images for the feedback round are those which have the greatest inuence on the position of the separating hyperplane. The only instances which inuence the position of the hyperplane are the support vectors. The nearer an unlabeled image is to the hyperplane, the higher is the possibility that it is a support vector of the new SVM, if labeled and added to the training set. Therefore the n images with the shortest distance to the separating hyperplane are chosen for the feedback round. Because negative instances will have negative distances, the absolute value has to be used. The most relevant images are those on the positive side of the hyperplane that have the highest possibility of lying on the correct side. This is shown in gure 5.2. The whiter a point is, the more relevant it is. The corresponding separating hyperplane was created by a two-class SVM with a linear kernel and is shown in gure 5.1. Therefore the n-most relevant images for the result can be determined by sorting the images according to their distance to the hyperplane and taking those with the greatest distance. Instances on the negative side will have negative distances and therefore have smaller values than any positive one. The SVM classier can easily be changed to give back the signed distance instead of the class label. 5.2 Using a One-Class SVM One-class SVMs capture the distribution of the relevant images using hyperspheres in a high dimensional feature space. The most informative images for the feedback round are those which have the greatest inuence on the center and radius of the separating hypersphere. The instances which possibly have the greatest inuence are those which lie nearest to the center of the hypersphere. Therefore the n images with the shortest distance to the center of the separating hypersphere are chosen for the feedback round. The images lying inside the hypersphere are more probable to be relevant. Therefore the n- most relevant pictures for the result are those closest to the center. Figure 5.3 shows the separating hypersphere of a one-class SVM using the linear kernel. The ranking of the points induced by this hypersphere is illustrated in gure Figure 5.3: Separating hypersphere of a linear one-class SVM Figure 5.4: Relevance ranking of all points

21 CHAPTER 5. RELEVANCE FEEDBACK WITH SVMS The lighter a point is colored, the more relevant is the point. The SVM classier can easily be changed to give back the distance instead of the class label. 5.3 Using a Similarity Measure As mentioned before, the reference point of comparison is computed by a oneclass SVM. Therefore the most informative images are those that are most informative to the one-class SVM and that are those which lie nearest to the center of the hypersphere. Instead of the distance to the center, the similarity to the center will be used to rank the images of the database. In the feedback round and as the result the user will be presented the n-most similar images to the center. In our relevance feedback CBIR system we used histogram intersection as similarity measure. One reason for our choice is that in previous studies, e.g. in [9], L 1 related distances like histogram intersection have been found to be better than L 2 distances when dealing with histogram based feature vectors.

22 Chapter 6 Relevance ranking induced by the histogram intersection kernel In the previous section it was already shown in gures how the relevance ranking of two dimensional points induced by one-class and two-class SVMs with the linear kernel look like. Now the same is done for one-class and twoclass SVMs using the histogram intersection kernel. Figure 6.1 illustrates the relevance ranking induced by a one-class SVM, which was trained with only one positive example point. This example point is chosen by the SVM to be the center of the hypersphere. The white area in gure 6.2 represents the points, which have a distance of 0.5 or less to the center when transformed into the higher dimensional feature space. Figures 6.3 and 6.4 show the same for a one-class SVM trained with two positive examples. Figure 6.5 shows the decision boundary of a one-class SVM in the upper left part and the decision boundary of a two-class SVM in the lower left part. Both SVMs used the histogram intersection kernel and were trained with a cluster of positive and surrounding negative examples. The induced relevance rankings of both SVMs are illustrated in the right half of gure 6.5. Figure 6.1: Relevance ranking induced by a one-class SVM trained with only one positive example Figure 6.2: Area of the points having a distance of 0.5 or less to the center of the hypersphere 20

23 CHAPTER 6. RELEVANCE RANKING 21 Figure 6.3: Relevance ranking induced by a one-class SVM trained with only two positive examples Figure 6.4: Area of the points having a distance of 0.5 or less to the center of the hypersphere Figure 6.5: Decision boundarys (left) and relevance ranking (right) induced by a one-class SVM (top) and a two-class SVM (bottom)

24 Chapter 7 Implementation For training and classication of a SVM each image of the database has to be represented by a numerical vector of image features. Here a invariant feature histogram with = 512 bins was chosen. A comprehensive description of this invariant feature histogram can be found in the dissertation of S. Siggelkow [7]. For the implementation of the SVMs the library LIBSVM [1] was used. LIB- SVM supports two-class SVMs, but not the here dened one-class SVMs. Therefore a additional SVM type "One-Class-Ball" had to be added. The decomposition method implemented in LIBSVM solves quadratic problems of the form: 1 min α 2 αt Qα + p T α subject to y T α =, 0 α t C, t = 1,..., l where y t = ±1,t = 1,..., l and Q is a matrix with Q ij = k(x i,x j ). As seen before, the dual form of the one-class SVM optimization problem is: min α i α j k(x i,x j ) α i k(x i,x i ) α i subject to i,j 0 α i 1 νl, α i = 1 To solve this problem with the decomposition method of LIBSVM this dual form has to be transformed. The rst thing that has to be done, is to scale the whole problem such that 0 α i 1 holds instead of 0 α i 1 νl : subject to 1 min α (νl) 2 α i α j k(x i,x j ) 1 α i k(x i,x i ) νl i,j 0 α i 1, i i α i = νl i After that the rst condition can be transformed to 1 min α 2 i,j α i α j k(x i,x j ) νl 2 22 α i k(x i,x i ) i

25 CHAPTER 7. IMPLEMENTATION 23 This can be rewritten as subject to with: min α : 1 2 αt Qα + p T α y T α =, 0 α t C, t = 1,..., l p = (p 1,..., p l ) with p i = νl 2 k(x i,x i ) = 1 C = 1 matrix Q with Q ij = k(x i,x j ) Now the optimization problem is in the needed form to be solved by the decomposition method of LIBSVM. Another SVM type "Histogram Intersection" was added to LIBSVM to support the learning method described in chapter 3. Furthermore LIBSVM only supports the commonly known kernel functions, so the histogram intersection kernel had to implemented, too. To realize the relevance feedback part of the CBIR system an additional function was added to LIBSVM that returns... the distance from the hyperplane, in the case of two-class SVM the distance from the center of the hypersphere, in the case of one-class SVM the result of the histogram intersection, in the case of histogram intersection... is used to rank the images for the feedback round and to determine the most relevant images. The distance returned for two-class SVM is signed. This means that for instances of the positive class positive distances and for negative instances negative distances are returned. The signed distance is needed for the computation of the most relevant images. For the selection of the most informative images the absolute value of the signed distance is used. The relevance feedback system for content based image search can be used through a PHP web interface. Some screen shots of the web interface can be viewed in gures The user can choose between two dierent sized image databases. With the help of a little menu, the SVM and all its necessary parameters can be selected. Alternatively histogram intersection can be chosen as the used learning method. After that the user is presented 12 random images from the selected database and is asked to mark all relevant images. The page will then reload and present the current 20 best results in the upper part and the 12 next query images of the second feedback round in the lower part. An example for an image retrieval query is shown in gures 7.1 and 7.2. The two images from gure 7.2 were the only ones of the 12 initial query images that were selected as relevant and were used to train a one-class SVM with

26 CHAPTER 7. IMPLEMENTATION 24 Figure 7.1: Example of a set of relevant training images histogram intersection kernel. The 20 most relevant images resulting from training the SVM can be found in gure 7.2. The user can take part in as many feedback rounds as he wants to. After each feedback round the user has the opportunity to compare his selected SVM with a new SVM with dierent parameters. The new SVM will be trained with the same images as the old one. To make a comparison of the results possible, the result images of the new SVM will be presented in a new window. Another nice feature is that after each feedback round a graph, which will show the number of relevant images as a function of the number of returned images, can be computed. Therefore the 100 best results images will be presented to the user for labeling. For comparison this statistic can be computed, with the help of the user, for dierent learning methods and then be shown as dierent colored lines in a single graph.

27 CHAPTER 7. IMPLEMENTATION Figure 7.2: The 20 most relevant images gained through training the CBIR system with a one-class SVM using only the two relevant examples from gure 7.1

28 Chapter 8 Results The CBIR relevance feedback system has been tested with labeled data of the MPEG-7 Content Set 1 consisting of 2500 images and Benchathlon image database [8]with 4500 images. For the MPEG-7 Content Set 15 images and for the Benchathlon collection 11 images were chosen randomly. After that, nonexpert users generated lists of relevant images for each selected image. For each relevant image set a number of dierent SVMs were trained with the same training data, consisting of positive instances out of the relevant set and randomly picked negative images from the complementary set. After each SVM training all images of the database were ranked according to their relevance. A ideal learner would retrieve all relevant images of the set before retrieving any irrelevant ones. To compare the dierent learning methods a precisionrecall graph is used. The precision is dened as the ratio of the number of retrieved relevant images to the total number of retrieved images. Recall is the proportion of the retrieved relevant images from the total number of relevant images in the collection. The precision-recall graph is dened as the precision as a function of the recall. In the ideal case the precision will be one for all recall values. In the next two subsection two-class and one-class SVMs with dierent kernel functions will be compared. The precision-recall graphs shown are the averages over all 11 or 15 relevance sets. 8.1 Tests with the MPEG-7 Content Set The best kernel function for two-class SVMs The goal of the rst experiment was to nd out which kernel function for twoclass SVM is the best for image retrieval. The SVMs were trained only with the initial training data, which consisted of 3 relevant and 5 irrelevant images. No additional feedback rounds were allowed. For each kernel individual tests were made to nd the best parameters. After that all kernels with their best parameters were compared. The best parameter of the kernel functions can be found in table 8.1 and the result of the comparison is shown in gure 8.1. As can be seen, the kernel with the best performance on the MPEG Content Set is the L 1 distance based histogram intersection kernel. The second best is the 1 We acknowledge Tristan Savatier, Aljandro Jaimes, and the Department of Water Resources, California, for providing them under the Licensing Agreement for the MPEG-7 Content Set (MPEG 98/N2466). 26

29 CHAPTER 8. RESULTS 27 Figure 8.1: Comparison of kernels for two-class SVMs kernel cost γ coef0 degree linear sigmoid polynomial rbf intersection Table 8.1: Parameter of the best two-class SVMs

30 CHAPTER 8. RESULTS 28 Figure 8.2: Comparison of kernels for one-class SVMs kernel ν γ coef0 degree linear sigmoid polynomial rbf intersection Table 8.2: Parameter of the best one-class SVMs rbf kernel, which is based on the L 2 distance. This result was expected, because in previous studies, e.g. in [9], L 1 distances have been found better than L 2 distances when dealing with histogram based feature vectors. Furthermore, all other kernels seem to be very inecient compared to the intersection and rbf kernel The best kernel function for one-class SVMs The same experiment was now carried out for one-class SVMs. The SVMs were again just trained with the initial training data, which consisted this time of only 3 relevant images. Negative instances in the training set have no inuence on one-class SVMs. Again no additional feedback rounds were allowed. The best parameters for the kernel functions, which again were determined through individual tests, are shown in table 8.3. The result of the comparison of the best kernel functions is presented in gure 8.2. Even when using one-class SVMs as the learning method for the CBIR relevance feedback system, histogram intersection is clearly the best kernel function. But this time the only kernel that performs badly is the sigmoid kernel. All other kernel functions perform equally well, but not as good as the intersection kernel.

31 CHAPTER 8. RESULTS 29 Figure 8.3: Comparison of one-class and two-class SVM Figure 8.4: Comparison of both SVM types after 6 feedback rounds Comparison of two-class and one-class SVMs In the next step the best two-class SVM was compared to the best one-class SVM. For both SVM types those were the SVMs using the histogram intersection kernel. The exact parameters can be looked up in the tables 8.1 and 8.3. Additionally histogram intersection was added as an alternative learning method to the comparison of the two SVMs. Again only 3 relevant and 5 irrelevant images were used for training and no additional feedback rounds were allowed. As can be seen in gure 8.3, the one-class SVM performs best, followed closely by histogram intersection Comparison of two-class and one-class SVMs after 6 query rounds How do the three learning methods perform when allowed to make up to 6 feedback rounds? In the rst feedback round the three learners will be trained with the same 3 relevant and 5 irrelevant images. Each following feedback round consisted of 12 query images. The result of the comparison after the 6 feedback rounds is shown in gure 8.4. Now, the two-class SVM is the best learner, followed by the one-class SVM. A possible reason that the twoclass SVM is the best, is that it is the only learner of the tree that uses the whole information from the feedback rounds. The one-class SVM and histogram intersection ignore completely the information from the irrelevant labeled feedback images Improvement of a two-class SVM with relevance feedback Another nice thing to know is, how a SVM improves after each relevance feedback round. Figure 8.5 shows the improvement of a two-class SVM with the histogram intersection kernel. In the rst round the SVM was trained with one relevant and 5 irrelevant images. After each feedback round the precisionrecall graph was updated. The SVM was trained until 6 feedback rounds were performed. It can be seen that the SVM improves, as expected, with each passed feedback round.

32 CHAPTER 8. RESULTS 30 Figure 8.5: Improvement of a twoclass SVM with relevance feedback Figure 8.6: Improvement of a twoclass SVM without relevance feedback Improvement of a two-class SVM without relevance feedback How does the SVM improve if not the most informative, but random images are used in the query round? The results can be seen in gure 8.6. As expected, the improvement is not nearly as good. 8.2 Tests with the Benchathlon image collection Figure 8.7: Comparison of SVMs The best kernel function for one-class and two-class SVMs The rst thing done with the Benchathlon image collection, is to determine the best one-class and two-class SVM for image retrieval. Therefore the SVMs were trained only with the initial training set, consisting of 3 relevant and 5 irrelevant images. No additional feedback rounds were allowed. For each SVM

33 CHAPTER 8. RESULTS 31 SVM type kernel cost ν γ one-class intersection one-class rbf two-class intersection two-class rbf Table 8.3: Parameter of the best one-class SVMs Improvement of a one- Figure 8.8: class SVM Improvement of a two- Figure 8.9: class SVM type and each kernel individual tests were made to nd the best parameters. After that all kernels with their best parameters were compared. The performances of the best two two-class SVMs, the best two one-class SVMs and histogram intersection can be found in gure 8.7. The corresponding parameters are shown in table 8.3. When comparing the curves of the MPEG Content Set with the curves of the Benchathlon image collection one can see that the results of the CBIR system using the Benchathlon database are not as good as the results using the MPEG Content Set. A reason therefore is that the Benchathlon database, with 4500 images, is larger than the MPEG Content Set, which consists of When using the Benchathlon the proportion of the relevant images to the irrelevant images is much smaller and the possibility that the CBIR system returns a irrelevant image before returning all relevant images is much higher Comparison of two-class and one-class SVMs This time the best two SVMs are the two-class SVM and the one-class SVM, both with the histogram intersection kernel. One may say that the two-class SVM is even slightly better. Now, the performance level of Histogram intersection is nowhere near the levels of the best two SVMs Improvement and comparison of one-class and two-class SVMs after 6 query rounds In gure 8.8 and gure 8.9 the improvement after 6 relevance feedback rounds of the one-class SVM and the two-class SVM, both using the histogram intersection kernel, is shown. Both SVMs were trained in the rst feedback round

34 CHAPTER 8. RESULTS 32 Figure 8.10: Comparison of both SVM types after 6 feedback rounds with 3 relevant and 5 irrelevant images. It can be easily seen that the twoclass SVM improves more during the feedback rounds. One reason for this is, that the two-class SVM makes use of the information given from the irrelevant labeled instances. In gure 8.10 the performance of both SVM types after 6 feedback rounds is compared. As expected, the two-class SVM preforms much better than the one-class SVM. 8.3 Conclusions When using image features based on histogram intersection the best one-class SVM and the best two-class SVM are those using histogram intersection as the kernel function. But which SVM type should be used for image retrieval in which cases? If the user is asked to provide the training examples by uploading a small number of relevant images, then a one-class SVM would be the best choice. In this case a two-class SVM will fail, because it needs at least one negative training example. Using a one-class SVM is also a good choice when the user is asked only once to label a small set of images. As seen in the previous chapter, the improvement of the performance from feedback round to feedback round of a two-class SVM is a lot better than the performance improvement of a one-class SVM. After several feedback rounds the two-class SVM is clearly the best learning method. Therefore we would expect, when using a CBIR with multiple query rounds, that a two-class SVM will perform better than a one-class SVM.

35 Bibliography [1] Chih-Chung Chang and Chih-Jen Lin, LIBSVM: a library for support vector machines, 2001, Software available at [2] Schölkopf, B., J.C. Platt, J. Shawe-Taylor, A.J. Smola and R.C. Williamson: Estimating the support of a high-dimensional distribution. Technical report No.(87) (1999) [3] Y. Rui,T. Huang, M. Ortega, S. Mehrotra: Relevance feedback: A power tool in interactive content-based image retrival. IEEE Trans. on Circuits and Systems for Video Technology 8(5) (Sep. 1998): [4] Chen, Y., et al, "One-class SVM for Learning in Image Retrival",IEEE Intl Conf. on Image Proc. (ICIP 2001), Thessaloniki, Greece, October 7-10, 2001 [5] S. Tong and E- Chang. Support vector machine active learning for image retrival, In ACM International Conference on Multimedia, pages ; Otawa, Canada, September 2001 [6] A. Barla, E. Franceschi, F. Odone and A. Verri: Image kernels, In Proceedings of the International Workshop on Pattern Recognition with Support Vector Machines, satellite event of ICPR 2002, LNCS 2388, p. 83, [7] S. Siggelkow. Feature Historgrams for Content-Based Image Retrieval. PhD thesis, Albert-Ludwigs-Universität Freiburg, December 2002 [8] Benchathlon home page: [9] O. Chapelle, P. Haner, and V. Vapnik. Svms for histogrambased image classication. IEEE Transactions on Neural Networks, accepted, special issue on Support Vectors. 33

A Short SVM (Support Vector Machine) Tutorial

A Short SVM (Support Vector Machine) Tutorial A Short SVM (Support Vector Machine) Tutorial j.p.lewis CGIT Lab / IMSC U. Southern California version 0.zz dec 004 This tutorial assumes you are familiar with linear algebra and equality-constrained optimization/lagrange

More information

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007,

More information

Bagging for One-Class Learning

Bagging for One-Class Learning Bagging for One-Class Learning David Kamm December 13, 2008 1 Introduction Consider the following outlier detection problem: suppose you are given an unlabeled data set and make the assumptions that one

More information

Feature scaling in support vector data description

Feature scaling in support vector data description Feature scaling in support vector data description P. Juszczak, D.M.J. Tax, R.P.W. Duin Pattern Recognition Group, Department of Applied Physics, Faculty of Applied Sciences, Delft University of Technology,

More information

Support Vector Machines

Support Vector Machines Support Vector Machines RBF-networks Support Vector Machines Good Decision Boundary Optimization Problem Soft margin Hyperplane Non-linear Decision Boundary Kernel-Trick Approximation Accurancy Overtraining

More information

Lab 2: Support Vector Machines

Lab 2: Support Vector Machines Articial neural networks, advanced course, 2D1433 Lab 2: Support Vector Machines March 13, 2007 1 Background Support vector machines, when used for classication, nd a hyperplane w, x + b = 0 that separates

More information

12 Classification using Support Vector Machines

12 Classification using Support Vector Machines 160 Bioinformatics I, WS 14/15, D. Huson, January 28, 2015 12 Classification using Support Vector Machines This lecture is based on the following sources, which are all recommended reading: F. Markowetz.

More information

Data Analysis 3. Support Vector Machines. Jan Platoš October 30, 2017

Data Analysis 3. Support Vector Machines. Jan Platoš October 30, 2017 Data Analysis 3 Support Vector Machines Jan Platoš October 30, 2017 Department of Computer Science Faculty of Electrical Engineering and Computer Science VŠB - Technical University of Ostrava Table of

More information

A Practical Guide to Support Vector Classification

A Practical Guide to Support Vector Classification A Practical Guide to Support Vector Classification Chih-Wei Hsu, Chih-Chung Chang, and Chih-Jen Lin Department of Computer Science and Information Engineering National Taiwan University Taipei 106, Taiwan

More information

Support Vector Machines

Support Vector Machines Support Vector Machines . Importance of SVM SVM is a discriminative method that brings together:. computational learning theory. previously known methods in linear discriminant functions 3. optimization

More information

All lecture slides will be available at CSC2515_Winter15.html

All lecture slides will be available at  CSC2515_Winter15.html CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 9: Support Vector Machines All lecture slides will be available at http://www.cs.toronto.edu/~urtasun/courses/csc2515/ CSC2515_Winter15.html Many

More information

SUPPORT VECTOR MACHINES

SUPPORT VECTOR MACHINES SUPPORT VECTOR MACHINES Today Reading AIMA 18.9 Goals (Naïve Bayes classifiers) Support vector machines 1 Support Vector Machines (SVMs) SVMs are probably the most popular off-the-shelf classifier! Software

More information

SUPPORT VECTOR MACHINES

SUPPORT VECTOR MACHINES SUPPORT VECTOR MACHINES Today Reading AIMA 8.9 (SVMs) Goals Finish Backpropagation Support vector machines Backpropagation. Begin with randomly initialized weights 2. Apply the neural network to each training

More information

Support Vector Machines

Support Vector Machines Support Vector Machines RBF-networks Support Vector Machines Good Decision Boundary Optimization Problem Soft margin Hyperplane Non-linear Decision Boundary Kernel-Trick Approximation Accurancy Overtraining

More information

Machine Learning for NLP

Machine Learning for NLP Machine Learning for NLP Support Vector Machines Aurélie Herbelot 2018 Centre for Mind/Brain Sciences University of Trento 1 Support Vector Machines: introduction 2 Support Vector Machines (SVMs) SVMs

More information

J. Weston, A. Gammerman, M. Stitson, V. Vapnik, V. Vovk, C. Watkins. Technical Report. February 5, 1998

J. Weston, A. Gammerman, M. Stitson, V. Vapnik, V. Vovk, C. Watkins. Technical Report. February 5, 1998 Density Estimation using Support Vector Machines J. Weston, A. Gammerman, M. Stitson, V. Vapnik, V. Vovk, C. Watkins. Technical Report CSD-TR-97-3 February 5, 998!()+, -./ 3456 Department of Computer Science

More information

Support vector machines

Support vector machines Support vector machines When the data is linearly separable, which of the many possible solutions should we prefer? SVM criterion: maximize the margin, or distance between the hyperplane and the closest

More information

A Summary of Support Vector Machine

A Summary of Support Vector Machine A Summary of Support Vector Machine Jinlong Wu Computational Mathematics, SMS, PKU May 4,2007 Introduction The Support Vector Machine(SVM) has achieved a lot of attention since it is developed. It is widely

More information

Classification by Support Vector Machines

Classification by Support Vector Machines Classification by Support Vector Machines Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Practical DNA Microarray Analysis 2003 1 Overview I II III

More information

Support Vector Machines.

Support Vector Machines. Support Vector Machines srihari@buffalo.edu SVM Discussion Overview 1. Overview of SVMs 2. Margin Geometry 3. SVM Optimization 4. Overlapping Distributions 5. Relationship to Logistic Regression 6. Dealing

More information

Classification by Support Vector Machines

Classification by Support Vector Machines Classification by Support Vector Machines Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Practical DNA Microarray Analysis 2003 1 Overview I II III

More information

More on Learning. Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization

More on Learning. Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization More on Learning Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization Neural Net Learning Motivated by studies of the brain. A network of artificial

More information

Support Vector Machines

Support Vector Machines Support Vector Machines SVM Discussion Overview. Importance of SVMs. Overview of Mathematical Techniques Employed 3. Margin Geometry 4. SVM Training Methodology 5. Overlapping Distributions 6. Dealing

More information

The Effects of Outliers on Support Vector Machines

The Effects of Outliers on Support Vector Machines The Effects of Outliers on Support Vector Machines Josh Hoak jrhoak@gmail.com Portland State University Abstract. Many techniques have been developed for mitigating the effects of outliers on the results

More information

A New Fuzzy Membership Computation Method for Fuzzy Support Vector Machines

A New Fuzzy Membership Computation Method for Fuzzy Support Vector Machines A New Fuzzy Membership Computation Method for Fuzzy Support Vector Machines Trung Le, Dat Tran, Wanli Ma and Dharmendra Sharma Faculty of Information Sciences and Engineering University of Canberra, Australia

More information

Support Vector Machines (a brief introduction) Adrian Bevan.

Support Vector Machines (a brief introduction) Adrian Bevan. Support Vector Machines (a brief introduction) Adrian Bevan email: a.j.bevan@qmul.ac.uk Outline! Overview:! Introduce the problem and review the various aspects that underpin the SVM concept.! Hard margin

More information

Bagging and Boosting Algorithms for Support Vector Machine Classifiers

Bagging and Boosting Algorithms for Support Vector Machine Classifiers Bagging and Boosting Algorithms for Support Vector Machine Classifiers Noritaka SHIGEI and Hiromi MIYAJIMA Dept. of Electrical and Electronics Engineering, Kagoshima University 1-21-40, Korimoto, Kagoshima

More information

Data Mining: Concepts and Techniques. Chapter 9 Classification: Support Vector Machines. Support Vector Machines (SVMs)

Data Mining: Concepts and Techniques. Chapter 9 Classification: Support Vector Machines. Support Vector Machines (SVMs) Data Mining: Concepts and Techniques Chapter 9 Classification: Support Vector Machines 1 Support Vector Machines (SVMs) SVMs are a set of related supervised learning methods used for classification Based

More information

SUPPORT VECTOR MACHINE ACTIVE LEARNING

SUPPORT VECTOR MACHINE ACTIVE LEARNING SUPPORT VECTOR MACHINE ACTIVE LEARNING CS 101.2 Caltech, 03 Feb 2009 Paper by S. Tong, D. Koller Presented by Krzysztof Chalupka OUTLINE SVM intro Geometric interpretation Primal and dual form Convexity,

More information

Kernel SVM. Course: Machine Learning MAHDI YAZDIAN-DEHKORDI FALL 2017

Kernel SVM. Course: Machine Learning MAHDI YAZDIAN-DEHKORDI FALL 2017 Kernel SVM Course: MAHDI YAZDIAN-DEHKORDI FALL 2017 1 Outlines SVM Lagrangian Primal & Dual Problem Non-linear SVM & Kernel SVM SVM Advantages Toolboxes 2 SVM Lagrangian Primal/DualProblem 3 SVM LagrangianPrimalProblem

More information

Local Linear Approximation for Kernel Methods: The Railway Kernel

Local Linear Approximation for Kernel Methods: The Railway Kernel Local Linear Approximation for Kernel Methods: The Railway Kernel Alberto Muñoz 1,JavierGonzález 1, and Isaac Martín de Diego 1 University Carlos III de Madrid, c/ Madrid 16, 890 Getafe, Spain {alberto.munoz,

More information

Content-based image and video analysis. Machine learning

Content-based image and video analysis. Machine learning Content-based image and video analysis Machine learning for multimedia retrieval 04.05.2009 What is machine learning? Some problems are very hard to solve by writing a computer program by hand Almost all

More information

.. Spring 2017 CSC 566 Advanced Data Mining Alexander Dekhtyar..

.. Spring 2017 CSC 566 Advanced Data Mining Alexander Dekhtyar.. .. Spring 2017 CSC 566 Advanced Data Mining Alexander Dekhtyar.. Machine Learning: Support Vector Machines: Linear Kernel Support Vector Machines Extending Perceptron Classifiers. There are two ways to

More information

SVM in Analysis of Cross-Sectional Epidemiological Data Dmitriy Fradkin. April 4, 2005 Dmitriy Fradkin, Rutgers University Page 1

SVM in Analysis of Cross-Sectional Epidemiological Data Dmitriy Fradkin. April 4, 2005 Dmitriy Fradkin, Rutgers University Page 1 SVM in Analysis of Cross-Sectional Epidemiological Data Dmitriy Fradkin April 4, 2005 Dmitriy Fradkin, Rutgers University Page 1 Overview The goals of analyzing cross-sectional data Standard methods used

More information

SVM Toolbox. Theory, Documentation, Experiments. S.V. Albrecht

SVM Toolbox. Theory, Documentation, Experiments. S.V. Albrecht SVM Toolbox Theory, Documentation, Experiments S.V. Albrecht (sa@highgames.com) Darmstadt University of Technology Department of Computer Science Multimodal Interactive Systems Contents 1 Introduction

More information

FERDINAND KAISER Robust Support Vector Machines For Implicit Outlier Removal. Master of Science Thesis

FERDINAND KAISER Robust Support Vector Machines For Implicit Outlier Removal. Master of Science Thesis FERDINAND KAISER Robust Support Vector Machines For Implicit Outlier Removal Master of Science Thesis Examiners: Dr. Tech. Ari Visa M.Sc. Mikko Parviainen Examiners and topic approved in the Department

More information

Support Vector Machines

Support Vector Machines Support Vector Machines About the Name... A Support Vector A training sample used to define classification boundaries in SVMs located near class boundaries Support Vector Machines Binary classifiers whose

More information

A Practical Guide to Support Vector Classification

A Practical Guide to Support Vector Classification Support Vector Machines 1 A Practical Guide to Support Vector Classification Chih-Jen Lin Department of Computer Science National Taiwan University Talk at University of Freiburg, July 15, 2003 Support

More information

Perceptron Learning Algorithm (PLA)

Perceptron Learning Algorithm (PLA) Review: Lecture 4 Perceptron Learning Algorithm (PLA) Learning algorithm for linear threshold functions (LTF) (iterative) Energy function: PLA implements a stochastic gradient algorithm Novikoff s theorem

More information

Semi-supervised learning and active learning

Semi-supervised learning and active learning Semi-supervised learning and active learning Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Combining classifiers Ensemble learning: a machine learning paradigm where multiple learners

More information

CSE 417T: Introduction to Machine Learning. Lecture 22: The Kernel Trick. Henry Chai 11/15/18

CSE 417T: Introduction to Machine Learning. Lecture 22: The Kernel Trick. Henry Chai 11/15/18 CSE 417T: Introduction to Machine Learning Lecture 22: The Kernel Trick Henry Chai 11/15/18 Linearly Inseparable Data What can we do if the data is not linearly separable? Accept some non-zero in-sample

More information

Optimal Separating Hyperplane and the Support Vector Machine. Volker Tresp Summer 2018

Optimal Separating Hyperplane and the Support Vector Machine. Volker Tresp Summer 2018 Optimal Separating Hyperplane and the Support Vector Machine Volker Tresp Summer 2018 1 (Vapnik s) Optimal Separating Hyperplane Let s consider a linear classifier with y i { 1, 1} If classes are linearly

More information

HW2 due on Thursday. Face Recognition: Dimensionality Reduction. Biometrics CSE 190 Lecture 11. Perceptron Revisited: Linear Separators

HW2 due on Thursday. Face Recognition: Dimensionality Reduction. Biometrics CSE 190 Lecture 11. Perceptron Revisited: Linear Separators HW due on Thursday Face Recognition: Dimensionality Reduction Biometrics CSE 190 Lecture 11 CSE190, Winter 010 CSE190, Winter 010 Perceptron Revisited: Linear Separators Binary classification can be viewed

More information

May 1, CODY, Error Backpropagation, Bischop 5.3, and Support Vector Machines (SVM) Bishop Ch 7. May 3, Class HW SVM, PCA, and K-means, Bishop Ch

May 1, CODY, Error Backpropagation, Bischop 5.3, and Support Vector Machines (SVM) Bishop Ch 7. May 3, Class HW SVM, PCA, and K-means, Bishop Ch May 1, CODY, Error Backpropagation, Bischop 5.3, and Support Vector Machines (SVM) Bishop Ch 7. May 3, Class HW SVM, PCA, and K-means, Bishop Ch 12.1, 9.1 May 8, CODY Machine Learning for finding oil,

More information

Second Order SMO Improves SVM Online and Active Learning

Second Order SMO Improves SVM Online and Active Learning Second Order SMO Improves SVM Online and Active Learning Tobias Glasmachers and Christian Igel Institut für Neuroinformatik, Ruhr-Universität Bochum 4478 Bochum, Germany Abstract Iterative learning algorithms

More information

Linear Models. Lecture Outline: Numeric Prediction: Linear Regression. Linear Classification. The Perceptron. Support Vector Machines

Linear Models. Lecture Outline: Numeric Prediction: Linear Regression. Linear Classification. The Perceptron. Support Vector Machines Linear Models Lecture Outline: Numeric Prediction: Linear Regression Linear Classification The Perceptron Support Vector Machines Reading: Chapter 4.6 Witten and Frank, 2nd ed. Chapter 4 of Mitchell Solving

More information

ALBERT-LUDWIGS-UNIVERSITÄT FREIBURG INSTITUT FÜR INFORMATIK

ALBERT-LUDWIGS-UNIVERSITÄT FREIBURG INSTITUT FÜR INFORMATIK ALBERT-LUDWIGS-UNIVERSITÄT FREIBURG INSTITUT FÜR INFORMATIK Lehrstuhl für Mustererkennung und Bildverarbeitung Fast Support Vector Machine Classification of very large Datasets Technical Report 2/07 Karina

More information

DM6 Support Vector Machines

DM6 Support Vector Machines DM6 Support Vector Machines Outline Large margin linear classifier Linear separable Nonlinear separable Creating nonlinear classifiers: kernel trick Discussion on SVM Conclusion SVM: LARGE MARGIN LINEAR

More information

Relevance Feedback for Content-Based Image Retrieval Using Support Vector Machines and Feature Selection

Relevance Feedback for Content-Based Image Retrieval Using Support Vector Machines and Feature Selection Relevance Feedback for Content-Based Image Retrieval Using Support Vector Machines and Feature Selection Apostolos Marakakis 1, Nikolaos Galatsanos 2, Aristidis Likas 3, and Andreas Stafylopatis 1 1 School

More information

Support Vector Machines.

Support Vector Machines. Support Vector Machines srihari@buffalo.edu SVM Discussion Overview. Importance of SVMs. Overview of Mathematical Techniques Employed 3. Margin Geometry 4. SVM Training Methodology 5. Overlapping Distributions

More information

Lecture 10: SVM Lecture Overview Support Vector Machines The binary classification problem

Lecture 10: SVM Lecture Overview Support Vector Machines The binary classification problem Computational Learning Theory Fall Semester, 2012/13 Lecture 10: SVM Lecturer: Yishay Mansour Scribe: Gitit Kehat, Yogev Vaknin and Ezra Levin 1 10.1 Lecture Overview In this lecture we present in detail

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines CS 536: Machine Learning Littman (Wu, TA) Administration Slides borrowed from Martin Law (from the web). 1 Outline History of support vector machines (SVM) Two classes,

More information

An User Preference Information Based Kernel for SVM Active Learning in Content-based Image Retrieval

An User Preference Information Based Kernel for SVM Active Learning in Content-based Image Retrieval An User Preference Information Based Kernel for SVM Active Learning in Content-based Image Retrieval Hua Xie and Antonio Ortega Integrated Media Systems Center and Signal and Image Processing Institute

More information

DECISION-TREE-BASED MULTICLASS SUPPORT VECTOR MACHINES. Fumitake Takahashi, Shigeo Abe

DECISION-TREE-BASED MULTICLASS SUPPORT VECTOR MACHINES. Fumitake Takahashi, Shigeo Abe DECISION-TREE-BASED MULTICLASS SUPPORT VECTOR MACHINES Fumitake Takahashi, Shigeo Abe Graduate School of Science and Technology, Kobe University, Kobe, Japan (E-mail: abe@eedept.kobe-u.ac.jp) ABSTRACT

More information

Kernels + K-Means Introduction to Machine Learning. Matt Gormley Lecture 29 April 25, 2018

Kernels + K-Means Introduction to Machine Learning. Matt Gormley Lecture 29 April 25, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Kernels + K-Means Matt Gormley Lecture 29 April 25, 2018 1 Reminders Homework 8:

More information

LECTURE 5: DUAL PROBLEMS AND KERNELS. * Most of the slides in this lecture are from

LECTURE 5: DUAL PROBLEMS AND KERNELS. * Most of the slides in this lecture are from LECTURE 5: DUAL PROBLEMS AND KERNELS * Most of the slides in this lecture are from http://www.robots.ox.ac.uk/~az/lectures/ml Optimization Loss function Loss functions SVM review PRIMAL-DUAL PROBLEM Max-min

More information

Introduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others

Introduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others Introduction to object recognition Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others Overview Basic recognition tasks A statistical learning approach Traditional or shallow recognition

More information

Lecture 9: Support Vector Machines

Lecture 9: Support Vector Machines Lecture 9: Support Vector Machines William Webber (william@williamwebber.com) COMP90042, 2014, Semester 1, Lecture 8 What we ll learn in this lecture Support Vector Machines (SVMs) a highly robust and

More information

Efficient Case Based Feature Construction

Efficient Case Based Feature Construction Efficient Case Based Feature Construction Ingo Mierswa and Michael Wurst Artificial Intelligence Unit,Department of Computer Science, University of Dortmund, Germany {mierswa, wurst}@ls8.cs.uni-dortmund.de

More information

Support Vector Machines

Support Vector Machines Support Vector Machines VL Algorithmisches Lernen, Teil 3a Norman Hendrich & Jianwei Zhang University of Hamburg, Dept. of Informatics Vogt-Kölln-Str. 30, D-22527 Hamburg hendrich@informatik.uni-hamburg.de

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Xiaojin Zhu jerryzhu@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [ Based on slides from Andrew Moore http://www.cs.cmu.edu/~awm/tutorials] slide 1

More information

Image retrieval based on bag of images

Image retrieval based on bag of images University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2009 Image retrieval based on bag of images Jun Zhang University of Wollongong

More information

Support Vector Machines and their Applications

Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of Technology Kanpur. Summer School on Expert Systems And Their Applications, Indian Institute of Information Technology

More information

Table of Contents. Recognition of Facial Gestures... 1 Attila Fazekas

Table of Contents. Recognition of Facial Gestures... 1 Attila Fazekas Table of Contents Recognition of Facial Gestures...................................... 1 Attila Fazekas II Recognition of Facial Gestures Attila Fazekas University of Debrecen, Institute of Informatics

More information

Leave-One-Out Support Vector Machines

Leave-One-Out Support Vector Machines Leave-One-Out Support Vector Machines Jason Weston Department of Computer Science Royal Holloway, University of London, Egham Hill, Egham, Surrey, TW20 OEX, UK. Abstract We present a new learning algorithm

More information

Fast Support Vector Machine Classification of Very Large Datasets

Fast Support Vector Machine Classification of Very Large Datasets Fast Support Vector Machine Classification of Very Large Datasets Janis Fehr 1, Karina Zapién Arreola 2 and Hans Burkhardt 1 1 University of Freiburg, Chair of Pattern Recognition and Image Processing

More information

Kernel Methods & Support Vector Machines

Kernel Methods & Support Vector Machines & Support Vector Machines & Support Vector Machines Arvind Visvanathan CSCE 970 Pattern Recognition 1 & Support Vector Machines Question? Draw a single line to separate two classes? 2 & Support Vector

More information

On Combining One-Class Classifiers for Image Database Retrieval

On Combining One-Class Classifiers for Image Database Retrieval On Combining One-Class Classifiers for Image Database Retrieval Carmen Lai 1,DavidM.J.Tax 2,RobertP.W.Duin 3,Elżbieta P ekalska 3,and Pavel Paclík 3 1 DIEE, University of Cagliari, Sardinia, Italy carmen@ph.tn.tudelft.nl

More information

GENDER CLASSIFICATION USING SUPPORT VECTOR MACHINES

GENDER CLASSIFICATION USING SUPPORT VECTOR MACHINES GENDER CLASSIFICATION USING SUPPORT VECTOR MACHINES Ashwin Swaminathan ashwins@umd.edu ENEE633: Statistical and Neural Pattern Recognition Instructor : Prof. Rama Chellappa Project 2, Part (a) 1. INTRODUCTION

More information

Lecture 7: Support Vector Machine

Lecture 7: Support Vector Machine Lecture 7: Support Vector Machine Hien Van Nguyen University of Houston 9/28/2017 Separating hyperplane Red and green dots can be separated by a separating hyperplane Two classes are separable, i.e., each

More information

Lecture 10: Support Vector Machines and their Applications

Lecture 10: Support Vector Machines and their Applications Lecture 10: Support Vector Machines and their Applications Cognitive Systems - Machine Learning Part II: Special Aspects of Concept Learning SVM, kernel trick, linear separability, text mining, active

More information

Lecture Linear Support Vector Machines

Lecture Linear Support Vector Machines Lecture 8 In this lecture we return to the task of classification. As seen earlier, examples include spam filters, letter recognition, or text classification. In this lecture we introduce a popular method

More information

Some Advanced Topics in Linear Programming

Some Advanced Topics in Linear Programming Some Advanced Topics in Linear Programming Matthew J. Saltzman July 2, 995 Connections with Algebra and Geometry In this section, we will explore how some of the ideas in linear programming, duality theory,

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Isabelle Guyon Notes written by: Johann Leithon. Introduction The process of Machine Learning consist of having a big training data base, which is the input to some learning

More information

Supervised vs. Unsupervised Learning. Supervised vs. Unsupervised Learning. Supervised vs. Unsupervised Learning. Supervised vs. Unsupervised Learning

Supervised vs. Unsupervised Learning. Supervised vs. Unsupervised Learning. Supervised vs. Unsupervised Learning. Supervised vs. Unsupervised Learning Overview T7 - SVM and s Christian Vögeli cvoegeli@inf.ethz.ch Supervised/ s Support Vector Machines Kernels Based on slides by P. Orbanz & J. Keuchel Task: Apply some machine learning method to data from

More information

Learning texture similarity with perceptual pairwise distance

Learning texture similarity with perceptual pairwise distance University of Wollongong Research Online Faculty of Engineering and Information Sciences - Papers: Part A Faculty of Engineering and Information Sciences 2005 Learning texture similarity with perceptual

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Chapter 9 Chapter 9 1 / 50 1 91 Maximal margin classifier 2 92 Support vector classifiers 3 93 Support vector machines 4 94 SVMs with more than two classes 5 95 Relationshiop to

More information

Data mining with Support Vector Machine

Data mining with Support Vector Machine Data mining with Support Vector Machine Ms. Arti Patle IES, IPS Academy Indore (M.P.) artipatle@gmail.com Mr. Deepak Singh Chouhan IES, IPS Academy Indore (M.P.) deepak.schouhan@yahoo.com Abstract: Machine

More information

KBSVM: KMeans-based SVM for Business Intelligence

KBSVM: KMeans-based SVM for Business Intelligence Association for Information Systems AIS Electronic Library (AISeL) AMCIS 2004 Proceedings Americas Conference on Information Systems (AMCIS) December 2004 KBSVM: KMeans-based SVM for Business Intelligence

More information

Chap.12 Kernel methods [Book, Chap.7]

Chap.12 Kernel methods [Book, Chap.7] Chap.12 Kernel methods [Book, Chap.7] Neural network methods became popular in the mid to late 1980s, but by the mid to late 1990s, kernel methods have also become popular in machine learning. The first

More information

Instance-based Learning

Instance-based Learning Instance-based Learning Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 19 th, 2007 2005-2007 Carlos Guestrin 1 Why not just use Linear Regression? 2005-2007 Carlos Guestrin

More information

Lab 2: Support vector machines

Lab 2: Support vector machines Artificial neural networks, advanced course, 2D1433 Lab 2: Support vector machines Martin Rehn For the course given in 2006 All files referenced below may be found in the following directory: /info/annfk06/labs/lab2

More information

Rule extraction from support vector machines

Rule extraction from support vector machines Rule extraction from support vector machines Haydemar Núñez 1,3 Cecilio Angulo 1,2 Andreu Català 1,2 1 Dept. of Systems Engineering, Polytechnical University of Catalonia Avda. Victor Balaguer s/n E-08800

More information

Efficient Tuning of SVM Hyperparameters Using Radius/Margin Bound and Iterative Algorithms

Efficient Tuning of SVM Hyperparameters Using Radius/Margin Bound and Iterative Algorithms IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 5, SEPTEMBER 2002 1225 Efficient Tuning of SVM Hyperparameters Using Radius/Margin Bound and Iterative Algorithms S. Sathiya Keerthi Abstract This paper

More information

One-class Problems and Outlier Detection. 陶卿 中国科学院自动化研究所

One-class Problems and Outlier Detection. 陶卿 中国科学院自动化研究所 One-class Problems and Outlier Detection 陶卿 Qing.tao@mail.ia.ac.cn 中国科学院自动化研究所 Application-driven Various kinds of detection problems: unexpected conditions in engineering; abnormalities in medical data,

More information

Training Data Selection for Support Vector Machines

Training Data Selection for Support Vector Machines Training Data Selection for Support Vector Machines Jigang Wang, Predrag Neskovic, and Leon N Cooper Institute for Brain and Neural Systems, Physics Department, Brown University, Providence RI 02912, USA

More information

Summarizing Inter-Query Learning in Content-Based Image Retrieval via Incremental Semantic Clustering

Summarizing Inter-Query Learning in Content-Based Image Retrieval via Incremental Semantic Clustering Summarizing Inter-Query Learning in Content-Based Image Retrieval via Incremental Semantic Clustering Iker Gondra, Douglas R. Heisterkamp Department of Computer Science Oklahoma State University Stillwater,

More information

RETIN AL: An Active Learning Strategy for Image Category Retrieval

RETIN AL: An Active Learning Strategy for Image Category Retrieval RETIN AL: An Active Learning Strategy for Image Category Retrieval Philippe-Henri Gosselin, Matthieu Cord To cite this version: Philippe-Henri Gosselin, Matthieu Cord. RETIN AL: An Active Learning Strategy

More information

9. Support Vector Machines. The linearly separable case: hard-margin SVMs. The linearly separable case: hard-margin SVMs. Learning objectives

9. Support Vector Machines. The linearly separable case: hard-margin SVMs. The linearly separable case: hard-margin SVMs. Learning objectives Foundations of Machine Learning École Centrale Paris Fall 25 9. Support Vector Machines Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech Learning objectives chloe agathe.azencott@mines

More information

Online Mathematical Symbol Recognition using SVMs with Features from Functional Approximation

Online Mathematical Symbol Recognition using SVMs with Features from Functional Approximation Online Mathematical Symbol Recognition using SVMs with Features from Functional Approximation Birendra Keshari and Stephen M. Watt Ontario Research Centre for Computer Algebra Department of Computer Science

More information

Data Mining. Lesson 9 Support Vector Machines. MSc in Computer Science University of New York Tirana Assoc. Prof. Dr.

Data Mining. Lesson 9 Support Vector Machines. MSc in Computer Science University of New York Tirana Assoc. Prof. Dr. Data Mining Lesson 9 Support Vector Machines MSc in Computer Science University of New York Tirana Assoc. Prof. Dr. Marenglen Biba Data Mining: Content Introduction to data mining and machine learning

More information

Perceptron Learning Algorithm

Perceptron Learning Algorithm Perceptron Learning Algorithm An iterative learning algorithm that can find linear threshold function to partition linearly separable set of points. Assume zero threshold value. 1) w(0) = arbitrary, j=1,

More information

Accelerometer Gesture Recognition

Accelerometer Gesture Recognition Accelerometer Gesture Recognition Michael Xie xie@cs.stanford.edu David Pan napdivad@stanford.edu December 12, 2014 Abstract Our goal is to make gesture-based input for smartphones and smartwatches accurate

More information

Support Vector Machines for Face Recognition

Support Vector Machines for Face Recognition Chapter 8 Support Vector Machines for Face Recognition 8.1 Introduction In chapter 7 we have investigated the credibility of different parameters introduced in the present work, viz., SSPD and ALR Feature

More information

Classification: Feature Vectors

Classification: Feature Vectors Classification: Feature Vectors Hello, Do you want free printr cartriges? Why pay more when you can get them ABSOLUTELY FREE! Just # free YOUR_NAME MISSPELLED FROM_FRIEND... : : : : 2 0 2 0 PIXEL 7,12

More information

Kernel Methods. Chapter 9 of A Course in Machine Learning by Hal Daumé III. Conversion to beamer by Fabrizio Riguzzi

Kernel Methods. Chapter 9 of A Course in Machine Learning by Hal Daumé III.   Conversion to beamer by Fabrizio Riguzzi Kernel Methods Chapter 9 of A Course in Machine Learning by Hal Daumé III http://ciml.info Conversion to beamer by Fabrizio Riguzzi Kernel Methods 1 / 66 Kernel Methods Linear models are great because

More information

An Introduction to Content Based Image Retrieval

An Introduction to Content Based Image Retrieval CHAPTER -1 An Introduction to Content Based Image Retrieval 1.1 Introduction With the advancement in internet and multimedia technologies, a huge amount of multimedia data in the form of audio, video and

More information

Kernel-based online machine learning and support vector reduction

Kernel-based online machine learning and support vector reduction Kernel-based online machine learning and support vector reduction Sumeet Agarwal 1, V. Vijaya Saradhi 2 andharishkarnick 2 1- IBM India Research Lab, New Delhi, India. 2- Department of Computer Science

More information

Application of Support Vector Machine Algorithm in Spam Filtering

Application of Support Vector Machine Algorithm in  Spam Filtering Application of Support Vector Machine Algorithm in E-Mail Spam Filtering Julia Bluszcz, Daria Fitisova, Alexander Hamann, Alexey Trifonov, Advisor: Patrick Jähnichen Abstract The problem of spam classification

More information

Linear methods for supervised learning

Linear methods for supervised learning Linear methods for supervised learning LDA Logistic regression Naïve Bayes PLA Maximum margin hyperplanes Soft-margin hyperplanes Least squares resgression Ridge regression Nonlinear feature maps Sometimes

More information