Semi-Supervised SVMs for Classification with Unknown Class Proportions and a Small Labeled Dataset

Size: px
Start display at page:

Download "Semi-Supervised SVMs for Classification with Unknown Class Proportions and a Small Labeled Dataset"

Transcription

1 Semi-Supervised SVMs or Classiication with Unknown Class Proportions and a Small Labeled Dataset S Sathiya Keerthi Yahoo! Labs Santa Clara, Caliornia, U.S.A selvarak@yahoo-inc.com Bigyan Bhar Indian Institute o Science Bangalore, India bigyan@csa.iisc.ernet.in Shirish Shevade Indian Institute o Science Bangalore, India shirish@csa.iisc.ernet.in S Sundararajan Yahoo! Labs Bangalore, India ssrajan@yahoo-inc.com ABSTRACT In the design o practical web page classiication systems one oten encounters a situation in which the labeled training set is created by choosing some examples rom each class; but, the class proportions in this set are not the same as those in the test distribution to which the classiier will be actually applied. The problem is made worse when the amount o training data is also small. In this paper we explore and adapt binary SVM methods that make use o unlabeled data rom the test distribution, viz., Transductive SVMs (TSVMs) and expectation regularization/constraint (ER/EC) methods to deal with this situation. We empirically show that when the labeled training data is small, TSVM designed using the class ratio tuned by minimizing the loss on the labeled set yields the best perormance; its perormance is good even when the deviation between the class ratios o the labeled training set and the test set is quite large. When the labeled training data is suiciently large, an unsupervised Gaussian mixture model can be used to get a very good estimate o the class ratio in the test set; also, when this estimate is used, both TSVM and EC/ER give their best possible perormance, with TSVM coming out superior. The ideas in the paper can be easily extended to multi-class SVMs and MaxEnt models. Categories and Subject Descriptors: I.5.2 [Pattern Recognition] Design Methodology-Classiier design and evaluation General Terms: Algorithms, Experimentation Keywords: Transductive and Semi-supervised learning, Classiication, Support Vector Machines. INTRODUCTION The problem o classiying web pages into a given set o Permissiontomakedigitalorhardcopiesoallorpartothisworkor personal or classroom use is granted without ee provided that copies are not made or distributed or proit or commercial advantage and that copies bearthisnoticeandtheullcitationontheirstpage.tocopyotherwise,to republish, to post on servers or to redistribute to lists, requires prior speciic permission and/or a ee. CIKM, October 24 28, 2, Glasgow, Scotland, UK. Copyright 2 ACM //...$.. classes arises requently in web applications. Linear classiiers such as support vector machines (SVMs) and maximum entropy (MaxEnt) models employing a rich eature representation (e.g., bag-o-words) are used successully or this purpose. When labeling is expensive one is orced to work with a small set o labeled examples or training. In such cases a linear classiier trained using only labeled data does not give a good perormance. Perormance can be signiicantly boosted by employing semi-supervised methods that make eective use o unlabeled data, which is usually available in plenty. The transductive support vector machine (TSVM) [] and MaxEnt models trained with expectation regularization (ER) [2] are good examples o such methods. Such semi-supervised methods assume the knowledge o auxiliary inormation about the underlying data such as class proportions. The perormance o semi-supervised methods such as ER and TSVM are sensitive to the class proportion values used to ind their solution; see section 4.2 or a detailed empirical analysis. In many practical situations, class proportions are unknown. Oten it also happens that the class proportions in the labeled training set are dierent rom those in the actual distribution to which the classiier will be applied. A ew examples o such scenarios are as ollows. There are tens o thousands o city aggregator sites covering categories such as dining, attractions, night lie etc. The content and vocabulary across these sites are very similar, but in each site the proportion o examples belonging to various classes can be quite dierent. So, building a separate classiier or each site is appropriate. Let us say that we are given a ew global examples o dining and non-dining pages. The aim is to develop a classiier or each site to separate dining pages rom others making use o unlabeled web pages in that site. Frequently, in classiier design, one would like to reuse labeled data rom a previous run to develop a classiier or a new distribution (e.g., a new stream o news pages); the old labeled set is still good as the union o sets o samples rom the individual classes, but class proportions are dierent. A user shows a ew relevant and irrelevant example

2 web pages and asks or inding relevant pages rom a resh collection o web pages. The main aim o this paper is to address the problem o mismatched class proportions mentioned above. Let x denote a generic example web page and c denote a class. From a probabilistic viewpoint our problem consists o dealing with a situation in which: (i) p(x c) is same in the labeled data and the actual data o interest or all c; but, (ii) {p(c)} is very dierent in the two sets. Coupled with labeled data being small the problem becomes diicult. Note that, even i labeled data is not small, extension o methods such as TSVM and ER to deal with mismatched class proportions requires careul thought because labeled data does not have inormation on actual class proportions and unlabeled data has to be suitably brought in to estimate them. There have been some previous eorts to address this problem or TSVMs [7, 4]. But these methods have some basic issues: they make adhoc changes to the training process in an active sel-learning mode and are not cleanly tied to a well-deined problem ormulation; they signiicantly deviate rom the TSVM ormulation and hence perorm quite worse than TSVMs when labeled data and the actual distribution are indeed well matched in terms o class proportions; and, they have not been demonstrated in diicult situations involving large distortions in class proportions. In this paper we empirically analyze various issues related to the problem o mismatched class proportions, propose ways o estimating actual class proportions and demonstrate their useulness. We only take up binary classiication problems in this paper; but the ideas are quite general and they have the potential or extension to multi-class settings. Since binary classiication involves only two classes (positive and negative), class proportion can be represented using a single quantity, the raction o positive examples.. Contributions o the paper In a nutshell ollowing are our main contributions, listed by order o treatment in the paper.. The ER method is introduced in the context o SVMs. In addition we introduce two related methods based on expectation constraints (EC) and simple threshold adjustment (SVMTh). 2. We empirically analyze all the methods (TSVM, ER, EC and SVMTh) in terms o their learning curves and their sensitivity to incorrect speciication o. 3. I labeled data has distorted class proportion (even severe), but the actual happens to be known, we empirically show that the methods are pretty much unaected. 4. To handle the crucial case in which the actual is unknown we propose and empirically analyze two methods or estimating the actual. The irst estimation method is based on inding the having the least labeled loss along a trajectory o solutions given by a method as is varied. The second estimation method is based on itting a mixture o two Gaussians to the output o the SVM trained using only labeled data. The irst estimation method works well with TSVM and it is suited or the situation where labeled data is small while the second estimation method is powerul and works well with all semi-supervised methods when labeled data is large. Overall, TSVM combined with the two estimation methods (suitably switched depending on the amount o labeled data available) is the top perorming method. 2. DESCRIPTION OF METHODS In this section we describe all the methods that will be analyzed in this paper. These methods are meant or binary classiication. Labeled data consists o l examples {x i, y i} l i= where the input patterns x i R d are eature vectors representing web pages and the labels y i {+, }. Semisupervised/transductive methods make use o unlabeled examples in addition; unlabeled data consists o u examples {x i} l+u i=l+. All the methods develop a linear classiication unction w T x with {x : w T x > } denoting the positive class region. Web page classiication problems involve a large eature space (d is large); the input patterns are sparse, i.e., the raction o non-zero elements in one x i is small. Unlabeled data is always a random sample picked rom the actual distribution to which the classiier will be applied; labeled data, on the other hand, may have a class proportion which is dierent rom that in the actual distribution. 2. SupervisedSVMs Supervised SVMs make use o labeled data and optimize the regularized large margin loss unction: min w Tsup = λ 2 w 2 + l l L(y i, o i) () where o i = w T x i. An example o the large margin loss unction is the squared hinge loss L(y, o) = max(, yo) 2. Alternatively one can use the hinge loss: L(y, o) = max(, yo). All the experiments in this paper are done with the squared hinge loss. Irrespective o which loss is used there exist very eicient numerical methods [, 3] or solving (); the running time o these methods is linear in the number o examples. On web page and text classiication tasks the perormance o the SVM classiier is quite steady over a large range o values o the regularization coeicient, λ. In all the experiments o the paper we use λ =. 2.2 TransductiveSVMs Transductive/Semi-Supervised SVMs (TSVMs) make eective use o unlabeled data to enhance the perormance o SVMs. These methods perorm very well on web page and text classiication problems. Even with just a ew labeled examples they can combine this labeled data with a large unlabeled set to attain a perormance equal to that o a supervised SVM designed using a large labeled set. Through unlabeled examples many eatures which are even completely absent in the labeled set end up getting very good weights. TSVMs are based on the cluster (or, low density separation) assumption which states that the decision boundary should not cross high density regions, but instead lie in low density regions. Joachims [] gave the irst eective TSVM ormulation. A key ingredient o this ormulation is that it Good sotware or supervised and transductive SVMs can be ound in and vikas.sindhwani.org/svmlin.html. i=

3 assumes, the raction o positive examples in the actual distribution is known and also the corresponding constraint is enorced in the ormulation. 2 Joachims [] solved the ollowing problem in which, apart rom w, the set o labels o unlabeled examples, {y i} are also variables: min w,{y i } T sup + u L(y i, o i) s.t. u [y i == ] = (2) where [z] is i z is true and otherwise. Unlike (), (2) is a not a convex minimization problem. In general, it is hard to solve. It has been pointed out [5] that the solution o (2) can get stuck in poor local minima. Fortunately, this is not the case in linear classiication problems involving large eature space, such as web page and text classiication. TSVMs are routinely used to solve applied problems [4, 3]. Alternative optimization iterations (ix w and optimize {y i}, then ix {y i} and optimize w) are usually used to solve (2). There also exist variations o the TSVM algorithm. For example the discrete variables in {y i} can be eliminated rom (2) to get the ollowing equivalent problem: min Tsup+ w u min{l(, o i), L(, o i)} s.t. [o i > ] = u (3) Sigmoid smoothing or other methods can be applied to write [o i > ] in a dierentiable orm. Then the solution can be approached via gradient based optimization o w. See [6, 3, 5] or methods o this kind. Most oten this method yields a perormance that is slightly better than (2). 2.3 ExpectationMethods Mann and McCallum [2] proposed the Expectation Regularization (ER) method in the context o MaxEnt models. The idea is to use expectation terms related to some domain knowledge to inluence the training process. This can be easily done with SVM models too, like we do here. In our case the domain knowledge o interest is the act that the raction o positively classiied examples in unlabeled data equals. This raction constraint can be used to inluence the solution by including an additional regularization term in the objective unction: min w Tsup + ρ 2 ( u [o i > ] ) 2 (4) As mentioned earlier with respect to (3), sigmoid smoothing can be used to make the expectation regularization term to be dierentiable so that gradient based numerical optimization techniques can be employed. In our implementation o ER we use ρ = 5 as the deault value. Later we will also study the eect o varying ρ. Note that in (4) the expectation constraint on the raction o positive examples is only approximately enorced since it is only included as a regularizer term. I the domain knowledge says that the raction constraint holds certainly then it may be better to orce the constraint rather than adding a regularizer. This leads to a new method which we call as the Expectation Constraints (EC) method. The optimiza- 2 We will reer to the constraint as the raction constraint. Also, when applied in a discrete setting (e.g., the raction constraint in (2)) it is assumed that appropriate rounding o is allowed. tion problem to be solved is: min w Tsup s.t. u [o i > ] = (5) Again, sigmoid smoothing can be used to write the raction constraint unction in a dierentiable orm. A suitable numerical method that can deal with equality constraints can be used to solve or w. In our implementation we use the Augmented Lagrangian method []. Let us introduce another expectation based baseline method that has been ignored in the literature. The method consists o taking the supervised SVM solution w (i.e., solution o ()) and adding a threshold θ to the scoring unction so that the raction o unlabeled examples that satisy w T x+θ > equals. The new classiier boundary is deined by w T x + θ =. We will reer to this method as SVMTh. 2.4 Relations between the methods It is easy to see that ER and EC methods are closely related. In particular, when the parameter ρ in (4) is made very large then ER and EC are expected to show very similar behavior. SVMTh is quite dierent rom EC although both methods enorce the raction constraint; while EC tries to balance the minimization o T sup with the raction constraint, SVMTh simply adjusts the threshold to satisy the raction constraint without worrying about its eect on T sup. I we compare (3) and (5) we see that TSVM has an extra term (loss on unlabeled data). This term turns out to be important as it helps TSVM to place the classiier boundary more precisely by making it pass through low density regions o unlabeled data; as we will see later the perormance improvement due to it is large especially when the size o labeled data is small. For MaxEnt models, Grandvalet and Bengio [9] suggested including an unlabeled loss term called entropy regularization that is similar in spirit to the unlabeled loss term in (3). However, their ormulation does not include the raction constraint. Mann and McCallum [2] conduct experiments to clearly point out the act that without this constraint this method does not do well. They also show that, i the constraint is included (even as a regularizer term) then the perormance is greatly enhanced. 3. DATASETS AND NOTATIONS All the empirical analyses in this paper are done using the ollowing ive text classiication datasets: gcat, ccat, autavn, real-sim and earn. The irst our datasets are same as the ones used in [3]; earn is a binary dataset created rom the Reuters-2578 dataset 3 by taking examples in the earn class as positive and the remaining examples as negative. Key properties o the datasets are given in Table. Each dataset is divided into three parts: labeled data, unlabeled data and test data. Unlabeled data and test data have the same class proportion; let actual denote the raction o positive examples in test/unlabeled data. Labeled data can have a dierent class proportion depending on the experiment; note that the main aim o the paper is to ind ways o dealing with a mismatch in class proportion between labeled data and the actual distribution. Let lab denote the raction o positive examples in labeled data. 3 testcollections/reuters2578/

4 Table : Properties o datasets. n : number o examples, d : data dimensionality, actual : positive class ratio. Dataset n d actual gcat ccat aut-avn real-sim earn gcat ccat Always actual is set to the raction o positive examples in the original, ull dataset. Given lab and the number o labeled examples, n lab, we create a random division o the ull dataset into test, labeled and unlabeled data as ollows. First, 5% o the examples are randomly chosen in a classstratiied ashion and kept as testing data. Let R denote the set o remaining examples. We orm labeled data by choosing n lab labˇ positive examples rom R and n lab n lab labˇ negative examples rom R. O the remaining examples, unlabeled data is randomly chosen to be the largest set obeying actual. Since the dataset sizes are large and n lab is small, the size o unlabeled data is large and it is several times larger than n lab. Perormance o the methods will be measured in terms o F score, which is the harmonic mean o precision and recall o the positive class. To ease the understanding we use consistent notations in all the igures. The ollowing colors will be used to denote the various methods: Black dotted: LSVM - SVM trained with labeled examples only; Red dotted: FullSVM - SVM trained using all unlabeled examples taking their labels as known; Blue: TSVM - Transductive SVM; Black: SVMTh - LSVM with threshold adjustment based on ; Green: ER - Expectation Regularization; and, Red: EC - Expectation Constraints. In most igures or lab is the horizontal axis. In such igures actual is shown as a vertical dotted line. In most igures the plots are averages over our random splits o a dataset into labeled, unlabeled and test data. Exceptions are Figures 3, 8, 9, and where just one random split is used. In the igures LabSize and denote, respectively, the size o labeled data, n lab and the raction o positive examples in labeled data, lab. 4. BASICEPERIMENTS Beore we go on to deal with dierences in class proportions in labeled data and unlabeled/test data it is useul to give some basic experimental results that help understand the methods. 4. LearningCurves Figure shows the learning curves (variations o perormance with n lab ) or various methods on our datasets. All methods use the value = actual. TSVM gives a much LabSize aut avn LabSize LabSize real sim 5 5 LabSize Figure : Learning curves or various datasets. All methods use = actual. stronger perormance than other methods at small values o n lab. Even at higher values o n lab it is better than or competitive with others. Whenever TSVM does signiicantly better than SVMTh it is a clear indication o the important role played by the unlabeled loss term in (2). The magnitude o perormance improvement given by TSVM over others depends on the dataset. On gcat and aut-avn TSVM is very much superior; on ccat it is quite better than others; and, on real-sim its perormance matches that o SVMTh. Ater TSVM, SVMTh is the next best perormer, ollowed by EC, ER and LSVM, in that order. 4.2 SensitivityAnalysis Since our aim is to deal with situations in which actual is unknown, it is useul to understand how the perormance o each method is aected when (the value o positive class raction used in ()-(5)) is changed away rom actual. Figure 2 describes this sensitivity or n lab = 9 and n lab = 44. We chose these n lab values as they represent the rising and stable parts o the learning curves or most (method, dataset) combinations. Let us irst make some observations or n lab = 9. All methods suer when is too much lower/higher than actual. I is suiciently ar rom actual then the perormance can degrade to be even worse than that o LSVM. The all in perormance on the lower side is sharper since recall is badly aected when is decreased rom actual and this has an adverse eect on the F score. TSVM and SVMTh attain their peak perormance at values that are close to actual. On the other hand, ER and EC attain their best perormance at values that are quite shited away rom actual. Consistently, this shiting away rom actual is such that the raction o the minority class is larger than its actual value in the unlabeled/test distribu-

5 gcat LabSize: 9 ccat LabSize: 9 aut avn LabSize: 9 real sim LabSize: gcat LabSize: ccat LabSize: aut avn LabSize: 44.5 real sim LabSize: Figure 2: Sensitivity o methods with respect to. The top row is or n lab = 9 and the bottom row is or n lab = 44. tion; in the top row o Figure 2 note the let shit o the peak or aut-avn and right shits or all other datasets. This is due to two properties associated with LSVM, o which EC and ER are modiications. First, when n lab is small the direction o the LSVM classiier makes a large angle with that o the ideal SVM classiier built using a large labeled training set. So the LSVM classiier boundary cuts through dense regions o the test set. Second, this cutting is more severe on the minority class examples due to lesser representation. 4 Thus, LSVM (which optimizes T sup) ends up choosing a classiier boundary placement in which many test examples o the minority class are pushed to the wrong side. See the let bottom plot o Figure 8 (which comes later in the paper) to conirm this or the gcat dataset. Thereore, to pull the classiier towards satisying the actual raction and hence do well on F score, EC and ER methods need to use a value (see (4) and (5)) that is higher (lower) than actual when the minority class is the positive (negative) class. The value that is needed by ER to get the best F score on the test set is arther rom actual than the value needed or EC because ER only loosely enorces the constraint. SVMTh behaves dierently because it doesn t try to balance T sup optimization and the raction constraint; it simply enorces the raction constraint in post-processing without worrying about what happens to T sup in that step. When n lab is large the behavior is dierent. The bottom row o Figure 2 gives sensitivity or LabSize=44. LSVM gives a very good perormance since labeled data set is suiciently large. The perormance o ER is close to that o 4 This can be explained as ollows. Take the case where the positive class is the minority class. Since the positive class has a smaller number o loss terms (see ()) the classiier boundary o LSVM is placed closer to the labeled examples o the positive class in order to reduce the total loss on the labeled examples o the negative class. Thereore the raction o examples classiied by LSVM as positive is less than actual. LSVM (see the green continuous and black dotted lines in the bottom row o Figure 2) because the ρ value (5) used in (4) is not strong enough to exert the raction constraint. TSVM, SVMTh and EC suer when is moved well away rom actual because they enorce the raction constraint. The observations made above imply that, or EC and ER it is a good idea to employ an value that is dierent rom actual, when n lab is small. This observation is missing in previous works on ER [2]. I the class proportions in labeled, unlabeled, and test data are all matching and n lab is not too small, it is then a good idea to use cross validation to choose to optimize perormance. I n lab is not large the Leave-One-Out method can be used or cross validation. However we do not go ahead and demonstrate this in this paper since the main ocus o the paper is to suggest ways o handling a mismatch between class proportions in labeled data and the actual distribution o interest. Figure 3 gives the sensitivity plots or the earn dataset. The experimental setup is close to that used by Zhang and Oles [5] who used the setup to point out (wrongly) that TSVM does not work well. Zhang and Oles [5] possibly optimized the TSVM objective unction in (2) without the raction constraint. Figure 3 explains what can go wrong i that is done. From the right side plot o Figure 3 it is clear that the least value o the objective unction occurs at = where the perormance is very poor. This clearly shows the important role played by the raction constraint in TSVM. Recall a similar comment that we made earlier at the end o subsection 2.4, with respect to MaxEnt models and entropy regularization. 5. DISTORTION IN CLASS PROPORTION The results o section 4 concern the case in which labeled data and test/unlabeled data have the same class proportion. We now consider situations in which the class proportions in the two sets are dierent. Sometimes it happens that even though there is a dierence in those class proportions,

6 .8.6 earn LabSize: earn :.3.5 Figure 3: Earn dataset: sensitivities o the methods with respect to with just 2 labeled examples. The let plot shows perormance o various methods. The right plot shows TSVM perormance (blue) and TSVM objective unction in equation (2) (magenta) (both are normalized to max value o to show them on the same plot). When actual is known TSVM gives an F score o.93. gcat LabSize: 9 C Adjusted gcat LabSize: Figure 4: Demonstration that cost adjusted labeled loss does not change perormance signiicantly. actual may actually be known. Such knowledge helps the methods to achieve a good perormance. In subsection 5. we take up this case. In subsection 5.2 we deal with the more diicult case in which actual is unknown. 5. Case: actual isknown Since we know actual as well as the number o positive and negative labeled examples in the labeled set, it is appropriate to modiy the SVM objective unction T sup so that the labeled loss term can be viewed as i it is the mean loss o labeled data picked (with replacement) rom the actual distribution o interest. To do this we change T sup to T Adj sup by introducing γ, a relative weighting parameter or positive labeled examples. T Adj sup is given by T Adj sup = λ 2 w 2 + N (γ i l,y i = L(y i, o i)+ i l,y i = L(y i, o i)) where γ is chosen so that γ lab = actual, lab = ( lab ) actual n p n p+n n, n p and n n are the number o positive and negative examples in labeled data, and N = n pγ + n n is the normalizer chosen to make sure that the labeled loss term is the mean loss. Note that when lab < actual, γ is bigger than, and so the weight on the positive labeled examples is increased. When lab > actual, γ is smaller than. All (6) gcat LabSize: aut avn LabSize: ccat LabSize: real sim LabSize: Figure 5: Variation o the perormance o methods with lab (LabelFrac) when actual is known and the methods use = actual. the methods can be appropriately modiied to use Tsup Adj instead o T sup. However, we ound that the perormance o the methods are little aected when Tsup Adj is used instead o T sup. Figure 4 compares the perormance or one dataset, gcat. Similar behavior is seen on other datasets. Figure 5 compares all methods on our datasets when = actual is used by the methods and lab is varied over a range o values rom to. Interestingly, TSVM, SVMTh and EC are only mildly aected by changes in lab. This is very much due to actual being known and used well by the methods via the raction constraint with = actual. On the other hand LSVM is badly aected when lab is moved away rom actual. ER suers when lab < actual. The penalty weight, ρ used in (4) plays a key role in ER. Figure 6 shows the perormance variation or a range o ρ values. When ρ is small the perormance o ER is close to that o LSVM. For large ρ its perormance is close to that o EC. Both these observations are along expected lines. For dierent values o lab the optimal ρ is dierent. 5.2 Case2: actual isunknown We propose and explore some methods or estimating actual in the presence o large variations in lab. Note that, even when labeled data is large, it cannot be used alone to get an estimate o actual since the class proportion in that set are distorted. We explore three methods or estimating actual. Let us reer to an estimate o actual as est actual. Estimation Method. Simply take est actual to be the raction o positive examples in labeled data. This is just a baseline method. The next two methods make use o unlabeled data together with labeled data. Estimation Method 2. Given labeled data we can solve () and obtain the LSVM solution w. From the good perormance o SVMTh over a large range o lab values in Figure

7 gcat LabSize: 9 ccat LabSize: 9 3 gcat LabSize: 9 3 gcat LabSize: aut avn LabSize: real sim LabSize: 9 2 SVM SVM Full SVM Full SVM SVM SVM 44 Figure 6: Variation o ER perormance with lab (Label- Frac) or various ρ values. Dashdot: ρ = 5 (gives perormance close to LSVM); Continuous: ρ = 5 (same as in Figure 5); Dotted: ρ = 5; Dashed: ρ = 5. ER with ρ 5 yields a perormance close to that o EC. EstFrac gcat LabSize Figure 7: Estimation Method 2. Mean (with error bars) o versus n lab or gcat. actual =.3 or gcat. actual est 5 we know that the classiier direction corresponding to w is good. It is appropriate to model the classiier scoring unction, o = w T x applied on unlabeled data, {o i = w T x i} as a mixture o two Gaussians: p(o i) = β g(o i; µ, σ ) + β 2g(o i; µ 2, σ 2) (7) where g( ; µ k, σ k ) is the univariate Gaussian density unction with mean µ k and standard deviation σ k, and β, β 2 are (non-negative) mixing proportion values satisying β +β 2 =. By itting this model to the data {o i} to maximize likelihood we can get the parameter values β, β 2, µ, µ 2, σ and σ 2. The Gaussian with the larger µ k represents the positive class; so, the corresponding β k can be taken as est actual, an estimate o the positive class raction. Figure 8 helps us understand the useulness o this method. When the number o labeled examples is small, w, the solution o (), makes a large angle with w, the solution associated with the ideal SVM classiier built using a large labeled training set having the actual class proportion; so the {o i = w T x i} distribution is not a clean mixture o two Gaussians, and the estimate est actual is also poor. On Figure 8: Understanding Estimation Method 2. SVM-N denotes LSVM output or N labeled examples and FullSVM is output o SVM corresponding to a very large labeled set. the other hand, when the number o labeled examples is large (closer to the higher side o the learning curve) this estimation method perorms much better. Figure 7 plots the mean (with error bars) o est actual (computed using random splits o gcat) as n lab is varied. Such a plot can help one decide when this estimation method is useul. Estimation Method 3. In this method we explore to see i some property associated with the solution o a method can be used to ind est actual. Let us take TSVM. Figure 9 gives, or gcat and ccat, the variation o the TSVM objective unction (see (2)), the Labeled Loss (the second term o T sup in ()), as well as the F score with respect to used in (2). The behavior in other datasets is similar. Clearly the TSVM objective unction is not useul or tuning. Gärtner et al [8] suggest to minimize the TSVM objective unction to tune class proportions; but Figure 9 clearly points out that it is not the right thing to do. The plots also show the goodness o minimizing the Labeled Loss as a way o obtaining est actual. 5 A rough explanation or this goodness is as ollows. Consider (2), the optimization problem associated with TSVM. End values (near and ) put heavy pressure on containing all examples on one side o the classiier boundary. With such pressure Labeled Loss becomes large or such values. Near = actual minimizing Labeled Loss, Unlabeled Loss and satisying the raction constraint are all consistent, so Labeled Loss achieves small values there. We also tried the same estimation method (i.e., minimize labeled loss along the trajectory o solutions deined by varying ) on the expectation methods. But it did not work well. Figure gives, or EC and gcat and ccat, the variation o labeled loss as well as the F score as a unction o. The minimizer o the labeled loss does not coincide quite well with the maximizer o F score. Similar behavior is observed 5 To avoid conusion we note that (2) is always the optimization problem that is solved to get the TSVM solution; minimizing the Labeled Loss that we are doing here is or tuning at a higher level treating as a hyperparameter.

8 gcat :.59 ccat :.68 gcat :.59 ccat : gcat :.45.5 ccat :.58.5 gcat :.45.5 ccat : gcat :.3.5 ccat :.47.5 gcat :.3.5 ccat : gcat :.25.5 ccat :.38.5 gcat :.25.5 ccat : gcat :.9.5 ccat :.28.5 gcat :.9.5 ccat : Figure 9: Variation o perormance (blue), TSVM objective unction in (2) (magenta) and Labeled Loss (the second term o T sup in ()) (cyan) with respect to, or gcat and ccat. The minimizer o Labeled Loss is marked by a black circle. The rows correspond to dierent raction distortions. The middle row corresponds to lab = actual. The bottom two rows correspond to lab =.8 actual and lab =.6 actual. The top two rows correspond to ( lab ) =.8( actual ) and ( lab ) =.6( actual ). Figure : EC: Variation o perormance (red) and Labeled Loss (the second term o T sup in ()) (cyan) with respect to, or gcat and ccat. The minimizer o Labeled Loss is marked by a black circle. The rows correspond to dierent raction distortions. The middle row corresponds to lab = actual. The bottom two rows correspond to lab =.8 actual and lab =.6 actual. The top two rows correspond to ( lab ) =.8( actual ) and ( lab ) =.6( actual ).

9 or SVMTh and ER. For n lab = 9 Figure shows the perormance o Estimation Method 3 with EC and SVMTh. Figure 2 compares the three estimation methods as applied to TSVM. For n lab = 9 Estimation Method 2 doesn t do even as well as Estimation Method. Estimation Method 3 is very good in a large range o lab values containing actual. It suers only at extreme lab values. When n lab is large, e.g., 44, Estimation Method 2 perorms very well. It is appropriate to switch between the two estimation methods depending on how large n lab is. Going by what we saw in Figure 7, we can use the steadiness o est actual given by Estimation Method 2 over a range o n lab values to decide when to switch to it. O course, n lab has to be reasonably big to notice a clear steady pattern. Until that size is reached it is best to use Estimation Method 3. When n lab is large LSVM itsel does quite well; since it is not dependent on it becomes the preerred method or that case. Thereore, the value o a raction estimation method should be determined by how well it perorms when labeled data is small. From that viewpoint Estimation Method 3 (as applied to TSVM) is most valuable. O all (Method, Estimation Method) combinations (TSVM, Estimation Method 3) is the one that takes the most computing time; even or that combination, the computation time or one run on any o our ive datasets is less than two minutes on a 3 GHz machine. 6. RELATEDWORK Since the irst crucial paper o Joachims [] several works (or a sample see [5, 7, 4, 6, 3, 5]) have been published extending and exploring various aspects o the irst TSVM model. O these, the works o Chen, Wang and Dong [7] and Wang and Huang [4] are the most relevant to our work since they address the problem o mismatch in class proportions. The main drawback o these methods is that they make adhoc labeling o the unlabeled examples during the training process in an active sel-learning mode and are not cleanly tied to a well-deined problem ormulation. Due to this they signiicantly deviate rom the TSVM ormulation and hence perorm quite worse than TSVMs or the normal situation in which labeled data and the actual distribution are indeed well matched in terms o class proportions. Also, the methods have been demonstrated only in situations where LSVM itsel gives decent perormance, which is not true when there are large distortions in class proportions. Mann and McCallum [2] empirically analyze ER in binary, multi-class and structured output settings in the context o MaxEnt models. ER s working and perormance in SVMs and MaxEnt models are similar. Mann and McCallum [2] do a brie study o the sensitivity o ER s perormance with respect to class proportions. But they only ocus on the case o distortion towards uniorm class proportions; in the binary case this corresponds to changing rom actual to.5. Unortunately their study is done only on one multi-class dataset. As we saw in section 4.2, ER shows interesting behavior in the binary case; in act ER improves in perormance when is moved towards.5. SVMTh is an important baseline method that is completely missed in [2]. There is a building literature on dealing with dierences in train and test distributions reerred to as covariate shit (see [2] or instance), but such papers introduce and analyze new ormulations and do not help to modiy TSVM or the expectation methods while keeping their spirit in tact. 7. CONCLUSION The main contributions o this paper are: (i) empirical analysis o TSVM and expectation methods and their sensitivity with respect to label proportions; and, (ii) proposal and evaluation o new methods or dealing with mismatches in label proportions between labeled and test sets. We have also done preliminary experiments to veriy the ideas on Least squares and MaxEnt models o binary classiication and observed behavior similar to SVM loss. With some care all the ideas can also be extended to the multi-class setting. The ideas and results o this paper are mainly or web page and text classiication. More work is needed to see i they hold on other types o problems. 8. REFERENCES [] D. P. Bertsekas and J. N. Tsitsiklis. Parallel and Distributed Computation: Numerical Methods. Athena Scientiic, 997. [2] S. Bickel, M. Brückner, and T. Scheer. Discriminative learning or diering training and test distributions. In ICML, pages 8 88, 27. [3] L. Bruzzone, M. Chi, and M. Marconcini. A novel transductive SVM or semisupervised classiication o remote-sensing images. IEEE-GRSS, , 26. [4] O. Chapelle, B. Schölkop, and A. Zien. Semi-Supervised Learning. MIT Press, 26. [5] O. Chapelle, V. Sindhwani, and S. S. Keerthi. Optimization techniques or semi-supervised support vector machines. JMLR, 9:23 233, 28. [6] O. Chapelle and A. Zien. Semi-supervised classiication by low density separation. In AISTATS, pages 57 64, 25. [7] Y. Chen, G. Wang, and S. Dong. Learning with progressive transductive support vector machine. Pattern Recognition Letters, 24: , 23. [8] T. Gärtner, Q. V. Le, S. Burton, A. J. Smola, and S. V. N. Vishwanathan. Large-scale multiclass transduction. In NIPS, 25. [9] Y. Grandvalet and Y. Bengio. Semi-supervised learning by entropy minimization. In NIPS, 24. [] T. Joachims. Transductive inerence or text classiication using support vector machines. In ICML, 999. [] T. Joachims. Training linear SVMs in linear time. In KDD, 26. [2] G. S. Mann and A. McCallum. Generalized expectation criteria or semi-supervised learning with weakly labeled data. JMLR, : , 2. [3] V. Sindhwani and S. S. Keerthi. Large scale semi-supervised linear SVMs. In SIGIR, pages , 26. [4] Y. Wang and S.-T. Huang. Training TSVM with the proper number o positive samples. Pattern Recognition Letters, 26: , 25. [5] T. Zhang and F. J. Oles. A probability analysis on the value o unlabeled data or classiication problems. In ICML, pages 9 98, 2.

10 gcat LabSize: 9 ccat LabSize: 9 aut avn LabSize: 9 real sim LabSize: gcat LabSize: 9.5 ccat LabSize: 9.5 aut avn LabSize: 9.5 real sim LabSize: Figure : Perormance o Estimation method 3 as applied to EC (top row) and SVMTh (bottom row). Estimation Method 3: Continuous. Dashed: Use = actual (Upper baseline) gcat LabSize: 9 ccat LabSize: 9 aut avn LabSize: 9 real sim LabSize: gcat LabSize: 44.5 ccat LabSize: 44.5 aut avn LabSize: 44.5 real sim LabSize: Figure 2: Comparison o actual estimation methods as applied to TSVM. Estimation Method : Dashdot. Estimation Method 2: Dotted with x. Estimation Method 3: Continuous. Dashed: Use = actual (Upper baseline)

Extension of TSVM to Multi-Class and Hierarchical Text Classification Problems With General Losses

Extension of TSVM to Multi-Class and Hierarchical Text Classification Problems With General Losses Extension of TSVM to Multi-Class and Hierarchical Text Classification Problems With General Losses S.Sathi ya Keer thi (1) S.Sundarara jan (2) Shirish Shevade (3) (1) Cloud and Information Services Lab,

More information

A Taxonomy of Semi-Supervised Learning Algorithms

A Taxonomy of Semi-Supervised Learning Algorithms A Taxonomy of Semi-Supervised Learning Algorithms Olivier Chapelle Max Planck Institute for Biological Cybernetics December 2005 Outline 1 Introduction 2 Generative models 3 Low density separation 4 Graph

More information

MAPI Computer Vision. Multiple View Geometry

MAPI Computer Vision. Multiple View Geometry MAPI Computer Vision Multiple View Geometry Geometry o Multiple Views 2- and 3- view geometry p p Kpˆ [ K R t]p Geometry o Multiple Views 2- and 3- view geometry Epipolar Geometry The epipolar geometry

More information

Neighbourhood Operations

Neighbourhood Operations Neighbourhood Operations Neighbourhood operations simply operate on a larger neighbourhood o piels than point operations Origin Neighbourhoods are mostly a rectangle around a central piel Any size rectangle

More information

9.8 Graphing Rational Functions

9.8 Graphing Rational Functions 9. Graphing Rational Functions Lets begin with a deinition. Deinition: Rational Function A rational unction is a unction o the orm P where P and Q are polynomials. Q An eample o a simple rational unction

More information

Classifier Evasion: Models and Open Problems

Classifier Evasion: Models and Open Problems Classiier Evasion: Models and Open Problems Blaine Nelson 1, Benjamin I. P. Rubinstein 2, Ling Huang 3, Anthony D. Joseph 1,3, and J. D. Tygar 1 1 UC Berkeley 2 Microsot Research 3 Intel Labs Berkeley

More information

Classifier Evasion: Models and Open Problems

Classifier Evasion: Models and Open Problems . In Privacy and Security Issues in Data Mining and Machine Learning, eds. C. Dimitrakakis, et al. Springer, July 2011, pp. 92-98. Classiier Evasion: Models and Open Problems Blaine Nelson 1, Benjamin

More information

Using a Projected Subgradient Method to Solve a Constrained Optimization Problem for Separating an Arbitrary Set of Points into Uniform Segments

Using a Projected Subgradient Method to Solve a Constrained Optimization Problem for Separating an Arbitrary Set of Points into Uniform Segments Using a Projected Subgradient Method to Solve a Constrained Optimization Problem or Separating an Arbitrary Set o Points into Uniorm Segments Michael Johnson May 31, 2011 1 Background Inormation The Airborne

More information

Semi-Supervised Clustering with Partial Background Information

Semi-Supervised Clustering with Partial Background Information Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject

More information

Intelligent Optimization Methods for High-Dimensional Data Classification for Support Vector Machines

Intelligent Optimization Methods for High-Dimensional Data Classification for Support Vector Machines Intelligent Inormation Management, 200, 2, ***-*** doi:0.4236/iim.200.26043 Published Online June 200 (http://www.scirp.org/journal/iim) Intelligent Optimization Methods or High-Dimensional Data Classiication

More information

THE FINANCIAL CALCULATOR

THE FINANCIAL CALCULATOR Starter Kit CHAPTER 3 Stalla Seminars THE FINANCIAL CALCULATOR In accordance with the AIMR calculator policy in eect at the time o this writing, CFA candidates are permitted to use one o two approved calculators

More information

Reflection and Refraction

Reflection and Refraction Relection and Reraction Object To determine ocal lengths o lenses and mirrors and to determine the index o reraction o glass. Apparatus Lenses, optical bench, mirrors, light source, screen, plastic or

More information

A Proposed Approach for Solving Rough Bi-Level. Programming Problems by Genetic Algorithm

A Proposed Approach for Solving Rough Bi-Level. Programming Problems by Genetic Algorithm Int J Contemp Math Sciences, Vol 6, 0, no 0, 45 465 A Proposed Approach or Solving Rough Bi-Level Programming Problems by Genetic Algorithm M S Osman Department o Basic Science, Higher Technological Institute

More information

CS485/685 Computer Vision Spring 2012 Dr. George Bebis Programming Assignment 2 Due Date: 3/27/2012

CS485/685 Computer Vision Spring 2012 Dr. George Bebis Programming Assignment 2 Due Date: 3/27/2012 CS8/68 Computer Vision Spring 0 Dr. George Bebis Programming Assignment Due Date: /7/0 In this assignment, you will implement an algorithm or normalizing ace image using SVD. Face normalization is a required

More information

THIN LENSES: BASICS. There are at least three commonly used symbols for object and image distances:

THIN LENSES: BASICS. There are at least three commonly used symbols for object and image distances: THN LENSES: BASCS BJECTVE: To study and veriy some o the laws o optics applicable to thin lenses by determining the ocal lengths o three such lenses ( two convex, one concave) by several methods. THERY:

More information

ROBUST FACE DETECTION UNDER CHALLENGES OF ROTATION, POSE AND OCCLUSION

ROBUST FACE DETECTION UNDER CHALLENGES OF ROTATION, POSE AND OCCLUSION ROBUST FACE DETECTION UNDER CHALLENGES OF ROTATION, POSE AND OCCLUSION Phuong-Trinh Pham-Ngoc, Quang-Linh Huynh Department o Biomedical Engineering, Faculty o Applied Science, Hochiminh University o Technology,

More information

A Novel Accurate Genetic Algorithm for Multivariable Systems

A Novel Accurate Genetic Algorithm for Multivariable Systems World Applied Sciences Journal 5 (): 137-14, 008 ISSN 1818-495 IDOSI Publications, 008 A Novel Accurate Genetic Algorithm or Multivariable Systems Abdorreza Alavi Gharahbagh and Vahid Abolghasemi Department

More information

What is Clustering? Clustering. Characterizing Cluster Methods. Clusters. Cluster Validity. Basic Clustering Methodology

What is Clustering? Clustering. Characterizing Cluster Methods. Clusters. Cluster Validity. Basic Clustering Methodology Clustering Unsupervised learning Generating classes Distance/similarity measures Agglomerative methods Divisive methods Data Clustering 1 What is Clustering? Form o unsupervised learning - no inormation

More information

Chapter 3 Image Enhancement in the Spatial Domain

Chapter 3 Image Enhancement in the Spatial Domain Chapter 3 Image Enhancement in the Spatial Domain Yinghua He School o Computer Science and Technology Tianjin University Image enhancement approaches Spatial domain image plane itsel Spatial domain methods

More information

Machine Learning. Semi-Supervised Learning. Manfred Huber

Machine Learning. Semi-Supervised Learning. Manfred Huber Machine Learning Semi-Supervised Learning Manfred Huber 2015 1 Semi-Supervised Learning Semi-supervised learning refers to learning from data where part contains desired output information and the other

More information

1 Newton Methods for Fast Solution of Semisupervised

1 Newton Methods for Fast Solution of Semisupervised 1 Newton Methods for Fast Solution of Semisupervised Linear SVMs Vikas Sindhwani vikass@cs.uchicago.edu Department of Computer Science, University of Chicago Chicago, IL 60637, USA S. Sathiya Keerthi keerthi@yahoo-inc.com

More information

Note Set 4: Finite Mixture Models and the EM Algorithm

Note Set 4: Finite Mixture Models and the EM Algorithm Note Set 4: Finite Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine Finite Mixture Models A finite mixture model with K components, for

More information

Binary Morphological Model in Refining Local Fitting Active Contour in Segmenting Weak/Missing Edges

Binary Morphological Model in Refining Local Fitting Active Contour in Segmenting Weak/Missing Edges 0 International Conerence on Advanced Computer Science Applications and Technologies Binary Morphological Model in Reining Local Fitting Active Contour in Segmenting Weak/Missing Edges Norshaliza Kamaruddin,

More information

2. Recommended Design Flow

2. Recommended Design Flow 2. Recommended Design Flow This chapter describes the Altera-recommended design low or successully implementing external memory interaces in Altera devices. Altera recommends that you create an example

More information

Reducing the Bandwidth of a Sparse Matrix with Tabu Search

Reducing the Bandwidth of a Sparse Matrix with Tabu Search Reducing the Bandwidth o a Sparse Matrix with Tabu Search Raael Martí a, Manuel Laguna b, Fred Glover b and Vicente Campos a a b Dpto. de Estadística e Investigación Operativa, Facultad de Matemáticas,

More information

Efficient Tuning of SVM Hyperparameters Using Radius/Margin Bound and Iterative Algorithms

Efficient Tuning of SVM Hyperparameters Using Radius/Margin Bound and Iterative Algorithms IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 5, SEPTEMBER 2002 1225 Efficient Tuning of SVM Hyperparameters Using Radius/Margin Bound and Iterative Algorithms S. Sathiya Keerthi Abstract This paper

More information

Experimenting with Multi-Class Semi-Supervised Support Vector Machines and High-Dimensional Datasets

Experimenting with Multi-Class Semi-Supervised Support Vector Machines and High-Dimensional Datasets Experimenting with Multi-Class Semi-Supervised Support Vector Machines and High-Dimensional Datasets Alex Gonopolskiy Ben Nash Bob Avery Jeremy Thomas December 15, 007 Abstract In this paper we explore

More information

Fig. 3.1: Interpolation schemes for forward mapping (left) and inverse mapping (right, Jähne, 1997).

Fig. 3.1: Interpolation schemes for forward mapping (left) and inverse mapping (right, Jähne, 1997). Eicken, GEOS 69 - Geoscience Image Processing Applications, Lecture Notes - 17-3. Spatial transorms 3.1. Geometric operations (Reading: Castleman, 1996, pp. 115-138) - a geometric operation is deined as

More information

Math 1314 Lesson 24 Maxima and Minima of Functions of Several Variables

Math 1314 Lesson 24 Maxima and Minima of Functions of Several Variables Math 1314 Lesson 4 Maxima and Minima o Functions o Several Variables We learned to ind the maxima and minima o a unction o a single variable earlier in the course. We had a second derivative test to determine

More information

Scalable Test Problems for Evolutionary Multi-Objective Optimization

Scalable Test Problems for Evolutionary Multi-Objective Optimization Scalable Test Problems or Evolutionary Multi-Objective Optimization Kalyanmoy Deb Kanpur Genetic Algorithms Laboratory Indian Institute o Technology Kanpur PIN 8 6, India deb@iitk.ac.in Lothar Thiele,

More information

Lagrangian relaxations for multiple network alignment

Lagrangian relaxations for multiple network alignment Noname manuscript No. (will be inserted by the editor) Lagrangian relaxations or multiple network alignment Eric Malmi Sanjay Chawla Aristides Gionis Received: date / Accepted: date Abstract We propose

More information

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.1 Cluster Analysis Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim,

More information

Computer Data Analysis and Plotting

Computer Data Analysis and Plotting Phys 122 February 6, 2006 quark%//~bland/docs/manuals/ph122/pcintro/pcintro.doc Computer Data Analysis and Plotting In this lab we will use Microsot EXCEL to do our calculations. This program has been

More information

SUPER RESOLUTION IMAGE BY EDGE-CONSTRAINED CURVE FITTING IN THE THRESHOLD DECOMPOSITION DOMAIN

SUPER RESOLUTION IMAGE BY EDGE-CONSTRAINED CURVE FITTING IN THE THRESHOLD DECOMPOSITION DOMAIN SUPER RESOLUTION IMAGE BY EDGE-CONSTRAINED CURVE FITTING IN THE THRESHOLD DECOMPOSITION DOMAIN Tsz Chun Ho and Bing Zeng Department o Electronic and Computer Engineering The Hong Kong University o Science

More information

2. Design Planning with the Quartus II Software

2. Design Planning with the Quartus II Software November 2013 QII51016-13.1.0 2. Design Planning with the Quartus II Sotware QII51016-13.1.0 This chapter discusses key FPGA design planning considerations, provides recommendations, and describes various

More information

Automated Modelization of Dynamic Systems

Automated Modelization of Dynamic Systems Automated Modelization o Dynamic Systems Ivan Perl, Aleksandr Penskoi ITMO University Saint-Petersburg, Russia ivan.perl, aleksandr.penskoi@corp.imo.ru Abstract Nowadays, dierent kinds o modelling settled

More information

Method estimating reflection coefficients of adaptive lattice filter and its application to system identification

Method estimating reflection coefficients of adaptive lattice filter and its application to system identification Acoust. Sci. & Tech. 28, 2 (27) PAPER #27 The Acoustical Society o Japan Method estimating relection coeicients o adaptive lattice ilter and its application to system identiication Kensaku Fujii 1;, Masaaki

More information

Genetic Algorithms Coding Primer

Genetic Algorithms Coding Primer Genetic Algorithms Coding Primer A.I. Khan Lecturer in Computing and Inormation Technology Heriot-Watt University, Riccarton, Edinburgh, EH14 4AS, United Kingdom Introduction In this article the concept

More information

Mixture Models and the EM Algorithm

Mixture Models and the EM Algorithm Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Finite Mixture Models Say we have a data set D = {x 1,..., x N } where x i is

More information

Thorsten Joachims Then: Universität Dortmund, Germany Now: Cornell University, USA

Thorsten Joachims Then: Universität Dortmund, Germany Now: Cornell University, USA Retrospective ICML99 Transductive Inference for Text Classification using Support Vector Machines Thorsten Joachims Then: Universität Dortmund, Germany Now: Cornell University, USA Outline The paper in

More information

Unsupervised Learning of Probabilistic Models for Robot Navigation

Unsupervised Learning of Probabilistic Models for Robot Navigation Unsupervised Learning o Probabilistic Models or Robot Navigation Sven Koenig Reid G. Simmons School o Computer Science, Carnegie Mellon University Pittsburgh, PA 15213-3891 Abstract Navigation methods

More information

Monocular 3D human pose estimation with a semi-supervised graph-based method

Monocular 3D human pose estimation with a semi-supervised graph-based method Monocular 3D human pose estimation with a semi-supervised graph-based method Mahdieh Abbasi Computer Vision and Systems Laboratory Department o Electrical Engineering and Computer Engineering Université

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Chapter 9 Chapter 9 1 / 50 1 91 Maximal margin classifier 2 92 Support vector classifiers 3 93 Support vector machines 4 94 SVMs with more than two classes 5 95 Relationshiop to

More information

Computer Data Analysis and Use of Excel

Computer Data Analysis and Use of Excel Computer Data Analysis and Use o Excel I. Theory In this lab we will use Microsot EXCEL to do our calculations and error analysis. This program was written primarily or use by the business community, so

More information

Larger K-maps. So far we have only discussed 2 and 3-variable K-maps. We can now create a 4-variable map in the

Larger K-maps. So far we have only discussed 2 and 3-variable K-maps. We can now create a 4-variable map in the EET 3 Chapter 3 7/3/2 PAGE - 23 Larger K-maps The -variable K-map So ar we have only discussed 2 and 3-variable K-maps. We can now create a -variable map in the same way that we created the 3-variable

More information

Piecewise polynomial interpolation

Piecewise polynomial interpolation Chapter 2 Piecewise polynomial interpolation In ection.6., and in Lab, we learned that it is not a good idea to interpolate unctions by a highorder polynomials at equally spaced points. However, it transpires

More information

The Curse of Dimensionality

The Curse of Dimensionality The Curse of Dimensionality ACAS 2002 p1/66 Curse of Dimensionality The basic idea of the curse of dimensionality is that high dimensional data is difficult to work with for several reasons: Adding more

More information

Estimating Human Pose in Images. Navraj Singh December 11, 2009

Estimating Human Pose in Images. Navraj Singh December 11, 2009 Estimating Human Pose in Images Navraj Singh December 11, 2009 Introduction This project attempts to improve the performance of an existing method of estimating the pose of humans in still images. Tasks

More information

CS 161: Design and Analysis of Algorithms

CS 161: Design and Analysis of Algorithms CS 161: Design and Analysis o Algorithms Announcements Homework 3, problem 3 removed Greedy Algorithms 4: Human Encoding/Set Cover Human Encoding Set Cover Alphabets and Strings Alphabet = inite set o

More information

A Formal Approach to Score Normalization for Meta-search

A Formal Approach to Score Normalization for Meta-search A Formal Approach to Score Normalization for Meta-search R. Manmatha and H. Sever Center for Intelligent Information Retrieval Computer Science Department University of Massachusetts Amherst, MA 01003

More information

Measuring the Vulnerability of Interconnection. Networks in Embedded Systems. University of Massachusetts, Amherst, MA 01003

Measuring the Vulnerability of Interconnection. Networks in Embedded Systems. University of Massachusetts, Amherst, MA 01003 Measuring the Vulnerability o Interconnection Networks in Embedded Systems V. Lakamraju, Z. Koren, I. Koren, and C. M. Krishna Department o Electrical and Computer Engineering University o Massachusetts,

More information

Second Order SMO Improves SVM Online and Active Learning

Second Order SMO Improves SVM Online and Active Learning Second Order SMO Improves SVM Online and Active Learning Tobias Glasmachers and Christian Igel Institut für Neuroinformatik, Ruhr-Universität Bochum 4478 Bochum, Germany Abstract Iterative learning algorithms

More information

Abstract. I. Introduction

Abstract. I. Introduction 46th AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics & Materials Conerence 8 - April 005, Austin, Texas AIAA 005-83 M Intuitive Design Selection Using Visualized n-dimensional Pareto Frontier G.

More information

TO meet the increasing bandwidth demands and reduce

TO meet the increasing bandwidth demands and reduce Assembling TCP/IP Packets in Optical Burst Switched Networks Xiaojun Cao Jikai Li Yang Chen Chunming Qiao Department o Computer Science and Engineering State University o New York at Bualo Bualo, NY -

More information

ITU - Telecommunication Standardization Sector. G.fast: Far-end crosstalk in twisted pair cabling; measurements and modelling ABSTRACT

ITU - Telecommunication Standardization Sector. G.fast: Far-end crosstalk in twisted pair cabling; measurements and modelling ABSTRACT ITU - Telecommunication Standardization Sector STUDY GROUP 15 Temporary Document 11RV-22 Original: English Richmond, VA. - 3-1 Nov. 211 Question: 4/15 SOURCE 1 : TNO TITLE: G.ast: Far-end crosstalk in

More information

A new approach for ranking trapezoidal vague numbers by using SAW method

A new approach for ranking trapezoidal vague numbers by using SAW method Science Road Publishing Corporation Trends in Advanced Science and Engineering ISSN: 225-6557 TASE 2() 57-64, 20 Journal homepage: http://www.sciroad.com/ntase.html A new approach or ranking trapezoidal

More information

9.3 Transform Graphs of Linear Functions Use this blank page to compile the most important things you want to remember for cycle 9.

9.3 Transform Graphs of Linear Functions Use this blank page to compile the most important things you want to remember for cycle 9. 9. Transorm Graphs o Linear Functions Use this blank page to compile the most important things you want to remember or cycle 9.: Sec Math In-Sync by Jordan School District, Utah is licensed under a 6 Function

More information

Research on Image Splicing Based on Weighted POISSON Fusion

Research on Image Splicing Based on Weighted POISSON Fusion Research on Image Splicing Based on Weighted POISSO Fusion Dan Li, Ling Yuan*, Song Hu, Zeqi Wang School o Computer Science & Technology HuaZhong University o Science & Technology Wuhan, 430074, China

More information

MATRIX ALGORITHM OF SOLVING GRAPH CUTTING PROBLEM

MATRIX ALGORITHM OF SOLVING GRAPH CUTTING PROBLEM UDC 681.3.06 MATRIX ALGORITHM OF SOLVING GRAPH CUTTING PROBLEM V.K. Pogrebnoy TPU Institute «Cybernetic centre» E-mail: vk@ad.cctpu.edu.ru Matrix algorithm o solving graph cutting problem has been suggested.

More information

Semi-supervised learning and active learning

Semi-supervised learning and active learning Semi-supervised learning and active learning Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Combining classifiers Ensemble learning: a machine learning paradigm where multiple learners

More information

Combining PGMs and Discriminative Models for Upper Body Pose Detection

Combining PGMs and Discriminative Models for Upper Body Pose Detection Combining PGMs and Discriminative Models for Upper Body Pose Detection Gedas Bertasius May 30, 2014 1 Introduction In this project, I utilized probabilistic graphical models together with discriminative

More information

KANGAL REPORT

KANGAL REPORT Individual Penalty Based Constraint handling Using a Hybrid Bi-Objective and Penalty Function Approach Rituparna Datta Kalyanmoy Deb Mechanical Engineering IIT Kanpur, India KANGAL REPORT 2013005 Abstract

More information

Understanding Signal to Noise Ratio and Noise Spectral Density in high speed data converters

Understanding Signal to Noise Ratio and Noise Spectral Density in high speed data converters Understanding Signal to Noise Ratio and Noise Spectral Density in high speed data converters TIPL 4703 Presented by Ken Chan Prepared by Ken Chan 1 Table o Contents What is SNR Deinition o SNR Components

More information

Chapter 2 Basic Structure of High-Dimensional Spaces

Chapter 2 Basic Structure of High-Dimensional Spaces Chapter 2 Basic Structure of High-Dimensional Spaces Data is naturally represented geometrically by associating each record with a point in the space spanned by the attributes. This idea, although simple,

More information

HOUGH TRANSFORM CS 6350 C V

HOUGH TRANSFORM CS 6350 C V HOUGH TRANSFORM CS 6350 C V HOUGH TRANSFORM The problem: Given a set of points in 2-D, find if a sub-set of these points, fall on a LINE. Hough Transform One powerful global method for detecting edges

More information

3 Nonlinear Regression

3 Nonlinear Regression CSC 4 / CSC D / CSC C 3 Sometimes linear models are not sufficient to capture the real-world phenomena, and thus nonlinear models are necessary. In regression, all such models will have the same basic

More information

ES 240: Scientific and Engineering Computation. a function f(x) that can be written as a finite series of power functions like

ES 240: Scientific and Engineering Computation. a function f(x) that can be written as a finite series of power functions like Polynomial Deinition a unction () that can be written as a inite series o power unctions like n is a polynomial o order n n ( ) = A polynomial is represented by coeicient vector rom highest power. p=[3-5

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)

More information

Gene Expression Based Classification using Iterative Transductive Support Vector Machine

Gene Expression Based Classification using Iterative Transductive Support Vector Machine Gene Expression Based Classification using Iterative Transductive Support Vector Machine Hossein Tajari and Hamid Beigy Abstract Support Vector Machine (SVM) is a powerful and flexible learning machine.

More information

Logistic Regression: Probabilistic Interpretation

Logistic Regression: Probabilistic Interpretation Logistic Regression: Probabilistic Interpretation Approximate 0/1 Loss Logistic Regression Adaboost (z) SVM Solution: Approximate 0/1 loss with convex loss ( surrogate loss) 0-1 z = y w x SVM (hinge),

More information

Digital Image Processing. Image Enhancement in the Spatial Domain (Chapter 4)

Digital Image Processing. Image Enhancement in the Spatial Domain (Chapter 4) Digital Image Processing Image Enhancement in the Spatial Domain (Chapter 4) Objective The principal objective o enhancement is to process an images so that the result is more suitable than the original

More information

Applying Supervised Learning

Applying Supervised Learning Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains

More information

Classification Method for Colored Natural Textures Using Gabor Filtering

Classification Method for Colored Natural Textures Using Gabor Filtering Classiication Method or Colored Natural Textures Using Gabor Filtering Leena Lepistö 1, Iivari Kunttu 1, Jorma Autio 2, and Ari Visa 1, 1 Tampere University o Technology Institute o Signal Processing P.

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:

More information

Semantic-Driven Selection of Printer Color Rendering Intents

Semantic-Driven Selection of Printer Color Rendering Intents Semantic-Driven Selection o Printer Color Rendering Intents Kristyn Falkenstern 1,2, Albrecht Lindner 3, Sabine Süsstrunk 3, and Nicolas Bonnier 1. 1 Océ Print Logic Technologies S.A. (France). 2 Institut

More information

Machine Learning for NLP

Machine Learning for NLP Machine Learning for NLP Support Vector Machines Aurélie Herbelot 2018 Centre for Mind/Brain Sciences University of Trento 1 Support Vector Machines: introduction 2 Support Vector Machines (SVMs) SVMs

More information

A Small-world Network Where All Nodes Have the Same Connectivity, with Application to the Dynamics of Boolean Interacting Automata

A Small-world Network Where All Nodes Have the Same Connectivity, with Application to the Dynamics of Boolean Interacting Automata A Small-world Network Where All Nodes Have the Same Connectivity, with Application to the Dynamics o Boolean Interacting Automata Roberto Serra Department o Statistics, Ca Foscari University, Fondamenta

More information

Semi-supervised Learning

Semi-supervised Learning Semi-supervised Learning Piyush Rai CS5350/6350: Machine Learning November 8, 2011 Semi-supervised Learning Supervised Learning models require labeled data Learning a reliable model usually requires plenty

More information

A SAR IMAGE REGISTRATION METHOD BASED ON SIFT ALGORITHM

A SAR IMAGE REGISTRATION METHOD BASED ON SIFT ALGORITHM A SAR IMAGE REGISTRATION METHOD BASED ON SIFT ALGORITHM W. Lu a,b, X. Yue b,c, Y. Zhao b,c, C. Han b,c, * a College o Resources and Environment, University o Chinese Academy o Sciences, Beijing, 100149,

More information

Revision of the SolidWorks Variable Pressure Simulation Tutorial J.E. Akin, Rice University, Mechanical Engineering. Introduction

Revision of the SolidWorks Variable Pressure Simulation Tutorial J.E. Akin, Rice University, Mechanical Engineering. Introduction Revision of the SolidWorks Variable Pressure Simulation Tutorial J.E. Akin, Rice University, Mechanical Engineering Introduction A SolidWorks simulation tutorial is just intended to illustrate where to

More information

A Cylindrical Surface Model to Rectify the Bound Document Image

A Cylindrical Surface Model to Rectify the Bound Document Image A Cylindrical Surace Model to Rectiy the Bound Document Image Huaigu Cao, Xiaoqing Ding, Changsong Liu Department o Electronic Engineering, Tsinghua University State Key Laboratory o Intelligent Technology

More information

PATH PLANNING OF UNMANNED AERIAL VEHICLE USING DUBINS GEOMETRY WITH AN OBSTACLE

PATH PLANNING OF UNMANNED AERIAL VEHICLE USING DUBINS GEOMETRY WITH AN OBSTACLE Proceeding o International Conerence On Research, Implementation And Education O Mathematics And Sciences 2015, Yogyakarta State University, 17-19 May 2015 PATH PLANNING OF UNMANNED AERIAL VEHICLE USING

More information

Application of Six Sigma Robust Optimization in Sheet Metal Forming

Application of Six Sigma Robust Optimization in Sheet Metal Forming Application o Six Sigma Robust Optimization in Sheet Metal Forming Y. Q. Li, Z. S. Cui, X. Y. Ruan, D. J. Zhang National Die and Mold CAD Engineering Research Center, Shanghai Jiao Tong University, Shanghai,

More information

9. Reviewing Printed Circuit Board Schematics with the Quartus II Software

9. Reviewing Printed Circuit Board Schematics with the Quartus II Software November 2012 QII52019-12.1.0 9. Reviewing Printed Circuit Board Schematics with the Quartus II Sotware QII52019-12.1.0 This chapter provides guidelines or reviewing printed circuit board (PCB) schematics

More information

THE preceding chapters were all devoted to the analysis of images and signals which

THE preceding chapters were all devoted to the analysis of images and signals which Chapter 5 Segmentation of Color, Texture, and Orientation Images THE preceding chapters were all devoted to the analysis of images and signals which take values in IR. It is often necessary, however, to

More information

Logistic Regression. Abstract

Logistic Regression. Abstract Logistic Regression Tsung-Yi Lin, Chen-Yu Lee Department of Electrical and Computer Engineering University of California, San Diego {tsl008, chl60}@ucsd.edu January 4, 013 Abstract Logistic regression

More information

Automatic Video Segmentation for Czech TV Broadcast Transcription

Automatic Video Segmentation for Czech TV Broadcast Transcription Automatic Video Segmentation or Czech TV Broadcast Transcription Jose Chaloupka Laboratory o Computer Speech Processing, Institute o Inormation Technology and Electronics Technical University o Liberec

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,

More information

Chapter 3: Supervised Learning

Chapter 3: Supervised Learning Chapter 3: Supervised Learning Road Map Basic concepts Evaluation of classifiers Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Summary 2 An example

More information

A Cartesian Grid Method with Transient Anisotropic Adaptation

A Cartesian Grid Method with Transient Anisotropic Adaptation Journal o Computational Physics 179, 469 494 (2002) doi:10.1006/jcph.2002.7067 A Cartesian Grid Method with Transient Anisotropic Adaptation F. E. Ham, F. S. Lien, and A. B. Strong Department o Mechanical

More information

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant

More information

Lab 9 - GEOMETRICAL OPTICS

Lab 9 - GEOMETRICAL OPTICS 161 Name Date Partners Lab 9 - GEOMETRICAL OPTICS OBJECTIVES Optics, developed in us through study, teaches us to see - Paul Cezanne Image rom www.weidemyr.com To examine Snell s Law To observe total internal

More information

Gesture Recognition using a Probabilistic Framework for Pose Matching

Gesture Recognition using a Probabilistic Framework for Pose Matching Gesture Recognition using a Probabilistic Framework or Pose Matching Ahmed Elgammal Vinay Shet Yaser Yacoob Larry S. Davis Computer Vision Laboratory University o Maryland College Park MD 20742 USA elgammalvinayyaserlsd

More information

Classification of Subject Motion for Improved Reconstruction of Dynamic Magnetic Resonance Imaging

Classification of Subject Motion for Improved Reconstruction of Dynamic Magnetic Resonance Imaging 1 CS 9 Final Project Classification of Subject Motion for Improved Reconstruction of Dynamic Magnetic Resonance Imaging Feiyu Chen Department of Electrical Engineering ABSTRACT Subject motion is a significant

More information

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010 INFORMATICS SEMINAR SEPT. 27 & OCT. 4, 2010 Introduction to Semi-Supervised Learning Review 2 Overview Citation X. Zhu and A.B. Goldberg, Introduction to Semi- Supervised Learning, Morgan & Claypool Publishers,

More information

Ensemble methods in machine learning. Example. Neural networks. Neural networks

Ensemble methods in machine learning. Example. Neural networks. Neural networks Ensemble methods in machine learning Bootstrap aggregating (bagging) train an ensemble of models based on randomly resampled versions of the training set, then take a majority vote Example What if you

More information

Fast or furious? - User analysis of SF Express Inc

Fast or furious? - User analysis of SF Express Inc CS 229 PROJECT, DEC. 2017 1 Fast or furious? - User analysis of SF Express Inc Gege Wen@gegewen, Yiyuan Zhang@yiyuan12, Kezhen Zhao@zkz I. MOTIVATION The motivation of this project is to predict the likelihood

More information

L ENSES. Lenses Spherical refracting surfaces. n 1 n 2

L ENSES. Lenses Spherical refracting surfaces. n 1 n 2 Lenses 2 L ENSES 2. Sherical reracting suraces In order to start discussing lenses uantitatively, it is useul to consider a simle sherical surace, as shown in Fig. 2.. Our lens is a semi-ininte rod with

More information

Proportional-Share Scheduling for Distributed Storage Systems

Proportional-Share Scheduling for Distributed Storage Systems Proportional-Share Scheduling or Distributed Storage Systems Abstract Yin Wang University o Michigan yinw@eecs.umich.edu Fully distributed storage systems have gained popularity in the past ew years because

More information

Using the Deformable Part Model with Autoencoded Feature Descriptors for Object Detection

Using the Deformable Part Model with Autoencoded Feature Descriptors for Object Detection Using the Deformable Part Model with Autoencoded Feature Descriptors for Object Detection Hyunghoon Cho and David Wu December 10, 2010 1 Introduction Given its performance in recent years' PASCAL Visual

More information