EE 8591 Homework 4 (10 pts) Fall 2018 SOLUTIONS Topic: SVM classification and regression GRADING: Problems 1,2,4 3pts each, Problem 3 1 point.

Size: px

Start display at page:

Download "EE 8591 Homework 4 (10 pts) Fall 2018 SOLUTIONS Topic: SVM classification and regression GRADING: Problems 1,2,4 3pts each, Problem 3 1 point."

Irma Carter
5 years ago
Views:

1 1 EE 8591 Homework 4 (10 pts) Fall 2018 SOLUTIONS Topic: SVM classification and regression GRADING: Problems 1,2,4 3pts each, Problem 3 1 point. Problem 1 (problem 7.6 from textbook) C=10e- 4 C=10e- 3 C=10e- 2 C=0.1 C=1 C=10 C=100 C=10e3 C=10e4 C=10e5 C=10e6 C=10e7 C \ degree d=1 52.0% 52.0% 52.0% 18.0% 15.2% 14.0% 14.4% 14.0% 14.4% 14.8% 14.4% 14.0% d=2 52.0% 52.0% 52.0% 14.8% 14.0% 14.8% 15.2% 16.4% 17.2% 17.2% 16.8% 16.8% d=3 52.0% 52.0% 42.4% 14.4% 13.6% 14.4% 14.8% 14.8% 13.2% 12.0% 13.2% 19.2% d=4 52.0% 52.0% 20.0% 13.6% 13.6% 14.8% 15.2% 13.2% 12.4% 11.6% 12.0% 16.8% d=5 52.0% 52.0% 20.4% 14.8% 14.0% 14.0% 13.6% 12.8% 11.6% 12.4% 15.2% 23.6% d=6 52.0% 52.0% 17.6% 13.6% 14.4% 14.8% 13.2% 12.0% 11.6% 12.4% 19.2% 33.6% d=7 52.0% 34.8% 16.0% 13.6% 14.0% 14.0% 12.8% 12.4% 12.0% 12.4% 18.4% 30.0% d=8 52.0% 32.8% 16.0% 14.4% 14.0% 14.0% 12.8% 12.0% 13.2% 14.0% 22.8% 24.4% d=9 52.0% 28.4% 16.0% 14.4% 14.8% 13.2% 12.4% 12.8% 12.8% 15.6% 20.0% 34% d= % 26.4% 15.6% 14.0% 14.4% 13.2% 12.4% 14.0% 13.2% 17.6% 25.2% 43.6% LibSVM software was used for this problem. Optimal values of tuning parameters (poly_degree, C) were selected via 10-fold cross validation, as shown in red in the Table shown above. Note that there are several possible optimal pairs of values of tuning parameters all yielding the same validation error 11.6%. The test error 9.80% was estimated using independent test set (1,000 samples) and it turns to be the same as with RBF kernel used in Example 7.1 (in the textbook). This indicates robustness of SVM modeling with respect to selection of kernel type. Optimal SVM model (along with training data and support vectors) is shown in Fig. 1 below. For this optimal SVM model, the number of the support vectors is 37 (this number is provided by LIBSVM software. Hence, the upper bound on test_error is 37/250=14.8%. This upper-bound estimate is not very tight when compared with actual test error 9.8% estimated using large test set. It is interesting to note that for this data set the optimal cross-validation error 11.6% is larger than test error 9.8%. Usually, the opposite is true, e.g. the validation error is (usually) smaller than test error.

2 2 Fig. 1 Estimated SVM model with optimal tuning parameters (estimated via cross-validation). Next, we show the histogram of projections for training data (Fig. 2) and for test data (Fig. 3). These histograms were generated using optimal SVM model (shown above).

3 3 Fig. 2 Histogram of projections for training data. Fig. 3 Histogram of projections for test data

4 4 Problem 2 (a) Problem 7.7 from the textbook. Make sure you repeat the experiments using 3-5 different realizations of training + validation data, and record (training, validation and test) errors in your optimal SVM models. (b) Recall two analytical bounds for SVM shown as (7.12) and (7.13) in the textbook. Which one of these bounds is more suitable (works better) for this data set? To answer this question, you may find useful to show/analyze estimated SVM models using the histogram-of-projections technique. (a) SOLUTION (2 pts) Note: this solution used STPRTool package for SVM and knn classification. Alternatively, you can use LIBSVM for SVM classification. The functions smo, svmclass, knnrule and knnclass in STPRTool are used for this problem. We apply linear SVM to the training data with candidate values of parameter C =[2 5, 2 4,..., 2 5 ]. Each C-value would produce a different linear SVM model. Then the best/optimal model (or parameter C) is the one that achieves the lowest error rate for the validation data set. The optimal parameter k for the k- NN classifier is the one that has the lowest error rate for the validation data set. Table 1 shows performance comparison of linear SVM and k-nn classifiers. The linear SVM classifier achieves lower validation and test errors compared with those of the k-nn classifier, for our random realization of the data. Note that for this data set, k-nn provides significantly worse prediction accuracy, relative to linear SVM. This can be expected because this data set is sparse and linearly separable. So the margin-based complexity control is superior to k-nn classification. Table 1: Performance of linear SVM and k-nn classifiers. Method Optimal parameter Training error Validation error Test error linear SVM C=2 2 4% 8% 16.3% k-nn k=13 36% 22% 36.6% Additional analysis of the coefficients in the linear SVM model shows large variability of estimated coefficients, due to small sample size and large dimensionality (20 inputs). We can illustrate this variability by showing: The average of the first 10 coefficients (w 1,...,w 10) is ; The average of the last 10 coefficients (w 11,...,w 20) is Thus, the two averages are significantly different, in agreement with the true model for this data set. This analysis/conclusion can be made more evident if we perform the experiment multiple times for different realization of the training/validation data and average the w vectors over many experiments (say, 5-10 experiments). (b) SOLUTION (1 pt) We need to analyze an SVM model (with optimally tuned C-value), and relate this model to analytic bounds. In particular, bound (7.12) requires the number of SV s (for optimal SVM model). Then the fraction of SV s (in the training data) gives a crude upper bound on the test error. For this problem, the input space is 20-dimensional, so linear SVM model has (at least) 21 SV s. Given 50 training samples, the fraction of SV s cannot be used as a good (tight) generalization bound. Bound (7.13) on VC-dimension can be effectively calculated using the histogram of projections (for an optimal SVM model) under the assumption that training data is separable (by this large-margin SVM

5 5 model). If this condition holds, then from the histogram, one can estimate the radius of the sphere that contains all training samples it corresponds to the range of x-values in the histogram. This range is given relative to the size of margin (delta parameter), so it can be used as an estimate of VC-dimension for this SVM model. Shown below are two representative histograms for optimally tuned linear SVM for this data set (using two independent realizations of the training + validation data). These results indicate that that for this data set: (a) training data cannot be separated by large-margin SVM model. (b) the radius of the sphere varies in the 3-5 range (depending on random training sample). Since the data is not separable, bound (7.13) cannot be used for this data set Fig. 4. Histograms of projections for linear SVM model estimated for two independent realizations of training data.

6 6 Problem 3 (problem 7.9 from textbook) We can project the training data onto the normal vector of the SVM decision boundary. These projection values are given by the real-valued decision values, i.e. the outputs of trained SVM model. This univariate histogram of projections (for training samples) can be then used to identify outliers in the training data. For example, see figure (below) for separable training data. The decision value of an outlier would deviate away from the general trend of the histogram, i.e. a red point would show up in the blue cluster. Such outliers can be easily detected and removed from the training data. Problem 4 (a) Apply SVM regression software (using RBF kernel) to estimate regression model for the data set used in HW3, according to the following experimental procedure. Six-dimensional regression data is generated according to: 2 y 10sin x x 20( x 0.5) 0x 5x x where x is uniform in [-1.5,1.5]. Generate three data sets: training (100 samples), validation (100 samples) and test (800 samples). The following procedure must be used for tuning SVM regression complexity parameters: (a) Set the value of C using analytic prescription (7.39) or (7.40) for all experiments. (b) Select optimal values of (epsilon, gamma) that minimize MSE validation error (on independent validation set). You can present model selection results in a format similar to Table 7.3 in the textbook. Report the NRMS and MSE test error of your regression model. (b) Compare your SVM results (test error) in part (a) to the test error obtained using Projection Pursuit Regression (PPR) where an optimal number of terms (complexity parameter) is estimated using the same validation data as in part (a). Discuss your comparisons and comment SVM performance (considering the fact that PPR is most appropriate for this data set. SOLUTION Model selection for SVM is performed by measuring MSE validation error for each set of parameter values, as shown in Table 2. The values γ = 2-4 and ε = 6 yielding the smallest MSE validation error are selected as the optimal parameters for this data set.

7 7 Table 2: SVM model selection. ε = 0 ε = 2 ε = 4 ε = 6 ε = 8 γ = γ = γ = γ = γ = γ = γ = γ = γ = γ = γ = Using these optimal parameters for the SVM regression yields the following training/validation/test error (NRMS). Note that we need to normalize the MSE errors produced by the LIBSVM software package, in order to compare the results with PPR regression (since XTAL software outputs the normalized RMS error or NRMS). Table 3 shows final modeling results, including Projection Pursuit regression results obtained in HW 3 for the same data set. This comparison indicates that PPR provides slightly better prediction than SVM regression. It can be explained by noting that the unknown true target function has additive form very suitable for PPR method. Table 3: Performance comparison between SVM regression and Projection Pursuit regression. All results show normalized RMS error. Method Parameter Training error Validation error Test error SVM regression γ = 2-4, ε = PPR

Lecture 9: Support Vector Machines

Lecture 9: Support Vector Machines William Webber (william@williamwebber.com) COMP90042, 2014, Semester 1, Lecture 8 What we ll learn in this lecture Support Vector Machines (SVMs) a highly robust and