Approximate Smoothing Spline Methods for Large Data Sets in the Binary Case Dong Xiang, SAS Institute Inc. Grace Wahba, University of Wisconsin at Mad

Size: px
Start display at page:

Download "Approximate Smoothing Spline Methods for Large Data Sets in the Binary Case Dong Xiang, SAS Institute Inc. Grace Wahba, University of Wisconsin at Mad"

Transcription

1 DEPARTMENT OF STATISTICS University of Wisconsin 1210 West Dayton St. Madison, WI TECHNICAL REPORT NO. 982 September 30, 1997 Approximate Smoothing Spline Methods for Large Data Sets in the Binary Case 1 by Dong Xiang and Grace Wahba 1 Prepared for the 1997 Proceedings of the American Statistical Association Joint Statistical Meetings, Biometrics Section. This short paper was written under the length constraints of those Proceedings. Further details may be found in Xiang (1996) and in preparation. Research sponsored in part by NEI under Grant R01 EY09946 and in part by NSF under Grant DMS

2 Approximate Smoothing Spline Methods for Large Data Sets in the Binary Case Dong Xiang, SAS Institute Inc. Grace Wahba, University of Wisconsin at Madison. y September 30, 1997 Key Words: Generalized Approximate Cross Validation, smoothing spline, penalized likelihood regression, generalized cross validation, Kullback-Leibler distance. Abstract We consider the use of smoothing splines in generalized additive models with binary responses in the large data set situation. Xiang and Wahba (1996) proposed using the Generalized Approximate Cross Validation (GACV ) function as a method to choose (multiple) smoothing parameters in the binary data case and demonstrated through simulation that the GACV method compares well to existing iterative methods, as judged by the Kullback-Leibler distance of the estimate to the true function being tted. However, the calculation of the GACV function involves solving an n by n linear system, where n is the sample size. As the sample size increases, the calculation becomes numerically unstable and infeasible. To reduce these computational problems we propose a randomized version of the GACV function, which is numerically stable. Furthermore, we use a clustering algorithm to choose a set of basis functions with which to approximate the exact additive smoothing spline estimate, which has a basis function for every data point. Combining these two approaches, we are able to extend smoothing spline methods in the binary response case to much larger data sets without sacricing much accuracy. 1 Introduction Suppose that y i ; i = 1; : : : ; n; are independent observations from an exponential family with density of the form f(y i ; (x i ); ) = exp(y i (x i )? b((x i )))=a() + c(y i ; ); y Dong Xiang is Senior Research Statistician in SAS Institute Inc, Grace Wahba is Bascom Professor in the Statistics Department, University of Wisconsin at Madison. This paper is supported by the National Science Foundation under Grant DMS and the National Eye Institute under Grant R01 EY Typographical error in the formulas for RanGACV xed on March 16,

3 where a, b and c are given, with b a strictly convex function of on any bounded set, the x i are vectors of covariates, is a nuisance parameter, and (x i ) is the so called canonical parameter. The goal is to estimate (). For the purpose of exposition, we will assume that x i is on the real line, but our arguments extend to more general domains for x. In the particular case of binary data, a() = 1; b() = log(1 + e ); c(y; ) = 0, and y i is 1 or 0 with probability p (xi ) = e (x i) =(1 + e (x i) ): The binary case is of particular interest because of its applicability in risk factor estimation. In the usual parametric GLIM models, () is assumed to be of a parametric form, and then maximum likelihood methods may be used to estimate and asses the tted models. A variety of approaches have been proposed to allow more exibility than is inherent in simple parametric models. We will not review the general literature, other than to note that regression splines have been used for this purpose by, for example, Friedman (1991), Stone (1994) and others. O'Sullivan (1983), O'Sullivan, Yandell and Raynor (1986), Gu (1990), Wahba (1990), others (Green, Jennison and Seheult 1985, Green and Yandell 1985, Yandell and Green 1986, Green and Silverman 1994 and Hastie and Tibshirani 1990) allow () to take on a more exible form by assuming that () is an element of some reproducing kernel Hilbert space H of smooth functions, and estimating () by minimizing a penalized log likelihood. Assuming that a() = 1 (or, is absorbed into below), dene l(y i ; (x i )) = y i (x i )?b((x i )): The smoothing spline (or penalized log likelihood) estimate () of () is the minimizer in H of? l(y i ; (x i )) + n J(); (1) 2 where the smoothing parameter 0 balances the tradeo between minimizing the negative log likelihood function and the `smoothness' J(). Here J is a quadratic penalty functional dened on H. If J 1=2 () is a norm in H or a semi-norm in H with low dimensional null space (the `parametric part') satisfying some conditions, then it is well known that, the minimizer of (1), is in a known n-dimensional subspace H n of H, with basis functions that are known functions of the reproducing kernel for H and a basis for the null space of J. See Wahba(1990), O'Sullivan (1983), Kimeldorf and Wahba (1971), and what follows. In general, we will assume that (1) will be minimized numerically in some N n dimensional space H B ; that is, () = NX j=1 j B j (); where the B j are either a set of suitable basis functions which may span H n or may constitute a convenient, suciently rich (linearly independent) approximation to a spanning set. See Wahba (1990), Chapter 7, and references cited there. Given, the computational problem is then to nd = ( 1 ; :::; N ) T to minimize I =? l(y i ; i ()) + n 2 T ; (2) 2

4 where i () = P N j=1 j B j (x i ) and is a non-negative denite matrix dened by T = J( NX j=1 j B j ): Letting l i () = l(y i ; ) and using the fact that all l i () are strictly concave with respect to, we may compute via Newton iterations. Dene w i =?d 2 l i =d 2 i ; u i =?dl i =d i. Each iteration for is equivalent to nding to minimize 1 X min ~wi (~y i? i ()) 2 + T ; (3) n where ~y i = ~ i? ~u i = ~w i and ~ i ; ~u i ; ~w i are the working values of i ; u i and w i based on the last iteration. The ~y i will be called the working data here. This problem will have a unique minimizer provided = 0 and i () = 0; i = 1; :::; n ) = 0. See O'Sullivan et al. (1986) and Gu (1990). Note that by the properties of the exponential family, if (x i ) is the true canonical parameter evaluated at x i, then Ey i = u i, and var y i = w i. If H B = H n or H B is suciently large, then a suciently small allows the i to eectively interpolate the data, while a suciently large forces the estimate to the null space of J() in H B. As for choosing the smoothing parameters for the generalized linear model, a number of methods have been proposed. O'Sullivan et al (1986) adapted the GCV function to the non-gaussian case by considering the quadratic approximation to the negative log likelihood available at the nal stage of their Newton iteration of. Yandell (1986) suggested evaluation and minimization of the GCV scores as the iteration proceeded. Gu (1992) compared the two dierent algorithms and found that the results from the method which chooses after each iteration of (3), and updates the working data based on the new before the new iteration (Algorithm 2), out performs the method which xes during the iteration and chooses based on the nal working data (Algorithm 1). However, a theoretical justication for Algorithm 2 is still yet to emerge. And Algorithm 2 does not guarantee convergence. In the same paper, Gu suggested a criteria (the \U" estimate) for binary data which is similar to the unbiased risk (UBR) estimate in Craven and Wahba (1976). He believed that the criteria he suggested is a proxy for the symmetrized Kullback-Leibler distance between () and the true (). He also demonstrated via some simulations that this criteria, computed via Algorithm 2, gave more favorable results than other methods. Based on this result, Wang (1994) extended the method to the problem of estimation of multiple smoothing parameters. In Xiang and Wahba (1996), we proposed the Generalized Approximated Cross Validation (GACV ) functions. With some abuse of notation, we write J() = T, where in this context we are letting = ((x 1 ); ; (x n )) T. For binary data, the GACV is given by P GACV () = 1 (?y i (x i ) + log(1 + e (x i) )) + T r(a) n y i (y i? p (x i )) n n n? T r(w 1=2 AW 1=2 ) ; (4) where W = diag(p (x 1 )(1? p (x 1 ); ; p (x n )(1? p (x n )) and A = [W ( ) + n]?1 : The GACV is an approximation to the comparative Kullback-Leibler distance obtained by starting with a leaving-out-one proxy, approximating it by repeated use of a Taylor 3

5 series expansion, and, nally, replacing individual diagonal entries in certain matrices by the average diagonal entry. In the simulations we conducted, the estimates of appeared superior to the popular and successful U estimates computed via Algorithm 2. However, to evaluate the GACV function, we need to invert an n n matrix. When the sample size increases, the calculation of GACV becomes unstable if it is still possible. In order for this method to be competitive with Gu's U estimates, we need to nd a stable numerical method. In this paper, we address this problem by combining a clustering method for choosing a subset of bases and a randomized version of GACV for selecting the smoothing parameters, therefore, instead of solving a n n linear system, we only need to nd the solution for a k k system. And for large data sets, k can be much smaller than the actual sample size. 2 Randomized GACV As we mentioned in the last section, the diculty of extending the GACV method to large data sets is that it involves solving an n n linear system. In this section, we address this problem by proposing an approximate solution to the penalized log-likelihood based on the basis functions selected by a clustering algorithm and a randomized GACV function for choosing the smoothing parameters. 2.1 An Approximate Solution Let H = H 0 H 1, where H 0 is the null space of J in H, let P 1 be the orthogonal projector onto H 1 in H, and let J() = jjp 1 jj 2. This situation covers the cases we are interested in. See Wahba (1990) for details. Let t i be the representer of evaluation at x i ; i = 1; ; n, let i = P 1 t i and let v ; v = 1; ; M span H 0. Then the minimizer of (1) in H is known to have a representation of the form Now let's dene (x) = without loss of generality we assume MX v=1 d v v (x) + c i i (x): H k = spanf ij ; j = 1; : : : ; k n ; v ; v = 1; : : : ; Mg; ( ij ; j = 1; ; k n ) = ( 1 ; ; k ): Instead of nding the minimizer of (1) in H, let's consider the problem of: min 2H k? 1 n l(y i ; (x i )) + J(): (5) The solution of (5) will have the form (x) = P M v=1 d v v (x) + P k j=1 c j j (x): We call it the solution to the penalized approximate smoothing spline problem. 4

6 When k = n, the solution is equal to the minimizer of the penalized log likelihood function (1). When k < n, the dierence between two solutions depends on the projection of the underlying true function onto the space of spanf 1 ; : : : ; n g? spanf 1 ; : : : ; k g: In Xiang (1996), we investigated the dierence of the two solutions and found that when the design points are very close (quite often when the sample size is large), the representers corresponding to those design points will be highly correlated. The correlation of two representers t i ; t j are dened as the correlation of (t i (x 1 ); ; t i (x n )) and (t j (x 1 ); ; t j (x n )): Including all those representers in the solution not only makes the problem dicult to solve due to the large number of parameters but also may make the calculation unstable. Under a simplied situation, Xiang (1996) proved that in order to achieve the same convergence rate as the penalized smoothing spline, the k for the penalized approximate smoothing spline (PASS) only needs to increase at the rate of 2m k = O(n (2m?1)(2m+1) = 1=(2m?1) ): The above argument shows that we do not need a large k in order for the approximate solution (in H k) to be very close to the 'exact' solution (in H). However, once k is chosen, it is important to choose the i well. We would like the i to have maximum separation. A crude method would be to use a grid to partition the design space into a number of small cells and then, choosing one design point inside each cell use only i corresponding to them to generate the solution space H k. However, this method will fail in the situations that the design points are not regularly spaced over the design space; most of the cells will be empty, especially when the number of covariates are large. As for real data, this is nearly always the case. Considering this problem from another point of view, what we want is to group design points into k groups with the requirement that the groups be spaced as far as possible from each other in an appropriate sense. Thus we can borrow the classical cluster analysis approach to solve our problem. Cluster analysis is normally used to separate a set of data into its constituent groups or clusters. Even though there is no natural separation among design points in our case, we still can force the algorithm to nd k groups that have maximum separation. There are lots of algorithms for clustering the data. From our limited experience, just about any algorithm will provide a reasonable answer, although some are faster than the others. We will choose the algorithm used in the procedure FASTCLUS in the SAS software package (SAS/STAT User's Guide, SAS Institute Inc, 1988) because it is designed for the disjoint clustering of very large data sets in minimum time. It operates in the following three steps: 1. Select the rst k design points as the cluster seeds 2. Clusters are formed by assigning each design point to the nearest seed (Euclidean distance). After all design points are assigned, the cluster seeds are replaced by the cluster means. This step can be repeated until the changes in the cluster seeds are less than or equal to the minimum distance between previous seeds times a default value taken as

7 3. Final clusters are formed by assigning each observation to the nearest seed. Within each cluster, we randomly choose one design point as the representative to be included in our subset for the purpose of generating the H k subspace. In the simulation as well as the real data examples, we found that the approximate solution is very close to the smoothing spline solution (Xiang 1996). 2.2 The Randomized GACV As we noticed in (4), the GACV contains the quantities T r(a) and T r(w 1=2 AW 1=2 ). In order to evaluate T r(a) and T r(w 1=2 AW 1=2 ), we have to calculate the n n matrix A explicitly. This will slow down the calculation of the penalized approximate smoothing spline. In this section, we will derive a randomized version of the GACV function, which is similar to the randomized version of GCV (Girard 1987). This version of GACV does not require the calculation of any inverse matrix. Now consider I (; Y ) =? l j (y j ; (x j )) + n 2 T ; (6) where Y = (y 1 ; ; y n ) T, l j (y j ; (x j )) is the log likelihood and = ((x 1 ); : : : (x n )) T. Thus, I (; Y ) is the penalized log likelihood for with respect to data Y. If we put a small disturbance on Y to get a new pseudo data set, Y +, we expect ^ Y + and ^ Y, the minimizers of (6) with respect to Y + and Y, to be very close. Since ^ Y + and ^ Y are minimizers of (6), we + (^Y ; Y + ) = @ (^Y ; Y ) = 0: Similar to the approach we used to derive the GACV function, we use the Taylor expansion to + ; Y + ) at (^ Y ; Y ) and have the + (^Y ; Y + (^Y ; Y ) (^ ; Y )(^ Y +? ^ Y ) where (^ ; Y ) is a point between (^ Y ; Y ) and (^ Y + ; Y + ). Based on (6), we 2 I = W () + and (^ ; Y )(Y +? Y ); 2 =?I nn: 6

8 Therefore, equation (7) can be simplied to ^ Y +? ^ Y = (W (^ ) + n)?1 : (8) When is very small (^ Y + ; Y + ) is very close to (^ Y ; Y ), W (^ ) can be approximated by W (^ Y ) and (8) will be approximated by ^ Y +? ^ Y (W (^ Y ) + n)?1 = A: (9) Using the above result, we will derive a Monte Carlo calculation of T r(a). Let's generate a small random error N(0; 2 I n ). Applying it to the square matrix A, we have and E( T A) = 2 T r(a) V ar( T A) = 4 (T r(a T A) + T r(a 2 )): Hence if is small, 1 n T A= 2 provides a good estimate of 1 T r(a). n Applying it to the approximation in (9), we can use ^ Y +? ^ Y to approximate A; therefore, we can calculate T A in the following way: T A = T (A) = T (^ Y +? ^ Y ): Thus, and T r(a) T (^ Y +? ^ Y )= 2 T r(w 1=2 AW 1=2 ) = T r(w A) T W (^ Y +? ^ Y )= 2 : The accuracy of this approximation depends on the size of the disturbance ( 2 ) which we add to the data. From the experiment we did in Xiang (1996), we found out that the shape and the minima of RanGACV are not very sensitive to the 2 as long as is within a certain large range. Now, replacing T r(a) and T r(w 1=2 AW 1=2 ) by their Monte Carlo calculations, we get a randomized version of the the GACV function, Heuristically, as n! 1, RanGACV () = 1 (?y i (x i ) + b( (x i )) n P + T (^ Y +? ^ Y ) n y i (y i? (x i )) n T? T W (^ Y +? ^ Y ) : ( T A=n? T r(a)=n)! p 0 7

9 and also, T =n! p 1 ( T (W 1=2 AW 1=2 )=n? T r(w 1=2 AW 1=2 )=n)! p 0; we expect (GACV? RanGACV )! p 0 when the sample size is large. To calculate RanGACV, we rst generate a set of disturbed pseudo data Y +. For a xed, we minimize (6) with respect to both Y and Y + to get ts ^ Y and ^ Y +. Based on the ^ Y and ^ Y +, we can evaluate RanGACV for that. In order to reduce the variation caused by, we will use the same for all the. It is very easy to generate and calculate ^ Y + using the penalized approximate smoothing spline approach; for each sequence of, there is a RanGACV curve. Given a set of Y, we can use the average of several RanGACV curves as our nal RanGACV. Several averaging methods are discussed in Xiang (1996), we will use RanGACV () = 1 n 1 nl lx r=1 [?y i (x i ) + b( (x i ))] + T + (r)(^y? (r) ^Y ) T (r) (r)? T + W (^Y (r)? (r) ^Y ) y i (y i? (x i )): As we investigated in Xiang (1996), when l, the number of replicates of, is larger than three, the RanGACV will perform very well for most cases. See Xiang (1996) for further discussion. Based on the extensive simulation we did in Xiang (1996), we found that RanGACV performs better than the U method in both univariate and multivariate smoothing parameter cases in the sense that the nal estimates based on the smoothing parameter chosen by RanGACV has a smaller Kullback-Leibler distance from the true function. References Craven, P. & Wahba, G. (1979), `Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized cross-validation', Numer. Math. 31, 377{403. Friedman, J. (1991), `Multivariate adaptive regression splines', Ann. Statist 19, 1{141. Girard, D. (1987), A fast `Monte Carlo cross validation' procedure for large least squares problems with noisy data, Technical Report RR 687-M, IMAG, Grenoble, France. Green, P. & Silverman, B. (1994), Nonparametric Regression and Generalized Linear Models, Chapman and Hall. 8

10 Green, P. & Yandell, B. (1985), Semi-parametric generalized linear models, in R. Gilchrist, ed., `Lecture Notes in Statistics, Vol. 32', Springer, pp. 44{55. Green, P., Jennison, C. & Seheult, A. (1985), `Analysis of eld experiments by least squares smoothing', J. Roy. Stat. Soc. 47, 299{315. Gu, C. (1990), `Adaptive spline smoothing in non-gaussian regression models', J. Amer. Statist. Assoc. 85, 801{807. Gu, C. (1992), `Cross-validating non-gaussian data', J. Comput. Graph. Stats. 1, 169{179. Hastie, T. & Tibshirani, R. (1990), Generalized Additive Models, Chapman and Hall. Kimeldorf, G. & Wahba, G. (1971), `Some results on Tchebychean spline functions', J. Math. Anal. Applic. 33, 82{95. SAS Institute Inc. (1988), SAS/STAT User's Guide, Release 6.03 edition, Cary, NC. O'Sullivan, F. (1983), The analysis of some penalized likelihood estimation schemes, PhD thesis, Department of Statistics, University of Wisconsin, Madison, WI. Technical Report 726. O'Sullivan, F., Yandell, B. & Raynor, W. (1986), `Automatic smoothing of regression functions in generalized linear models', J. of the Amer. Statist. Assoc. 81, 96{103. Stone, C. (1994), `The use of polynomial splines anad their tensor products in multivariate function estimation, with discussion', Annals of Statistics 22, 118{184. Wahba, G. (1990), Spline Models for Observational Data, SIAM. CBMS-NSF Regional Conference Series in Applied Mathematics, v. 59. Wahba, G., Wang, Y., Gu, C., Klein, R. & Klein, B. (1995), `Smoothing spline ANOVA for exponential families, with application to the Wisconsin Epidemiological Study of Diabetic Retinopathy', Annals of Statistics 23, 1865{1895. Wang, Y. (1994), Smoothing spline analysis of variance of data from Exponential Families, Ph.D thesis, Department of Statistics, University of Wisconsin-Madison, Madison, WI. Technical Report 928. Xiang, D. (1996) Model tting and testing for non-gaussian data with large data sets, Ph.D thesis, Department of Statistics, University of Wisconsin-Madison, Madison, WI. Technical Report 957. Xiang, D. & Wahba, G. (1996), A generalized approximate cross validation for smoothing splines with non-gaussian data, Statistica Sinica 6, 675{692. Yandell, B. (1986), Algorithms for nonlinear generalized cross-validation, in T. Boardman, ed., `Computer Science and Statistics: 18th Symposium on the Interface', American Statistical Association, Washington, DC. 9

11 Yandell, B. & Green, P. (1986), Semi-parametric generalized linear model diagnostics, in `ASA Proceedings of Statistical Computing Section', American Statistical Association, pp. 48{53. 10

TECHNICAL REPORT NO December 11, 2001

TECHNICAL REPORT NO December 11, 2001 DEPARTMENT OF STATISTICS University of Wisconsin 2 West Dayton St. Madison, WI 5376 TECHNICAL REPORT NO. 48 December, 2 Penalized Log Likelihood Density Estimation, via Smoothing-Spline ANOVA and rangacv

More information

Generalized Additive Model

Generalized Additive Model Generalized Additive Model by Huimin Liu Department of Mathematics and Statistics University of Minnesota Duluth, Duluth, MN 55812 December 2008 Table of Contents Abstract... 2 Chapter 1 Introduction 1.1

More information

Improved smoothing spline regression by combining estimates of dierent smoothness

Improved smoothing spline regression by combining estimates of dierent smoothness Available online at www.sciencedirect.com Statistics & Probability Letters 67 (2004) 133 140 Improved smoothing spline regression by combining estimates of dierent smoothness Thomas C.M. Lee Department

More information

Nonparametric regression using kernel and spline methods

Nonparametric regression using kernel and spline methods Nonparametric regression using kernel and spline methods Jean D. Opsomer F. Jay Breidt March 3, 016 1 The statistical model When applying nonparametric regression methods, the researcher is interested

More information

Adaptive Estimation of Distributions using Exponential Sub-Families Alan Gous Stanford University December 1996 Abstract: An algorithm is presented wh

Adaptive Estimation of Distributions using Exponential Sub-Families Alan Gous Stanford University December 1996 Abstract: An algorithm is presented wh Adaptive Estimation of Distributions using Exponential Sub-Families Alan Gous Stanford University December 1996 Abstract: An algorithm is presented which, for a large-dimensional exponential family G,

More information

Abstract R-splines are introduced as splines t with a polynomial null space plus the sum of radial basis functions. Thin plate splines are a special c

Abstract R-splines are introduced as splines t with a polynomial null space plus the sum of radial basis functions. Thin plate splines are a special c R-Splines for Response Surface Modeling July 12, 2000 Sarah W. Hardy North Carolina State University Douglas W. Nychka National Center for Atmospheric Research Raleigh, NC 27695-8203 Boulder, CO 80307

More information

GAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K.

GAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K. GAMs semi-parametric GLMs Simon Wood Mathematical Sciences, University of Bath, U.K. Generalized linear models, GLM 1. A GLM models a univariate response, y i as g{e(y i )} = X i β where y i Exponential

More information

SMOOTHING SPLINE ANOVA MODELS FOR LARGE DATA SETS WITH BERNOULLI OBSERVATIONS AND THE RANDOMIZED GACV

SMOOTHING SPLINE ANOVA MODELS FOR LARGE DATA SETS WITH BERNOULLI OBSERVATIONS AND THE RANDOMIZED GACV The Annals of Statistics 2000, Vol. 28, No. 6, 1570 1600 SMOOTHING SPLINE ANOVA MODELS FOR LARGE DATA SETS WITH BERNOULLI OBSERVATIONS AND THE RANDOMIZED GACV By Xiwu Lin, 1 Grace Wahba, 1 Dong Xiang,

More information

Statistics & Analysis. Fitting Generalized Additive Models with the GAM Procedure in SAS 9.2

Statistics & Analysis. Fitting Generalized Additive Models with the GAM Procedure in SAS 9.2 Fitting Generalized Additive Models with the GAM Procedure in SAS 9.2 Weijie Cai, SAS Institute Inc., Cary NC July 1, 2008 ABSTRACT Generalized additive models are useful in finding predictor-response

More information

Assessing the Quality of the Natural Cubic Spline Approximation

Assessing the Quality of the Natural Cubic Spline Approximation Assessing the Quality of the Natural Cubic Spline Approximation AHMET SEZER ANADOLU UNIVERSITY Department of Statisticss Yunus Emre Kampusu Eskisehir TURKEY ahsst12@yahoo.com Abstract: In large samples,

More information

Nonparametric Estimation of Distribution Function using Bezier Curve

Nonparametric Estimation of Distribution Function using Bezier Curve Communications for Statistical Applications and Methods 2014, Vol. 21, No. 1, 105 114 DOI: http://dx.doi.org/10.5351/csam.2014.21.1.105 ISSN 2287-7843 Nonparametric Estimation of Distribution Function

More information

Independent Components Analysis through Product Density Estimation

Independent Components Analysis through Product Density Estimation Independent Components Analysis through Product Density Estimation 'frevor Hastie and Rob Tibshirani Department of Statistics Stanford University Stanford, CA, 94305 { hastie, tibs } @stat.stanford. edu

More information

Smoothing parameterselection forsmoothing splines: a simulation study

Smoothing parameterselection forsmoothing splines: a simulation study Computational Statistics & Data Analysis 42 (2003) 139 148 www.elsevier.com/locate/csda Smoothing parameterselection forsmoothing splines: a simulation study Thomas C.M. Lee Department of Statistics, Colorado

More information

References for G. Wahba Short Course

References for G. Wahba Short Course References for G. Wahba Short Course PART I Reproducing Kernel Hilbert Spaces [Amo97] L. Amodei. Reproducing kernels of vector-valued function spaces. In A. LeMehaute, C. Rabut, and L. Schumaker, editors,

More information

Clustering Sequences with Hidden. Markov Models. Padhraic Smyth CA Abstract

Clustering Sequences with Hidden. Markov Models. Padhraic Smyth CA Abstract Clustering Sequences with Hidden Markov Models Padhraic Smyth Information and Computer Science University of California, Irvine CA 92697-3425 smyth@ics.uci.edu Abstract This paper discusses a probabilistic

More information

Linear Methods for Regression and Shrinkage Methods

Linear Methods for Regression and Shrinkage Methods Linear Methods for Regression and Shrinkage Methods Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Linear Regression Models Least Squares Input vectors

More information

100 Myung Hwan Na log-hazard function. The discussion section of Abrahamowicz, et al.(1992) contains a good review of many of the papers on the use of

100 Myung Hwan Na log-hazard function. The discussion section of Abrahamowicz, et al.(1992) contains a good review of many of the papers on the use of J. KSIAM Vol.3, No.2, 99-106, 1999 SPLINE HAZARD RATE ESTIMATION USING CENSORED DATA Myung Hwan Na Abstract In this paper, the spline hazard rate model to the randomly censored data is introduced. The

More information

2 <Author Name(s)> and Green, 1994). This research was motivated by a specic application with twelve potential independent variables. If the functiona

2 <Author Name(s)> and Green, 1994). This research was motivated by a specic application with twelve potential independent variables. If the functiona Variable Selection for Response Surface Modeling Using R-Splines Sarah W. HARDY Douglas W. NYCHKA North Carolina State University National Center for Atmospheric Research Raleigh, NC 27695-8203 Boulder,

More information

Recent Developments in Model-based Derivative-free Optimization

Recent Developments in Model-based Derivative-free Optimization Recent Developments in Model-based Derivative-free Optimization Seppo Pulkkinen April 23, 2010 Introduction Problem definition The problem we are considering is a nonlinear optimization problem with constraints:

More information

* * * * * * * * * * * * * * * ** * **

* * * * * * * * * * * * * * * ** * ** Generalized additive models Trevor Hastie and Robert Tibshirani y 1 Introduction In the statistical analysis of clinical trials and observational studies, the identication and adjustment for prognostic

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Basis Functions Tom Kelsey School of Computer Science University of St Andrews http://www.cs.st-andrews.ac.uk/~tom/ tom@cs.st-andrews.ac.uk Tom Kelsey ID5059-02-BF 2015-02-04

More information

Automatic basis selection for RBF networks using Stein s unbiased risk estimator

Automatic basis selection for RBF networks using Stein s unbiased risk estimator Automatic basis selection for RBF networks using Stein s unbiased risk estimator Ali Ghodsi School of omputer Science University of Waterloo University Avenue West NL G anada Email: aghodsib@cs.uwaterloo.ca

More information

Linear penalized spline model estimation using ranked set sampling technique

Linear penalized spline model estimation using ranked set sampling technique Hacettepe Journal of Mathematics and Statistics Volume 46 (4) (2017), 669 683 Linear penalized spline model estimation using ranked set sampling technique Al Kadiri M A Abstract Benets of using Ranked

More information

A noninformative Bayesian approach to small area estimation

A noninformative Bayesian approach to small area estimation A noninformative Bayesian approach to small area estimation Glen Meeden School of Statistics University of Minnesota Minneapolis, MN 55455 glen@stat.umn.edu September 2001 Revised May 2002 Research supported

More information

Machine Learning / Jan 27, 2010

Machine Learning / Jan 27, 2010 Revisiting Logistic Regression & Naïve Bayes Aarti Singh Machine Learning 10-701/15-781 Jan 27, 2010 Generative and Discriminative Classifiers Training classifiers involves learning a mapping f: X -> Y,

More information

J. Weston, A. Gammerman, M. Stitson, V. Vapnik, V. Vovk, C. Watkins. Technical Report. February 5, 1998

J. Weston, A. Gammerman, M. Stitson, V. Vapnik, V. Vovk, C. Watkins. Technical Report. February 5, 1998 Density Estimation using Support Vector Machines J. Weston, A. Gammerman, M. Stitson, V. Vapnik, V. Vovk, C. Watkins. Technical Report CSD-TR-97-3 February 5, 998!()+, -./ 3456 Department of Computer Science

More information

Introduction to machine learning, pattern recognition and statistical data modelling Coryn Bailer-Jones

Introduction to machine learning, pattern recognition and statistical data modelling Coryn Bailer-Jones Introduction to machine learning, pattern recognition and statistical data modelling Coryn Bailer-Jones What is machine learning? Data interpretation describing relationship between predictors and responses

More information

Generalized Additive Models

Generalized Additive Models :p Texts in Statistical Science Generalized Additive Models An Introduction with R Simon N. Wood Contents Preface XV 1 Linear Models 1 1.1 A simple linear model 2 Simple least squares estimation 3 1.1.1

More information

Dimension Reduction of Image Manifolds

Dimension Reduction of Image Manifolds Dimension Reduction of Image Manifolds Arian Maleki Department of Electrical Engineering Stanford University Stanford, CA, 9435, USA E-mail: arianm@stanford.edu I. INTRODUCTION Dimension reduction of datasets

More information

Chapter 3. Quadric hypersurfaces. 3.1 Quadric hypersurfaces Denition.

Chapter 3. Quadric hypersurfaces. 3.1 Quadric hypersurfaces Denition. Chapter 3 Quadric hypersurfaces 3.1 Quadric hypersurfaces. 3.1.1 Denition. Denition 1. In an n-dimensional ane space A; given an ane frame fo;! e i g: A quadric hypersurface in A is a set S consisting

More information

Nonparametric Regression

Nonparametric Regression Nonparametric Regression John Fox Department of Sociology McMaster University 1280 Main Street West Hamilton, Ontario Canada L8S 4M4 jfox@mcmaster.ca February 2004 Abstract Nonparametric regression analysis

More information

Generative and discriminative classification techniques

Generative and discriminative classification techniques Generative and discriminative classification techniques Machine Learning and Category Representation 013-014 Jakob Verbeek, December 13+0, 013 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.13.14

More information

Soft Threshold Estimation for Varying{coecient Models 2 ations of certain basis functions (e.g. wavelets). These functions are assumed to be smooth an

Soft Threshold Estimation for Varying{coecient Models 2 ations of certain basis functions (e.g. wavelets). These functions are assumed to be smooth an Soft Threshold Estimation for Varying{coecient Models Artur Klinger, Universitat Munchen ABSTRACT: An alternative penalized likelihood estimator for varying{coecient regression in generalized linear models

More information

A Fast Clustering Algorithm with Application to Cosmology. Woncheol Jang

A Fast Clustering Algorithm with Application to Cosmology. Woncheol Jang A Fast Clustering Algorithm with Application to Cosmology Woncheol Jang May 5, 2004 Abstract We present a fast clustering algorithm for density contour clusters (Hartigan, 1975) that is a modified version

More information

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)

More information

Non-Parametric and Semi-Parametric Methods for Longitudinal Data

Non-Parametric and Semi-Parametric Methods for Longitudinal Data PART III Non-Parametric and Semi-Parametric Methods for Longitudinal Data CHAPTER 8 Non-parametric and semi-parametric regression methods: Introduction and overview Xihong Lin and Raymond J. Carroll Contents

More information

Programming, numerics and optimization

Programming, numerics and optimization Programming, numerics and optimization Lecture C-4: Constrained optimization Łukasz Jankowski ljank@ippt.pan.pl Institute of Fundamental Technological Research Room 4.32, Phone +22.8261281 ext. 428 June

More information

davidr Cornell University

davidr Cornell University 1 NONPARAMETRIC RANDOM EFFECTS MODELS AND LIKELIHOOD RATIO TESTS Oct 11, 2002 David Ruppert Cornell University www.orie.cornell.edu/ davidr (These transparencies and preprints available link to Recent

More information

Algebraic Iterative Methods for Computed Tomography

Algebraic Iterative Methods for Computed Tomography Algebraic Iterative Methods for Computed Tomography Per Christian Hansen DTU Compute Department of Applied Mathematics and Computer Science Technical University of Denmark Per Christian Hansen Algebraic

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 01-25-2018 Outline Background Defining proximity Clustering methods Determining number of clusters Other approaches Cluster analysis as unsupervised Learning Unsupervised

More information

The picasso Package for High Dimensional Regularized Sparse Learning in R

The picasso Package for High Dimensional Regularized Sparse Learning in R The picasso Package for High Dimensional Regularized Sparse Learning in R X. Li, J. Ge, T. Zhang, M. Wang, H. Liu, and T. Zhao Abstract We introduce an R package named picasso, which implements a unified

More information

Andrew Davenport and Edward Tsang. fdaveat,edwardgessex.ac.uk. mostly soluble problems and regions of overconstrained, mostly insoluble problems as

Andrew Davenport and Edward Tsang. fdaveat,edwardgessex.ac.uk. mostly soluble problems and regions of overconstrained, mostly insoluble problems as An empirical investigation into the exceptionally hard problems Andrew Davenport and Edward Tsang Department of Computer Science, University of Essex, Colchester, Essex CO SQ, United Kingdom. fdaveat,edwardgessex.ac.uk

More information

Time Series Analysis by State Space Methods

Time Series Analysis by State Space Methods Time Series Analysis by State Space Methods Second Edition J. Durbin London School of Economics and Political Science and University College London S. J. Koopman Vrije Universiteit Amsterdam OXFORD UNIVERSITY

More information

Expectation Maximization (EM) and Gaussian Mixture Models

Expectation Maximization (EM) and Gaussian Mixture Models Expectation Maximization (EM) and Gaussian Mixture Models Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 2 3 4 5 6 7 8 Unsupervised Learning Motivation

More information

Unconstrained Optimization

Unconstrained Optimization Unconstrained Optimization Joshua Wilde, revised by Isabel Tecu, Takeshi Suzuki and María José Boccardi August 13, 2013 1 Denitions Economics is a science of optima We maximize utility functions, minimize

More information

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask Machine Learning and Data Mining Clustering (1): Basics Kalev Kask Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand patterns of

More information

What is machine learning?

What is machine learning? Machine learning, pattern recognition and statistical data modelling Lecture 12. The last lecture Coryn Bailer-Jones 1 What is machine learning? Data description and interpretation finding simpler relationship

More information

Estimating Arterial Wall Shear Stress 1

Estimating Arterial Wall Shear Stress 1 DEPARTMENT OF STATISTICS University of Wisconsin 1210 West Dayton St. Madison, WI 53706 TECHNICAL REPORT NO. 1088 December 12, 2003 Estimating Arterial Wall Shear Stress 1 John D. Carew 2 Departments of

More information

STATISTICS (STAT) Statistics (STAT) 1

STATISTICS (STAT) Statistics (STAT) 1 Statistics (STAT) 1 STATISTICS (STAT) STAT 2013 Elementary Statistics (A) Prerequisites: MATH 1483 or MATH 1513, each with a grade of "C" or better; or an acceptable placement score (see placement.okstate.edu).

More information

INDEPENDENT COMPONENT ANALYSIS WITH QUANTIZING DENSITY ESTIMATORS. Peter Meinicke, Helge Ritter. Neuroinformatics Group University Bielefeld Germany

INDEPENDENT COMPONENT ANALYSIS WITH QUANTIZING DENSITY ESTIMATORS. Peter Meinicke, Helge Ritter. Neuroinformatics Group University Bielefeld Germany INDEPENDENT COMPONENT ANALYSIS WITH QUANTIZING DENSITY ESTIMATORS Peter Meinicke, Helge Ritter Neuroinformatics Group University Bielefeld Germany ABSTRACT We propose an approach to source adaptivity in

More information

data points semiparametric nonparametric parametric

data points semiparametric nonparametric parametric Category: Presentation Preference: Correspondence: Algorithms and Architectures (Support Vectors, Supervised Learning) Poster Address to A. Smola Semiparametric Support Vector and Linear Programming Machines

More information

Supervised vs. Unsupervised Learning

Supervised vs. Unsupervised Learning Clustering Supervised vs. Unsupervised Learning So far we have assumed that the training samples used to design the classifier were labeled by their class membership (supervised learning) We assume now

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 01-31-017 Outline Background Defining proximity Clustering methods Determining number of clusters Comparing two solutions Cluster analysis as unsupervised Learning

More information

Multivariate Standard Normal Transformation

Multivariate Standard Normal Transformation Multivariate Standard Normal Transformation Clayton V. Deutsch Transforming K regionalized variables with complex multivariate relationships to K independent multivariate standard normal variables is an

More information

Computational Methods in Statistics with Applications A Numerical Point of View. Large Data Sets. L. Eldén. March 2016

Computational Methods in Statistics with Applications A Numerical Point of View. Large Data Sets. L. Eldén. March 2016 Computational Methods in Statistics with Applications A Numerical Point of View L. Eldén SeSe March 2016 Large Data Sets IDA Machine Learning Seminars, September 17, 2014. Sequential Decision Making: Experiment

More information

Module 1 Lecture Notes 2. Optimization Problem and Model Formulation

Module 1 Lecture Notes 2. Optimization Problem and Model Formulation Optimization Methods: Introduction and Basic concepts 1 Module 1 Lecture Notes 2 Optimization Problem and Model Formulation Introduction In the previous lecture we studied the evolution of optimization

More information

A popular method for moving beyond linearity. 2. Basis expansion and regularization 1. Examples of transformations. Piecewise-polynomials and splines

A popular method for moving beyond linearity. 2. Basis expansion and regularization 1. Examples of transformations. Piecewise-polynomials and splines A popular method for moving beyond linearity 2. Basis expansion and regularization 1 Idea: Augment the vector inputs x with additional variables which are transformation of x use linear models in this

More information

spline structure and become polynomials on cells without collinear edges. Results of this kind follow from the intrinsic supersmoothness of bivariate

spline structure and become polynomials on cells without collinear edges. Results of this kind follow from the intrinsic supersmoothness of bivariate Supersmoothness of bivariate splines and geometry of the underlying partition. T. Sorokina ) Abstract. We show that many spaces of bivariate splines possess additional smoothness (supersmoothness) that

More information

Telecommunication and Informatics University of North Carolina, Technical University of Gdansk Charlotte, NC 28223, USA

Telecommunication and Informatics University of North Carolina, Technical University of Gdansk Charlotte, NC 28223, USA A Decoder-based Evolutionary Algorithm for Constrained Parameter Optimization Problems S lawomir Kozie l 1 and Zbigniew Michalewicz 2 1 Department of Electronics, 2 Department of Computer Science, Telecommunication

More information

Minoru SASAKI and Kenji KITA. Department of Information Science & Intelligent Systems. Faculty of Engineering, Tokushima University

Minoru SASAKI and Kenji KITA. Department of Information Science & Intelligent Systems. Faculty of Engineering, Tokushima University Information Retrieval System Using Concept Projection Based on PDDP algorithm Minoru SASAKI and Kenji KITA Department of Information Science & Intelligent Systems Faculty of Engineering, Tokushima University

More information

The gbev Package. August 19, 2006

The gbev Package. August 19, 2006 The gbev Package August 19, 2006 Version 0.1 Date 2006-08-10 Title Gradient Boosted Regression Trees with Errors-in-Variables Author Joe Sexton Maintainer Joe Sexton

More information

An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework

An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework IEEE SIGNAL PROCESSING LETTERS, VOL. XX, NO. XX, XXX 23 An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework Ji Won Yoon arxiv:37.99v [cs.lg] 3 Jul 23 Abstract In order to cluster

More information

linearize discretize Galerkin optimize sample

linearize discretize Galerkin optimize sample Fairing by Finite Dierence Methods Leif Kobbelt Abstract. We propose an ecient and exible scheme to fairly interpolate or approximate the vertices of a given triangular mesh. Instead of generating a piecewise

More information

Linear Models. Lecture Outline: Numeric Prediction: Linear Regression. Linear Classification. The Perceptron. Support Vector Machines

Linear Models. Lecture Outline: Numeric Prediction: Linear Regression. Linear Classification. The Perceptron. Support Vector Machines Linear Models Lecture Outline: Numeric Prediction: Linear Regression Linear Classification The Perceptron Support Vector Machines Reading: Chapter 4.6 Witten and Frank, 2nd ed. Chapter 4 of Mitchell Solving

More information

EE613 Machine Learning for Engineers LINEAR REGRESSION. Sylvain Calinon Robot Learning & Interaction Group Idiap Research Institute Nov.

EE613 Machine Learning for Engineers LINEAR REGRESSION. Sylvain Calinon Robot Learning & Interaction Group Idiap Research Institute Nov. EE613 Machine Learning for Engineers LINEAR REGRESSION Sylvain Calinon Robot Learning & Interaction Group Idiap Research Institute Nov. 4, 2015 1 Outline Multivariate ordinary least squares Singular value

More information

(X 1:n η) 1 θ e 1. i=1. Using the traditional MLE derivation technique, the penalized MLEs for η and θ are: = n. (X i η) = 0. i=1 = 1.

(X 1:n η) 1 θ e 1. i=1. Using the traditional MLE derivation technique, the penalized MLEs for η and θ are: = n. (X i η) = 0. i=1 = 1. EXAMINING THE PERFORMANCE OF A CONTROL CHART FOR THE SHIFTED EXPONENTIAL DISTRIBUTION USING PENALIZED MAXIMUM LIKELIHOOD ESTIMATORS: A SIMULATION STUDY USING SAS Austin Brown, M.S., University of Northern

More information

control polytope. These points are manipulated by a descent method to compute a candidate global minimizer. The second method is described in Section

control polytope. These points are manipulated by a descent method to compute a candidate global minimizer. The second method is described in Section Some Heuristics and Test Problems for Nonconvex Quadratic Programming over a Simplex Ivo Nowak September 3, 1998 Keywords:global optimization, nonconvex quadratic programming, heuristics, Bezier methods,

More information

Bias-Variance Tradeos Analysis Using Uniform CR Bound. Mohammad Usman, Alfred O. Hero, Jerey A. Fessler and W. L. Rogers. University of Michigan

Bias-Variance Tradeos Analysis Using Uniform CR Bound. Mohammad Usman, Alfred O. Hero, Jerey A. Fessler and W. L. Rogers. University of Michigan Bias-Variance Tradeos Analysis Using Uniform CR Bound Mohammad Usman, Alfred O. Hero, Jerey A. Fessler and W. L. Rogers University of Michigan ABSTRACT We quantify fundamental bias-variance tradeos for

More information

All 0-1 Polytopes are. Abstract. We study the facial structure of two important permutation polytopes

All 0-1 Polytopes are. Abstract. We study the facial structure of two important permutation polytopes All 0-1 Polytopes are Traveling Salesman Polytopes L.J. Billera and A. Sarangarajan y Abstract We study the facial structure of two important permutation polytopes in R n2, the Birkho or assignment polytope

More information

Vector Regression Machine. Rodrigo Fernandez. LIPN, Institut Galilee-Universite Paris 13. Avenue J.B. Clement Villetaneuse France.

Vector Regression Machine. Rodrigo Fernandez. LIPN, Institut Galilee-Universite Paris 13. Avenue J.B. Clement Villetaneuse France. Predicting Time Series with a Local Support Vector Regression Machine Rodrigo Fernandez LIPN, Institut Galilee-Universite Paris 13 Avenue J.B. Clement 9343 Villetaneuse France rf@lipn.univ-paris13.fr Abstract

More information

Bootstrap selection of. Multivariate Additive PLS Spline. models

Bootstrap selection of. Multivariate Additive PLS Spline. models Bootstrap selection of Multivariate Additive PLS Spline models Jean-François Durand Montpellier II University, France E-Mail: support@jf-durand-pls.com Abdelaziz Faraj Institut Français du Pétrole E-Mail:

More information

Numerical Dynamic Programming. Jesus Fernandez-Villaverde University of Pennsylvania

Numerical Dynamic Programming. Jesus Fernandez-Villaverde University of Pennsylvania Numerical Dynamic Programming Jesus Fernandez-Villaverde University of Pennsylvania 1 Introduction In the last set of lecture notes, we reviewed some theoretical background on numerical programming. Now,

More information

EE613 Machine Learning for Engineers LINEAR REGRESSION. Sylvain Calinon Robot Learning & Interaction Group Idiap Research Institute Nov.

EE613 Machine Learning for Engineers LINEAR REGRESSION. Sylvain Calinon Robot Learning & Interaction Group Idiap Research Institute Nov. EE613 Machine Learning for Engineers LINEAR REGRESSION Sylvain Calinon Robot Learning & Interaction Group Idiap Research Institute Nov. 9, 2017 1 Outline Multivariate ordinary least squares Matlab code:

More information

The fastclime Package for Linear Programming and Large-Scale Precision Matrix Estimation in R

The fastclime Package for Linear Programming and Large-Scale Precision Matrix Estimation in R Journal of Machine Learning Research (2013) Submitted ; Published The fastclime Package for Linear Programming and Large-Scale Precision Matrix Estimation in R Haotian Pang Han Liu Robert Vanderbei Princeton

More information

Points Lines Connected points X-Y Scatter. X-Y Matrix Star Plot Histogram Box Plot. Bar Group Bar Stacked H-Bar Grouped H-Bar Stacked

Points Lines Connected points X-Y Scatter. X-Y Matrix Star Plot Histogram Box Plot. Bar Group Bar Stacked H-Bar Grouped H-Bar Stacked Plotting Menu: QCExpert Plotting Module graphs offers various tools for visualization of uni- and multivariate data. Settings and options in different types of graphs allow for modifications and customizations

More information

A General Greedy Approximation Algorithm with Applications

A General Greedy Approximation Algorithm with Applications A General Greedy Approximation Algorithm with Applications Tong Zhang IBM T.J. Watson Research Center Yorktown Heights, NY 10598 tzhang@watson.ibm.com Abstract Greedy approximation algorithms have been

More information

A Practical Review of Uniform B-Splines

A Practical Review of Uniform B-Splines A Practical Review of Uniform B-Splines Kristin Branson A B-spline is a convenient form for representing complicated, smooth curves. A uniform B-spline of order k is a piecewise order k Bezier curve, and

More information

Nonparametric and Semiparametric Econometrics Lecture Notes for Econ 221. Yixiao Sun Department of Economics, University of California, San Diego

Nonparametric and Semiparametric Econometrics Lecture Notes for Econ 221. Yixiao Sun Department of Economics, University of California, San Diego Nonparametric and Semiparametric Econometrics Lecture Notes for Econ 221 Yixiao Sun Department of Economics, University of California, San Diego Winter 2007 Contents Preface ix 1 Kernel Smoothing: Density

More information

Comments on the randomized Kaczmarz method

Comments on the randomized Kaczmarz method Comments on the randomized Kaczmarz method Thomas Strohmer and Roman Vershynin Department of Mathematics, University of California Davis, CA 95616-8633, USA. strohmer@math.ucdavis.edu, vershynin@math.ucdavis.edu

More information

Divide and Conquer Kernel Ridge Regression

Divide and Conquer Kernel Ridge Regression Divide and Conquer Kernel Ridge Regression Yuchen Zhang John Duchi Martin Wainwright University of California, Berkeley COLT 2013 Yuchen Zhang (UC Berkeley) Divide and Conquer KRR COLT 2013 1 / 15 Problem

More information

Smoothing Dissimilarities for Cluster Analysis: Binary Data and Functional Data

Smoothing Dissimilarities for Cluster Analysis: Binary Data and Functional Data Smoothing Dissimilarities for Cluster Analysis: Binary Data and unctional Data David B. University of South Carolina Department of Statistics Joint work with Zhimin Chen University of South Carolina Current

More information

Adaptive Metric Nearest Neighbor Classification

Adaptive Metric Nearest Neighbor Classification Adaptive Metric Nearest Neighbor Classification Carlotta Domeniconi Jing Peng Dimitrios Gunopulos Computer Science Department Computer Science Department Computer Science Department University of California

More information

Chapter 11 Arc Extraction and Segmentation

Chapter 11 Arc Extraction and Segmentation Chapter 11 Arc Extraction and Segmentation 11.1 Introduction edge detection: labels each pixel as edge or no edge additional properties of edge: direction, gradient magnitude, contrast edge grouping: edge

More information

Recent advances in Metamodel of Optimal Prognosis. Lectures. Thomas Most & Johannes Will

Recent advances in Metamodel of Optimal Prognosis. Lectures. Thomas Most & Johannes Will Lectures Recent advances in Metamodel of Optimal Prognosis Thomas Most & Johannes Will presented at the Weimar Optimization and Stochastic Days 2010 Source: www.dynardo.de/en/library Recent advances in

More information

Perceptron as a graph

Perceptron as a graph Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 10 th, 2007 2005-2007 Carlos Guestrin 1 Perceptron as a graph 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0-6 -4-2

More information

Note Set 4: Finite Mixture Models and the EM Algorithm

Note Set 4: Finite Mixture Models and the EM Algorithm Note Set 4: Finite Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine Finite Mixture Models A finite mixture model with K components, for

More information

Image Coding with Active Appearance Models

Image Coding with Active Appearance Models Image Coding with Active Appearance Models Simon Baker, Iain Matthews, and Jeff Schneider CMU-RI-TR-03-13 The Robotics Institute Carnegie Mellon University Abstract Image coding is the task of representing

More information

Dynamic Thresholding for Image Analysis

Dynamic Thresholding for Image Analysis Dynamic Thresholding for Image Analysis Statistical Consulting Report for Edward Chan Clean Energy Research Center University of British Columbia by Libo Lu Department of Statistics University of British

More information

Modern Methods of Data Analysis - WS 07/08

Modern Methods of Data Analysis - WS 07/08 Modern Methods of Data Analysis Lecture XV (04.02.08) Contents: Function Minimization (see E. Lohrmann & V. Blobel) Optimization Problem Set of n independent variables Sometimes in addition some constraints

More information

Picasso: A Sparse Learning Library for High Dimensional Data Analysis in R and Python

Picasso: A Sparse Learning Library for High Dimensional Data Analysis in R and Python Picasso: A Sparse Learning Library for High Dimensional Data Analysis in R and Python J. Ge, X. Li, H. Jiang, H. Liu, T. Zhang, M. Wang and T. Zhao Abstract We describe a new library named picasso, which

More information

Density estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate

Density estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate Density estimation In density estimation problems, we are given a random sample from an unknown density Our objective is to estimate? Applications Classification If we estimate the density for each class,

More information

A Brief Look at Optimization

A Brief Look at Optimization A Brief Look at Optimization CSC 412/2506 Tutorial David Madras January 18, 2018 Slides adapted from last year s version Overview Introduction Classes of optimization problems Linear programming Steepest

More information

MULTIVARIATE TEXTURE DISCRIMINATION USING A PRINCIPAL GEODESIC CLASSIFIER

MULTIVARIATE TEXTURE DISCRIMINATION USING A PRINCIPAL GEODESIC CLASSIFIER MULTIVARIATE TEXTURE DISCRIMINATION USING A PRINCIPAL GEODESIC CLASSIFIER A.Shabbir 1, 2 and G.Verdoolaege 1, 3 1 Department of Applied Physics, Ghent University, B-9000 Ghent, Belgium 2 Max Planck Institute

More information

Calibrating a Structured Light System Dr Alan M. McIvor Robert J. Valkenburg Machine Vision Team, Industrial Research Limited P.O. Box 2225, Auckland

Calibrating a Structured Light System Dr Alan M. McIvor Robert J. Valkenburg Machine Vision Team, Industrial Research Limited P.O. Box 2225, Auckland Calibrating a Structured Light System Dr Alan M. McIvor Robert J. Valkenburg Machine Vision Team, Industrial Research Limited P.O. Box 2225, Auckland New Zealand Tel: +64 9 3034116, Fax: +64 9 302 8106

More information

I D I A P. Conditional Gaussian Mixture Models for Environmental Risk Mapping Nicolas Gilardi * Samy Bengio * R E S E A R C H R E P O R T

I D I A P. Conditional Gaussian Mixture Models for Environmental Risk Mapping Nicolas Gilardi * Samy Bengio * R E S E A R C H R E P O R T R E S E A R C H R E P O R T I D I A P Conditional Gaussian Mixture Models for Environmental Risk Mapping Nicolas Gilardi * Samy Bengio * Mikhail Kanevski * IDIAPRR 2-12 May 15, 22 published in IEEE International

More information

SDLS: a Matlab package for solving conic least-squares problems

SDLS: a Matlab package for solving conic least-squares problems SDLS: a Matlab package for solving conic least-squares problems Didier Henrion 1,2 Jérôme Malick 3 June 28, 2007 Abstract This document is an introduction to the Matlab package SDLS (Semi-Definite Least-Squares)

More information

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used. 1 4.12 Generalization In back-propagation learning, as many training examples as possible are typically used. It is hoped that the network so designed generalizes well. A network generalizes well when

More information

Mixture Models and the EM Algorithm

Mixture Models and the EM Algorithm Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Finite Mixture Models Say we have a data set D = {x 1,..., x N } where x i is

More information

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016 CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2016 A2/Midterm: Admin Grades/solutions will be posted after class. Assignment 4: Posted, due November 14. Extra office hours:

More information

Department of. Computer Science. Remapping Subpartitions of. Hyperspace Using Iterative. Genetic Search. Keith Mathias and Darrell Whitley

Department of. Computer Science. Remapping Subpartitions of. Hyperspace Using Iterative. Genetic Search. Keith Mathias and Darrell Whitley Department of Computer Science Remapping Subpartitions of Hyperspace Using Iterative Genetic Search Keith Mathias and Darrell Whitley Technical Report CS-4-11 January 7, 14 Colorado State University Remapping

More information