Nesterov's Smoothing Technique and Minimizing Differences of Convex Functions for Hierarchical Clustering

Size: px
Start display at page:

Download "Nesterov's Smoothing Technique and Minimizing Differences of Convex Functions for Hierarchical Clustering"

Transcription

1 Portland State University PDXScholar Mathematics and Statistics Faculty Publications and Presentations Fariborz Maseeh Department of Mathematics and Statistics Nesterov's Smoothing Technique and Minimizing Differences of Convex Functions for Hierarchical Clustering Mau Nam Nguyen Portland State University, Wondi Geremew Stockton University Sam Raynolds Portland State University Tuyen Tran Portland State University Let us know how access to this document benefits you. Follow this and additional works at: Part of the Mathematics Commons Citation Details Nam, N. M., Geremew, W., Raynolds, S., & Tran, T. (017). The Nesterov Smoothing Technique and Minimizing Differences of Convex Functions for Hierarchical Clustering. arxiv preprint arxiv: This Pre-Print is brought to you for free and open access. It has been accepted for inclusion in Mathematics and Statistics Faculty Publications and Presentations by an authorized administrator of PDXScholar. For more information, please contact

2 NESTEROV S SMOOTHING TECHNIQUE AND MINIMIZING DIFFERENCES OF CONVEX FUNCTIONS FOR HIERARCHICAL CLUSTERING N. M. NAM 1, W. GEREMEW, S. REYNOLDS 3 and T. TRAN 4. arxiv: v [math.oc] 6 Mar 017 Abstract. A bilevel hierarchical clustering model is commonly used in designing optimal multicast networks. In this paper, we consider two different formulations of the bilevel hierarchical clustering problem, a discrete optimization problem which can be shown to be NP-hard. Our approach is to reformulate the problem as a continuous optimization problem by making some relaxations on the discreteness conditions. Then Nesterov s smoothing technique and a numerical algorithm for minimizing differences of convex functions called the DCA are applied to cope with the nonsmoothness and nonconvexity of the problem. Numerical examples are provided to illustrate our method. Key words. DC programming, Nesterov s smoothing technique, hierarchical clustering, subgradient, Fenchel conjugate. AMS subject classifications. 49J5, 49J53, 90C31 1 Introduction Although convex optimization techniques and numerical algorithms have been the topics of extensive research for more than 50 years, solving large-scale optimization problems without the presence of convexity remains a challenge. This is a motivation to search for new optimization methods that are capable of handling broader classes of functions and sets where convexity is not assumed. One of the most successful approaches to go beyond convexity is to consider the class of functions representable as differences of two convex functions. Functions of this type are called DC functions, where DC stands for difference of convex. It was recognized early by P. Hartman [10] that the class of DC functions has many nice algebraic properties. For instance, this class of functions is closed under many operations usually considered in optimization such as taking linear combination, imum, or product of a finite number of DC functions. Given a linear space X, a DC program is an optimization problem in which the objective function f : X R can be represented as f = g h, where g, h: X R are convex functions. This extension of convex programming enables us to take advantage of the available tools from convex analysis and optimization. At the same time, DC programming is sufficiently broad to use in solving many nonconvex optimization problems faced in recent applications. Another feature of DC programming is that it possesses a very nice duality theory; see [3] and the references therein. Although DC programming had been known to be important for many applications much earlier, the first algorithm for minimizing differences of convex 1 Fariborz Maseeh Department of Mathematics and Statistics, Portland State University, Portland, OR 9707, USA (mau.nam.nguyen@pdx.edu). Research of this author was partly supported by the National Science Foundation under grant # School of General Studies, Stockton University, Galloway, NJ 0805, USA (wondi.geremew@stockton.edu) 3 Fariborz Maseeh Department of Mathematics and Statistics, Portland State University, Portland, OR 9707, USA (ser6@pdx.edu ). 4 Fariborz Maseeh Department of Mathematics and Statistics, Portland State University, Portland, OR 9707, USA (tuyen@pdx.edu). 1

3 functions called the DCA was introduced by Tao and An in [3, 9]. The DCA is a simple but effective optimization scheme used extensively in DC programming and its applications. Cluster analysis or clustering is one of the most important problems in many fields such as machine learning, pattern recognition, image analysis, data compression, and computer graphics. Given a finite number of data points in a metric space, a centroid-based clustering problem seeks a finite number of cluster centers with each data point assigned to the nearest cluster center in a way that a certain distance, a measure of dissimilarity among data points, is minimized. Since many kinds of data encountered in practical applications have nested structures, they are required to use multilevel hierarchical clustering which involves grouping a data set into a hierarchy of clusters. In this paper, we apply the mathematical optimization approach to the bilevel hierarchical clustering problem. In fact, using mathematical optimization in clustering is a very promising approach to overcome many disadvantages of the k mean algorithm commonly used in clustering; see [1,, 5, 15] and the references therein. In particular, the DCA was successfully applied in [] to a bilevel hierarchical clustering problem in which the distance measurement is defined by the squared Euclidean distance. Although the DCA in [] provides an effective way to solve the bilevel hierarchical clustering in high dimensions, it has not been used to solve the original model defined by the Euclidean distance measurement proposed in [5] which is not suitable for the resulting DCA according to the authors of []. By applying Nesterov s smoothing technique and the DCA, we are able to solve the original model proposed in [5] in high dimensions. The paper is organized as follows. In Section, we present basic definitions and tools of optimization that are used throughout the paper. In section 3, we study two models of bilevel hierarchical clustering problems, along with two new algorithms based on Nesterov s smoothing technique and the DCA. Numerical examples and conclusions are presented in Section 4 and Section 5, respectively. Basic Definitions and Tools of Optimization In this section, we present two main tools of optimization used to solve the bilevel hierarchical crusting problem: the DCA introduced by Pham Dinh Tao and Nesterov s smoothing technique. We consider throughout the paper DC programming: minimize f(x) := g(x) h(x), x R n, (.1) where g : R n R and h: R n R are convex functions. The function f in (.1) is called a DC function and g h is called a DC decomposition of f. Given a convex function g : R n R, the Fenchel conjugate of g is defined by g (y) := sup{ y, x g(x) x R n }, y R n. Note that g : R n (, ] is also a convex function. In addition, x g (y) if and only if y g(x), where denotes the subdifferential operator in the sense of convex analysis; see, e.g., [19-1]

4 Let us present below the DCA introduced by Tao and An [3, 9] as applied to (.1). Although the algorithm is used for nonconvex optimization problems, the convexity of the functions involved still plays a crucial role. The DCA INPUT: x 0 R n, N N. for k = 1,..., N do Find y k h(x k 1 ). Find x k g (y k ). end for OUTPUT: x N. Let us discuss below a convergence result of DC programming. A function h: R n R is called γ-convex (γ 0) if the function defined by k(x) := h(x) γ x, x R n, is convex. If there exists γ > 0 such that h is γ convex, then h is called strongly convex. We say that an element x R n is a critical point of the function f defined by (.1) if g( x) h( x). Obviously, in the case where both g and h are differentiable, x is a critical point of f if and only if x satisfies the Fermat rule f( x) = 0. The theorem below provides a convergence result for the DCA. It can be derived directly from [9, Theorem 3.7]. Theorem.1 Consider the function f defined by (.1) and the sequence {x k } generated by the DCA. The following properties are valid: (i) If g is γ 1 -convex and h is γ -convex, then f(x k ) f(x k+1 ) γ 1 + γ x k+1 x k for all k N. (ii) The sequence {f(x k )} is monotone decreasing. (iii) If f is bounded from below, g is γ 1 -convex and h is γ -convex with γ 1 + γ > 0, and {x k } is bounded, then every subsequential limit of the sequence {x k } is a critical point of f. Now we present a direct consequence of Nesterov s smoothing technique given in [16]. In the proposition below, d(x; Ω) denotes the Euclidean distance and P (x; Ω) denotes the Euclidean projection from a point x to a nonempty closed convex set Ω in R n. Proposition. Given any a R n and > 0, a Nesterov smoothing approximation of ϕ(x) := x a defined in R n has the representation Moreover, ϕ (x) = P ( x a ; B) and where B is the closed unit ball of R n. ϕ (x) := 1 x a [ x a d( ; B)]. (.) ϕ (x) ϕ(x) ϕ (x) +, 3

5 3 The Bilevel Hierarchical Clustering Problem Given a set of m points (nodes) a 1, a,..., a m in R n, our goal is to decompose this set into k clusters. In each cluster, we would like to find a point x i among the nodes and assign it as the center for this cluster with all points in the cluster connected to this center. Then we will find a total center x among the given points a 1, a,..., a m, and all centers are connected to this total center. The goal is to minimize the total transportation cost in this tree computed by the sum of the distances from the total center to each center and from each center to the nodes in each cluster. This is a discrete optimization problem which can be shown to be NP-hard. We will solve this problem based on continuous optimization techniques. 3.1 The Bilevel Hierarchical Clustering: Model I The difficulty in solving this hierarchical clustering problem lies in the fact that the centers and total center have to be among the nodes. We first relax this condition with the use of artificial centers x 1, x,..., x k that could be anywhere in R n. Then we define the total center as the centroid of x 1, x,..., x k given by The total cost of the tree is given by x := 1 k (x1 + x + + x k ). ϕ(x 1,..., x k ) := min,...,k xl a i + x l x. Note that here each a i is assigned to its closest center. However, what we expect are the real centers, which can be approximated by trying to minimize the difference between the artificial centers and the real centers. To achieve this goal, define the function φ(x 1,..., x k ) := min,...,m xl a i, x 1,..., x k R n. Observe that φ(x 1,..., x k ) = 0 if and only if for every l = 1,..., k, there exists i {1,..., m} such that x l = a i, which means that x l is a real node. Therefore, we consider the constrained minimization problem: minimize ϕ(x 1,..., x k ) subject to φ(x 1,..., x k ) = 0. This problem can be converted to an unconstrained minimization problem: minimize f λ (x 1,..., x k ) := ϕ(x 1,..., x k ) + λφ(x 1,..., x k ), x 1,..., x k R n, (3.1) where λ > 0 is a penalty parameter. Similar to the situation with the clustering problem, this new problem is nonsmooth and nonconvex, which can be solved by smoothing techniques and the DCA. Note that a particular case of this model in two dimensions was 4

6 considered in [5] where the problem was solved using the derivative-free discrete gradient method established in [4], but this method is not suitable for large-scale settings in high dimensions. The DCA was used in [] to solve a similar model with the squared Euclidean distance function used as the distance measurement. In this paper, the authors also addressed the difficulty of dealing with (3.1) using the DCA as not suitable for the resulting DCA. Nevertheless, we will show in what follows that the DCA is applicable to this model when combined with Nesterov s smoothing technique. Note that the functions ϕ and φ in (3.1) belong to the class of DC functions with the following DC decompositions: ϕ(x 1,..., x k ) = φ(x 1,..., x k ) = [ x l a i [ m = x l a i + x l a i r=1,...,k,l r ] x l x s=1,...,m,i s ] x l a i + x l x r=1,...,k,l r x l a i. It follows that the objective function f λ in (3.1) has the DC decomposition: f λ (x 1,..., x k ) = [ (1 + λ) [ m r=1,...,k,l r x l a i + x l a i + λ ] x l x s=1,...,m,i s x l a i, ] x l a i. This DC decomposition is not suitable for applying the DCA because there is no closed form for a subgradient of the function g involved. In the next step, we apply Nesterov s smoothing technique from Proposition. to approximate the objective function f λ by a new DC function favorable for applying the DCA. To accomplish this goal, we simply replace each term of the form x a from the first part of f λ (x 1,..., x k ) (the positive part) by the smooth approximation (.), while keeping the second part (the negative part) the same. As a result, we obtain f λ (x 1,..., x k ) := (1 + λ) (1 + λ) r=1,...,k,l r x l a i [ d + x l x ( x l a i )] ; B x l a i λ s=1,...,m,i s [ ( x l x )] d ; B x l a i. The original bilevel hierarchical clustering problem now can be solved using a DC program: minimize f λ (x 1,..., x k ) = g λ (x 1,..., x k ) h λ (x 1,..., x k ), x 1,..., x k R n. 5

7 In this formulation, g λ and h λ are convex functions on (R n ) k defined by g λ (x 1,..., x k ) := g 1 λ (x1,..., x k ) + g λ (x1,..., x k ), h λ (x 1,..., x k ) := h 1 λ (x1,..., x k ) + h λ (x1,..., x k ) + h 3 λ (x1,..., x k ) + h 4 λ (x1,..., x k ), with their respective components defined as g 1 λ (x1,..., x k ) := 1 + λ h 1 λ (x1,..., x k ) := h 3 λ (x1,..., x k ) := (1 + λ) x l a i, g λ (x1,..., x k ) := 1 r=1,...,k,l r x l x, [ ( x l a i )] d ; B, h λ (x1,..., x k ) := x l a i, h 4 λ (x1,..., x k ) := λ s=1,...,m,i s [ ( x l x d ; B)], x l a i. To facilitate the gradient and subgradient calculations for the DCA, we introduce a data matrix A and a variable matrix X. The data A is formed by putting each a i, i = 1,..., m, in the i th row, i.e., a 11 a 1 a a 1n a A = 1 a a 3... a n..... a m1 a m a m3... a mn Similarly, if x 1,..., x k are the k cluster centers, then the variable X is formed by putting each x l, l = 1,..., k, in the l th row, i.e., x 11 x 1 x x 1n x X = 1 x x 3... x n..... x k1 x k x k3... x kn Then the variable matrix X of the optimization problem belongs to R k n, the linear space of k by n real matrices equipped with the inner product X, Y := trace(x T Y). The Frobenius norm on R k n is defined by X F := X, X = k x l, x l = k x l. Finally, we represent the average of the k cluster centers by x, i.e., x := 1 k k j=1 xj. Gradient and Subgradient Calculations for the DCA Let us start by computing the gradient of g λ (X) = g 1 λ (X) + g λ (X). 6

8 Using the Frobenius norm, the function gλ 1 can equivalently be written as g 1 λ (X) = 1 + λ = 1 + λ = 1 + λ x l a i [ x l x l, a i + a i ] [m X F X, E km A + k A F where E km is a k m matrix whose entries are all ones. Hence, one can see that g 1 λ is differentiable and its gradient is given by g 1 λ (X) = 1 + λ [mx E kma]. ], Similarly, g λ can equivalently be written as g λ (X) = 1 = 1 = 1 = 1 x l x [ x l x l, x + x ] [ X F k [ X F 1 k X, E kk X + 1 ] X, E kk X k X, E kk X ], where E kk is a k k matrix whose entries are all ones. Hence, gλ gradient is given by gλ (X) = 1 [ X 1 ] k E kkx. is differentiable and its Since g λ (X) = gλ 1 (X) + g λ (X), its gradient can be computed by Therefore, g λ (X) = g 1 λ (X) + g λ (X) = 1 + λ [mx E kma] + 1 [ X 1 ] k E kkx = 1 [ (1 + λ)mx (1 + λ)e km A + X 1 ] k E kkx = 1 [[ [(1 + λ)m + 1] I kk 1 k E kk ] X (1 + λ)e km A g λ (X) = [ 1 (((1 + λ)m + 1)Ikk J ) ] X (1 + λ)s, where J = 1 k E kk, and S = E km A. ]. 7

9 Our goal now is to compute g (Y ), which can be accomplished by the relation The latter can equivalently be written as X = g (Y) if and only if Y = g(x). [[1 + (1 + λ)m] I kk J] X = [(1 + λ)s + Y]. Then with some algebraic manipulation we can show that [ ] g 1 (Y) = X = 1 + (1 + λ)m I 1 kk + [1 + (1 + λ)m](1 + λ)m J [(1 + λ)s + Y]. Next, we will demonstrate in more detail the techniques we used in finding a subgradient for the convex function h λ. Recall that h λ is defined by h λ (X) = We will start with the function h 1 λ given by h 1 (1 + λ) λ (X) = 4 h i λ (X). [ ( x l a i d ; B)]. From its representation, one can see that h 1 λ is differentiable, and hence its subgradient coincides with its gradient, that can be computed by the partial derivatives with respect to x 1,, x k, i.e., h 1 λ m x l (X) = (1 + λ) [ x l a i ( x l a i )] P ; B. Thus, for l = 1,,..., k, h 1 λ (X) is the k n matrix U whose lth row is h1 λ (X). x l Similarly, one can see that the function h λ given by h λ (X) = [ ( x l x )] d ; B is differentiable with its partial derivatives computed by h [ x l x l (X) = x ( x l x )] P ; B 1 [ x j x ( x j x )] P ; B. k Hence, for l = 1,,..., k, h (X) is the k n matrix V whose l th row is h x l (X). Unlike h 1 λ and h λ, the convex functions h3 λ and h4 λ are not differentiable, but both can be written as a finite sum of the imum of a finite number of convex functions. Let us compute a subgradient of h 3 λ as an example. We have j=1 h 3 λ (X) = m r=1,...,k,l r 8 x l a i = γ i (X),

10 where, for i = 1,..., m, γ i (X) := γ ir(x) = x l a i, r = 1,..., k.,l r Then, for each i = 1,..., m, we find W i γ i (X) according to the subdifferential rule for the imum of convex functions. Then define W := m W i to get a subgradient of the function h 3 λ at X by the subdifferential sum rule. To accomplish this goal, we first choose an index r from the index set {1,..., k} such that γ i (X) = γ ir (X) = x l a i.,l r Using the familiar subdifferential formula of the Euclidean norm function, the l th row wi l for l r of the matrix W i is determined as follows x l a i wi l if x l a i, x := l a i u B if x l = a i. The r th row of the matrix W i is w r i := 0. The procedure for calculating a subgradient of the function h 4 λ given by h 4 λ (x1,..., x k ) = λ s=1,...,m,i s is very similar to what we just demonstrated for h 3 λ. x l a i, At this point, we are ready to give a new DCA based algorithm for Model I. Algorithm 1. INPUT: X 0, λ 0, 0, N N. while stopping criteria (λ, ) = false do for k = 1,, 3,, N do Find Y k h 1λ (X k 1 ). Find X k g1λ (Y k). end for update λ and. end while OUTPUT: X N. 3. The Bilevel Hierarchical Clustering: Model II In this section, we introduce the second model to solve the bilevel hierarchical clustering problem. In this model, we use an additional variable x k+1 to denote the total center. At 9

11 first we allow the total center x k+1 to be a free point in R n, the same as the k cluster centers. Then the total cost of the tree is given by ϕ(x 1,..., x k+1 ) := min,...,k xl a i + x l x k+1, x 1,..., x k+1 R n. To force the k + 1 centers to be chosen from the given nodes (or to make them as close to the nodes as possible), we set the constraint φ(x 1,..., x k+1 ) := k+1 Our goal is to solve the optimization problem minimize ϕ(x 1,..., x k+1 ) min,...,m xl a i = 0. subject to φ(x 1,..., x k+1 ), x 1,..., x k+1 R n. Similar to the first model, this problem formulation can be converted to an unconstrained minimization problem involving a penalty parameter λ > 0: minimize f λ (x 1,..., x k+1 ) := ϕ(x 1,..., x k+1 ) + λφ(x 1,..., x k+1 ), x 1,..., x k+1 R n. (3.) Next, we apply Nesterov s smoothing technique to get an approximation of the objective function f given in (3.) which involves two parameter λ > 0 and > 0: f λ (X) := 1 + λ λ k+1 [ d x l a i + 1 ( x k+1 a i )] ; B [ ( x l x k+1 d ; B)] x l x k+1 1 (1 + λ) r=1,...,k,l r [ d x k+1 a i ( x l a i )] ; B k+1 x l a i λ s=1,...,m,i s As we will show in what follows, it is convenient to apply the DCA to minimize the function f λ. This function can be represented as the differences of two convex functions defined on R (k+1) n using a variable X whose i th row is x i for i = 1,..., k + 1: f λ (X) = g λ (X) h λ (X), X R (k+1) n. In this formulation, g λ and h λ are convex functions defined on R (k+1) n by g λ (X) := g 1 λ (X) + g λ (X) x l a i. and h λ (X) := h 1 λ (X) + h λ (X) + h3 λ (X) + h4 λ (X) + h5 λ (X) + h6 λ (X), 10

12 with their respective components given by gλ λ (X) := h 1 1 λ (X) := h 3 (1 + λ) λ (X) := h 5 λ (X) := m k+1 x l a i, gλ 1 (X) := x l x k+1, x k+1 a i, h λ (X) := λ [ ( x k+1 a i d ; B)], [ ( x l a i )] d ; B, h 4 λ (X) := [ ( x l x k+1 d ; B)], r=1,...,k,l r k+1 x l a i, h 6 λ (X) := λ s=1,...,m,i s x l a i. Gradient and Subgradient Calculations for the DCA Let us start by computing the gradient of the first part of the DC decomposition, i.e., g λ (X) = g 1 λ (X) + g λ (X). By applying similar techniques used in computing gradients/subgradients for Model I, gλ 1 (X) can be written as g 1 λ (X) = 1 + λ [mx EA], where E is the (k + 1) m matrix whose entries are all ones. Similarly, gλ (X) can also be written as gλ (X) = 1 [I + T] X, where I is the (k + 1) (k + 1) identity matrix, and T is the (k + 1) (k + 1) matrix whose entries are all zeros except its (k + 1) th row and (k + 1) th column, which both are filled by the vector ( 1, 1, 1,..., 1, k 1). It follows that g λ (X) = g 1 λ (X) + g λ (X) = 1 + λ [mx EA] + 1 [I + T] X = 1 [c 1I + T] X 1 + λ EA, where c 1 = 1 + (1 + λ)m. Our goal now is to compute gλ (Y), which can be accomplished by the relation X = gλ (Y) if and only if Y = g λ(x). The latter can be equivalently written as [c 1 I + T] X = (1 + λ)ea + Y, 11

13 whose solutions can be explicitly computed by its l th row for l = 1,..., k + 1: In the representation x l = [(1 + λ)ea + Y] l + xk+1 c 1 for l = 1,..., k, x k+1 = c 1 [(1 + λ)ea + Y] k+1 + k [(1 + λ)ea + Y] l. (c 1 + k)(c 1 1) h λ (X) = h 1 λ (X) + h λ (X) + h3 λ (X) + h4 λ (X) + h5 λ (X) + h6 λ (X), the convex functions h 1 λ, h λ, h3 λ, and h4 λ are given by h 1 λ are differentiable. The partial derivatives of h 1 λ (X) = 0 for l = 1,..., k, xl [ h 1 λ x k+1 (X) = 1 (x k+1 a i ) = 1 mx k+1 ] A i. Then h 1 λ (X) is the (k + 1) n matrix L whose lth row is h1 λ (X) for l = 1,..., k + 1. x l Similarly, the partial derivatives of h λ are given by h λ (X) = 0 for l = 1,..., k, xl h λ m x k+1 (X) = λ [ x k+1 a i ( x k+1 a i )] P ; B. Then h λ (X) is the (k + 1) n matrix M whose lth row is h λ (X), for l = 1,..., k + 1. x l Now we compute the gradient of h 3 λ. We have h 3 λ m x l (X) = (1 + λ) [ x l a i ( x l a i P )] ; B for l = 1,..., k, h 3 λ (X) = 0. xk+1 Thus, h 3 λ (X) is the (k + 1) n matrix U whose lth row is h3 λ x l (X) for l = 1,...,, k + 1. Let us compute the gradient of h 4 λ. We have h 4 λ x l (X) = h 4 λ x k+1 (X) = [ x l x k+1 ( x l x k+1 )] P ; B [ x l x k+1 ( x l x k+1 P for l = 1,..., k, )] ; B. 1

14 Similarly, h 4 λ (X) is the (k + 1) n matrix V whose lth row is filled with h4 λ (X), for x l l = 1,..., k + 1. The procedure for computing subgradients of the last two nondifferentiable components, h 5 λ and h6 λ, is similar to the procedure we described for computing subgradients of h3 λ in Model I. The following is a DCA based algorithm for Model II. Algorithm. INPUT: X 0, λ 0, 0, N N. while stopping criteria (λ, ) = false do for k = 1,, 3,, N do Find Y k h λ (X k 1 ). Find X k gλ (Y k). end for update λ and. end while OUTPUT: X N. 4 Numerical Experiments We use MATLAB to code our algorithms and perform numerical experiments on a MacBook Pro with. GHz Intel Core i7 Processor, and 16 GB 1600 MHz DDR3 Memory. For our numerical experiments, we use three data sets: one artificial data set with 18 data points in R (see Figure 1a), the EIL76 and the PR100 from [17] (see Figure 1b and Figure 1c), respectively. (a) 18 Data Points, Centers (b) 76 Data Points, 3 Centers (c) 100 Data Points, 6 Centers Figure 1: Plots of the three test data sets The two MATLAB codes used to implement our two algorithms have two major parts: an outer loop for updating the penalty and the smoothing parameters and an inner loop for updating the cluster centers. The penalty parameter λ and the smoothing parameter are updated as follows. Choosing 0 > 0, λ 0 > 0 and σ 1 > 1, σ (0, 1), we update λ i+1 = σ 1 λ i and i+1 = σ i for i 0 after each outer loop. To choose σ 1 and σ, we first 13

15 let N be the number of outer iterations and choose λ 0, λ as the initial and final values of λ, respectively, and similarly for 0, min. Then choose growth/decay parameters according to σ 1 = (λ /λ 0 ) 1/N and σ = ( min / 0 ) 1/N. By trial and error we find that the values chosen for λ 0, λ, 0, and min in large part determine the performance of the two algorithms for each data set. Intuitively, we see that very large values of λ will over-penalize the distance between an artificial center and its nearest data node and may prevent the algorithm from clustering properly. We therefore use λ 0 1 λ so that the algorithm has a chance to cluster the data before the penalty parameter takes effect. Similarly, we choose min 1 0. We select the starting center X 0 at a certain radius, γ rad(a), from the median point, median(a), of the entire data set, i.e., X 0 = median(a) + γ rad(a) U, where rad(a) := { a i median(a) a i A}, γ is a randomly chosen real number from the standard uniform distribution on the open interval (0,1), U is a k n matrix whose k rows are randomly generated unit vectors in R n, and the sum is in the sense of adding a vector to each row of a matrix. As showed in Tables 1,, and 3, both algorithms identify the optimal solutions with reasonable amount of time for both DS18 and EIL76 with two and three cluster centers, respectively. To get a good starting point which yields a better estimate of the optimal value for bigger data sets such as PR100, we use a method called radial search described as follows. Given initial radius r 0 > 0 and m N, set γ = ir 0 for i = 1,..., m. Then we test the algorithm with different starting points given by X 0 (i) = median(a) + ir 0 (rad(a)u) for i = 1,..., m. Figure shows the result of the method applied to PR100 with six cluster centers, where the y-axis represents the optimal value returned by ALG1 with different starting points X 0 (i), as represented on the x-axis. 0 = 5.70, λ 0 = 0.001, σ 1 = 7500, σ = 0.5 Cost1 Cost Time1 Time Iter1 Iter k m n ADS ADS ADS Table 1: Results for the 18 points artificial data set. 0 = 100, λ 0 = 10 6, σ 1 = 1, σ = 0.5 Cost1 Cost Time1 Time Iter1 Iter k m n EIL EIL EIL Table : Results for EIL76 data set. 14

16 0 = 1950, λ 0 = 10 6, σ 1 = 7500, σ = 0.5 Cost1 Cost Time1 Time Iter1 Iter k m n PR e e PR e e PR e e Table 3: Results for PR100 data set. For comparison purposes, both Cost1 and Cost are computed by the same way. First, we systematically reassign the k cluster centers returned by the respective algorithms by k real nodes that are close to them, i.e., for l = 1,..., k x l = argmin{ x l a i a i A}. Then the total center x will be a real node, from the remaining nodes, whose sum of distances from the k reassigned centers is the minimal, i.e., { } x := argmin x l a i a i A. The total cost is computed by adding the distance of each real node to its closest center (including the total center), and the distances of the total center from the k cluster centers, i.e., Cost := min,...,k+1 xl a i + x l x k+1, where x k+1 = x. Figure : PR100, The 100 City Problem, with 6 Cluster Centers 15

17 5 Conclusions In our numerical experiments, one of the major challenge we face is how to choose optimal parameters for our algorithms. We observe that parameter selection is the decisive factor in terms of accuracy and speed of convergence of our proposed algorithms. The performance of the proposed algorithms highly depends on the initial values set to the penalty and smoothing parameters, λ 0 and 0 ; and their respective growth/decay factors σ 1 and σ. A better guidance of selecting the optimal parameter than the methods we suggest in this paper will be addressed in or future research. References [1] L. T. H. An, M. T. Belghiti, P. D. Tao: A new efficient algorithm based on DC programming and DCA for clustering. J. Glob. Optim., 7, (007) [] L. T. H. An, and L. H. Minh, Optimization based DC programming and DCA for hierarchical clustering. European J. Oper. Res. 183 (007), [3] L. T. H. An, and P. D. Tao, Convex analysis approach to D.C. programming: Theory, algorithms and applications. Acta Math. Vietnam. (1997), [4] A. M. Bagirov, Derivative-free methods for unconstrained nonsmooth optimization and its numerical analysis. Investigacao Operacional. 19 (1999), [5] A. Bagirov, Long Jia, I. Ouveysi, and A.M. Rubinov, Optimization based clustering algorithms in Multicast group hierarchies, in: Proceedings of the Australian Telecommunications, Networks and Applications Conference (ATNAC), 003, Melbourne Australia (published on CD, ISNB ). [6] H. H. Bauschke and P. L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces Springer, New York, 011. [7] J. M. Borwein and A. S. Lewis, Convex Analysis and Nonlinear Optimization, nd edition, Springer, New York, 006. [8] R. I. Boţ, Conjugate Duality in Convex Optimization, Springer, Berlin, 010. [9] T. Pham Dinh and H. A. Le Thi, A d.c. optimization algorithm for solving the trust-region subproblem, SIAM J. Optim. 8 (1998), [10] P. Hartman, On functions representable as a difference of convex functions. Pacific J. Math. 9, (1959), [11] J.-B. Hiriart-Urruty, C. Lemaréchal, Convex Analysis and Minimization Algorithms I, II, Springer, Berlin, [1] J. B. Hiriart-Urruty, Generalized differentiability, duality and optimization for problems dealing with differences of convex functions. Lecture Note in Economics and Math. Systems. 56 (1985), [13] B. S. Mordukhovich, Variational Analysis and Generalized Differentiation, I: Basic Theory, II: Applications, Springer, Berlin, 006. [14] B. S. Mordukhovich and N. M. Nam, An Easy Path to Convex Analysis and Applications, Morgan & Claypool Publishers, San Rafael, CA,

18 [15] N. M. Nam, R.B. Rector, D. Giles: Minimizing Differences of Convex Functions with Applications to Facility Location and Clustering, submitted. [16] Y. Nesterov: Smooth minimization of non-smooth functions. Math. Program. 103, (005) [17] G. Reinelt, TSPLIB: A Traveling Salesman Problem Library. ORSA Journal of Computing. 3 (1991), [18] R. T. Rockafellar, Convex Analysis, Princeton University Press, Princeton, NJ, [19] R. T. Rockafellar, Conjugate Duality and Optimization, SIAM, Philadelphia, PA,

DC Programming: A brief tutorial. Andres Munoz Medina Courant Institute of Mathematical Sciences

DC Programming: A brief tutorial. Andres Munoz Medina Courant Institute of Mathematical Sciences DC Programming: A brief tutorial. Andres Munoz Medina Courant Institute of Mathematical Sciences munoz@cims.nyu.edu Difference of Convex Functions Definition: A function exists f is said to be DC if there

More information

Surrogate Gradient Algorithm for Lagrangian Relaxation 1,2

Surrogate Gradient Algorithm for Lagrangian Relaxation 1,2 Surrogate Gradient Algorithm for Lagrangian Relaxation 1,2 X. Zhao 3, P. B. Luh 4, and J. Wang 5 Communicated by W.B. Gong and D. D. Yao 1 This paper is dedicated to Professor Yu-Chi Ho for his 65th birthday.

More information

Lecture 19: Convex Non-Smooth Optimization. April 2, 2007

Lecture 19: Convex Non-Smooth Optimization. April 2, 2007 : Convex Non-Smooth Optimization April 2, 2007 Outline Lecture 19 Convex non-smooth problems Examples Subgradients and subdifferentials Subgradient properties Operations with subgradients and subdifferentials

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning (Problems; Algorithms - C) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html

More information

Convex Analysis and Minimization Algorithms I

Convex Analysis and Minimization Algorithms I Jean-Baptiste Hiriart-Urruty Claude Lemarechal Convex Analysis and Minimization Algorithms I Fundamentals With 113 Figures Springer-Verlag Berlin Heidelberg New York London Paris Tokyo Hong Kong Barcelona

More information

Recent Developments in Model-based Derivative-free Optimization

Recent Developments in Model-based Derivative-free Optimization Recent Developments in Model-based Derivative-free Optimization Seppo Pulkkinen April 23, 2010 Introduction Problem definition The problem we are considering is a nonlinear optimization problem with constraints:

More information

Convexity: an introduction

Convexity: an introduction Convexity: an introduction Geir Dahl CMA, Dept. of Mathematics and Dept. of Informatics University of Oslo 1 / 74 1. Introduction 1. Introduction what is convexity where does it arise main concepts and

More information

1. Introduction. performance of numerical methods. complexity bounds. structural convex optimization. course goals and topics

1. Introduction. performance of numerical methods. complexity bounds. structural convex optimization. course goals and topics 1. Introduction EE 546, Univ of Washington, Spring 2016 performance of numerical methods complexity bounds structural convex optimization course goals and topics 1 1 Some course info Welcome to EE 546!

More information

Convex Optimization / Homework 2, due Oct 3

Convex Optimization / Homework 2, due Oct 3 Convex Optimization 0-725/36-725 Homework 2, due Oct 3 Instructions: You must complete Problems 3 and either Problem 4 or Problem 5 (your choice between the two) When you submit the homework, upload a

More information

Lecture 2 September 3

Lecture 2 September 3 EE 381V: Large Scale Optimization Fall 2012 Lecture 2 September 3 Lecturer: Caramanis & Sanghavi Scribe: Hongbo Si, Qiaoyang Ye 2.1 Overview of the last Lecture The focus of the last lecture was to give

More information

Recommendation System Using Yelp Data CS 229 Machine Learning Jia Le Xu, Yingran Xu

Recommendation System Using Yelp Data CS 229 Machine Learning Jia Le Xu, Yingran Xu Recommendation System Using Yelp Data CS 229 Machine Learning Jia Le Xu, Yingran Xu 1 Introduction Yelp Dataset Challenge provides a large number of user, business and review data which can be used for

More information

of Convex Analysis Fundamentals Jean-Baptiste Hiriart-Urruty Claude Lemarechal Springer With 66 Figures

of Convex Analysis Fundamentals Jean-Baptiste Hiriart-Urruty Claude Lemarechal Springer With 66 Figures 2008 AGI-Information Management Consultants May be used for personal purporses only or by libraries associated to dandelon.com network. Jean-Baptiste Hiriart-Urruty Claude Lemarechal Fundamentals of Convex

More information

Convex Optimization MLSS 2015

Convex Optimization MLSS 2015 Convex Optimization MLSS 2015 Constantine Caramanis The University of Texas at Austin The Optimization Problem minimize : f (x) subject to : x X. The Optimization Problem minimize : f (x) subject to :

More information

Global Minimization via Piecewise-Linear Underestimation

Global Minimization via Piecewise-Linear Underestimation Journal of Global Optimization,, 1 9 (2004) c 2004 Kluwer Academic Publishers, Boston. Manufactured in The Netherlands. Global Minimization via Piecewise-Linear Underestimation O. L. MANGASARIAN olvi@cs.wisc.edu

More information

Local Optimization Method with Global Multidimensional Search for descent.

Local Optimization Method with Global Multidimensional Search for descent. Local Optimization Method with Global Multidimensional Search for descent. Adil M. Bagirov, Alexander M. Rubinov and Jiapu Zhang Centre for Informatics and Applied Optimization, School of Information Technology

More information

Distance-to-Solution Estimates for Optimization Problems with Constraints in Standard Form

Distance-to-Solution Estimates for Optimization Problems with Constraints in Standard Form Distance-to-Solution Estimates for Optimization Problems with Constraints in Standard Form Philip E. Gill Vyacheslav Kungurtsev Daniel P. Robinson UCSD Center for Computational Mathematics Technical Report

More information

Bilevel Sparse Coding

Bilevel Sparse Coding Adobe Research 345 Park Ave, San Jose, CA Mar 15, 2013 Outline 1 2 The learning model The learning algorithm 3 4 Sparse Modeling Many types of sensory data, e.g., images and audio, are in high-dimensional

More information

Introduction to Modern Control Systems

Introduction to Modern Control Systems Introduction to Modern Control Systems Convex Optimization, Duality and Linear Matrix Inequalities Kostas Margellos University of Oxford AIMS CDT 2016-17 Introduction to Modern Control Systems November

More information

Iteratively Re-weighted Least Squares for Sums of Convex Functions

Iteratively Re-weighted Least Squares for Sums of Convex Functions Iteratively Re-weighted Least Squares for Sums of Convex Functions James Burke University of Washington Jiashan Wang LinkedIn Frank Curtis Lehigh University Hao Wang Shanghai Tech University Daiwei He

More information

Bilinear Programming

Bilinear Programming Bilinear Programming Artyom G. Nahapetyan Center for Applied Optimization Industrial and Systems Engineering Department University of Florida Gainesville, Florida 32611-6595 Email address: artyom@ufl.edu

More information

Aspects of Convex, Nonconvex, and Geometric Optimization (Lecture 1) Suvrit Sra Massachusetts Institute of Technology

Aspects of Convex, Nonconvex, and Geometric Optimization (Lecture 1) Suvrit Sra Massachusetts Institute of Technology Aspects of Convex, Nonconvex, and Geometric Optimization (Lecture 1) Suvrit Sra Massachusetts Institute of Technology Hausdorff Institute for Mathematics (HIM) Trimester: Mathematics of Signal Processing

More information

The Set-Open topology

The Set-Open topology Volume 37, 2011 Pages 205 217 http://topology.auburn.edu/tp/ The Set-Open topology by A. V. Osipov Electronically published on August 26, 2010 Topology Proceedings Web: http://topology.auburn.edu/tp/ Mail:

More information

Lagrangian Relaxation: An overview

Lagrangian Relaxation: An overview Discrete Math for Bioinformatics WS 11/12:, by A. Bockmayr/K. Reinert, 22. Januar 2013, 13:27 4001 Lagrangian Relaxation: An overview Sources for this lecture: D. Bertsimas and J. Tsitsiklis: Introduction

More information

A Derivative-Free Approximate Gradient Sampling Algorithm for Finite Minimax Problems

A Derivative-Free Approximate Gradient Sampling Algorithm for Finite Minimax Problems 1 / 33 A Derivative-Free Approximate Gradient Sampling Algorithm for Finite Minimax Problems Speaker: Julie Nutini Joint work with Warren Hare University of British Columbia (Okanagan) III Latin American

More information

A Comparison of Mixed-Integer Programming Models for Non-Convex Piecewise Linear Cost Minimization Problems

A Comparison of Mixed-Integer Programming Models for Non-Convex Piecewise Linear Cost Minimization Problems A Comparison of Mixed-Integer Programming Models for Non-Convex Piecewise Linear Cost Minimization Problems Keely L. Croxton Fisher College of Business The Ohio State University Bernard Gendron Département

More information

Linear Bilevel Programming With Upper Level Constraints Depending on the Lower Level Solution

Linear Bilevel Programming With Upper Level Constraints Depending on the Lower Level Solution Linear Bilevel Programming With Upper Level Constraints Depending on the Lower Level Solution Ayalew Getachew Mersha and Stephan Dempe October 17, 2005 Abstract Focus in the paper is on the definition

More information

Alternating Projections

Alternating Projections Alternating Projections Stephen Boyd and Jon Dattorro EE392o, Stanford University Autumn, 2003 1 Alternating projection algorithm Alternating projections is a very simple algorithm for computing a point

More information

Convexity Theory and Gradient Methods

Convexity Theory and Gradient Methods Convexity Theory and Gradient Methods Angelia Nedić angelia@illinois.edu ISE Department and Coordinated Science Laboratory University of Illinois at Urbana-Champaign Outline Convex Functions Optimality

More information

BINARY TOMOGRAPHY BY ITERATING LINEAR PROGRAMS

BINARY TOMOGRAPHY BY ITERATING LINEAR PROGRAMS BINARY TOMOGRAPHY BY ITERATING LINEAR PROGRAMS Stefan Weber, 1 Christoph Schnörr, 1 Thomas Schüle, 1,3 and Joachim Hornegger 2 1 University of Mannheim Dept. M&CS, CVGPR-Group, D-68131 Mannheim, Germany

More information

Convex optimization algorithms for sparse and low-rank representations

Convex optimization algorithms for sparse and low-rank representations Convex optimization algorithms for sparse and low-rank representations Lieven Vandenberghe, Hsiao-Han Chao (UCLA) ECC 2013 Tutorial Session Sparse and low-rank representation methods in control, estimation,

More information

CONVEX OPTIMIZATION: A SELECTIVE OVERVIEW

CONVEX OPTIMIZATION: A SELECTIVE OVERVIEW 1! CONVEX OPTIMIZATION: A SELECTIVE OVERVIEW Dimitri Bertsekas! M.I.T.! Taiwan! May 2010! 2! OUTLINE! Convexity issues in optimization! Common geometrical framework for duality and minimax! Unifying framework

More information

Orientation of manifolds - definition*

Orientation of manifolds - definition* Bulletin of the Manifold Atlas - definition (2013) Orientation of manifolds - definition* MATTHIAS KRECK 1. Zero dimensional manifolds For zero dimensional manifolds an orientation is a map from the manifold

More information

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York Clustering Robert M. Haralick Computer Science, Graduate Center City University of New York Outline K-means 1 K-means 2 3 4 5 Clustering K-means The purpose of clustering is to determine the similarity

More information

Division of the Humanities and Social Sciences. Convex Analysis and Economic Theory Winter Separation theorems

Division of the Humanities and Social Sciences. Convex Analysis and Economic Theory Winter Separation theorems Division of the Humanities and Social Sciences Ec 181 KC Border Convex Analysis and Economic Theory Winter 2018 Topic 8: Separation theorems 8.1 Hyperplanes and half spaces Recall that a hyperplane in

More information

FACES OF CONVEX SETS

FACES OF CONVEX SETS FACES OF CONVEX SETS VERA ROSHCHINA Abstract. We remind the basic definitions of faces of convex sets and their basic properties. For more details see the classic references [1, 2] and [4] for polytopes.

More information

A PARAMETRIC SIMPLEX METHOD FOR OPTIMIZING A LINEAR FUNCTION OVER THE EFFICIENT SET OF A BICRITERIA LINEAR PROBLEM. 1.

A PARAMETRIC SIMPLEX METHOD FOR OPTIMIZING A LINEAR FUNCTION OVER THE EFFICIENT SET OF A BICRITERIA LINEAR PROBLEM. 1. ACTA MATHEMATICA VIETNAMICA Volume 21, Number 1, 1996, pp. 59 67 59 A PARAMETRIC SIMPLEX METHOD FOR OPTIMIZING A LINEAR FUNCTION OVER THE EFFICIENT SET OF A BICRITERIA LINEAR PROBLEM NGUYEN DINH DAN AND

More information

ACCELERATED DUAL GRADIENT-BASED METHODS FOR TOTAL VARIATION IMAGE DENOISING/DEBLURRING PROBLEMS. Donghwan Kim and Jeffrey A.

ACCELERATED DUAL GRADIENT-BASED METHODS FOR TOTAL VARIATION IMAGE DENOISING/DEBLURRING PROBLEMS. Donghwan Kim and Jeffrey A. ACCELERATED DUAL GRADIENT-BASED METHODS FOR TOTAL VARIATION IMAGE DENOISING/DEBLURRING PROBLEMS Donghwan Kim and Jeffrey A. Fessler University of Michigan Dept. of Electrical Engineering and Computer Science

More information

Research Interests Optimization:

Research Interests Optimization: Mitchell: Research interests 1 Research Interests Optimization: looking for the best solution from among a number of candidates. Prototypical optimization problem: min f(x) subject to g(x) 0 x X IR n Here,

More information

A Short SVM (Support Vector Machine) Tutorial

A Short SVM (Support Vector Machine) Tutorial A Short SVM (Support Vector Machine) Tutorial j.p.lewis CGIT Lab / IMSC U. Southern California version 0.zz dec 004 This tutorial assumes you are familiar with linear algebra and equality-constrained optimization/lagrange

More information

A Note on Smoothing Mathematical Programs with Equilibrium Constraints

A Note on Smoothing Mathematical Programs with Equilibrium Constraints Applied Mathematical Sciences, Vol. 3, 2009, no. 39, 1943-1956 A Note on Smoothing Mathematical Programs with Equilibrium Constraints M. A. Tawhid Department of Mathematics and Statistics School of Advanced

More information

Sparse Optimization Lecture: Proximal Operator/Algorithm and Lagrange Dual

Sparse Optimization Lecture: Proximal Operator/Algorithm and Lagrange Dual Sparse Optimization Lecture: Proximal Operator/Algorithm and Lagrange Dual Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know learn the proximal

More information

Efficient MR Image Reconstruction for Compressed MR Imaging

Efficient MR Image Reconstruction for Compressed MR Imaging Efficient MR Image Reconstruction for Compressed MR Imaging Junzhou Huang, Shaoting Zhang, and Dimitris Metaxas Division of Computer and Information Sciences, Rutgers University, NJ, USA 08854 Abstract.

More information

Lecture 4 Duality and Decomposition Techniques

Lecture 4 Duality and Decomposition Techniques Lecture 4 Duality and Decomposition Techniques Jie Lu (jielu@kth.se) Richard Combes Alexandre Proutiere Automatic Control, KTH September 19, 2013 Consider the primal problem Lagrange Duality Lagrangian

More information

A Hyperbolic Smoothing Approach to the Fermat-Weber Location Problem

A Hyperbolic Smoothing Approach to the Fermat-Weber Location Problem A Hyperbolic Smoothing Approach to the Fermat-Weber Location Problem Vinicius Layter Xavier 1 Felipe Maia Galvão França 1 Adilson Elias Xavier 1 Priscila Machado Vieira Lima 2 1 Dept. of Systems Engineering

More information

ME 555: Distributed Optimization

ME 555: Distributed Optimization ME 555: Distributed Optimization Duke University Spring 2015 1 Administrative Course: ME 555: Distributed Optimization (Spring 2015) Instructor: Time: Location: Office hours: Website: Soomin Lee (email:

More information

control polytope. These points are manipulated by a descent method to compute a candidate global minimizer. The second method is described in Section

control polytope. These points are manipulated by a descent method to compute a candidate global minimizer. The second method is described in Section Some Heuristics and Test Problems for Nonconvex Quadratic Programming over a Simplex Ivo Nowak September 3, 1998 Keywords:global optimization, nonconvex quadratic programming, heuristics, Bezier methods,

More information

Clustering algorithms and introduction to persistent homology

Clustering algorithms and introduction to persistent homology Foundations of Geometric Methods in Data Analysis 2017-18 Clustering algorithms and introduction to persistent homology Frédéric Chazal INRIA Saclay - Ile-de-France frederic.chazal@inria.fr Introduction

More information

Advanced Operations Research Techniques IE316. Quiz 1 Review. Dr. Ted Ralphs

Advanced Operations Research Techniques IE316. Quiz 1 Review. Dr. Ted Ralphs Advanced Operations Research Techniques IE316 Quiz 1 Review Dr. Ted Ralphs IE316 Quiz 1 Review 1 Reading for The Quiz Material covered in detail in lecture. 1.1, 1.4, 2.1-2.6, 3.1-3.3, 3.5 Background material

More information

Topological properties of convex sets

Topological properties of convex sets Division of the Humanities and Social Sciences Ec 181 KC Border Convex Analysis and Economic Theory Winter 2018 Topic 5: Topological properties of convex sets 5.1 Interior and closure of convex sets Let

More information

Preprint Stephan Dempe, Alina Ruziyeva The Karush-Kuhn-Tucker optimality conditions in fuzzy optimization ISSN

Preprint Stephan Dempe, Alina Ruziyeva The Karush-Kuhn-Tucker optimality conditions in fuzzy optimization ISSN Fakultät für Mathematik und Informatik Preprint 2010-06 Stephan Dempe, Alina Ruziyeva The Karush-Kuhn-Tucker optimality conditions in fuzzy optimization ISSN 1433-9307 Stephan Dempe, Alina Ruziyeva The

More information

Primal and Dual Methods for Optimisation over the Non-dominated Set of a Multi-objective Programme and Computing the Nadir Point

Primal and Dual Methods for Optimisation over the Non-dominated Set of a Multi-objective Programme and Computing the Nadir Point Primal and Dual Methods for Optimisation over the Non-dominated Set of a Multi-objective Programme and Computing the Nadir Point Ethan Liu Supervisor: Professor Matthias Ehrgott Lancaster University Outline

More information

Finding Euclidean Distance to a Convex Cone Generated by a Large Number of Discrete Points

Finding Euclidean Distance to a Convex Cone Generated by a Large Number of Discrete Points Submitted to Operations Research manuscript (Please, provide the manuscript number!) Finding Euclidean Distance to a Convex Cone Generated by a Large Number of Discrete Points Ali Fattahi Anderson School

More information

Convexization in Markov Chain Monte Carlo

Convexization in Markov Chain Monte Carlo in Markov Chain Monte Carlo 1 IBM T. J. Watson Yorktown Heights, NY 2 Department of Aerospace Engineering Technion, Israel August 23, 2011 Problem Statement MCMC processes in general are governed by non

More information

EE/AA 578: Convex Optimization

EE/AA 578: Convex Optimization EE/AA 578: Convex Optimization Instructor: Maryam Fazel University of Washington Fall 2016 1. Introduction EE/AA 578, Univ of Washington, Fall 2016 course logistics mathematical optimization least-squares;

More information

Characterizing Improving Directions Unconstrained Optimization

Characterizing Improving Directions Unconstrained Optimization Final Review IE417 In the Beginning... In the beginning, Weierstrass's theorem said that a continuous function achieves a minimum on a compact set. Using this, we showed that for a convex set S and y not

More information

Projection onto the probability simplex: An efficient algorithm with a simple proof, and an application

Projection onto the probability simplex: An efficient algorithm with a simple proof, and an application Proection onto the probability simplex: An efficient algorithm with a simple proof, and an application Weiran Wang Miguel Á. Carreira-Perpiñán Electrical Engineering and Computer Science, University of

More information

Online algorithms for clustering problems

Online algorithms for clustering problems University of Szeged Department of Computer Algorithms and Artificial Intelligence Online algorithms for clustering problems Summary of the Ph.D. thesis by Gabriella Divéki Supervisor Dr. Csanád Imreh

More information

CHAPTER 2 CONVENTIONAL AND NON-CONVENTIONAL TECHNIQUES TO SOLVE ORPD PROBLEM

CHAPTER 2 CONVENTIONAL AND NON-CONVENTIONAL TECHNIQUES TO SOLVE ORPD PROBLEM 20 CHAPTER 2 CONVENTIONAL AND NON-CONVENTIONAL TECHNIQUES TO SOLVE ORPD PROBLEM 2.1 CLASSIFICATION OF CONVENTIONAL TECHNIQUES Classical optimization methods can be classified into two distinct groups:

More information

arxiv: v3 [math.oc] 3 Nov 2016

arxiv: v3 [math.oc] 3 Nov 2016 BILEVEL POLYNOMIAL PROGRAMS AND SEMIDEFINITE RELAXATION METHODS arxiv:1508.06985v3 [math.oc] 3 Nov 2016 JIAWANG NIE, LI WANG, AND JANE J. YE Abstract. A bilevel program is an optimization problem whose

More information

AVERAGING RANDOM PROJECTION: A FAST ONLINE SOLUTION FOR LARGE-SCALE CONSTRAINED STOCHASTIC OPTIMIZATION. Jialin Liu, Yuantao Gu, and Mengdi Wang

AVERAGING RANDOM PROJECTION: A FAST ONLINE SOLUTION FOR LARGE-SCALE CONSTRAINED STOCHASTIC OPTIMIZATION. Jialin Liu, Yuantao Gu, and Mengdi Wang AVERAGING RANDOM PROJECTION: A FAST ONLINE SOLUTION FOR LARGE-SCALE CONSTRAINED STOCHASTIC OPTIMIZATION Jialin Liu, Yuantao Gu, and Mengdi Wang Tsinghua National Laboratory for Information Science and

More information

Sparsity Based Regularization

Sparsity Based Regularization 9.520: Statistical Learning Theory and Applications March 8th, 200 Sparsity Based Regularization Lecturer: Lorenzo Rosasco Scribe: Ioannis Gkioulekas Introduction In previous lectures, we saw how regularization

More information

CMU-Q Lecture 9: Optimization II: Constrained,Unconstrained Optimization Convex optimization. Teacher: Gianni A. Di Caro

CMU-Q Lecture 9: Optimization II: Constrained,Unconstrained Optimization Convex optimization. Teacher: Gianni A. Di Caro CMU-Q 15-381 Lecture 9: Optimization II: Constrained,Unconstrained Optimization Convex optimization Teacher: Gianni A. Di Caro GLOBAL FUNCTION OPTIMIZATION Find the global maximum of the function f x (and

More information

REAL-CODED GENETIC ALGORITHMS CONSTRAINED OPTIMIZATION. Nedim TUTKUN

REAL-CODED GENETIC ALGORITHMS CONSTRAINED OPTIMIZATION. Nedim TUTKUN REAL-CODED GENETIC ALGORITHMS CONSTRAINED OPTIMIZATION Nedim TUTKUN nedimtutkun@gmail.com Outlines Unconstrained Optimization Ackley s Function GA Approach for Ackley s Function Nonlinear Programming Penalty

More information

Lecture VIII. Global Approximation Methods: I

Lecture VIII. Global Approximation Methods: I Lecture VIII Global Approximation Methods: I Gianluca Violante New York University Quantitative Macroeconomics G. Violante, Global Methods p. 1 /29 Global function approximation Global methods: function

More information

New Methods for Solving Large Scale Linear Programming Problems in the Windows and Linux computer operating systems

New Methods for Solving Large Scale Linear Programming Problems in the Windows and Linux computer operating systems arxiv:1209.4308v1 [math.oc] 19 Sep 2012 New Methods for Solving Large Scale Linear Programming Problems in the Windows and Linux computer operating systems Saeed Ketabchi, Hossein Moosaei, Hossein Sahleh

More information

IDENTIFYING ACTIVE MANIFOLDS

IDENTIFYING ACTIVE MANIFOLDS Algorithmic Operations Research Vol.2 (2007) 75 82 IDENTIFYING ACTIVE MANIFOLDS W.L. Hare a a Department of Mathematics, Simon Fraser University, Burnaby, BC V5A 1S6, Canada. A.S. Lewis b b School of ORIE,

More information

Projection-Based Methods in Optimization

Projection-Based Methods in Optimization Projection-Based Methods in Optimization Charles Byrne (Charles Byrne@uml.edu) http://faculty.uml.edu/cbyrne/cbyrne.html Department of Mathematical Sciences University of Massachusetts Lowell Lowell, MA

More information

What to come. There will be a few more topics we will cover on supervised learning

What to come. There will be a few more topics we will cover on supervised learning Summary so far Supervised learning learn to predict Continuous target regression; Categorical target classification Linear Regression Classification Discriminative models Perceptron (linear) Logistic regression

More information

Augmented Lagrangian Methods

Augmented Lagrangian Methods Augmented Lagrangian Methods Mário A. T. Figueiredo 1 and Stephen J. Wright 2 1 Instituto de Telecomunicações, Instituto Superior Técnico, Lisboa, Portugal 2 Computer Sciences Department, University of

More information

Optimization. Industrial AI Lab.

Optimization. Industrial AI Lab. Optimization Industrial AI Lab. Optimization An important tool in 1) Engineering problem solving and 2) Decision science People optimize Nature optimizes 2 Optimization People optimize (source: http://nautil.us/blog/to-save-drowning-people-ask-yourself-what-would-light-do)

More information

THREE LECTURES ON BASIC TOPOLOGY. 1. Basic notions.

THREE LECTURES ON BASIC TOPOLOGY. 1. Basic notions. THREE LECTURES ON BASIC TOPOLOGY PHILIP FOTH 1. Basic notions. Let X be a set. To make a topological space out of X, one must specify a collection T of subsets of X, which are said to be open subsets of

More information

A Lagrange method based L-curve for image restoration

A Lagrange method based L-curve for image restoration Journal of Physics: Conference Series OPEN ACCESS A Lagrange method based L-curve for image restoration To cite this article: G Landi 2013 J. Phys.: Conf. Ser. 464 012011 View the article online for updates

More information

Facility Location and Strategic Supply Chain Management

Facility Location and Strategic Supply Chain Management Structure I. Location Concepts II. Location Theory Chapter 3 Continuous Location Problems Chapter 4 Location Problems on Networks Chapter 5 Discrete Location Problems Chapter 6 - Territory Design III.

More information

Lab # 2 - ACS I Part I - DATA COMPRESSION in IMAGE PROCESSING using SVD

Lab # 2 - ACS I Part I - DATA COMPRESSION in IMAGE PROCESSING using SVD Lab # 2 - ACS I Part I - DATA COMPRESSION in IMAGE PROCESSING using SVD Goals. The goal of the first part of this lab is to demonstrate how the SVD can be used to remove redundancies in data; in this example

More information

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or

More information

The Un-normalized Graph p-laplacian based Semi-supervised Learning Method and Speech Recognition Problem

The Un-normalized Graph p-laplacian based Semi-supervised Learning Method and Speech Recognition Problem Int. J. Advance Soft Compu. Appl, Vol. 9, No. 1, March 2017 ISSN 2074-8523 The Un-normalized Graph p-laplacian based Semi-supervised Learning Method and Speech Recognition Problem Loc Tran 1 and Linh Tran

More information

General Instructions. Questions

General Instructions. Questions CS246: Mining Massive Data Sets Winter 2018 Problem Set 2 Due 11:59pm February 8, 2018 Only one late period is allowed for this homework (11:59pm 2/13). General Instructions Submission instructions: These

More information

Interpretation of Dual Model for Piecewise Linear. Programming Problem Robert Hlavatý

Interpretation of Dual Model for Piecewise Linear. Programming Problem Robert Hlavatý Interpretation of Dual Model for Piecewise Linear 1 Introduction Programming Problem Robert Hlavatý Abstract. Piecewise linear programming models are suitable tools for solving situations of non-linear

More information

Metaheuristic Optimization with Evolver, Genocop and OptQuest

Metaheuristic Optimization with Evolver, Genocop and OptQuest Metaheuristic Optimization with Evolver, Genocop and OptQuest MANUEL LAGUNA Graduate School of Business Administration University of Colorado, Boulder, CO 80309-0419 Manuel.Laguna@Colorado.EDU Last revision:

More information

A fuzzy k-modes algorithm for clustering categorical data. Citation IEEE Transactions on Fuzzy Systems, 1999, v. 7 n. 4, p.

A fuzzy k-modes algorithm for clustering categorical data. Citation IEEE Transactions on Fuzzy Systems, 1999, v. 7 n. 4, p. Title A fuzzy k-modes algorithm for clustering categorical data Author(s) Huang, Z; Ng, MKP Citation IEEE Transactions on Fuzzy Systems, 1999, v. 7 n. 4, p. 446-452 Issued Date 1999 URL http://hdl.handle.net/10722/42992

More information

Chapter II. Linear Programming

Chapter II. Linear Programming 1 Chapter II Linear Programming 1. Introduction 2. Simplex Method 3. Duality Theory 4. Optimality Conditions 5. Applications (QP & SLP) 6. Sensitivity Analysis 7. Interior Point Methods 1 INTRODUCTION

More information

An efficient algorithm for sparse PCA

An efficient algorithm for sparse PCA An efficient algorithm for sparse PCA Yunlong He Georgia Institute of Technology School of Mathematics heyunlong@gatech.edu Renato D.C. Monteiro Georgia Institute of Technology School of Industrial & System

More information

Convex Optimization M2

Convex Optimization M2 Convex Optimization M2 Lecture 1 A. d Aspremont. Convex Optimization M2. 1/49 Today Convex optimization: introduction Course organization and other gory details... Convex sets, basic definitions. A. d

More information

15.082J and 6.855J. Lagrangian Relaxation 2 Algorithms Application to LPs

15.082J and 6.855J. Lagrangian Relaxation 2 Algorithms Application to LPs 15.082J and 6.855J Lagrangian Relaxation 2 Algorithms Application to LPs 1 The Constrained Shortest Path Problem (1,10) 2 (1,1) 4 (2,3) (1,7) 1 (10,3) (1,2) (10,1) (5,7) 3 (12,3) 5 (2,2) 6 Find the shortest

More information

Dipartimento di Ingegneria Informatica, Automatica e Gestionale A. Ruberti, SAPIENZA, Università di Roma, via Ariosto, Roma, Italy.

Dipartimento di Ingegneria Informatica, Automatica e Gestionale A. Ruberti, SAPIENZA, Università di Roma, via Ariosto, Roma, Italy. Data article Title: Data and performance profiles applying an adaptive truncation criterion, within linesearchbased truncated Newton methods, in large scale nonconvex optimization. Authors: Andrea Caliciotti

More information

Function approximation using RBF network. 10 basis functions and 25 data points.

Function approximation using RBF network. 10 basis functions and 25 data points. 1 Function approximation using RBF network F (x j ) = m 1 w i ϕ( x j t i ) i=1 j = 1... N, m 1 = 10, N = 25 10 basis functions and 25 data points. Basis function centers are plotted with circles and data

More information

Nonlinear Programming

Nonlinear Programming Nonlinear Programming SECOND EDITION Dimitri P. Bertsekas Massachusetts Institute of Technology WWW site for book Information and Orders http://world.std.com/~athenasc/index.html Athena Scientific, Belmont,

More information

A primal-dual framework for mixtures of regularizers

A primal-dual framework for mixtures of regularizers A primal-dual framework for mixtures of regularizers Baran Gözcü baran.goezcue@epfl.ch Laboratory for Information and Inference Systems (LIONS) École Polytechnique Fédérale de Lausanne (EPFL) Switzerland

More information

Convex Optimization. Lijun Zhang Modification of

Convex Optimization. Lijun Zhang   Modification of Convex Optimization Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Modification of http://stanford.edu/~boyd/cvxbook/bv_cvxslides.pdf Outline Introduction Convex Sets & Functions Convex Optimization

More information

Mathematical Programming and Research Methods (Part II)

Mathematical Programming and Research Methods (Part II) Mathematical Programming and Research Methods (Part II) 4. Convexity and Optimization Massimiliano Pontil (based on previous lecture by Andreas Argyriou) 1 Today s Plan Convex sets and functions Types

More information

CSE 547: Machine Learning for Big Data Spring Problem Set 2. Please read the homework submission policies.

CSE 547: Machine Learning for Big Data Spring Problem Set 2. Please read the homework submission policies. CSE 547: Machine Learning for Big Data Spring 2019 Problem Set 2 Please read the homework submission policies. 1 Principal Component Analysis and Reconstruction (25 points) Let s do PCA and reconstruct

More information

3 Nonlinear Regression

3 Nonlinear Regression CSC 4 / CSC D / CSC C 3 Sometimes linear models are not sufficient to capture the real-world phenomena, and thus nonlinear models are necessary. In regression, all such models will have the same basic

More information

Semi-Supervised Clustering with Partial Background Information

Semi-Supervised Clustering with Partial Background Information Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject

More information

Distributed Alternating Direction Method of Multipliers

Distributed Alternating Direction Method of Multipliers Distributed Alternating Direction Method of Multipliers Ermin Wei and Asuman Ozdaglar Abstract We consider a network of agents that are cooperatively solving a global unconstrained optimization problem,

More information

Text Modeling with the Trace Norm

Text Modeling with the Trace Norm Text Modeling with the Trace Norm Jason D. M. Rennie jrennie@gmail.com April 14, 2006 1 Introduction We have two goals: (1) to find a low-dimensional representation of text that allows generalization to

More information

I How does the formulation (5) serve the purpose of the composite parameterization

I How does the formulation (5) serve the purpose of the composite parameterization Supplemental Material to Identifying Alzheimer s Disease-Related Brain Regions from Multi-Modality Neuroimaging Data using Sparse Composite Linear Discrimination Analysis I How does the formulation (5)

More information

The Branch-and-Sandwich Algorithm for Mixed-Integer Nonlinear Bilevel Problems

The Branch-and-Sandwich Algorithm for Mixed-Integer Nonlinear Bilevel Problems The Branch-and-Sandwich Algorithm for Mixed-Integer Nonlinear Bilevel Problems Polyxeni-M. Kleniati and Claire S. Adjiman MINLP Workshop June 2, 204, CMU Funding Bodies EPSRC & LEVERHULME TRUST OUTLINE

More information

Mathematics of Optimization Viewpoint of Applied Mathematics Logic of Optimization

Mathematics of Optimization Viewpoint of Applied Mathematics Logic of Optimization Global Focus on Knowledge Lecture (The World of Mathematics #7) 2007 06 07 Mathematics of Optimization Viewpoint of Applied Mathematics Logic of Optimization Kazuo Murota Department. of Mathematical Engineering

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Theory of Variational Inference: Inner and Outer Approximation Eric Xing Lecture 14, February 29, 2016 Reading: W & J Book Chapters Eric Xing @

More information

Hard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering

Hard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering An unsupervised machine learning problem Grouping a set of objects in such a way that objects in the same group (a cluster) are more similar (in some sense or another) to each other than to those in other

More information