Assignment 2 Master Solution. Part (a) /ESD.77 Spring 2004 Multidisciplinary System Design Optimization. (a1) Decomposition

Size: px

Start display at page:

Download "Assignment 2 Master Solution. Part (a) /ESD.77 Spring 2004 Multidisciplinary System Design Optimization. (a1) Decomposition"

Ambrose Andrews
5 years ago
Views:

1 16.888/ESD.77 Spring 004 Multidisciplinary System Design Optimization Assignment Master Solution Part (a) (a1) Decomposition The following system of equations must be solved: xx x + = 0 (1) 1 3 x + 3x 9 = 0 () 5 x x x x + 10 = 0 (3) x 3x + 7 = 0 (4) 5 xx xx + x 9 = 0 (5) 5 4 1) All-at-once First, we solve for a solution, numerically, all at once (MATLAB version 5): Script (Aa1.m) % Aa1: solve A part (a1) clear all close all eqn1='x1*x-*x3+'; eqn='x+3*x5-9'; eqn3='x1-x4*x5-x3+10'; eqn4='9*x5-3*x+7'; eqn5='x*x5-x*x4+x-9'; flops(0) tic [x1,x,x3,x4,x5]=solve(eqn1,eqn,eqn3,eqn4,eqn5) toc flops The following solution is returned: T T x = x1 x x3 x4 x5 = = [ ] [ ] The computational effort is: CPU time = 0.19 [sec] FLOPS = 3795 T

2 ) Decomposition The second method decomposes the system of equations into smaller systems and solves them sequentially. First we find the occurrence matrix of the system: X1 X X3 X4 X5 E1 x x x E x x E3 x x x x E4 x x E5 x x x Based on this we reorder the occurrence matrix to mae it a lower left triangular blocdiagonal matrix: X X5 X4 X3 X1 E x x E4 x x E5 x x x E3 x x x x E1 x x x The solution strategy now is: 1: First we solve for x and x 5 from Eq. and Eq.4 : Next we solve for x 4 from Eq.5 3: Then we solve for x 3 and x 1 from Eq.3 and Eq.1 The following script solves this system: clear all close all flops(0) tic %first step x5=inv([1 3; -3 9])*[9-7]'; x=x5(1); x5=x5(); %second step x4=(9-x-x*x5)/(-x); %third step x13=inv([x -; 1-1])*[- -10+x4*x5]'; x1=x13(1); x3=x13(); x=[x1 x x3 x4 x5]' toc flops The solution is obtained as:

3 x = The computational effort is: CPU time = 0.01 [sec] FLOPS = 134 Conclusion: The solution by decomposition is less computationally expensive by a factor of ~19 in CPU time and a factor of ~8 in floating point operations. In this special case no iterations are required. This effect can be magnified for large systems of coupled, nonlinear equations. The solution by decomposition, however, requires more upfront wor and understanding of the structure of the system of equations. (a) Gradient-Based Optimization Rosenbroc s banana function is given as: (, ) = ( 1 ) + 100( ) f x x x x x This test function often causes gradient based optimizers problems, due to it s numerical conditioning. This is the main point of this problem. It is called the banana function because of the way the curvature bends around the origin. It is notorious in optimization examples because of the slow convergence which most methods exhibit when trying to solve this problem. This function has a unique minimum at (1,1): f(1,1)=0.

4 The gradient vector can be calculated analytically 1 as: T f f 3 f = = 400x1 + ( 1 00x) x1 00( x x1 ) x1 x There are no constraints, g, h in this case and the KKT optimality conditions reduce to ( ) * * x : f x = 0 We can verify that the point (1,1) indeed meets this condition by substitution. The Hessian matrix, H, is obtained as: f f x x x x ( x ) x H = = f f 400x1 00 x x1 x At (1,1) the Hessian matrix is T f f x1 x1 x H = = f f x x x 1 (1,1) The eigenvalues of H at (1,1) are: λ = λ = Both eigenvalues are positive, therefore the point (1,1) is at least a local minimum. Note that the Rosenbroc banana function is not convex, so global optimality cannot be proven, because H is not semi-positive definite (SPD) for all setting of x. The point (1,1) is the nown global minimum of this function, since a sum of squares is minimized when all individual terms of the sum go to zero. The problem for numerically searching for this solution is that the two eigenvalues are three orders of magnitude apart. This is said to be a numerically ill-conditioned situation. Most gradient based optimization algorithms, start form an initial point, x o, and try to reach a point where the KKT conditions are true in an iterative fashion: 1 Rather than approximating the gradient vector by finite differencing.

5 x = x + α S 1 where α is the step size and S is the step direction vector. What distinguishes most algorithms is how α and S are computed throughout the search space. Depending on the method used to compute step size and step direction the algorithm will converge faster or slower for different types of ill-conditioned or non-convex functions. Or it might fail to converge altogether. Some of you might have experienced this problem initially. (i) Implement steepest gradient search and determine min(f) using at least two initial guesses: We select the following far-apart starting points: [-5,-3] and [4,] In steepest descent the step direction is chosen as the (local) direction that causes a maximal decrease of J (or f) at x: Choose 0 o x, set x= x Repeat until converged: 1 S = J x ( ) choose α to minimize J ( x+ αs ) update current point: x=x+ αs This algorithm doesn t use any information from previous iterations. The step size a is chosen with a 1D-search. Here a bisection algorithm is implemented, where α [0, αmax ] and α / mid = αmax. The size of this interval is adjusted, i.e. increased or decreased, until f ( x + α ) ( mid S < f x + αmaxs ) is true. Next a nd order equation (parabola) is fitted to f ( α ) with the three points 0, αmid, α max. The minimum of this parabola is chosen as the optimal step size α at the -th iteration. The line search for α is one of the most challenging aspects of writing a robust gradient search algorithm. The implementation code for steepest gradient search is enclosed below. Note: It is more robust to normalize the step direction to be a unit vector: 1 S = J x / J x 1 and have the step size be purely determined by α. ( ) ( ) (ii) Implement conjugate gradient search and determine min(f) using the same initial guesses as in (i) Conjugate gradient search is similar to steepest descent, with the exception that information from previous steps is used to compute the next step (direction), S. Here conjugate gradient search was implemented as follows:

6 S f ( x ) ( x ) f ( x ) 1 ( x ) = + β S f 1 β = f Note, that this means that the new step direction is the steepest gradient direction at iteration, modified by a contribution from the step direction, S, at the previous iteration -1. Thus, some memory is used without having to store the entire past matrix (history) of gradients. The contribution of the past step is scaled with a parameter β, which is larger if the norm of the gradient vector is larger for the current step,, and smaller if the norm of the gradient vector at is smaller than at -1. As we approach the (local or global) minimum we expect the norm of f to decrease, such that β<1 will generally be true in the vicinity of such a stationary point. (iii) Verify computational results against the analytical solution, and compare the performance of the two numerical methods. The analytical optimum is (1,1) as discussed above. The numerical results, as well as the expense for obtaining them, is compared in the following table and figure combinations for the two starting points. General conclusions about the behavior of both algorithms are also provided. The residual is simply the -norm (Euclidian norm) of the gradient vector: f f residual = f ( x ) = + x1 x x x The termination criterion was set as: residual < Conclusions Both methods are able to approximate the analytical optimum (1,1), albeit at different computational cost. For the starting point (-5,-3) the solution (1,1) is reached from below, for the starting point (4,) it is reached from above. The conjugate gradient method converges faster than steepest descent by a factor of 16-3 in terms of the number of function evaluations (iterations), CPU time and floating point operations (FLOPS). The exact savings factor depends on the start point. This is a very significant difference, which results mainly from the fact that conjugate gradient avoids zig-zaging in the design space the way steepest gradient does. This is achieved by smoothing the search direction with the conjugate directions from previous steps. Overall lesson: even small algorithmic changes can have a large (beneficial or detrimental) impact on the behavior of search methods, particularly for numerically ill-conditioned objective functions and constraints.

7 Results for Starting Point: (-5,-3) Steepest Gradient Search Starting Point: -5-3 Solution: Iterations: 1558 CPU time: 1.79 FLOPS: Residual: Conjugate Gradient Search Starting Point: -5-3 Solution: Iterations: 79 CPU time: 0.11 FLOPS: Residual: Figure: x o =[-5, -3]: Comparison of search path with steepest gradient (white dashed) and conjugate gradient algorithm (yellow dotted line).

Results for Starting Point: (4,) Steepest Gradient Search Starting Point: 4 Solution: 1.0009 1.0019 Iterations: 54 CPU time: 6.49 FLOPS: 50406311 Residual: 0.

8 Results for Starting Point: (4,) Steepest Gradient Search Starting Point: 4 Solution: Iterations: 54 CPU time: 6.49 FLOPS: Residual: Conjugate Gradient Search Starting Point: 4 Solution: Iterations: 169 CPU time: 0.1 FLOPS: Residual: Figure: x o =[4 ]: Comparison of search path with steepest gradient (white dashed) and conjugate gradient algorithm (yellow dotted line).

9 Gradient Optimization Code: function aa % Find minimum of Rosenbroc's "banana" function using two % different gradient search methods: steepest gradient % search and conjugate gradient search % dwo, March 004 clear all close all warning off % plot function dx = 1/5; [x1,x] = meshgrid(-5:dx:5); f=(1-x1).^+100.*(x-x1.^).^; surf(x1,x,f) view ([ 8]) axis([min(min(x1)) max(max(x1)) min(min(x)) max(max(x))... min(min(f)) max(max(f))]) xlabel('x_1'), ylabel('x_'), zlabel('f'), title('rosenbroc Banana Function') drawnow % start optimization xi=[-5-3]; x=xi; f=banana(x(1),x()); epsilon=1e-3; % steepest gradient search =1; T1=cputime; F1=flops; residual=1; alpha=1; while residual>epsilon % compute gradient gradf=[400*x(,1)^3+*(1-00*x(,))*x(,1)- 00*(x(,)- x(,1)^)]; S=-1/norm(gradf,)*gradf; %normalized step direction residual=norm(gradf,); % line search for optimal step size alpha - bisection method % use last alpha as initial guess alpha=[0 alpha *alpha]; foundalpha=0; while foundalpha==0; for inda=1:length(alpha) xsearch=x(,:)+alpha(inda)*s; fsearch(inda)=banana(xsearch(1), xsearch()); end if fsearch(3)<=fsearch() & fsearch()<=fsearch(1) %disp('expand search interval over alpha') alpha(1)=0; alpha(3)=*alpha(3); alpha()=alpha(3)/; elseif fsearch()>=fsearch(1) %disp('shrin search interval over alpha')

10 fit alpha(1)=0; alpha(3)=alpha(); alpha()=alpha(3)/; elseif fsearch()<=fsearch(1) & fsearch(3)>=fsearch() % interpolate for optimal alpha [Palpha,Salpha] = polyfit(alpha,fsearch,); % nd order alphafullsearch=linspace(alpha(1),alpha(3),1e3); ffullsearch=polyval(palpha,alphafullsearch,salpha); [tmp,inda]=min(fsearch); [fmin,inda]=min(ffullsearch); alphaopt=alphafullsearch(inda); %disp('found optimal alpha') foundalpha=1; end end %finished line search for alpha alpha=alphaopt; %alphasearch=[0:0.1:1.0]; %for inda=1:length(alphasearch) % xsearch=x(,:)+alphasearch(inda)*s; % fsearch(inda)=banana(xsearch(1), xsearch()); %end % polynomial fit for f(alpha) % [Palpha,Salpha] = polyfit(alphasearch,fsearch,); % nd order fit % alphafullsearch=[0:0.001:1.0]; % for inda=1:length(alphafullsearch) % ffullsearch(inda)=polyval(palpha,alphafullsearch(inda),salpha); % end % [fmin,inda]=min(ffullsearch); % alpha=alphafullsearch(inda) % update current solution x=x(,:); xplusone=x+alpha*s; x=[x; xplusone]; =+1; f()=banana(xplusone(1), xplusone()); end %while residual>epsilon 1=; T=cputime; F=flops; disp('steepest Gradient Search') disp(['starting Point: ' numstr(xi)]) disp(['solution: ' numstr(x(,:))]) disp(['iterations: ' numstr()]) disp(['cpu time: ' numstr(t-t1)]) disp(['flops: ' numstr(f-f1)]) disp(['residual: ' numstr(residual)]) % plot steepest gradient search path hold on plot3(x(:,1),x(:,),f,'w*') plot3(x(:,1),x(:,),f,'w--','linewidth', ) drawnow % conjugate gradient search xi=[-5-3]; x=xi; f=banana(x(1),x());

11 =1; T1=cputime; F1=flops; residual=1; alpha=1; gradf=[]; residual=1; S=[]; while residual>epsilon % compute gradient gradf(,:)=[400*x(,1)^3+*(1-00*x(,))*x(,1)- 00*(x(,)- x(,1)^)]; residual=norm(gradf(,:),); if ==1 S=-1/norm(gradf,)*gradf; %normalized step direction %normalized step direction else beta()=norm(gradf(,:),)^/norm(gradf(-1,:),)^; S(,:)=-(1/norm(gradf(,:),))*gradf(,:)+beta()*S(-1,:); S(,:)=S(,:)/norm(S(,:),); end fit % line search for optimal step size alpha - bisection method % use last alpha as initial guess alpha=[0 alpha *alpha]; foundalpha=0; while foundalpha==0; for inda=1:length(alpha) xsearch=x(,:)+alpha(inda)*s(,:); fsearch(inda)=banana(xsearch(1), xsearch()); end if fsearch(3)<=fsearch() & fsearch()<=fsearch(1) %disp('expand search interval over alpha') alpha(1)=0; alpha(3)=*alpha(3); alpha()=alpha(3)/; elseif fsearch()>=fsearch(1) %disp('shrin search interval over alpha') alpha(1)=0; alpha(3)=alpha(); alpha()=alpha(3)/; elseif fsearch()<=fsearch(1) & fsearch(3)>=fsearch() % interpolate for optimal alpha [Palpha,Salpha] = polyfit(alpha,fsearch,); % nd order alphafullsearch=linspace(alpha(1),alpha(3),1e3); ffullsearch=polyval(palpha,alphafullsearch,salpha); [tmp,inda]=min(fsearch); [fmin,inda]=min(ffullsearch); alphaopt=alphafullsearch(inda); %disp('found optimal alpha') foundalpha=1; end end %finished line search for alpha alpha=alphaopt; x=x(,:); xplusone=x+alpha*s(,:); x=[x; xplusone]; =+1; f()=banana(xplusone(1), xplusone()); end %while residual>epsilon 1=; T=cputime; F=flops; disp('conjugate Gradient Search') disp(['starting Point: ' numstr(xi)]) disp(['solution: ' numstr(x(,:))]) disp(['iterations: ' numstr()]) disp(['cpu time: ' numstr(t-t1)]) disp(['flops: ' numstr(f-f1)])

12 disp(['residual: ' numstr(residual)]) % plot steepest gradient search path hold on plot3(x(:,1),x(:,),f,'y*') plot3(x(:,1),x(:,),f,'y:','linewidth', )

Today. Golden section, discussion of error Newton s method. Newton s method, steepest descent, conjugate gradient

Today. Golden section, discussion of error Newton s method. Newton s method, steepest descent, conjugate gradient Optimization Last time Root finding: definition, motivation Algorithms: Bisection, false position, secant, Newton-Raphson Convergence & tradeoffs Example applications of Newton s method Root finding in