For-loop equivalents and massive parallelization in CasADi

Size: px

Start display at page:

Download "For-loop equivalents and massive parallelization in CasADi"

Joshua Boone
6 years ago
Views:

1 For-loop equivalents and massive parallelization in CasADi Citius, Altius, Fortius Joris Gillis (MECO group, KU Leuven)

2 Outline Key ingredients of CasADi New concepts: Map, MapAccum Massive parallelization Software architecture

3 Core idea Computer-aided formulation of optimization problems symbolic primitive expression x = optivar() a = optivar() optimize(x^2+a^2, [ x+a^2>= 0 ]) 2 minimize x +a x,a subject to 2 2 x +a 0 Obtain gradients, Jacobians, Hessians automatically

4 Basic example Find initial condition to achieve a goal x =a x +cos x 2 minimize x N x0 x0 = optivar() a = optivar() dt = 1 x1 x2 x3 xn = x0 + dt*(a*x0 + cos(x0)) = x1 + dt*(a*x1 + cos(x1)) = x2 + dt*(a*x2 + cos(x2)) = optimize(xn**2)

5 Basic example Find initial condition to achieve a goal x =a x +cos x 2 minimize x N x0 x0 = optivar() a = optivar() dt = 1 x = x0 for i in range(n): x = x + dt*(a*x + cos(x)) xn = x optimize(xn**2)

6 Key ingredient: efficient symbolics Matlab symbolic toolbox (mupad) syms('x0') syms('a') dt = 1; x = x0; for i=1:n x = x+dt*(a*x+cos(x)); end

7 Key ingredient: efficient symbolics Matlab: yalmip x0 = sdpvar(1,1) a = sdpvar(1,1) dt = 1; x = x0; for i=1:n x = x+dt*(a*x+cos(x)); end

8 Key ingredient: efficient symbolics Python: sympy x0 = symbols('x0') a = symbols('a') dt = 1 x = x0 for i in range(n): x = x+dt*(a*x+cos(x))

9 Key ingredient: efficient symbolics Python: casadi x0 = MX.sym('x0') a = MX.sym('a') dt = 1 x = x0 for i in range(n): x = x+dt*(a*x+cos(x))

10 Key ingredient: efficient symbolics Construction time [s] x0 = MX.sym('x0') a = MX.sym('a') dt = 1 x = x0 for i in range(n): x = x+dt*(a*x+cos(x)) N

11 Key ingredient: efficient symbolics x0 = optivar() a = optivar() dt = 1 x0 + dt*(a*x0 + cos(x0)) x1 x2 x3 xn = x0 + dt*(a*x0 + cos(x0)) = x1 + dt*(a*x1 + cos(x1)) = x2 + dt*(a*x2 + cos(x2)) = x2 = x1 + dt*(a*x1 + cos(x1)) x2 = (x0 + dt*(a*x0 + cos(x0))) + dt*(a*(x0 + dt*(a*x0 + cos(x0))) + cos(x0 + dt*(a*x0 + cos(x0)))) Expression length ~ 3^N

12 Key ingredient: efficient symbolics x0 a cos graph representation x0 = MX.sym('x0') a = MX.sym('a') dt = 1 * * x = x0 cos * + + for i in range(n): x = x+dt*(a*x+cos(x)) * cos * #nodes: O(N) + + *

13 Key ingredient: algorithmic differentiation Cost of a (forward/reverse) derivative of: small multiple of original 2 minimize x N x0 d xn d x0

14 Key ingredient: algorithmic differentiation Construction time [s] x0 = MX.sym('x0') a = MX.sym('a') dt = 1 x = x0 for i in range(n): x = x+dt*(a*x+cos(x)) jacobian(x,x0) N d xn d x0

15 Key ingredient: matrix-valued graphs x = A x+cos x x ℝ n x = MX.sym('x',2,2) A = MX.sym('A',2) x = SX.sym('x',2,2) A = SX.sym('A',2,2) mul(a,x) mul(a,x) A x mul Scalar expansion ~ loop unrolling * a00 a01 x0 * a10 a11 * * + x1 +

16 Key ingredient: function embedding x0 a cos x0 = MX.sym('x0') a = MX.sym('a') dt = 1 * x = x0 * cos * + + for i in range(n): x = x+dt*(a*x+cos(x)) * cos * + + *

17 Key ingredient: function embedding x0 a x0 = MX.sym('x0') a = MX.sym('a') dt = 1 F x = x0 for i in range(n): x = F(x,a) F F

18 Key ingredient: function embedding x0 a x0 = MX.sym('x0') a = MX.sym('a') dt = 1 F x = x0 for i in range(n): x = F(x,a) F x F x = MX.sym('x') a = MX.sym('a') dt = 1 e = x+dt*(a*x+cos(x)) F = Function( F,[x,a],[e]) a F cos * *

19 Key ingredient: function embedding rootfinder linsol integrator...

20 Outline Key ingredients of CasADi Efficient symbolics Algorithmic differentiation Matrix-valued graphs Function embedding New concepts: Map, MapAccum Massive parallelization Software architecture

21 Towards large scale optimal control 2 x k +1= F ( x k,uk ) x0 minimize x N x 0, u0, u1... u N 1 U u x0 = MX.sym('x0') U = MX.sym('u',1,N) dt = 1 F w = horzsplit(u) x = x0 F for i in range(n): x = F(x,w[i]) x F MX graph (optimized for memory footprint) a F cos * 1 SX graph (optimized for speed) + + *

22 Towards large scale optimal control x0 U u... F F F N = , ,...

23 Towards large scale optimal control Not fast enough! Construction time [s] Painfully slow Lightning fast N = , ,...

24 New node: MapAccum x0 U u... F x0 U MapAccum(F,N) (x1) F x1 x2 x3 (x2) F (x3)

25 Practical example: sys id minimize x =v 3 v =u k x c v d x 2 x i x i 2 x 0, k, c, d... x k +1= F ( x k,uk, p) u_data x u params x0 F x_next MapAccum(F,N) k,c,d x1 x2 x3 Function( F,[x,u,vertcat([k,c,d])],[...])

26 Practical example: sys id minimize 2 x i x i 2 x 0, k, c, d x =v 3 v =u k x c v d x R = F.mapaccum("R", N) X_symbolic = R(x0, u_data, repmat(params,1,n)) e = y_data-x_symbolic[0,:].t; Measured data Known signals

27 Practical example: sys id Construction time [s] for i in range(n): X = F(X,u_data[:,i],params) R = F.mapaccum("R", N) N

28 Construction time evaluation time Up-front cost Numerical evaluation (iterations of nlp)

29 Construction time evaluation time So MapAccum is useless to speed up online computations? Evaluation time [s] Numerical evaluation (iterations of nlp)

30 Evaluation time [s] Code-generation

31 Code-generation First you need to run the compiler... Compilation time [s] Compilation memory [B]

32 Outline Key ingredients of CasADi Efficient symbolics Algorithmic differentiation Matrix-valued graphs Function embedding New concepts: MapAccum 2 orders of magnitude faster construction compile 2 orders of magnitude larger problems Software architecture

33 MapAccum ~ single shooting 2 minimize x N x 0, u0, u1... u N 1 x0 U u... F x0 U MapAccum(F,N) (x1) F x1 x2 x3 (x2) F (x3)

34 Map ~ multiple shooting 2 minimize x N x 0, x 1,... x N,u0, u1... u N 1 X U u x U Map(F,N) F x1 x2 x3 F... x1 x2 x3 F

35 Map ~ multiple shooting 2 minimize x N x 0, x 1,... x N,u0, u1... u N 1 s.t. F is a function that: - reads a few inputs - computes a lot (e.g. time integration) - writes a few outputs x 1 F ( x 0 ; u0 )=0 x 2 F ( x 1 ;u1 )=0... Ideal for parallelism: - openmp support builtin - opencl GPU (proof-of-concept) x N F (x N 1 ; u N 1 )=0 R = F.map("R","serial",N) X_n = R(X, u_data, repmat(params,1,n)) gaps = X[:,1:]-X_n[:,:-1]

36 Outline Key ingredients of CasADi Efficient symbolics Algorithmic differentiation Matrix-valued graphs Function embedding New concepts: Map, MapAccum Speedups Massive parallelization Software architecture

37 Operation on an image D f (D i, j, X ) i, j i j

38 Operation on an image f (D i, j, X ) i, j X = MX.sym( x,2) D = MX.sym( D,1024,768) R = F.map( mymap, serial,1024*768) Dflat = vec(d).t terms = R(Dflat,X) e = sum2(terms)

39 Operation on an image f (D i, j, X ) i, j X = MX.sym( x,2) D = MX.sym( D,1024,768) R = F.map( mymap, openmp,1024*768) Dflat = vec(d).t terms = R(Dflat,X) e = sum2(terms)

40 Operation on an image f (D i, j, X ) i, j X = MX.sym( x,2) D = MX.sym( D,1024,768) R = F.map( mymap, opencl,1024*768) Dflat = vec(d).t terms = R(Dflat,X) e = sum2(terms)

41 Operation on an image f (D i, j, X ) i, j X = MX.sym( x,2) D = MX.sym( D,1024,768) R = F.map( mymap, opencl,1024*768) Dflat = vec(d).t terms = R(Dflat,X) e = sum2(terms)

42 What is OpenCL? C library Compiler for computation kernels Low-level memory concepts Write once, run on CPU GPU FPGA

43 Operation on an image f (D i, j, X ) i, j X = MX.sym( x,2) D = MX.sym( D,1024,768) Init: Allocate buffer for x on GPU Allocate buffer for D on GPU Allocate buffer for result on GPU Compile GPU kernel Eval: Send x R = F.map( mymap, opencl,1024*768) Send D Compute kernel Retrieve sol Dflat = vec(d).t terms = R(Dflat,X) e = sum2(terms)

44 Kernelsum node f (D i, j, X ) i, j Init: Allocate buffer for x on GPU Allocate buffer for D on GPU Allocate buffer for result on GPU Send D Compile GPU kernel Eval: Send x Compute kernel Sum result Retrieve sum

45 Kernelsum node For Flanders Make: x speedup on a Titan Black GPU (2000 cores)

46 Outline Key ingredients of CasADi Efficient symbolics Algorithmic differentiation Matrix-valued graphs Function embedding New concepts: Map, MapAccum Massive parallelization Software architecture

47 Software architecture: interfaces Python interface source casadipython_wrap.cxx C++ source casadi.cpp casadi.hpp SWIG interface header _casadi.so casadi.i Matlab interface source casadimatlab_wrap.cxx libcasadi.so casadimatlab_wrap.mexa64

48 Software architecture: before plugins C++ source casadi.cpp casadi.hpp nlp_solver.cpp ipopt_interface.cpp snopt_interface.cpp libipopt.so (LGPL) libsnopt.so (commercial) libcasadi.so

49 Software architecture: before plugins C++ source casadi.cpp casadi.hpp nlp_solver.cpp ipopt_interface.cpp snopt_interface.cpp Mr. Cheapskate from casadi import * Error: symbol snopt not found Mrs. Spender from casadi import * nlpsol( solver, snopt,nlp) libipopt.so (LGPL) libsnopt.so (commercial) libcasadi.so

50 Software architecture: before plugins C++ source casadi.cpp Mr. Cheapskate from casadi import * nlpsol( solver, ipopt,nlp) casadi.hpp nlp_solver.cpp ipopt_interface.cpp snopt_interface.cpp Mrs. Spender from casadi import * nlpsol( solver, snopt,nlp) Error: solver snopt not compiled libipopt.so (LGPL) libsnopt.so (commercial) libcasadi.so

51 Software architecture: plugins from casadi import * C++ source nlpsol( ipopt,nlp) casadi.cpp Dynamically load (dlopen) libcasadi_nlpsolver_ipopt.so casadi.hpp nlp_solver.cpp from casadi import * ipopt_interface.cpp snopt_interface.cpp nlpsol( snopt,nlp) Dynamically load (dlopen) libcasadi_nlpsolver_snopt.so libcasadi_nlpsol_ipopt.so libcasadi_nlpsol_snopt.so libcasadi.so libipopt.so (LGPL) libsnopt.so (commercial)

52 Software architecture: version control C++ source casadi.cpp casadi.hpp nlp_solver.cpp ipopt_interface.cpp snopt_interface.cpp Github

53 Software architecture: version control I added a flag here

54 Software architecture: continuous-integration Every change (commit) is unittested by travis-ci cpu-time Memory checks Green = good Documentation

55 Software architecture: continuous-integration

56 Software architecture: continuous-integration Both appveyor and travis are free for open-source projects Encryption support can test commercial plugins/interfaces e.g. Matlab

57 Software architecture: binaries Linux buildslaves g++ x86_64-w64-mingw32-g++

58 Software architecture: binaries (python) Binary = zip folder e.g. casadi-py27-np1.9.1-v3.0.0.tar.gz /home/me/software - casadi-py27-np1.9.1-v casadi - include - lib - casadi-py27-np1.9.1-v casadi - include - lib import sys sys.path.append( /home/me/software/casadi-py27-np1.9.1-v3.0.0 ) import casadi

59 Software architecture: binaries (matlab) Binary = zip folder e.g. casadi-matlabr2014b-v2.4.0.zip /home/me/software - casadi-matlabr2014b-v casadi - include - lib - casadi-matlabr2014b-v casadi - include - lib addpath('/home/me/software/casadi-matlabr2014b-v3.0.0') import casadi.*

60 SX, MX, Function, AD, JIT nlp, qpsol,... Ipopt plugin, Snopt plugin, OOQP plugin, Ecos plugin, Mosek plugin,. Developers: Joel Andersson, Joris Gillis Greg Horn, Niels van Duijkeren

61 Go write your dynamic optimization tool...

62 e.g. Optoy Go write your dynamic optimization tool...

63 Optoy Go write your dynamic optimization tool...

64 Go write your dynamic optimization tool...

65 Go write your dynamic optimization tool... Prototype-style speed of development State-of-the-art performance

66 Thanks for your attention Let's do some coding...

CasADi tutorial Introduction

Lund, 6 December 2011 CasADi tutorial Introduction Joel Andersson Department of Electrical Engineering (ESAT-SCD) & Optimization in Engineering Center (OPTEC) Katholieke Universiteit Leuven OPTEC (ESAT