MOSAIC: an Online Platform for Combined Process Model and Measurement Data Management

MOSAIC: an Online Platform for Combined Process Model and Measurement Data Management Erik Esche*, David Müller**, Robert Kraus***, Sandra Fillinger ****, Victor Alejandro Merchan Restrepo*****, Günter Wozny****** Chair of Process Dynamics and Operation, Berlin University of Technology, Sekr. KWT-9, Str. des 17. Juni 135, D-10623 Berlin, Germany *erik.esche@tu-berlin.de, **david.mueller@tu-berlin.de, ***robert.kraus@tu-berlin.de, ****sandra.fillinger@tu-berlin.de, *****v.merchanrestrepo@tu-berlin.de, ******guenter.wozny@tuberlin.de Abstract The collaboration of experimenters and modelers working in same or varying programming languages, often poses a challenge. This challenge especially concerns the management of the models, the storage of measurement data, and the general information exchange between the various collaborators. Herein, the modeling platform MOSAIC, which aims towards assisting the different groups, is presented. MOSAIC allows mathematical modeling of phenomena, single units, or entire process on the documentation level. Furthermore, transparency is increased, by allowing the storing of measurement data in the same collaborative platform. Thus, parameter estimations and the development of a model can be more easily tracked back. This contribution presents the general workflow of the collaboration platform MOSAIC and how the collaboration between modelers and experimenters via model creation and data-storage is assisted. 1. Motivation and Introduction The development, validation, and subsequent management of process models are some of the most challenging tasks in process systems engineering. Due to insufficient knowledge on chemistry and physics standard models usually do not suffice to accurately describe the performance of a chemical plant. Individual solutions in the form of validated process models are often preferred. Most of all, this implies a continuous adjustment of process models and their parameters to updated experimental data. This calls for a platform which allows for the collaboration of experimenters and modelers, who work in same or varying programming languages, to manage the process models, store measurement data, and facilitate the fitting of model parameters. For this purpose the online modeling environment MOSAIC [1, 2] has been extended. 1.1. Modeling, Simulation, and Optimization of Processes in Chemical Engineering In Chemical Engineering, most models follow some form of hierarchical structure. This is especially the case, whenever whole plants or processes are modeled and the model hence spans across several scales, i.e. anything from basic phenomena like the surface tension of water to plant-wide infrastructure like utility management etc. Consequently, many model parts, especially for the smaller scale, can often be reused for different models. This, of course, requires that each basic phenomenon or basically any model part needs to be well documented, so that a comprehensive reuse is actually feasible. In MOSAIC this concept is strictly applied starting from the smallest parts the equations to equation systems and the whole model [1]. Apart from the notation and basic documentation of a model, additional information is required if models are to be reused for simulation and optimization. Most models in Chemical Engineering are highly non-linear and non-convex. Hence, it is quite common that several different mathematically valid solutions to the same problem exist. If consistent sets of starting values are provided, this problem should not arise. The workflow and all activities needed for model generation are described in [3]. 1.2. Collaboration Between Experimenters, Modelers, and Optimizers The reuse of model parts or whole models, as mentioned above, is of course not just carried out by a single person. Up to now, the issue persists that experimenters, modelers, and optimizers working on the same process, do not use the same software or programming language and sometimes do not even speak the same language. The common denominator is usually a mathematical representation of their models. As forcing them into using

the same software or equipment for solving their individual problems is neither desirable nor actually feasible, a flexible environment facilitating their communication is of advantage. 2. Standard Workflow for Optimization The following two sections show how MOSAIC supports the collaboration of experimenters, modelers, and optimizers by supplying a mathematical form of common ground. 2.1. Modeling Modeling in Mosaic is implemented as closely to the mathematical representation of any model as possible. The standard workflow starts by defining a notation, which is basically a collection of sets for base names, superscripts, subscripts, and indices. A notation can be valid for an entire model or only a small portion of it. Based on the notation, equations can be defined in a standard LaTeX format. Fig. 1 shows this for a common implementation of the Peng-Robinson equation of state. Figure 1: Raw LaTeX and rendered versions of the Peng-Robinson equation of state within MOSAIC s equation editor. In case equations of state, component property functions etc. are not supposed to be implemented in MOSAIC itself, the software s CAPE-OPEN capabilities [5] allow for the definition of functions and their respective external function calls. These can even be exported to languages such as Aspen Custom Modeler, gproms, Matlab and many more. Apart from basic equation systems, MOSAIC at the moment only supports differential algebraic equation systems of first order. Differentials of a higher order need to be discretized manually. After the implementation or adaption of all equations and functions, an equation system needs to be defined. Each consists of a notation and a selection of algebraic and differential equations or even other equation systems. The notations of the connected elements do not need to be identical with the notation of the new equation system. For this purpose MOSAIC offers connectors which facilitate the reuse of models and allow for the connection of models with a completely different structure. The following example will outline the advantages. Imagine a flowsheet as shown in Fig. 2. The system consists of three reactors and a number of streams. All reactors are of the same type, but they differ in the catalyst selected for each. Consequently, the basic equations required to describe each reactor are identical, but for the set of kinetic equations. Within MOSAIC this can be exploited quite efficiently. The basic reactor equations are formulated once and aggregated into an equation system. In an additional equation system, e.g. reactor I, this equation system is connected and the additional equations describing the reaction kinetics are added. This procedure can be repeated for each reactor. To form the flowsheet shown in Fig. 2, additional equations are required describing the streams. This can be done in the overall equation system, henceforth referred to as flowsheet. The notation used for the reactors may contain many specifications not required for the streams. Sometimes a different notation for the streams can be of interest, with a special subscript for the stream id etc. The later will be called the supernotation for now. For each reactor a connector needs to be defined tethering outlet and inlet to the streams in the flowsheet. These connectors are depicted as dotted circles in Fig. 2.

Figure 2: Example of how flowsheets can be implemented in MOSAIC using equation systems, different notations, and connectors. 2.2. Simulation Based on any equation system an evaluation can be created. Initially, equations in MOSAIC are usually formulated in a generic form, for example with indices for streams or units. When defining an evaluation, MOSAIC requires the user to enter maximum numbers for each respective index to thus create all required equations. This becomes especially helpful, when discretizing a system as the number of finite elements can easily be increased etc. The indexing for two indices is displayed in Fig. 3, while Fig. 4 shows the equation instances created by the specification of the index i in addition to the generic form. Figure 3: Index specification in MOSAIC. Figure 4: Automatic creation of equation instances based on a generic implementation by the user. Once the equation system is fully instantiated, the user must specify parameters and initial values in order to simulate the desired process or phenomenon. MOSAIC calculates the degree of freedom of the system by subtracting the number of selected design variables (i.e. parameters) from the number of equations in the entire model. Partly as a preparation for later optimization applications but also as a support for some simulation solvers, lower bounds and upper bounds must be defined for all variables in the system. In case the equation system contains first order derivatives, MOSAIC automatically identifies the differentiation variable and requires the user to supply respective start and end values for the simulation [4].

Once the degree of freedom is at zero, the system is ready for export to any simulation environment. MOSAIC automatically identifies the system either as a set of nonlinear equations (NLE), ordinary differential equations (ODE), or differential algebraic equations (DAE). Accordingly a preselection of the appropriate solvers is carried out. The user is then faced with several options. MOSAIC offers a number of language specifications for internal solvers, meaning the generated code can be run on MOSAIC s server, external solvers, meaning the code needs to be exported and run on a copy of the software the user own s him- or herself, and to specify a new programming language which is not yet contained in MOSAIC s repository. For the internal solution MOSAIC offers a wide selection of both NLE and DAE solvers, among those are C++ BzzMath NLE [6], NLEQ1S [7], or the F90 DASSL Solver. For the external solution and the user-defined option, the user can choose many different environments and solvers such as C++, Matlab, Python, AMPL, Fortran 90, gproms, GAMS, Aspen Custom Modeler, and Scilab. For each solver properties can be modified for the code generation and the execution within the external software. These can be accuracies, selections of nested ODE solvers in Matlab, maximum number of iterations etc. For code executed on MOSAIC s server a results panel exists, which displays the solver output and allows for a saving of the results, which can later be reloaded as new initial values. In case of DAE or ODE systems, which are solved on the server, MOSAIC supplies modifiable graphs of the outputs. Lastly, for locally executed code MOSAIC offers an import tab, in which variable names and their results can be pasted and the values are automatically updated within MOSAIC. 2.3. Optimization Starting with a working evaluation in MOSAIC, setting up an optimization requires only a few further modifications. In the optimization tab any evaluation can directly be opened. At the moment, the constraints of an optimization problem within MOSAIC can only consist of algebraic equations as only simultaneous optimization is supported. Differential algebraic equation system needs to be fully discretized beforehand. Once the evaluation is opened in the optimization tab, all the iteration variables (basically the states) and design variables (basically the parameters and controls) are displayed. Among the former, an objective can be selected, among the latter, the optimization variables (i.e. the decision variables) can be chosen. Fig. 5 shows the variable selection window. MOSAIC requires the formulation an objective function as a separate constraint, which calculates the objective variable.

Figure 5: Variable panel for the optimization within MOSAIC, wherein objective and decision variables can be selected. Similarly, inequality constraints cannot directly be formulated in MOSAIC. They should be added as further equality constraints with user-defined slack variables. The lower and upper bounds on slacks, decision variables, and states can of course all be modified in the iteration variables frame shown in Fig. 5 in case they were not already correctly specified during the simulation. The optimization variables can also be chosen to be integers. Hence, both MILP and MINLP optimization are also possible. From this point on two path are possible within MOSAIC, either a language specificator for basic code generation is chosen and the code is executed in the user s optimization environments, or MOSAIC is hooked up to the NEOS server [8, 9], whose wide variety of solvers can be used. The import of external results works through the same panel as for the simulations. 3. Extension to Measurement Data Management So far the standard workflow assumes that parameters for models are known and indisputable. Of course this is scarcely ever the case. The consistent and lasting management of models, in connection with the corresponding models and parameter, is a not yet solved challenge. In conventional workflows, the measurement data files, the model files and the resulting estimated model parameter are not linked. This is also valid for the corresponding documentations. In most cases, it is no possible for outsiders to reconstruct the single changes in the corresponding measurement setup. Crucial information, are lost with time and are only known to the experimenter. The distribution on different platforms, results into different copies and versions of the different elements and several possibilities for errors. A new approach is the central storage in a database, where all interdependent research domains are connected. Modeling and measurements are directly connected and both benefit from a rapid knowledge transfer. The modeler gets new measurement values and updated parameter sets. On the other hand, it is possible to perform a more precise design of experiments. The combined management on the same online platform, results in an intensified documentation through the whole project life. In this section MOSAIC s measurement data management will be introduced and it will be discussed how this data is linked to and used to update models. 3.1. Description of a Plant or an Experiment Within MOSAIC the measurement data is hierarchically linked to the equipment it was measured with and the facility it was measured in. Fig. 6 shows how this is implemented. The measurement data management consists of three layers. The first is the plant or facility, with which the data was obtained. The second layer is the configuration or equipment, which was used for the measurements. The configuration level allows for attachments of PFDs and PIDs, details on the measurement devices, the set-up etc. Also included are a timestamp and a comprehensive list of all measurement points. Each configuration then contains sets of measurement data. A set therein are all data which belong to one measurement campaign. Each active sensor of the configuration has time-specific values with lower and upper errors given.

Figure 6: Measurement data management within MOSAIC: On the left hand side plants, configurations, and sets can be chosen. On the right hand side the editor for modifying an existing measurement data set is shown. 3.2. Import and Storage of the Measurement Data For importing new measurement data into MOSAIC, all sensor names of interest need to be added to the list of active measurement points. Next, a new set can be created, a sensor is selected, and data can be added using the import window. In the import window a file can be selected to import the data from. Optimally, this file is a CSV file, but other formats are supported. The user can then select, whether the file contains any header section, define how different columns are separated and select which columns contain the timestamp or index value, the value itself, the lower bound on the error, the upper, and an optional flag on the validity of the measurement information. Pressing import causes MOSAIC to parse all lines of the file, the respective data is imported, and a message on the number of data points is returned. Thanks to the timestamp-option large dynamic data sets for whole plant operations can be stored. Whenever only single operation points are of interest, the index-option can be chosen. Figure 7: Import window for adding new measurement data to MOSAIC. 3.3. Usage of Measurement Data in Modeling The main advantage of storing measurement data on the same platform as the models and variable specifications lies in the collaboration of experimenters, modelers, and optimizers. Accurate parameter estimation is one of the most challenging tasks during the modeling step of any application. The overall

transparency of the model-development process is increased by storing the measurement data, on which the parameter estimations are based on, in the same collaborative platform. A reuse of models is thus assisted. As soon as new data becomes available it can immediately be connected with existing models. At the moment, capabilities are being created in MOSAIC to directly connect experimental data to equations. This way, parameter estimation problems can be formulated, exported to any optimization environment, and evaluated therein. The estimated parameters can of course be imported into MOSAIC and are hence easily traced back to their respective experimental data. Also, repeating the parameter estimation, once new experimental data is obtained, is straight forward this way. Additional applications which can be pursued within MOSAIC based on the new measurement data management are online experimental design and the consecutive evaluation, measurement data validation and reconciliation. 4. Conclusions and Outlook This paper shows how MOSAIC can be used as multiuser and multiplatform interface for the derivation and organization of models and measurements. Models can be derived in a documentation framework, which can automatically create the needed implementation code for different simulation and optimization environments. The current features and used techniques of MOSAIC have been described with the focus on the collaboration between experimenters, modelers, and optimizers. The benefit is an integrated and consistent documentation of models and measurements, including the connection among each other and the development history. Finally it can help to increase the lifetime of models and measurements, due to the open storage and formulation, which is free of a specific implementation language. Work is under way to extend MOSAIC to support partial differential equation systems, their automatic discretization using collocation and a link to 2D and 3D plant design programs. Acknowledgements This work is part of the Collaborative Research Centre "Integrated Chemical Processes in Liquid Multiphase Systems" (TRR 63) and the Cluster of Excellence Unifying Concepts in Catalysis" coordinated by the Technische Universität Berlin. The financial support by the German Research Foundation (Deutsche Forschungsgemeinschaft, DFG) is gratefully acknowledged. References [1] Kuntsche, S., Arellano-Garcia, H., and Wozny, G. (2011) MOSAIC, an environment for web-based modeling in the documentation level, Computer Aided Chemical Engineering 29, 1140-1144 ISBN 978-0-444-54-298-4. [2] Robert Kraus, Victor Alejandro Merchan Restrepo, Harvey Arellano-Garcia and Günter Wozny, Hierarchical simulation of integrated chemical processes with a web based modeling tool, in: 11th International Symposium on Process Systems Engineering, pages 155-159, Elsevier, 2012 [3] Esche, E., Müller, D., Kraus, R., Wozny, G. (2013) Systematic approaches for model derivation for optimization purposes. Article submitted to Chemical Engineering Science [4] Victor Alejandro Merchan Restrepo, Robert Kraus, Tilman Barz, Harvey Arellano-Garcia and Günter Wozny, Generation of first and higher order derivative information out of the documentation level, in: 11th International Symposium on Process Systems Engineering, pages 950-954, Elsevier, 2012 [5] Braunschweig, B., Pantelides, C.C., Britt, H.I., Sama, S. (2000) Process modeling: The promise of open software architectures. Chemical Engineering Progress, 96(9), 65-76 [6] Buzzi-Ferraris, G., Manenti, F. (2012) BzzMath: Library Overview and Recent Advances in Numerical Methods, Computer-Aided Chemical Engineering, 30 (2), 1312-1316, DOI: 10.1016/B978-0-444-59520-1.50121-4 [7] Nowak, U., Weimann, L. (1991) A Family of Newton Codes for Systems of Highly Nonlinear Equations, Konrad-Zuse-Zentrum für Informationstechnik Berlin, Technical Report, TR 91-10 [8] Czyzyk. J., Mesnier, M., and Moré, J. (1998) The NEOS Server, IEEE Journal on Computational Science and Engineering, 5, 68-75 [9] Gropp, W. and Moré, J. (1997) Optimization Environments and the NEOS Server, Approximation Theory and Optimization, M. D. Buhmann and A. Iserles, eds., 167-182, Cambridge University Press