System Identification Algorithms and Techniques for Systems Biology by c Choujun Zhan A Thesis submitted to the School of Graduate Studies in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of Electronic Engineering City University of Hong Kong 08/2011 Tat Chee Avenue, Kowloon, Hong Kong SAR
Abstract Mathematical models for revealing the dynamics and interaction properties of biological systems play an important role in computational systems biology. This PhD work is motivated by the current difficulty in system identification of dynamic biochemical pathways, given limited highly noisy and spare time-course experimental data. In this thesis, the inverse problem of identifying unknown parameters of dynamical biological systems, which are modelled by ordinary differential equations (ODEs) or delay-differential equations (DDEs), is treated using experimental data. In some cases, even the model can sufficiently describe the measured data, it is still important to infer how well the model parameters are determined by the amount and quality of the available experimental data, which is essential for investigation of model prediction. For this reason, another key topic in this thesis is identifiability analysis. The main contributions of this PhD work are summarized as follows: 1. In many cases, bio-system models are autonomous systems, which are linear in parameters. For this type of models, an optimization-based parameter estimation approach is proposed. Spline and numerical differentiation methods are used to smooth noisy observations and to estimate the time derivative of the underlying dynamical system, respectively. Subsequently, the parameter estimation problem can be reduced to a Least-Squares Parameter Estimation (LSPE) or a Linear Programming Parameter Estimation (LPPE) problem, which can ii
then be efficiently solved by many global optimization algorithms. 2. For general bio-system models, a parameter estimation method combining spline theory with Nonlinear Programming (NLP) is developed. This method removes the need for ODE solvers during the identification process. Our analysis shows that the augmented cost function surface used in the proposed method is smoother; which can ease the optimal searching process and hence enhance the robustness and speed of the search algorithm. Moreover, the core of our algorithms is NLP based, which is flexible and where consequently additional constraints can be embedded/removed easily. 3. In practice, time-delay feedback pathways exist in many biological systems, which can be modelled by continuous delay-differential equations (DDEs). In this work, a two-stage approach is adopted for parameter estimation: first, by combining spline theory and NLP, the parameter estimation problem is formulated as an optimization problem with only algebraic constraints; then, a new differential evolution (DE) algorithm is proposed to find a feasible solution. The approach is designed to handle problems of realistic sizes with noisy observation data. 4. Identifiability analysis of the so-called S-system is given. The basic theory is developed and the structural identifiability of the S-system is proved. This work also analyzes the limitation of existing structural identification approaches, revealing that these approaches face the risk of the overfitting/underfitting problem. iii
Table of Contents Abstract ii Acknowledgments iv Table of Contents viii List of Tables x List of Figures xiv Abstract 1 Acknowledgments 3 1 Introduction 1 1.1 Overview.................................. 1 1.2 Thesis Objectives............................. 6 1.3 Thesis Contributions........................... 7 1.4 Thesis Outline............................... 8 2 Biochemical Pathway Modeling Review 11 2.1 Pathway Model Representation I: Chemical Master Equations (CMEs) Model................................... 14 v
2.1.1 A simple example to illustrate the major idea......... 15 2.1.2 From CME model to ODE model................ 19 2.2 Building General CME Model Using Enzyme Kinetic Reaction Model as Example................................ 21 2.2.1 General representation and notation of reaction channel.... 21 2.2.2 Combination number of reaction molecules and the probability assumption............................ 24 2.2.3 General representation of CME model............. 25 2.2.4 Stochastic simulation: Gillespie modeling and Gillespie algorithm 30 2.2.5 Examples to illustrate stochastic simulation.......... 36 2.3 Pathway Model Representation II: ODE Model and DDE Model... 38 2.3.1 GMA model............................ 41 2.3.2 S-system model.......................... 42 2.3.3 Michaelis-Menten model..................... 44 2.4 Parameter Estimation.......................... 49 2.4.1 Existing techniques for biochemical systems.......... 50 2.4.2 General description of parameter estimation.......... 53 2.5 Structural Identifiability and Practical Identifiability......... 55 3 Spline-Based Convex Optimization Method 59 3.1 Preliminary on Parameter Estimation.................. 61 3.1.1 Introduction of splines...................... 61 3.1.2 Preliminaries on parameter estimation............. 62 3.2 Parameter Estimation of ODE/DDE Models.............. 65 3.2.1 State estimation by spline smoothing.............. 66 3.2.2 Estimating derivatives by numerical differentiation method.. 68 3.2.3 Least-squares and linear programming parameter estimation. 69 vi
3.3 Simulation Examples........................... 73 3.3.1 Enzyme kinetic model...................... 74 3.3.2 Lorenz system........................... 77 3.3.3 Time-delay chaotic system.................... 79 3.4 Conclusion................................. 81 4 Parameter Estimation Approach for General ODE Model Based on Spline with Nonlinear Programming 82 4.1 Parameter Estimation Problem of General ODE Model........ 84 4.2 Parameter Estimation Method Combining Spline Theory and NLP.. 85 4.3 Computation Results........................... 87 4.3.1 Mammalian G1/S transition network model.......... 87 4.3.2 Yeast fermentation pathway................... 94 4.4 Discussion and Conclusion........................ 97 5 New Differential Evolution Algorithm for General DDE Model 100 5.1 Problem Formulation with General DDE Model............ 102 5.2 Parameter Estimation Method Combining Spline and New DE Algorithm103 5.2.1 Spline smoother-based cost function............... 103 5.2.2 Proposed Differential Evolution (DE) algorithm for parameter estimation............................. 106 5.3 Identification Results........................... 111 5.3.1 Mammalian G1/S transition network model.......... 111 5.3.2 Yeast fermentation pathway................... 115 5.3.3 JAK-STAT signalling pathway model.............. 117 5.4 Conclusion and Discussion........................ 119 vii
6 Analysis of Structural and Practical Identifiability of S-system 121 6.1 Theoretical Analysis of Structural Identifiability of S-system..... 122 6.1.1 S-System formalism........................ 123 6.1.2 Priori mathematical assumption of S-system.......... 124 6.1.3 Structural identifiability of S-system.............. 124 6.1.4 Structural identification approach............... 133 6.2 Yeast Fermentation Pathway Model for Illustrating the Structural Identifiability of S-system........................... 135 6.3 Analysis of Practical Identifiability................... 138 6.3.1 The overfitting problem faced by practical structural identification138 6.3.2 The underfitting problem by using approaches in literatures. 140 6.4 Conclusion and Discussion........................ 144 7 Conclusions and Future Work 145 7.1 Summary and Conclusions........................ 145 7.2 Future Work................................ 147 viii