System Identification Algorithms and Techniques for Systems Biology

Similar documents
Modeling with Uncertainty Interval Computations Using Fuzzy Sets

Automatic basis selection for RBF networks using Stein s unbiased risk estimator

CITY UNIVERSITY OF HONG KONG 香港城市大學. Multiple Criteria Decision Processes for Voltage Control of Large-Scale Power Systems 大規模電力系統電壓穩定性控制的多目標決策過程

B.Sc. & B.Sc. (Hons) with Major in Applied Mathematics

Principles of Network Economics

An Algorithm for Symbolic Computing of Singular Limits of Dynamical Systems

PARALLEL OPTIMIZATION

Chapter I INTRODUCTION. and potential, previous deployments and engineering issues that concern them, and the security

Mid-Year Report. Discontinuous Galerkin Euler Equation Solver. Friday, December 14, Andrey Andreyev. Advisor: Dr.

THE CHINESE UNIVERSITY OF HONG KONG Department of Information Engineering

International Graduate Program on Applied Artificial Intelligence and Cyber-Security

THE CHINESE UNIVERSITY OF HONG KONG Department of Information Engineering

CPIB SUMMER SCHOOL 2011: INTRODUCTION TO BIOLOGICAL MODELLING

Convex and Distributed Optimization. Thomas Ropars

UMass Lowell Computer Science Advanced Algorithms Computational Geometry Prof. Karen Daniels. Spring, Project

PlantSimLab An Innovative Web Application Tool for Plant Biologists

STATISTICS (STAT) Statistics (STAT) 1

LECTURE NOTES Non-Linear Programming

Application of Genetic Algorithms to CFD. Cameron McCartney

AM205: lecture 2. 1 These have been shifted to MD 323 for the rest of the semester.

Rules for Identifying the Initial Design Points for Use in the Quick Convergent Inflow Algorithm

ASSIUT UNIVERSITY. Faculty of Computers and Information Department of Information Technology. on Technology. IT PH.D. Program.

arxiv: v1 [cond-mat.dis-nn] 30 Dec 2018

Bioinformatics: Network Analysis

Parameter Estimation of a DC Motor-Gear-Alternator (MGA) System via Step Response Methodology

Graphical Approach to Solve the Transcendental Equations Salim Akhtar 1 Ms. Manisha Dawra 2

Mathematics Gap Analysis Phase I

A Logical Deduction Tool for Assembling Rule-Based Models

CITY UNIVERSITY OF HONG KONG 香港城市大學

Syntactic Measures of Complexity

Statistics (STAT) Statistics (STAT) 1. Prerequisites: grade in C- or higher in STAT 1200 or STAT 1300 or STAT 1400

College of Sciences Department of Mathematics and Computer Science. Assessment Plan Mathematics

FORMULATION AND BENEFIT ANALYSIS OF OPTIMIZATION MODELS FOR NETWORK RECOVERY DESIGN

INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad

Discovery of the Source of Contaminant Release

Introduction to Design Optimization

CITY UNIVERSITY OF HONG KONG 香港城市大學. Human Detection and Tracking in Central Catadioptric Omnidirectional Camera System 基於中心折反射相機系統下的行人檢測和追蹤

Approximate Graph Patterns for Biological Network

Enhanced Web Log Based Recommendation by Personalized Retrieval

ADAPTIVE TILE CODING METHODS FOR THE GENERALIZATION OF VALUE FUNCTIONS IN THE RL STATE SPACE A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL

Linear Bilevel Programming With Upper Level Constraints Depending on the Lower Level Solution

744 Motooka Nishi-ku Fukuoka , Japan Phone/Fax:

Knowledge Discovery and Data Mining

CHAPTER 1 INTRODUCTION

Generalized Additive Models

Summary and Conclusions

The Impact of Relational Model Bases on Organizational Decision Making: Cases in E- Commerce and Ecological Economics

CHAPTER 2 CONVENTIONAL AND NON-CONVENTIONAL TECHNIQUES TO SOLVE ORPD PROBLEM

INSTRUCTIONAL FOCUS DOCUMENT Algebra II

Metaheuristic Optimization with Evolver, Genocop and OptQuest

COMPUTATIONAL CHALLENGES IN HIGH-RESOLUTION CRYO-ELECTRON MICROSCOPY. Thesis by. Peter Anthony Leong. In Partial Fulfillment of the Requirements

Global Solution of Mixed-Integer Dynamic Optimization Problems

Automated Item Banking and Test Development Model used at the SSAC.

Contents. I Basics 1. Copyright by SIAM. Unauthorized reproduction of this article is prohibited.

Parallel stochastic simulation using graphics processing units for the Systems Biology Toolbox for MATLAB

EECS 144/244. Fundamental Algorithms for System Modeling, Analysis, and Optimization. Lecture 1: Introduction, Systems

Using Weighted Least Squares to Model Data Accurately. Linear algebra has applications across many, if not all, mathematical topics.

Greed Considered Harmful

Programs for MDE Modeling and Conditional Distribution Calculation

MATHEMATICAL ANALYSIS, MODELING AND OPTIMIZATION OF COMPLEX HEAT TRANSFER PROCESSES

Engineering Design Notes I Introduction. EE 498/499 Capstone Design Classes Klipsch School of Electrical & Computer Engineering

BioNSi - Biological Network Simulation Tool

Analysis and optimization methods of graph based meta-models for data flow simulation

Pacific Symposium on Biocomputing 4: (1999) The Virtual Cell

Version Control on Database Schema and Test Cases from Functional Requirements Input Changes

Convex combination of adaptive filters for a variable tap-length LMS algorithm

Model Parameter Estimation

Dynamic data integration and stochastic inversion of a two-dimensional confined aquifer

Locally Weighted Least Squares Regression for Image Denoising, Reconstruction and Up-sampling

Mobile Robotics. Mathematics, Models, and Methods. HI Cambridge. Alonzo Kelly. Carnegie Mellon University UNIVERSITY PRESS

Knowledge-based Systems for Industrial Applications

The Design and Implementation of a Modeling Package

Department of Computer Science and Engineering

DIGITAL COMMUNICATIONS WITH CHAOS. Multiple Access Techniques and Performance Evaluation. Wai M. Tam, Francis C. M. Lau and Chi K.

Data Engineering Fuzzy Mathematics in System Theory and Data Analysis

Recent developments in simulation, optimization and control of flexible multibody systems

Tabu search and genetic algorithms: a comparative study between pure and hybrid agents in an A-teams approach

Image Processing. Filtering. Slide 1

Parameter Estimation in Differential Equations: A Numerical Study of Shooting Methods

Applied Interval Analysis

List of figures List of tables Acknowledgements

Optimization of Noisy Fitness Functions by means of Genetic Algorithms using History of Search with Test of Estimation

packet-switched networks. For example, multimedia applications which process

the Simulation of Dynamics Using Simulink

Research Article Modeling and Simulation Based on the Hybrid System of Leasing Equipment Optimal Allocation

Self-formation, Development and Reproduction of the Artificial System

Module 1 Lecture Notes 2. Optimization Problem and Model Formulation

GNU MCSim Frederic Yves Bois Chair of mathematical modeling and system biology for predictive toxicology

Proposal of Research Activity. PhD Course in Space Sciences, Technologies and Measurements (STMS)

Now each of you should be familiar with inverses from your previous mathematical

A Probabilistic Approach to the Hough Transform

New Rules of ME Ph.D. Qualifying Exams 1/8

Transactions on Information and Communications Technologies vol 16, 1996 WIT Press, ISSN

GOAL GEOMETRIC PROGRAMMING PROBLEM (G 2 P 2 ) WITH CRISP AND IMPRECISE TARGETS

Dynamic estimation of specific fluxes in metabolic networks using non-linear dynamic optimization

SAAM II Version 2.1 Basic Tutorials. Working with Parameters Basic

Time Series Analysis by State Space Methods

Undergraduate Program for Specialty of Software Engineering

Houghton Mifflin MATHSTEPS Level 7 correlated to Chicago Academic Standards and Framework Grade 7

Model-Solver Integration in Decision Support Systems: A Web Services Approach

Transcription:

System Identification Algorithms and Techniques for Systems Biology by c Choujun Zhan A Thesis submitted to the School of Graduate Studies in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of Electronic Engineering City University of Hong Kong 08/2011 Tat Chee Avenue, Kowloon, Hong Kong SAR

Abstract Mathematical models for revealing the dynamics and interaction properties of biological systems play an important role in computational systems biology. This PhD work is motivated by the current difficulty in system identification of dynamic biochemical pathways, given limited highly noisy and spare time-course experimental data. In this thesis, the inverse problem of identifying unknown parameters of dynamical biological systems, which are modelled by ordinary differential equations (ODEs) or delay-differential equations (DDEs), is treated using experimental data. In some cases, even the model can sufficiently describe the measured data, it is still important to infer how well the model parameters are determined by the amount and quality of the available experimental data, which is essential for investigation of model prediction. For this reason, another key topic in this thesis is identifiability analysis. The main contributions of this PhD work are summarized as follows: 1. In many cases, bio-system models are autonomous systems, which are linear in parameters. For this type of models, an optimization-based parameter estimation approach is proposed. Spline and numerical differentiation methods are used to smooth noisy observations and to estimate the time derivative of the underlying dynamical system, respectively. Subsequently, the parameter estimation problem can be reduced to a Least-Squares Parameter Estimation (LSPE) or a Linear Programming Parameter Estimation (LPPE) problem, which can ii

then be efficiently solved by many global optimization algorithms. 2. For general bio-system models, a parameter estimation method combining spline theory with Nonlinear Programming (NLP) is developed. This method removes the need for ODE solvers during the identification process. Our analysis shows that the augmented cost function surface used in the proposed method is smoother; which can ease the optimal searching process and hence enhance the robustness and speed of the search algorithm. Moreover, the core of our algorithms is NLP based, which is flexible and where consequently additional constraints can be embedded/removed easily. 3. In practice, time-delay feedback pathways exist in many biological systems, which can be modelled by continuous delay-differential equations (DDEs). In this work, a two-stage approach is adopted for parameter estimation: first, by combining spline theory and NLP, the parameter estimation problem is formulated as an optimization problem with only algebraic constraints; then, a new differential evolution (DE) algorithm is proposed to find a feasible solution. The approach is designed to handle problems of realistic sizes with noisy observation data. 4. Identifiability analysis of the so-called S-system is given. The basic theory is developed and the structural identifiability of the S-system is proved. This work also analyzes the limitation of existing structural identification approaches, revealing that these approaches face the risk of the overfitting/underfitting problem. iii

Table of Contents Abstract ii Acknowledgments iv Table of Contents viii List of Tables x List of Figures xiv Abstract 1 Acknowledgments 3 1 Introduction 1 1.1 Overview.................................. 1 1.2 Thesis Objectives............................. 6 1.3 Thesis Contributions........................... 7 1.4 Thesis Outline............................... 8 2 Biochemical Pathway Modeling Review 11 2.1 Pathway Model Representation I: Chemical Master Equations (CMEs) Model................................... 14 v

2.1.1 A simple example to illustrate the major idea......... 15 2.1.2 From CME model to ODE model................ 19 2.2 Building General CME Model Using Enzyme Kinetic Reaction Model as Example................................ 21 2.2.1 General representation and notation of reaction channel.... 21 2.2.2 Combination number of reaction molecules and the probability assumption............................ 24 2.2.3 General representation of CME model............. 25 2.2.4 Stochastic simulation: Gillespie modeling and Gillespie algorithm 30 2.2.5 Examples to illustrate stochastic simulation.......... 36 2.3 Pathway Model Representation II: ODE Model and DDE Model... 38 2.3.1 GMA model............................ 41 2.3.2 S-system model.......................... 42 2.3.3 Michaelis-Menten model..................... 44 2.4 Parameter Estimation.......................... 49 2.4.1 Existing techniques for biochemical systems.......... 50 2.4.2 General description of parameter estimation.......... 53 2.5 Structural Identifiability and Practical Identifiability......... 55 3 Spline-Based Convex Optimization Method 59 3.1 Preliminary on Parameter Estimation.................. 61 3.1.1 Introduction of splines...................... 61 3.1.2 Preliminaries on parameter estimation............. 62 3.2 Parameter Estimation of ODE/DDE Models.............. 65 3.2.1 State estimation by spline smoothing.............. 66 3.2.2 Estimating derivatives by numerical differentiation method.. 68 3.2.3 Least-squares and linear programming parameter estimation. 69 vi

3.3 Simulation Examples........................... 73 3.3.1 Enzyme kinetic model...................... 74 3.3.2 Lorenz system........................... 77 3.3.3 Time-delay chaotic system.................... 79 3.4 Conclusion................................. 81 4 Parameter Estimation Approach for General ODE Model Based on Spline with Nonlinear Programming 82 4.1 Parameter Estimation Problem of General ODE Model........ 84 4.2 Parameter Estimation Method Combining Spline Theory and NLP.. 85 4.3 Computation Results........................... 87 4.3.1 Mammalian G1/S transition network model.......... 87 4.3.2 Yeast fermentation pathway................... 94 4.4 Discussion and Conclusion........................ 97 5 New Differential Evolution Algorithm for General DDE Model 100 5.1 Problem Formulation with General DDE Model............ 102 5.2 Parameter Estimation Method Combining Spline and New DE Algorithm103 5.2.1 Spline smoother-based cost function............... 103 5.2.2 Proposed Differential Evolution (DE) algorithm for parameter estimation............................. 106 5.3 Identification Results........................... 111 5.3.1 Mammalian G1/S transition network model.......... 111 5.3.2 Yeast fermentation pathway................... 115 5.3.3 JAK-STAT signalling pathway model.............. 117 5.4 Conclusion and Discussion........................ 119 vii

6 Analysis of Structural and Practical Identifiability of S-system 121 6.1 Theoretical Analysis of Structural Identifiability of S-system..... 122 6.1.1 S-System formalism........................ 123 6.1.2 Priori mathematical assumption of S-system.......... 124 6.1.3 Structural identifiability of S-system.............. 124 6.1.4 Structural identification approach............... 133 6.2 Yeast Fermentation Pathway Model for Illustrating the Structural Identifiability of S-system........................... 135 6.3 Analysis of Practical Identifiability................... 138 6.3.1 The overfitting problem faced by practical structural identification138 6.3.2 The underfitting problem by using approaches in literatures. 140 6.4 Conclusion and Discussion........................ 144 7 Conclusions and Future Work 145 7.1 Summary and Conclusions........................ 145 7.2 Future Work................................ 147 viii