M odel Selection by Functional Decomposition of M ulti-proxy Flow Responses

Similar documents
Joint quantification of uncertainty on spatial and non-spatial reservoir parameters

Modeling Uncertainty in the Earth Sciences Jef Caers Stanford University

Uncertainty Quantification Using Distances and Kernel Methods Application to a Deepwater Turbidite Reservoir

Programs for MDE Modeling and Conditional Distribution Calculation

Adaptive spatial resampling as a Markov chain Monte Carlo method for uncertainty quantification in seismic reservoir characterization

A Parallel, Multiscale Approach to Reservoir Modeling. Omer Inanc Tureyen and Jef Caers Department of Petroleum Engineering Stanford University

Integration of Geostatistical Modeling with History Matching: Global and Regional Perturbation

A Geostatistical and Flow Simulation Study on a Real Training Image

Functional Data Analysis

Geostatistical Reservoir Characterization of McMurray Formation by 2-D Modeling

A Soft Computing-Based Method for the Identification of Best Practices, with Application in the Petroleum Industry

DI TRANSFORM. The regressive analyses. identify relationships

A Data-Driven Smart Proxy Model for A Comprehensive Reservoir Simulation

Clustering and Visualisation of Data

Assessing the Quality of the Natural Cubic Spline Approximation

PTE 519 Lecture Note Finite Difference Approximation (Model)

History Matching: Towards Geologically Reasonable Models

Exploring Econometric Model Selection Using Sensitivity Analysis

Predict Outcomes and Reveal Relationships in Categorical Data

Calibration of NFR models with interpreted well-test k.h data. Michel Garcia

We G Updating the Reservoir Model Using Engineeringconsistent

On internal consistency, conditioning and models of uncertainty

Exploring Direct Sampling and Iterative Spatial Resampling in History Matching

Homogenization and numerical Upscaling. Unsaturated flow and two-phase flow

Machine Learning: An Applied Econometric Approach Online Appendix

Discovery of the Source of Contaminant Release

Image Compression With Haar Discrete Wavelet Transform

Short Note: Some Implementation Aspects of Multiple-Point Simulation

Workshop - Model Calibration and Uncertainty Analysis Using PEST

Developing a Smart Proxy for the SACROC Water-Flooding Numerical Reservoir Simulation Model

Locally Weighted Least Squares Regression for Image Denoising, Reconstruction and Up-sampling

Numerical Methods for (Time-Dependent) HJ PDEs

Tensor Based Approaches for LVA Field Inference

Variogram Inversion and Uncertainty Using Dynamic Data. Simultaneouos Inversion with Variogram Updating

Multi-Objective Stochastic Optimization by Co-Direct Sequential Simulation for History Matching of Oil Reservoirs

Challenge Problem 5 - The Solution Dynamic Characteristics of a Truss Structure

A012 A REAL PARAMETER GENETIC ALGORITHM FOR CLUSTER IDENTIFICATION IN HISTORY MATCHING

Dynamics and Vibrations Mupad tutorial

Truncation Errors. Applied Numerical Methods with MATLAB for Engineers and Scientists, 2nd ed., Steven C. Chapra, McGraw Hill, 2008, Ch. 4.

A CURVELET-BASED DISTANCE MEASURE FOR SEISMIC IMAGES. Yazeed Alaudah and Ghassan AlRegib

SMART WELL MODELLING. Design, Scenarios and Optimisation

SCRF. 22 nd Annual Meeting. April 30-May

Novel Lossy Compression Algorithms with Stacked Autoencoders

Reservoir Modeling Combining Geostatistics with Markov Chain Monte Carlo Inversion

Week 7 Picturing Network. Vahe and Bethany

Structural and Syntactic Pattern Recognition

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

A009 HISTORY MATCHING WITH THE PROBABILITY PERTURBATION METHOD APPLICATION TO A NORTH SEA RESERVOIR

A workflow to account for uncertainty in well-log data in 3D geostatistical reservoir modeling

Uncertainty Quantification and Sensitivity Analysis of Reservoir Forecasts with Machine Learning

CONDITIONAL SIMULATION OF TRUNCATED RANDOM FIELDS USING GRADIENT METHODS

A low rank based seismic data interpolation via frequencypatches transform and low rank space projection

FUZZY INFERENCE SYSTEMS

Bagging for One-Class Learning

RM03 Integrating Petro-elastic Seismic Inversion and Static Model Building

Curve fitting. Lab. Formulation. Truncation Error Round-off. Measurement. Good data. Not as good data. Least squares polynomials.

CS 229: Machine Learning Final Report Identifying Driving Behavior from Data

Consistently integrate static and dynamic data into reservoir models while accounting for uncertainty the ResX approach

3 Nonlinear Regression

CHAPTER 8 COMPOUND CHARACTER RECOGNITION USING VARIOUS MODELS

2016 Stat-Ease, Inc. & CAMO Software

Character Recognition

An Efficient, Geometric Multigrid Solver for the Anisotropic Diffusion Equation in Two and Three Dimensions

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS /$ IEEE

ISSN (ONLINE): , VOLUME-3, ISSUE-1,

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017

Ensemble-based decision making for reservoir management present and future outlook. TPD R&T ST MSU DYN and FMU team

AN IMPROVED HYBRIDIZED K- MEANS CLUSTERING ALGORITHM (IHKMCA) FOR HIGHDIMENSIONAL DATASET & IT S PERFORMANCE ANALYSIS

Simulation of In-Cylinder Flow Phenomena with ANSYS Piston Grid An Improved Meshing and Simulation Approach

Closing the Loop via Scenario Modeling in a Time-Lapse Study of an EOR Target in Oman

Introduction to Exploratory Data Analysis

Review of feature selection techniques in bioinformatics by Yvan Saeys, Iñaki Inza and Pedro Larrañaga.

Multivariate Standard Normal Transformation

Introduction. Medical Images. Using Medical Images as Primary Outcome Measures. Joint work with Emma O Connor. Acknowledgements

Using 3D-DEGA. Omer Inanc Tureyen and Jef Caers Department of Petroleum Engineering Stanford University

Fluid flow modelling with seismic cluster analysis

WELCOME! Lecture 3 Thommy Perlinger

CS 229 Final Project Report Learning to Decode Cognitive States of Rat using Functional Magnetic Resonance Imaging Time Series

Modeling Ground Water Problems Using the Complex Polynomial Method

MSA220 - Statistical Learning for Big Data

Learning from Data Linear Parameter Models

Face Recognition using Eigenfaces SMAI Course Project

TPG4160 Reservoir simulation, Building a reservoir model

CSE 586 Final Programming Project Spring 2011 Due date: Tuesday, May 3

1

Smoothing Dissimilarities for Cluster Analysis: Binary Data and Functional Data

Chapter 7. Conclusions and Future Work

7.1 INTRODUCTION Wavelet Transform is a popular multiresolution analysis tool in image processing and

Modeling with Uncertainty Interval Computations Using Fuzzy Sets

Downscaling saturations for modeling 4D seismic data

PARALLELIZATION OF THE NELDER-MEAD SIMPLEX ALGORITHM

A Modified Spline Interpolation Method for Function Reconstruction from Its Zero-Crossings

B. Todd Hoffman and Jef Caers Stanford University, California, USA

Factorization with Missing and Noisy Data

Seven Techniques For Finding FEA Errors

PATTERN CLASSIFICATION AND SCENE ANALYSIS

The Effect of Changing Grid Size in the Creation of Laser Scanner Digital Surface Models

Driven Cavity Example

International Journal of Digital Application & Contemporary research Website: (Volume 1, Issue 7, February 2013)

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

Transcription:

M odel Selection by Functional Decomposition of M ulti-proxy Flow Responses Report Prepared for SCRF Affiliates Meeting, Stanford University 1 Ognjen Grujic and Jef Caers A bstract Time constraints play a significant role in decision making in oil and gas industry. Decisions need to be made quickly, while quantification of geological uncertainties in support of the decision requires large amount of time. This time requirement comes from the fact that proper uncertainty quantification requires considerable effort in modeling and simulation which is highly time consuming both in the domain of computational power and in the domain of and manpower. The typical engineering solution to such problem is to simplify. Simplifications may come in many forms, but the most common ones are on the flow simulation side, where many proxy flow methods have been developed over the years. Needless to say that any approximation no matter how good it is, introduces certain degree of error into the analysis and subsequently affects decisions that any such uncertainty quantification is supporting. The general opinion amongst reservoir simulation and modeling community is that high fidelity flow simulations are at some point unavoidable and that any uncertainty quantification should strive towards making use of such models. This fact along with the ubiquitous problem of time constraints motivated ideas/solutions commonly referred to as Model Selection. SCRF has worked on model selection ideas quite extensively over the past few years. To date, our main contribution is the well-known distance kernel method (DKM) where we approximate distances of high fidelity simulations with some less expensive (time wise) simulation approach, and map these distances in low dimensional space with multidimensional scaling (MDS) producing an effective space for ultimate model selection. Few things that are required in practical problems but remained challenging for DKM approach were: multivariate distance approximation, and incorporation of scalar parameters in distance approximations namely original oil in place which is commonly used metric by reservoir engineers. In this paper we address these limitations by introducing concepts of functional data analysis (FDA) to model selection problem. We give a brief introduction to FDA and we also test our ideas on a synthetic reservoir case study. Throughout our study we tried our best to compare the results achieved with FDA methods, with the ones obtained with distance kernel method with MDS (whenever it was possible).

Introduction Decision analysis concepts formulated by Howard in 19 s are today s absolute norm when it comes to decision making in oil and gas project developments. The main idea behind these concepts is to achieve clarity of thought by identifying relevant uncertainties and perform detailed assessments of their influence on particular decisions at hand. This identification and assessment of relevant uncertainties is perhaps the most challenging part of every decision analysis project. Uncertainties are expressed in a form of probability density functions, whose assessment is often complex and time consuming. This is especially true when it comes to oil and gas decision problems where uncertainties are quantified through complex procedures of geostatistical modeling and flow simulations, which often require teams of people and very long computational hours. The nature of oil and gas decisions is such that tight time constraints are almost always imposed, which immediately rules Monte Carlo approaches as inappropriate. This fact motivated the development of approximate techniques for flow simulations that mainly focused on reducing computational time, by performing clever model simplifications. Of course, no simplification comes without a cost, or in statistical terminology There is no free lunch. Computational boosts usually are a consequence of reduced accuracy, which could become quite influential factor in the big picture of decision analysis. The general attitude amongst reservoir modeling community is that nothing is good enough to replace computationally expensive high fidelity flow simulations. This attitude motivated the development of a subset of methods commonly referred to as model selection where an attempt is made to cleverly select only a portion of a large ensemble of geostatistical realizations or scenarios and evaluate them with computationally expensive high fidelity flow simulations, with a hope that such subsample most accurately approximates true distributions acquirable only with exhaustive methods such as Monte Carlo. In fact, to calculate the deciles of any dynamic response, one would only need to select 1 models such that, when flow simulation is performed on them, the decile-response can be calculated exactly. The challenge therefore is to find an amount of models (probably more than 1) as small as possible from which those deciles can be calculated. The general strategy at SCRF has been to use proxy flow simulations only as a guide in the search for the most representative subset of models, or candidates for full-physics simulations. Current state of the art in this domain is the well-known distance kernel method [5,,7] where we approximate true distances between computationally expensive flow responses, with less Uncertainty Quantification with Proxy Flow Models 1

expensive proxy responses. We further employ multidimensional scaling procedure and kernel based clustering to select the most representative candidates for computationally expensive flow simulation. Quite often we are interested in producing sampling spaces that are based on several proxy measures of our ensemble of models. A drawback of the DKM approach is that it requires discretization of flow profiles, since distances between the curves need to be computed at the concurrent time steps. This unfortunate fact may lead to loss of information since time dependent variations are essentially being removed from the analyses. Work presented in this report has several objectives. The first and most important is to incorporate time when exploring variation between flow responses and subsequently use such information in producing appropriate clustering spaces and model selection. The second objective is to expand the ideas of model selection into multivariate space, since in many practical applications multivariate responses (oil and water, and/or gas) need to be considered. This could introduce valuable information into our model selection procedure. Finally, common reservoir engineering applications consider original oil in place, as a very important parameter for what engineer s call model ranking. We made an effort to develop workflow that would be capable of incorporating such information as well. It turns out that the answers to many of our questions can be found in relatively new family of statistical methods, commonly referred to as Functional data analysis (FDA). The first part of this report is dedicated to a brief introduction to functional data analysis concepts, with the main focus on ideas useful for model selection. The second part of the report is reserved for a small case study, for which we developed a synthetic reservoir models that were flow simulated with high fidelity flow simulations and two types of proxies, simple upscaling and flow diagnostics. Last but not least, we also provide a quick start guide to functional data analysis for all those readers who like the concepts presented in this paper and would like to apply them on their own reservoir studies. Uncertainty Quantification with Proxy Flow Models

Functional Data Analysis Functional data analysis starts from a very common problem of data sampling. Often in nature, we collect data of some physical process that can be fully described with differential equations. Problem is that this data collection is usually conducted over irregular time and space intervals, and underlying process that produced this data is unknown. FDA s main idea is to fit analytical functions to such data in a non-parametric fashion and further carry out the analysis with these analytical representations of the measured data. Much like multivariate data analysis, functional data analysis explores variation in the data through principal component analysis; only in this case, principal components are not represented by Eigen vectors but rather Eigen functions, due to the incorporation of argument variable (usually time). Fitting analytical functions also enables study of derivatives on recorded data, something that would be impossible with conventional multivariate statistical approaches. The first step in FDA is reserved for analytical function fitting also known as basis expansion. The second part is most commonly reserved for exploration of variance, commonly carried out by functional principal component analysis. We discuss both of these steps in continuance of this section. Additionally we discuss a very valuable element of functional data analysis, multivariate principal component analysis and mixed principal component analysis. We found the last two techniques to be crucial solutions to our problem of jointly incorporating oil water and gas into model ranking procedures, as well as scalar parameters such as OOIP. Basis expansion The idea is that measurements of some functional data (such as for example reservoir production rate as function of time) are taken over some time intervals (commonly irregular). The objective is to represent such data in some sort of analytical form. In functional data analysis this is accomplished by establishing series of basis functions multiplied with appropriate coefficients: P y i(t) = c j i φ j (t) j=1.. (1) Where: - y i(t) is functional approximation of measured data; - φ j (t) j-th basis function; - c i j - Coefficient that multiplies j-th basis function; Uncertainty Quantification with Proxy Flow Models 3

This requires selection of a basis system and number of basis functions by the analyst. Most commonly used basis systems are B-Splines and Fourier basis systems 1. Figure 1 below provides a very basic example of these two basis systems. Ramsay and Silverman [1,] point out that Fourier basis is the most suitable for periodic data, whereas B-Spline basis system is pretty much applicable for anything else. Given the nature of curves that we commonly deal with in petroleum engineering (production profiles, well logs, etc.) the B-spline basis system would be the most preferred choice in almost all applications, including model-selection problem that this paper is dealing with. Figure 1 Example of common basis function ensembles (Left-B-Spline basis system, R ight-fourier Basis System) Just to provide a more intuitive sense of a basis expansion we provide a visual example of this procedure in figure below. What is obvious form figure is that basis functions do not have the same height as is the case shown in figure 1 (left). This is due to the fact that at this stage we already multiplied each basis function with its respective coefficient. Each dashed curve in figure represents one product under the sum in equation 1 above. Please note that edge bases products have the exact same height as the data they are trying to fit. This is because the technique requires discontinuity in both curve and its derivatives at the edges of time domain that is being considered. It is obvious form figure below that the sum of given basis products produces the full black curve plotted above them, which as the figure shows produces pretty decent fit of the measured data given in red dots. 1 Please note that functional data analysis community developed other types of basis systems besides B- Splines and Fourier basis. For more details please refer to Ramsay and Silverman (5 & 1997). Uncertainty Quantification with Proxy Flow Models

Figure Example of Basis Expansion. Dashed lines are basis functions multiplied with their coefficients (see equation 1). R ed dots represent original data (measurements). Full black line is the result of basis expansion Functional Principal Com ponent Analysis Besides producing representations of discretely measured functional data, the basis expanded approximations also act as dimensionality reduction technique. Functional data analysis also allows one more step in dimensionality reduction through functional principal component analysis, which also enables exploration of variance within the data through analysis of principal component scores. Much like what we are used to doing in conventional multivariate data analysis and distance based multidimensional scaling approaches. The main difference between multivariate functional data analysis and functional data analysis lies in the principal components. In a multivariate sense principal components are vectors, while in the functional data analysis sense they are functions, in fact, eigen functions. There is also a small difference in the way we compute principal component scores. Table given below provides a more detailed comparison of the two methods: Uncertainty Quantification with Proxy Flow Models 5

Table 1 Comparison between PCA and fpca (source: W ikipedia) M ixed Principal Com ponent Analysis In uncertainty quantification procedures our models are featured with functional and scalar components. Functional components are oil, water, and/or gas rates vs. time, while the most important scalar component is original oil in place (OOIP). A very valid question at this stage is how to use all these pieces of information together in performing model selection? Fortunately enough, our industry is not the only discipline that deals with such data; there are many more examples in statistical literature that deal with quite similar mixed data. Functional data analysis community went furthest in providing meaningful solutions to this problem, whose most significant results is, the so called: Mixed Principal Component Analysis. The main idea behind mixed PCA is to decompose the problem into two parts. First part is the functional part that we process with basis expansion approach outlined before, which enables us to represent each curve as a vector of coefficients (basis multipliers). The second part is the scalar part, or in our example, original oil in place. What we accomplished with this is that we transformed the mixed problem into entirely multivariate problem consisting of basis coefficients and original scalar data (oil in place). At this point it is pretty much obvious that the next step in model selection boils down to simple and well known multivariate principal component analysis which is a well-known technique that does not require any discussion here. M ultivariate Principal Com ponent A nalysis Quite often several functional variables are recorded simultaneously over the same argument value, most commonly, time. Typical examples of such situation in the oil and gas industry are oil and water rates. If we are to compute principal components of such multivariate data we would first have to note that such principal components would now be defined by a bivariate- Uncertainty Quantification with Proxy Flow Models

vector ξ= (ξ O, ξ W ), where each vector defines variation within one variable space (ξ O -oil, ξ w - water). Practical computations of multivariate principal components, and principal component scores is much like previously discussed mixed PCA, only in this case we are dealing with multiple sources of functional data without any scalars such as for example OOIP. What this means is that each function is replaced with its coefficients, originating from appropriate basis expansion of multivariate function. Hence, transforming the problem into a simple multivariate case that we can easily solve with conventional PCA. This approach is very similar to previously described mixed principal component analysis. Comparison of M odel Selection Approaches D R eservoir M odels U sed In Our Case Studies In order to compare previously described ideas for model selection space building we developed a small reservoir case study. We generated a total of 1 reservoir models with two facies: channelized high permeability sand embedded into low permeability shaly sand. All 1 realizations were finely gridded with a 1x1 grid. Since objective was to test model selection ideas based on proxy flow responses we decided to also subject our ensemble to proxy modeling. First proxy was produced with flow diagnostics toolbox (package in MRST from SINTEF), and its output consisted of the well-known F-Phi curves. Very simple upscaling of the finely gridded models produced second proxy. Each simulation had one injector at the bottom left corner and a producer in the upper right corner. Upscaled proxy flow models were developed by simple image resizing (averaging), which is a very poor upscaling scheme. Please note that this was done deliberately since we needed a proxy that would perform as poor as possible. Figures given below show couple of finely gridded realizations (1x1 grid) along with their upscaled counterparts (51x51). In continuance of this section we also provide flow responses from all 1 realizations both on finely and coarsely gridded models. Uncertainty Quantification with Proxy Flow Models 7

Figure 3 finely gridded realizations with their coarse counterparts (PER M [md]) Flow responses from proxy and finely gridded flow simulations are given in figures below. Uncertainty Quantification with Proxy Flow Models

Figure Fine (left) and Proxy (right) Production Variables M odel Selection Based on Flow Diagnostics (F-Φ Curves) According to Shook, the Lorenz coefficient, a metric derived from flow diagnostics curves is a very valid measure of reservoir heterogeneity, and it can be used as a valuable guide to model selection. We don t question Shooks [3] approach, we actually quite agree with him, but we also recognize some space for improvement. We can use flow diagnostics curves for model ranking by applying distance-based method on the raw F-Phi curves. In this paper we test this idea, vs. simple model selection based on Lorenz coefficient as well as against quite commonly used approach of clustering based on Lorenz coefficient and OOIP. Another idea is to process F-Phi curves with FDA basis expansion approach and functional principal component analysis, which we also considered. Lawrence Coefficient.35.3.5. Model Ranking With Flow Diagnostics MSE q 1 179., MSE q 5 399.5, MSE q 9 35.1 True Quantiles Estimated Quantiles.9 1 1. 1. OOIP x 1 1 Figure 5 Example M odel R anking Based on Lorenz Coefficient and OOIP Uncertainty Quantification with Proxy Flow Models 9

Harmonics Principal Components on Flow Diag. Curves 1 PC1-1 PC PC3 -.... 1 Phi Harmonic 1 PCA function I (Percentage of variability 79) 1.... Mean mean-pc1 mean+pc1.... 1 Phi Figure An example of functional principal components (left) of FD curves and interpretation plot (right). Even though they have the same argument value as original data, their interpretation is difficult, at least in this raw form. For this reason, most common procedure in principal component interpretation is to plot them as perturbation of the mean which is shown on the right of figure where we perturb the mean function with only principal component 1 which describes most of variance (5%). We found it very important to determine the most optimal number of high fidelity function evaluations necessary for accurate quantiles estimation. We organized the study in a way such that we performed clustering using specific number of clusters, performed high fidelity functional evaluations based on cluster medoids and compute quantiles, these are compared with the true quantiles evaluated on entire ensemble of high fidelity flow simulations. Results of our effort are given in figures below: 1 x 1 Convergence of Lz vs. OOIP P1 P5 P9 1 x Convergence of Random Search 1 P1 P5 P9 1 3 5 1 3 5 Figure 7 R andom Search vs. OOIP -Lorenz approach Since we already computed the Lorenz coefficient it was worthwhile investigating if quantiles from the Lorenz coefficient distribution would provide us with exact P1, P5 and P9 quantiles. Figure given below shows an example of this exercise: Uncertainty Quantification with Proxy Flow Models 1

MSE q 1 119.3, MSE q 5 33.59, MSE q 9 117.5 True Quantiles Estimated Quantiles 1 Figure M odel Selection Based on Lorenz coefficient only It is clear that three models selected from Lorenz coefficient s distribution or through Lorenz vs. OOIP clustering are simply not enough to estimate true quantiles with high accuracy. The same study but this time carried out with MDS and FPCA approaches as applied on raw F-Phi curves are given in figure below. 1 x Convergence of MDS on FD curves 1 P1 P5 P9 1 3 5 1 x Convergence of fpca on FD curves 1 P1 P5 P9 1 3 5 Figure 9 M DS on F-Phi Curves, vs. FPCA on F-Phi Curves What is interesting to notice here is that both MDS and FPCA performed equally well in this case. This is not a surprise since there was not much wiggling in F-Phi curves for FDA to produce significant advantage over MDS. The distance between these types of curves sufficiently captured the difference in Phi parameter behavior; hence integrated distance (MDS) is similar to decomposed Phi variability (FPCA). Another thing also worth noticing is that as number of high fidelity function evaluations kept increasing quantiles estimated by sampling from MDS or FPCA spaces proved to be much more stable than quantiles produced by sampling from Lz vs. Uncertainty Quantification with Proxy Flow Models 11

OOIP space. Figure below shows MDS and FPCA produced sampling spaces. Sampling algorithms do not see any difference between these two spaces, hence they produced the same results as shown in figure 9. MDS on Flow Diagnostics FPCA on Flow Diagnostics Figure 1 Two sampling spaces. Left - produced with M DS, R ight - FPCA space produced with FPCA U pscaled Proxy M odel Selection with FDA & M D S This part of our study focused on performing model selection only based on responses computed with upscaled flow simulations. In this section we pretty much carried out the same analysis as in previous flow diagnostics analysis, with the difference that here we also introduce multivariate functional data analysis. This study is organized as follows: 1. We performed MDS on oil rates and compared its results with fpca on oil rate. This is essentially single variate space building for clustering (or model selection).. We introduced multivariate FPCA and compared results with multivariate MDS based on compound distances (combining distance matrices by weighting). Uncertainty Quantification with Proxy Flow Models 1

1 x Oil - Clustering on Oil Rate Only 1 P1 P5 P9 1 x Oil - Clustering on Oil Rate Only 1 P1 P5 P9 1 3 5 1 3 5 Figure 11 Quantiles Convergence with FPCA approach (left) and M DS approach (right) Single Variate Approach 1 x Oil - Clustering on Oil and Water Rates 1 P1 P5 P9 1 3 5 Oil 1 x - Clustering on Oil and Water Rates MDS 1 P1 P5 P9 1 3 5 Figure 1 M ultivariate FPCA (left) vs. M ultivariate M DS (right) What is obvious from these results is that model selection with FPCA has faster rate of convergence towards true quantiles than model selection with MDS. It is also important to notice that multivariate FPCA has faster and much more stable convergence than its multivariate MDS counterpart based on weighted distances. Uncertainty Quantification with Proxy Flow Models 13

M ulti Proxy M odel Selection It is very common to have flow responses from multiple proxies. One such example is the case study we performed in the work described in this paper where we have two approximate measures: flow diagnostics and upscaling. In such situation it is natural to consider combining information from different proxies and create some sort of hybrid sampling space from which we would sample for high fidelity flow simulation runs? In MDS this would call for a difficult decision on the weights associated with each distance, which calls for a subjective decision by the modeler. This is where the true power of functional data analysis comes into the spotlight. Instead of considering the problem as separate univariate problems in MDS that than need to be combined a-posterior, FDA allows for a full multi-variate analysis. Thanks to basis expansion approach to curve fitting, each and every last one of our responses is now represented as a vector of basis coefficients. This includes oil rate curves as well as water rate production curves, along with the coefficients fully describing flow diagnostics responses (F-Phi curves). What this means is that all these coefficients can be used to create hybrid matrices in mixed principal component analysis approach and further analyze these matrices with conventional multivariate principle components approaches. We tried and tested this approach and we dedicate this section of the report to the results of such effort. There are many ways of mixing available proxy data. We limited our study to only a few in order to test the validity of mixed PCA approach. Description of each study carried out with mixed PCA is as follows: 1. We performed mixed PCA with oil rates from upscaled proxy and with F-Phi curves from flow diagnostics. This data was subjected to basis expansion prior to mixing, and mixing was performed on basis coefficients (multipliers).. Expanded this idea further by incorporating water rates into the analysis. 3. Finally we added original oil in place (OOIP) to previous analysis with oil water and flow diagnostics proxy responses. Results of each one of these studies is given in following three subsections. Uncertainty Quantification with Proxy Flow Models 1

1. M ixed FPCA on oil rate with FPCA on flow diagnostics curves. 1 x 1 Oil - Mixed PCA FD+OilRate P1 P5 P9 1 x Water - Mixed PCA FD+OilRate 1 P1 P5 P9 1 3 5 1 3 5 Figure 13 Quantiles Convergence - M ixed Proxy (fpca+fd ). Oil quantiles (left) and W ater quantiles (right). M ixed multivariate PCA (Oil & W ater) with fpca on flow diagnostics curves. 1 x Oil - Mixed PCA FD+OilWaterRate 1 P1 P5 P9 1 3 5 1 x Water - Mixed PCA FD+OilWaterRate 1 P1 P5 P9 1 3 5 Figure 1 M ixed Selection. M ultivariate fpca on upscaled models with fpca on flow diagnostics. Left - Quantiles Convergence on Oil Variable, Right - Quantiles convergence on W ater variable Uncertainty Quantification with Proxy Flow Models 15

3. M ixed PCA (Oil & W ater) with Flow Diagnostics and Original Oil in Place 1 x Oil - Mixed PCA FD+OilWaterRate 1 P1 P5 P9 1 3 5 1 x Water - Mixed PCA FD+OilWaterRate 1 P1 P5 P9 1 3 5 Figure 15 M ixed PCA, M ultivariate U pscaled Proxy + Flow Diagnostics + OOIP. Left -Quantiles convergence on Oil Variable, Right-Quantiles convergence on W ater Variable. Uncertainty Quantification with Proxy Flow Models 1

Conclusions and Future W ork In this paper we introduced functional data analysis to the problem of model selection and modeling uncertainty with proxy flow models. We provided detailed comparison between previously published distance based method and newly introduced functional data analysis approach to model ranking. We showed that functional principle component analysis performs better or equally well as distance based method with multidimensional scaling. What we also showed is that functional data analysis approach enables model ranking based on several flow variables, in which it also outperforms a workaround solution with compound distances in distance based method. Interesting to note is that this study also demonstrated that on average no matter which method we use we cannot get away with any less than 3 high fidelity (fine flow) simulations. Any idea of only three models that fully capture true quantiles seams quite absurd at this point, since such endeavor would be possible only if entire probability distribution evaluated with high fidelity flow simulations is available. In other words, error-less sampling. In our future work we will focus on developing ideas that would enable reconstruction of fine flow simulations based on a few function evaluations. Response surface methods dealt with one variable at the time, usually oil rate at one time step. With functional data analysis we can go beyond this limit, by predicting and reconstructing entire flow responses with functional cokriging. Another interesting application of functional data analysis is history matching, since field measurements can come on irregular time scale for which functional data analysis methods were built to solve. Reconstruction capabilities of this approach also enable more advanced estimation of uncertainties in estimated quantiles with functional flavor of parametric bootstrap. Overall functional data analysis methodologies opened entirely new avenues of research for us, and influenced many interesting ideas that we plan on developing in near future. Uncertainty Quantification with Proxy Flow Models 17

R eferences [1] J. Ramsay, E. Silverman, Functional Data Analysis, Springer 5. [] J. Ramsay, E. Silverman, Applied Functional Data Analysis Methods and Case studies, Springer [3] M. Shook, A Robust Measure of Heterogeneity for Ranking Earth Models: The F PHI Curve and Dynamic Lorenz Coefficient, SPE Annual Technical Conference and Exhibition, 9, New Orleans Louisiana. [] J. Ramsay, G. Hooker, S. Graves, Functional Data Analysis with R and MATLAB Springer 9 [5] C. Scheidt, J. Caers, Representing Spatial Uncertainty Using Distances and Kernels, Mathematical Geosciences, May 9, Vol. 1. [] C. Scheidt, J. Caers, Bootstrap confidence intervals for reservoir model selection techniques, Computational Geosciences, March 1, Vol. 1. [7] C. Scheidt, J. Caers, Uncertainty Quantification in Reservoir Performance Using Distances and Kernel Methods Application to a West Africa Deepwater Turbidite Reservoir, SPE Journal December 9. [] Matlab Reservoir Simulation Toolbox (MRST), SINTEF 13 Uncertainty Quantification with Proxy Flow Models 1

Appendix A A Quick Start Guide to Functional Data Analysis (FDA) Making first steps with functional data analysis could be quite tedious. For these reasons we decided to write a short quick start guide that would help all interested readers start implementing FDA ideas on their real reservoir case studies. First of all every reader is advised to consult the following absolutely essential literature on functional data analysis. These two books cover all theoretical aspects of FDA and provide nice examples that enable quick and easy comprehension. 1. Ramsay and Silverman, Functional Data Analysis, Springer 5. (There is also an older version from 1997). Ramsay and Silverman, Applied Functional Data Analysis Methods and Case studies, Springer All methods outlined in Ramsay s and Silverman s books have been implemented in both MATLAB and R. Jim Ramsay generously shares these two toolboxes on his website: www.functionaldata.org. Please note that these two toolboxes are 1% identical in capabilities, every FDA function has its MATLAB and R versions. These toolboxes come with many useful examples. Many of these examples were also outlined in Ramsay and Silverman s books, mainly Applied Functional Data Analysis, Springer. Third and also very important publication, especially when it comes to computer implementation of FDA concepts was written by: Jim Ramsay, Giles Hooker, and Spencer Graves, and it s titled: Functional Data Analysis with R and MATLAB Springer 9. This book represents a very good reference for all functions and methods contained within the two R and MATLAB toolboxes. Last but not least, Prof. Giles Hooker s website also has many free resources on FDA. Prof. Hooker teaches only course on FDA known to us. He also generously provides all his course and workshop materials on the website. The only drawback of this resource is that all implementations were written exclusively in R, which some readers might find discouraging. http://faculty.bscb.cornell.edu/~hooker/ Good luck using FDA! Uncertainty Quantification with Proxy Flow Models 19