S1 Note. Basis functions.

Similar documents
NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

Hermite Splines in Lie Groups as Products of Geodesics

y and the total sum of

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

Statistical analysis on mean rainfall and mean temperature via functional data analysis technique

Smoothing Spline ANOVA for variable screening

Radial Basis Functions

LECTURE : MANIFOLD LEARNING

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

Support Vector Machines

Lecture 5: Multilayer Perceptrons

Available online at ScienceDirect. Procedia Environmental Sciences 26 (2015 )

Feature Reduction and Selection

GSLM Operations Research II Fall 13/14

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Wavefront Reconstructor

Solutions to Programming Assignment Five Interpolation and Numerical Differentiation

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements

Computer Animation and Visualisation. Lecture 4. Rigging / Skinning

Reading. 14. Subdivision curves. Recommended:

Programming in Fortran 90 : 2017/2018

Empirical Distributions of Parameter Estimates. in Binary Logistic Regression Using Bootstrap

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Image Representation & Visualization Basic Imaging Algorithms Shape Representation and Analysis. outline

Introduction to Registration Problem for Functional Data

Cubic Spline Interpolation for. Petroleum Engineering Data

Categories and Subject Descriptors B.7.2 [Integrated Circuits]: Design Aids Verification. General Terms Algorithms

Cluster Analysis of Electrical Behavior

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated.

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

CHAPTER 2 DECOMPOSITION OF GRAPHS

A Newton-Type Method for Constrained Least-Squares Data-Fitting with Easy-to-Control Rational Curves

Interpolation of the Irregular Curve Network of Ship Hull Form Using Subdivision Surfaces

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

TN348: Openlab Module - Colocalization

An Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc.

Harmonic Coordinates for Character Articulation PIXAR

AVO Modeling of Monochromatic Spherical Waves: Comparison to Band-Limited Waves

Analysis of Continuous Beams in General

User Authentication Based On Behavioral Mouse Dynamics Biometrics

Machine Learning 9. week

Optimal Workload-based Weighted Wavelet Synopses

X- Chart Using ANOM Approach

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

Solitary and Traveling Wave Solutions to a Model. of Long Range Diffusion Involving Flux with. Stability Analysis

Research Article Quasi-Bézier Curves with Shape Parameters

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT

Review of approximation techniques

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Module Management Tool in Software Development Organizations

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

An Optimal Algorithm for Prufer Codes *

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

Adaptive Regression in SAS/IML

Intra-Parametric Analysis of a Fuzzy MOLP

An Accurate Evaluation of Integrals in Convex and Non convex Polygonal Domain by Twelve Node Quadrilateral Finite Element Method

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Synthesizer 1.0. User s Guide. A Varying Coefficient Meta. nalytic Tool. Z. Krizan Employing Microsoft Excel 2007

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Parameter estimation for incomplete bivariate longitudinal data in clinical trials

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Reducing Frame Rate for Object Tracking

In the planar case, one possibility to create a high quality. curve that interpolates a given set of points is to use a clothoid spline,

Exercises (Part 4) Introduction to R UCLA/CCPR. John Fox, February 2005

Machine Learning: Algorithms and Applications

Backpropagation: In Search of Performance Parameters

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Smooth Approximation to Surface Meshes of Arbitrary Topology with Locally Blended Radial Basis Functions

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

Meta-heuristics for Multidimensional Knapsack Problems

Very simple computational domains can be discretized using boundary-fitted structured meshes (also called grids)

Vanishing Hull. Jinhui Hu, Suya You, Ulrich Neumann University of Southern California {jinhuihu,suyay,

Multi-view 3D Position Estimation of Sports Players

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

A Robust LS-SVM Regression

S.P.H. : A SOLUTION TO AVOID USING EROSION CRITERION?

Support Vector Machines

Classification / Regression Support Vector Machines

An Image Fusion Approach Based on Segmentation Region

Minimization of the Expected Total Net Loss in a Stationary Multistate Flow Network System

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010

Mixed Linear System Estimation and Identification

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

Improved Methods for Lithography Model Calibration

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

Fitting & Matching. Lecture 4 Prof. Bregler. Slides from: S. Lazebnik, S. Seitz, M. Pollefeys, A. Effros.

Mathematics 256 a course in differential equations for engineering students

Brushlet Features for Texture Image Retrieval

Lecture #15 Lecture Notes

Analysis of Collaborative Distributed Admission Control in x Networks

UNIVERSITY OF CALIFORNIA. Los Angeles. Development of. Statistical Online Computational Resources. and Teaching Tools

Adjustment methods for differential measurement errors in multimode surveys

Wishing you all a Total Quality New Year!

Accounting for the Use of Different Length Scale Factors in x, y and z Directions

Optimal Scheduling of Capture Times in a Multiple Capture Imaging System

Transcription:

S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type I error rates for famflm test usng cubc B-splne bass wth dfferent numbers of bass functons (K)...5 Table S2. Smulaton results of type I error rates for famflm test usng Fourer bass wth dfferent numbers of bass functons (K)....5 Fgure S1. The statstcal power of regonal assocaton analyss on the famlal data usng cubc B-splne bass (a-c) or Fourer bass (d-f) wth dfferent numbers of bass functons (K)...6 References...7 Types of bass functons A bass functon system s a set of K standard mathematcal functons, denoted by { 1 (t),, K (t)}. They are lnearly ndependent and can be combned to estmate any functon, denoted as x(t). In ths work, we estmated two specfc functons x(t) = s the beta-smoothng functon (BSF) and G ~ t 1 β ~ t and x(t) = G ~ t, where t s the genetc varant functon (GVF), both from the functonal lnear model (2). There are several dfferent types of bass functons that are selected, takng nto account the behavor of the data. We consdered two popular types of bass functons: B-splne and Fourer bases. The frst type s more sutable for non-perodc data wth open-ended range, and the second one s more sutable for the data wth perodc or near-perodc nature wth lmted range. Knowng only the values {x(t ), = 1,,m} of an unknown functon x(t) n dscrete ponts {t, = 1,,m}, one can approxmate x(t) by a weghted sum (lnear combnaton) of the bass functons: x K T t c t c t k1 k k, β ~

where c s a (K 1) vector of weght coeffcents, and (t) = ( 1 (t),, K (t)) T. Ths approxmaton could also be carred out when the values {x(t ), = 1,,m} are not gven, but can been estmated. For the functons G ~ t, the approaches to fndng the weght coeffcents c β ~ t and dffer. For estmatng the GVFs n Model (2), a dscrete realzaton of whch s the known matrx G(n m), we use a smple lnear smoother that determnes c T as GФ(Ф T Ф) -1, where Φ (m K) s the matrx wth an (j,k)-th element equal to k (t j ) [Ramsay and Slverman, 2005]. However, for estmatng the BSF wth unknown dscrete realzatons n Model (2), we fnd the vector c as unknown model parameters n regresson lnear equaton. When the values of x(t) n ponts {t, = 1,, m} are gven and the number of the bass functons K s equal to m, t s easy to see that such an exact soluton of equaton system T { xt c t, = 1,, m} wth regard to c exsts. In ths case, t doesn't matter what bass functon system was used. However, when m s large, t s mpractcal to set K equal to m. When K<m, an accuracy of the approxmated representaton depends on selected type of bass functon system. Ideally, bass functons should have features that match wth the known features of the functon beng estmated. It s easer to acheve a satsfactory approxmaton usng a comparatvely small number K of bass functons. We wll consder n detal the Fourer and the B-splne bases. The Fourer bass The Fourer bass s a set of sne and cosne functons of ncreasng frequency, whch s provded by the Fourer seres: φ 0 (t) = 1, φ 2r 1 (t) = sn(2πrt) and φ 2r (t) = cos(2πrt), for r = 1,...,(K 1)/2. Here K s taken as a postve odd nteger. Each functon n ths Fourer bass s perodc n t wth perod 1. If the dscrete values of t j are equally spaced on the normalzed nterval [0, 1], then ths bass s orthogonal n the sense that the cross product matrx Φ T Φ s dagonal. For a genome regon wth the m genetc varants, where m 25, we, as Fan et al. [2013], selected K = 25. Specfcs: Fourer bass functons have excellent computatonal propertes, especally f the dscrete ponts of observaton are equally spaced, due to the easy dervatve estmaton, and 2

due to the smple non-recursve constructon technque. A Fourer seres s especally useful for extremely stable functons, such as functons wthout strong local features where the curvature tends to be of the same order everywhere. However they are napproprate for data where dscontnutes n the functon tself or n low order dervatves are known or suspected. They are best sutable for descrbng data whch are perodc or near-perodc. However, ther perodcty s a problem for non-perodc data. See detals n [e.g., Ramsay and Slverman, 2005; Ramsay et al., 2009; Ferraty and Roman, 2011; Horvath and Kokoszka, 2012]. B-splne bass A B-splne bass s the most popular approxmaton system for non-perodc data. Here, a B-splne bass s a system of K polynomals of specfed order d each (here, the order of a polynomal s the number of constants requred to defne t [Ramsay and Slverman, 2005; Ramsay et al., 2009]). An approxmatng functon x(t) s defned pecewse by bass polynomals wth the gven order of smoothness at the jon ponts. To use a B-splne bass, the nterval normalzed as [0, 1] s subdvded nto L arbtrary segments, (L = K d + 1). Consecutve segments are separated by a jon pont called a knot. The number of such nteror ponts s equal to L 1. For each of consecutve segments, the approxmatng functon x(t) s defned as a correspondng bass polynomal. To make the resultng pecewse polynomal smoothng, the values of the polynomals and all ther dervatves up to order d 2 must match at the jon pont for any par of consecutve segments. The -th B-splne bass functon of the k-th order (k d), defned on the set of all reals, and denoted by B,k (t), = 1,, L+k 1, can be defned recursvely as follows: 1, f t t t 1 B, 1t and 0, otherwze B t t t t, k1 1, k, k=2,, d. t t t t k t B t B t, k 1 k1 k 1 Here B,k (t) s a polynomal of order k that wll be used on the -th nterval t t < t +1, =1,, L. Value k must be at least 2 and at most L+1. For each k, the resultng pecewse polynomal approxmaton n terms of B,k s must have contnuous dervatves up to order k 2 at all the knots. For a genome regon wth the number of genetc varants m 15, we selected K = 15 and d = 4, as Fan et al. [2013]. In ths case, the correspondng number of knots s calculated as L 3

1 = K d = 11, the correspondng number of segments s 12, and the correspondng number of control ponts s 13. Specfcs: We used the cubc B-splnes as a hghest computatonally feasble opton. In ths case, the runnng tme s only slghtly hgher than n case of Fourer bass functon. However, when the order of B-splne polynomals s hgh, the recursve constructon technque can decelerate the calculatng process. In addton, n a neghborhood of a knot that s dstant from ts neghborng knots, such splnes could oscllate and devate notceably from the gven approxmatng functon. They can reduce the power of the methods. To use the B-splne bass we must determne not only the number of bass functons and the order of the polynomal segments but the locaton of knots. For computatonal convenence, we used equally spaced knots to determne B-splne bass. The power of the method can be ncreased, f the least squares fttng crteron to estmate locaton of knots on the base of analyzed data s used (Vsevolozhskaya et al., 2014). However ths crteron s hghly nonlnear n knot locatons, and the computatonal challenges are severe. Nevertheless, n certan cases where strong curvature s localzed n regons not known n advance, ths s the more natural approach. The detals can be found n [Ramsay and Slverman, 2005; Ramsay et al., 2009]. Power and type I error rates wth dfferent numbers of bass functons We compared the statstcal propertes of our method usng dfferent number of bass functons (K) n a range 5 35. Two models, B-B and F-F, were selected for ths testng: the model usng Fourer bass for both BSF and GVF; the model usng B-splne bass for both BSF and GVF. The emprcal type I error rates were very close to the declared values for all numbers of bass functons, both models and all tested scenaros (Tables S1-S2). Dependence of power on the number of bass functons vared for dfferent scenaros (Fg S1). For scenaros wth low genetc effect where power was 0.25 we dd not see the dfference between cases wth dfferent numbers of bass functons. For scenaros wth mddle and large genetc effect the worst result was obtaned n case of 5 bass functons whle other numbers of bass functons demonstrated about the same power. These results are n good agreement wth the fndngs of Fan et al [27] that the statstcal propertes of the method do not strongly depend on the number of bass functons n a range of 10 K 25. Therefore, we selected 15 and 25 bass functon for B-splne and Fourer bases, respectvely, as t was recommended by Fan et al [27]. 4

Wth the number of bass functons n a range 15 35, the powers for models usng Fourer bass were consstently hgher than for correspondng models usng B-splne bass (P values 0.006 n the pared t-tests). Table S1. Smulaton results of type I error rates for famflm test usng cubc B-splne bass wth dfferent numbers of bass functons (K). Numbers of bass functons (K) 5 15 25 35 0.05 0.050649 0.050315 0.050264 0.050226 0.01 0.010223 0.010165 0.010167 0.010164 0.001 0.001035 0.001057 0.001043 0.001045 0.0001 0.000109 0.000110 0.000109 0.000108 Table S2. Smulaton results of type I error rates for famflm test usng Fourer bass wth dfferent numbers of bass functons (K). Numbers of bass functons (K) 5 15 25 35 0.05 0.050541 0.048556 0.047493 0.047083 0.01 0.010301 0.009851 0.009573 0.009530 0.001 0.001096 0.001037 0.001012 0.001006 0.0001 0.000107 0.000100 0.000102 0.000099 5

Fgure S1. The statstcal power of regonal assocaton analyss on the famlal data usng cubc B-splne bass (a-c) or Fourer bass (d-f) wth dfferent numbers of bass functons (K). All (rare and common) varants were used n smulatons for selecton of causal varants and n analyss. The proporton of causal varants havng the same drecton was 80%. 6

References Fan R, Wang Y, Mlls JL, Wlson AF, Baley-Wlson JE, et al. (2013) Functonal lnear models for assocaton analyss of quanttatve trats. Genet Epdemol 37: 726 742. Ferraty F, Roman Y (2011) The Oxford Handbook of Functonal Data Analyss (Eds), Oxford Unversty Press, New York, NY, USA Horvath L, Kokoszka P (2012) Inference for Functonal Data wth Applcatons. New York: Sprnger Seres n Statstcs. 422 p. Ramsay JO, Hooker G, Graves S (2009) Functonal Data Analyss wth R and Matlab. New York: Sprnger-Verlag. 214 p. Ramsay JO, Slverman BW (2005) Functonal Data Analyss. New York: Sprnger Seres n Statstcs. 430 p. Vsevolozhskaya OA, Zaykn DV, Greenwood MC, We C, Lu Q (2014) Functonal analyss of varance for assocaton studes. PLoS One. 22; 9(9):e105074. do: 10.1371/journal.pone.0105074. 7