S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type I error rates for famflm test usng cubc B-splne bass wth dfferent numbers of bass functons (K)...5 Table S2. Smulaton results of type I error rates for famflm test usng Fourer bass wth dfferent numbers of bass functons (K)....5 Fgure S1. The statstcal power of regonal assocaton analyss on the famlal data usng cubc B-splne bass (a-c) or Fourer bass (d-f) wth dfferent numbers of bass functons (K)...6 References...7 Types of bass functons A bass functon system s a set of K standard mathematcal functons, denoted by { 1 (t),, K (t)}. They are lnearly ndependent and can be combned to estmate any functon, denoted as x(t). In ths work, we estmated two specfc functons x(t) = s the beta-smoothng functon (BSF) and G ~ t 1 β ~ t and x(t) = G ~ t, where t s the genetc varant functon (GVF), both from the functonal lnear model (2). There are several dfferent types of bass functons that are selected, takng nto account the behavor of the data. We consdered two popular types of bass functons: B-splne and Fourer bases. The frst type s more sutable for non-perodc data wth open-ended range, and the second one s more sutable for the data wth perodc or near-perodc nature wth lmted range. Knowng only the values {x(t ), = 1,,m} of an unknown functon x(t) n dscrete ponts {t, = 1,,m}, one can approxmate x(t) by a weghted sum (lnear combnaton) of the bass functons: x K T t c t c t k1 k k, β ~
where c s a (K 1) vector of weght coeffcents, and (t) = ( 1 (t),, K (t)) T. Ths approxmaton could also be carred out when the values {x(t ), = 1,,m} are not gven, but can been estmated. For the functons G ~ t, the approaches to fndng the weght coeffcents c β ~ t and dffer. For estmatng the GVFs n Model (2), a dscrete realzaton of whch s the known matrx G(n m), we use a smple lnear smoother that determnes c T as GФ(Ф T Ф) -1, where Φ (m K) s the matrx wth an (j,k)-th element equal to k (t j ) [Ramsay and Slverman, 2005]. However, for estmatng the BSF wth unknown dscrete realzatons n Model (2), we fnd the vector c as unknown model parameters n regresson lnear equaton. When the values of x(t) n ponts {t, = 1,, m} are gven and the number of the bass functons K s equal to m, t s easy to see that such an exact soluton of equaton system T { xt c t, = 1,, m} wth regard to c exsts. In ths case, t doesn't matter what bass functon system was used. However, when m s large, t s mpractcal to set K equal to m. When K<m, an accuracy of the approxmated representaton depends on selected type of bass functon system. Ideally, bass functons should have features that match wth the known features of the functon beng estmated. It s easer to acheve a satsfactory approxmaton usng a comparatvely small number K of bass functons. We wll consder n detal the Fourer and the B-splne bases. The Fourer bass The Fourer bass s a set of sne and cosne functons of ncreasng frequency, whch s provded by the Fourer seres: φ 0 (t) = 1, φ 2r 1 (t) = sn(2πrt) and φ 2r (t) = cos(2πrt), for r = 1,...,(K 1)/2. Here K s taken as a postve odd nteger. Each functon n ths Fourer bass s perodc n t wth perod 1. If the dscrete values of t j are equally spaced on the normalzed nterval [0, 1], then ths bass s orthogonal n the sense that the cross product matrx Φ T Φ s dagonal. For a genome regon wth the m genetc varants, where m 25, we, as Fan et al. [2013], selected K = 25. Specfcs: Fourer bass functons have excellent computatonal propertes, especally f the dscrete ponts of observaton are equally spaced, due to the easy dervatve estmaton, and 2
due to the smple non-recursve constructon technque. A Fourer seres s especally useful for extremely stable functons, such as functons wthout strong local features where the curvature tends to be of the same order everywhere. However they are napproprate for data where dscontnutes n the functon tself or n low order dervatves are known or suspected. They are best sutable for descrbng data whch are perodc or near-perodc. However, ther perodcty s a problem for non-perodc data. See detals n [e.g., Ramsay and Slverman, 2005; Ramsay et al., 2009; Ferraty and Roman, 2011; Horvath and Kokoszka, 2012]. B-splne bass A B-splne bass s the most popular approxmaton system for non-perodc data. Here, a B-splne bass s a system of K polynomals of specfed order d each (here, the order of a polynomal s the number of constants requred to defne t [Ramsay and Slverman, 2005; Ramsay et al., 2009]). An approxmatng functon x(t) s defned pecewse by bass polynomals wth the gven order of smoothness at the jon ponts. To use a B-splne bass, the nterval normalzed as [0, 1] s subdvded nto L arbtrary segments, (L = K d + 1). Consecutve segments are separated by a jon pont called a knot. The number of such nteror ponts s equal to L 1. For each of consecutve segments, the approxmatng functon x(t) s defned as a correspondng bass polynomal. To make the resultng pecewse polynomal smoothng, the values of the polynomals and all ther dervatves up to order d 2 must match at the jon pont for any par of consecutve segments. The -th B-splne bass functon of the k-th order (k d), defned on the set of all reals, and denoted by B,k (t), = 1,, L+k 1, can be defned recursvely as follows: 1, f t t t 1 B, 1t and 0, otherwze B t t t t, k1 1, k, k=2,, d. t t t t k t B t B t, k 1 k1 k 1 Here B,k (t) s a polynomal of order k that wll be used on the -th nterval t t < t +1, =1,, L. Value k must be at least 2 and at most L+1. For each k, the resultng pecewse polynomal approxmaton n terms of B,k s must have contnuous dervatves up to order k 2 at all the knots. For a genome regon wth the number of genetc varants m 15, we selected K = 15 and d = 4, as Fan et al. [2013]. In ths case, the correspondng number of knots s calculated as L 3
1 = K d = 11, the correspondng number of segments s 12, and the correspondng number of control ponts s 13. Specfcs: We used the cubc B-splnes as a hghest computatonally feasble opton. In ths case, the runnng tme s only slghtly hgher than n case of Fourer bass functon. However, when the order of B-splne polynomals s hgh, the recursve constructon technque can decelerate the calculatng process. In addton, n a neghborhood of a knot that s dstant from ts neghborng knots, such splnes could oscllate and devate notceably from the gven approxmatng functon. They can reduce the power of the methods. To use the B-splne bass we must determne not only the number of bass functons and the order of the polynomal segments but the locaton of knots. For computatonal convenence, we used equally spaced knots to determne B-splne bass. The power of the method can be ncreased, f the least squares fttng crteron to estmate locaton of knots on the base of analyzed data s used (Vsevolozhskaya et al., 2014). However ths crteron s hghly nonlnear n knot locatons, and the computatonal challenges are severe. Nevertheless, n certan cases where strong curvature s localzed n regons not known n advance, ths s the more natural approach. The detals can be found n [Ramsay and Slverman, 2005; Ramsay et al., 2009]. Power and type I error rates wth dfferent numbers of bass functons We compared the statstcal propertes of our method usng dfferent number of bass functons (K) n a range 5 35. Two models, B-B and F-F, were selected for ths testng: the model usng Fourer bass for both BSF and GVF; the model usng B-splne bass for both BSF and GVF. The emprcal type I error rates were very close to the declared values for all numbers of bass functons, both models and all tested scenaros (Tables S1-S2). Dependence of power on the number of bass functons vared for dfferent scenaros (Fg S1). For scenaros wth low genetc effect where power was 0.25 we dd not see the dfference between cases wth dfferent numbers of bass functons. For scenaros wth mddle and large genetc effect the worst result was obtaned n case of 5 bass functons whle other numbers of bass functons demonstrated about the same power. These results are n good agreement wth the fndngs of Fan et al [27] that the statstcal propertes of the method do not strongly depend on the number of bass functons n a range of 10 K 25. Therefore, we selected 15 and 25 bass functon for B-splne and Fourer bases, respectvely, as t was recommended by Fan et al [27]. 4
Wth the number of bass functons n a range 15 35, the powers for models usng Fourer bass were consstently hgher than for correspondng models usng B-splne bass (P values 0.006 n the pared t-tests). Table S1. Smulaton results of type I error rates for famflm test usng cubc B-splne bass wth dfferent numbers of bass functons (K). Numbers of bass functons (K) 5 15 25 35 0.05 0.050649 0.050315 0.050264 0.050226 0.01 0.010223 0.010165 0.010167 0.010164 0.001 0.001035 0.001057 0.001043 0.001045 0.0001 0.000109 0.000110 0.000109 0.000108 Table S2. Smulaton results of type I error rates for famflm test usng Fourer bass wth dfferent numbers of bass functons (K). Numbers of bass functons (K) 5 15 25 35 0.05 0.050541 0.048556 0.047493 0.047083 0.01 0.010301 0.009851 0.009573 0.009530 0.001 0.001096 0.001037 0.001012 0.001006 0.0001 0.000107 0.000100 0.000102 0.000099 5
Fgure S1. The statstcal power of regonal assocaton analyss on the famlal data usng cubc B-splne bass (a-c) or Fourer bass (d-f) wth dfferent numbers of bass functons (K). All (rare and common) varants were used n smulatons for selecton of causal varants and n analyss. The proporton of causal varants havng the same drecton was 80%. 6
References Fan R, Wang Y, Mlls JL, Wlson AF, Baley-Wlson JE, et al. (2013) Functonal lnear models for assocaton analyss of quanttatve trats. Genet Epdemol 37: 726 742. Ferraty F, Roman Y (2011) The Oxford Handbook of Functonal Data Analyss (Eds), Oxford Unversty Press, New York, NY, USA Horvath L, Kokoszka P (2012) Inference for Functonal Data wth Applcatons. New York: Sprnger Seres n Statstcs. 422 p. Ramsay JO, Hooker G, Graves S (2009) Functonal Data Analyss wth R and Matlab. New York: Sprnger-Verlag. 214 p. Ramsay JO, Slverman BW (2005) Functonal Data Analyss. New York: Sprnger Seres n Statstcs. 430 p. Vsevolozhskaya OA, Zaykn DV, Greenwood MC, We C, Lu Q (2014) Functonal analyss of varance for assocaton studes. PLoS One. 22; 9(9):e105074. do: 10.1371/journal.pone.0105074. 7