A Bayesian approach to detect time-specific group differences between nonlinear temporal curves

Size: px

Start display at page:

Download "A Bayesian approach to detect time-specific group differences between nonlinear temporal curves"

Maurice King
5 years ago
Views:

University of Iowa Iowa Research Online Theses and Dissertations Spring 2016 A Bayesian approach to detect time-specific group differences between nonlinear temporal

edu/etd/5606 Recommended Citation Pugh, Melissa Anna Maria. "A Bayesian approach to detect time-specific group differences between nonlinear temporal curves.

1 University of Iowa Iowa Research Online Theses and Dissertations Spring 2016 A Bayesian approach to detect time-specific group differences between nonlinear temporal curves Melissa Anna Maria Pugh University of Iowa Copyright 2016 Melissa Anna Maria Pugh This dissertation is available at Iowa Research Online: Recommended Citation Pugh, Melissa Anna Maria. "A Bayesian approach to detect time-specific group differences between nonlinear temporal curves." PhD (Doctor of Philosophy) thesis, University of Iowa, Follow this and additional works at: Part of the Biostatistics Commons

2 A BAYESIAN APPROACH TO DETECT TIME-SPECIFIC GROUP DIFFERENCES BETWEEN NONLINEAR TEMPORAL CURVES by Melissa Anna Maria Pugh A thesis submitted in partial fulfillment of the requirements for the Doctor of Philosophy degree in Biostatistics in the Graduate College of The University of Iowa May 2016 Thesis Supervisor: Associate Professor Jacob J. Oleson

4 Graduate College The University of Iowa Iowa City, Iowa CERTIFICATE OF APPROVAL This is to certify that the Ph.D. thesis of PH.D. THESIS Melissa Anna Maria Pugh has been approved by the Examining Committee for the thesis requirement for the Doctor of Philosophy degree in Biostatistics at the May 2016 graduation. Thesis Committee: Jacob J. Oleson, Thesis Supervisor Joseph Cavanaugh Bob McMurray Gideon Zamba Eric Foster

5 To Mom, Dad, and Michelle, with love i

6 ACKNOWLEDGEMENTS There are so many people I would like to thank for helping me get to this position that I am fortunate enough to find myself in today. It is great to know that there are so many people who are truly on my side. I would like to express my gratitude to all of the faculty and staff in the Biostatistics Department of the University of Iowa for making me feel welcome and part of something genuinely special. Terry, thank you for your expertise and thoughtful consideration throughout my time here at Iowa. I would like to thank Jake, Joe, Eric, Gideon and Bob, for your willingness to serve on my committee. I especially want to thank Jake for always being patient, supportive, and understanding. Thank you for always finding time to deal with me and all of my questions. I would not have been able to do any of this without you! I would also like to thank Joe for our many talks and for helping me keep my goals in perspective. I am also grateful for all of the lifelong friendships I have made here in Iowa. Last, but certainly not least, I want to thank my family. I would not be half the person I am today without their unconditional love and support. They understood that I needed to be here and supported that despite the many obstacles that we have faced in the past several years. Tia Silvia, Uncle Jeff, Andres, Nena, Angela, Brian, and Thomas I appreciate you always being by my side and keeping me sane. Abuelita, this is for you. Mom, Dad, and Michelle, thank you for always having my best interests at heart and pushing me to pursue my dreams, even if those dreams meant me moving hundreds of ii

7 miles away from home. Mom and Dad, thank you for believing that I could do this, even when I wasn t too sure that I could. I am truly grateful for everyone in my past and present whose contributions have helped me fulfill my academic goals. Go Hawks! iii

8 ABSTRACT Researchers from many fields have found it difficult to fit statistical models that can incorporate multiple random effects, account for the correlated nature of the data, and simultaneously fit and compare multiple groups. In this dissertation, we have taken a Bayesian hierarchical modeling approach for this multivariate non-linear longitudinal data problem. Within this framework, we develop both parametric and nonparametric approaches in simultaneously modeling multiple longitudinal curves. Finally, we put forth comparison techniques to allow for a between group comparison of group curves under the Bayesian framework. The work is motivated from the visual world paradigm which is a tool that is widely used in the field of psycholinguistics to help investigate how people listen and understand words and sentences. Proportions of fixations to several different objects are recorded for a number of subjects, over a specific time period. Within this dissertation is the model development, demonstrations of the ability of the model via simulation, and answers to scientific research questions from visual world paradigm data. iv

9 PUBLIC ABSTRACT Researchers from many fields have found it difficult to fit statistical models that can incorporate multiple random effects, account for the correlated nature of the data, and simultaneously fit and compare multiple groups. In this dissertation, we have taken a Bayesian hierarchical modeling approach for this multivariate non-linear longitudinal data problem. Within this framework, we develop both parametric and nonparametric approaches in simultaneously modeling multiple longitudinal curves. Finally, we put forth comparison techniques to allow for a between group comparison of group curves under the Bayesian framework. The work is motivated from the visual world paradigm which is a tool that is widely used in the field of psycholinguistics to help investigate how people listen and understand words and sentences. Proportions of fixations to several different objects are recorded for a number of subjects, over a specific time period. Within this dissertation is the model development, demonstrations of the ability of the model via simulation, and answers to scientific research questions from visual world paradigm data. v

10 TABLE OF CONTENTS Chapter 1 INTRODUCTION Bayesian Longitudinal Data Dissertation Overview... 2 Chapter 2 BACKGROUND Motivating Example and Dataset Nonlinear Techniques Spline Modeling P-Splines and Mixed Models Bayesian Hierarchical Techniques Comparison Techniques Chapter 3 NONLINEAR PARAMETRIC HIERARCHICAL MODELS Data Model Process Model Four Parameter Logistic Model Three Parameter Logistic Double Gaussian Model Implementation Simulations Data Analysis Four Parameter Logistic Double Gaussian Chapter 4 NONLINEAR NONPARAMETRIC HIERARCHICAL MODELS Process Model Parameter Model Model Implementation Data Analysis Target Fixation Cohort Fixation Rhyme Fixation Unrelated Fixation Selecting the Number of Knots Chapter 5 COMPARISON TECHNIQUES Sequential BEST Test Sequential BEST Test Implementation Simulated Data Data Analysis Chapter 6 CONCLUSIONS Future Work Final Thought References Appendix vi

11 LIST OF TABLES Table 3.1 Identical Groups for Four Parameter Logistic Process Model Table 3.2 Varying Maximums for Four Parameter Logistic Process Model Table 3.3 Varying Slopes for the Four Parameter Logistic Process Model Table 3.4 Varying Inflection Points for the Four Parameter Logistic Process Model Table 3.5 A typical subject's parameter values from the 4PL model (Target Fixation) Table 3.6 A typical subject's parameter values from the DG model (Cohort Fixation) Table 3.7 A typical subject's parameter values from the DG model (Rhyme Fixation) Table 3.8 A typical subject's parameter values from the DG model (Unrelated Fixation)51 Table 4.1 Root MSE Values for the different models within fixation type Table 4.2 DIC values for the different models within fixation type Table 5.1 Sequential BEST test (part 1) simulation results (n = 25 data sets) for 2 groups with 10 subjects with 100 time points each. A ROPE of was used to compare both groups Table 5.2 Confidence intervals for the first significant time points found in Table vii

12 LIST OF FIGURES Figure 2.1 Sample trial from the VWP. The target word is lizard, cohort word liver, rhyme word wizard, and unrelated word necklace Figure 2.2 A typical participant s fixation curves Figure 3.1 Four Parameter Logistic Subject Specific Fitted Curves. Top panels: Cochlear Implant Fitted curves, Bottom panels: fitted Normal Hearing curves Figure 3.2 Four-Parameter Logistic Subject vs Individual Curves for all subjects Figure 3.3 Cohort Subject Specific Fitted Curves. Top panels: Cochlear Implant Fitted curves, Bottom panels: fitted Normal Hearing curves Figure 3.4 Double Gaussian - Cohort Fixation: Subject vs Individual Curves for all participants Figure 3.5 Rhyme Subject Specific Fitted Curves. Top panels: Cochlear Implant Fitted curves, Bottom panels: fitted Normal Hearing curves Figure 3.6 Double Gaussian Function: Rhyme Fixation; Subject vs Individual Curves for all participants Figure 3.7 Unrelated Subject Specific Fitted Curves. Top panels: Cochlear Implant Fitted curves, Bottom panels: fitted Normal Hearing curves Figure 3.8 Double Gaussian Function: Unrelated Fixation; Group vs Individual Curves for all participants Figure 4.1 Bayesian Longitudinal Nonparametric ANOVA for the Target fixation with 3 knots, plots from 4 typical subjects Figure 4.2 Cochlear Implant and Normal Hearing group subject specific curves for the target fixation with 3 knots Figure 4.3 Bayesian Longitudinal Nonparametric ANOVA for the Target fixation with 5 knots, plots from 4 typical subjects Figure 4.4 Cochlear Implant and Normal Hearing group subject specific curves for the target fixation with 5 knots Figure 4.5 Bayesian Longitudinal Nonparametric ANOVA for the Target fixation with 7 knots plots from 4 typical subjects viii

13 Figure 4.6 Cochlear Implant and Normal Hearing group subject specific curves for the target fixation with 7 knots Figure 4.7 Bayesian Longitudinal Nonparametric ANOVA for the Cohort fixation with 3 knots, plots from 4 typical subjects Figure 4.8 Cochlear Implant and Normal Hearing group subject specific curves for the cohort fixation with 3 knots Figure 4.9 Bayesian Longitudinal Nonparametric ANOVA for the Cohort fixation with 5 knots plots from 4 typical subjects Figure 4.10 Cochlear Implant and Normal Hearing group subject specific curves for the cohort fixation with 5 knots Figure 4.11 Bayesian Longitudinal Nonparametric ANOVA for the Cohort fixation with 7 knots, plots from 4 typical subjects Figure 4.12 Cochlear Implant and Normal Hearing group subject specific curves for the cohort fixation with 7 knots Figure 4.13 Bayesian Longitudinal Nonparametric ANOVA for the Rhyme fixation with 3 knots, plots from 4 typical subjects Figure 4.14 Cochlear Implant and Normal Hearing group subject specific curves for the rhyme fixation with 3 knots Figure 4.15 Bayesian Longitudinal Nonparametric ANOVA for the Rhyme fixation with 5 knots, plots from 4 typical subjects Figure 4.16 Cochlear Implant and Normal Hearing group subject specific curves for the rhyme fixation with 5 knots Figure 4.17 Bayesian Longitudinal Nonparametric ANOVA for the Rhyme fixation with 7 knots, plots from 4 typical subjects Figure 4.18 Cochlear Implant and Normal Hearing group subject specific curves for the rhyme fixation with 7 knots Figure 4.19 Bayesian Longitudinal Nonparametric ANOVA for the Unrelated fixation with 3 knots, plots from 4 typical subjects Figure 4.20 Cochlear Implant and Normal Hearing group subject specific curves for the Unrelated fixation with 3 knots ix

14 Figure 4.21 Bayesian Longitudinal Nonparametric ANOVA for the Unrelated fixation with 5 knots, plots from 4 typical subjects Figure 4.22 Cochlear Implant and Normal Hearing group subject specific curves for the Unrelated fixation with 5 knots Figure 4.23 Bayesian Longitudinal Nonparametric ANOVA for the Unrelated Fixation with 7 knots. Plots from 4 typical subjects Figure 4.24 Cochlear Implant and Normal Hearing group subject specific curves for the Unrelated fixation with 7 knots Figure 5.1 Target Fixation population curves via the Four Parameter Logistic. Vertical Line at first significant difference: 512 ms Figure 5.2 Mean Population Group Curves for Target fixation via Four Parameter Logistic. Significant differences shaded from 512 ms to the end of the time course Figure 5.3 Histogram of the first time points where the difference is greater than 1.5% for the Target group Figure 5.4 Histogram of the first time points where the difference is greater than 3% for the Target group Figure 5.5 Cohort Population Curves via Double Gaussian process model, Vertical lines at 508ms, 568 ms and 712 ms indicate time points where significant differences are found Figure 5.6 Cohort Mean Population Curves via Double Gaussian model, Vertical Line at 508 ms, 568ms and 712 ms, indicate time points where significant differences are found Figure 5.7 Histogram of the first time points where the difference is greater than 1.5% for the Cohort group Figure 5.8 Rhyme fixation population curves via Double Gaussian model, vertical line at 620 ms indicating first significant difference Figure 5.9 Rhyme Mean Population Curves via Double Gaussian model, Vertical Line at 620 ms indicates the first significant difference Figure 5.10 Histogram of the first time points where the difference is greater than 1.5% for the Rhyme group Figure 5.11 Unrelated Fixation Population Curves via Double Gaussian function, Vertical Line at first significant difference: 584ms x

15 Figure 5.12 Unrelated Mean Population Curves via Double Gaussian, Vertical line at first significant difference: 584ms Figure 5.13 Histogram of the first time points where the difference is greater than 1.5% for the Unrelated group Figure 5.14 Population Curves for the Cohort and Rhyme fixation types, where the vertical lines indicate the interval of significant differences from 480ms to 880ms Figure 5.15 Histogram of the first time points where the difference is greater than 1.5% between the Cohort and Rhyme fixation types Figure 5.16 Population curves for the cohort fixation and unrelated fixation types, where the vertical lines indicate the interval of significant differences from 488ms to 880ms Figure 5.17 Histogram of the first time points where the difference is greater than 1.5% between the Cohort and Unrelated fixation types Figure 5.18 Population curves for the rhyme and unrelated fixation types xi

16 Chapter 1 INTRODUCTION 1.1 Bayesian Longitudinal Data Longitudinal data are important for understanding individual patterns of change over time. One primary goal of studying longitudinal data is to characterize the change of the response over time and the factors that influence change. In particular, having longitudinal data implies that repeated measurements are obtained from a single individual or study unit across time (Fitzmaurice, Laird et al. 2012). Within the past 30 years we have seen a substantial amount of literature and methodologies being developed to analyze longitudinal data. The majority of the methodology used in practice to analyze longitudinal data has been frequentist in nature. With the advancements of computational methods, there has been an increase in the development of Bayesian methodologies. With these recent developments, there has been an interest in applying Bayesian methods to diverse applications such as health care, astrophysics and criminal justice, just to name a few (Cowles 2013). In language science research, tracking eye movement has been used to investigate how different individuals understand and process language. By tracking eye movement, researchers are able to monitor the ongoing comprehension process and observe the rapid mental processes that accompany spoken language comprehension (Tanenhaus, Spivey-Knowlton et al. 1995). Even though existing research has done much to address the modeling of eye tracking data, there is still progress to be made in statistical 1

17 methodology for eye tracking data. In this dissertation, we aim to apply Bayesian methods and develop a modeling framework for multivariate non-linear longitudinal data and apply this novel modeling framework to eye tracking data. This novel modeling framework will be able to simultaneously model both individual and group specific effects, eliminating the need to run separate analyses for group and subject level inference. 1.2 Dissertation Overview We will first begin with the background information necessary to create a foundation for the rest of the dissertation. In Chapter 2, we will start with a description of the visual world paradigm dataset that motivated the work within this thesis. Next, we explore previous techniques that were used to analyze this dataset. We will go into a discussion of Bayesian hierarchical modeling techniques, as well as a brief introduction to splines. Kliethermes (2013) used functional data analysis but did not compare groups at each time point and was not able to estimate onset detection. Finally, we will have a brief introduction to comparison techniques for this type of data. In Chapter 3, we introduce the Bayesian hierarchical parametric modeling framework we use to simultaneously estimate group and individual parameters. We develop two different models within this framework. First, the different components of the two different models will be described in detail. They include multiple subject level random effects and a longitudinal correlation structure. Once both modeling structures have been described, we apply these models to the visual world paradigm dataset. 2

18 In Chapter 4, we explore an adaptation of the longitudinal nonparametric ANOVA model presented by Crainiceanu (Crainiceanu, Ruppert et al. 2005). This modeling framework provides a nonparametric alternative to the Bayesian hierarchical parametric modeling framework described in Chapter 3. Again, there are multiple subject level random effects and a longitudinal correlation structure. We apply this modeling framework to the visual world paradigm data set presented in Chapter 2. In Chapter 5, we focus on making comparisons between multiple groups so that we can determine when and where differences occur. We implement a previously developed BEST test (Kruschke 2013) but do so sequentially across hundreds of correlated time points. We also estimate the distribution of onset difference detection between two groups over time. Specifically, we are using the modeling framework developed in Chapter 3 to develop a Bayesian comparison technique. Finally, in Chapter 6 we discuss the methods presented in this dissertation and summarize the procedures from each chapter. Here we also discuss the limitations of this work and give direction for future work. 3

19 Chapter 2 BACKGROUND 2.1 Motivating Example and Dataset The motivating example for this dissertation arises from the area of psycholinguistics. Language unfolds over time and in psycholinguistics, a fundamental analytical challenge is how to adequately incorporate time into the analysis. The visual world paradigm (VWP) is a technique used in psycholinguistics to help researchers investigate how people comprehend words and sentences (Allopenna, Magnuson et al. 1998). The visual world paradigm was developed with the notion that eye movements provide insight into mental processes that accompany language comprehension (Tanenhaus, Spivey- Knowlton et al. 1995). In the VWP, a subject is given a task, via spoken instruction, to select one of several visual objects, as fast as they can, from a computer screen or in real life (Tanenhaus, Spivey-Knowlton et al. 1995, Allopenna, Magnuson et al. 1998, McMurray, Samelson et al. 2010). For example, suppose the images displayed in Figure 2.1 are shown on the computer screen (without the names of each category below the object). The goal of a single trial for the subject is to select the target word given to them via instruction. In the example in Figure 2.1, this would be lizard. In this version of the visual world paradigm, there are typically four different visual objects to choose from on each trial, and each object can be placed in one of the following four categories: Target, Cohort, Rhyme, and Unrelated. The cohort word shares similarities to the beginning of the target word, the 4

20 rhyme word rhymes with target word, and finally the unrelated word has no spoken relation to the target word. When the trial begins, the participant is to be focused on the red dot in the middle of the screen until a spoken instruction is given. For example in the sample trial displayed in Figure 2.1, the participant will hear the target word lizard as the image of a lizard will appear on the screen along with the competitor words liver (cohort word), wizard (rhyme word) and necklace (unrelated word). Figure 2.1 Sample trial from the VWP. The target word is lizard, cohort word liver, rhyme word wizard, and unrelated word necklace. When the target word is spoken to the participant, the participant is expected to visually scan the display of pictures before ultimately fixating on the target word, thereby 5

21 making their selection. The idea is that the participant s fixations reflect what they plan to do, in this case which image they plan on selecting. To better understand language dynamics, researchers would like to know how long the participant spends on each of the competitor words before making a final selection. Each participant wears an eyetracking device that records individual eye movements. That is, the device is able to track which of the images the participant is looking at and records the information every four milliseconds (ms) for up to 2000 ms (2 seconds). These trials are repeated multiple times. Each subsequent trial begins as soon as the previous word selection has been made, indicating that each trial could have a different response time. Only trials in which the target word was selected were considered in this dissertation so that each participant had similar outcomes. This allowed researchers to either extend or truncate each time series to a maximum of 2000 ms per subject. Therefore, each individual s time series consists of 501 time points, yielding measurements at every 4 ms for a period of 2000 ms. Across the repeated trials, the proportion of fixations to the target, cohort, rhyme, and unrelated words is recorded at each time point per subject. These fixation probabilities represent each subject s fixation how strongly each of the four word types (Target, Cohort, Rhyme, and Unrelated) are being considered, across several different trials, at any given time point. Figure 2.2 is a representation of a typical subject s fixation curves for each of the four categories (Target, Cohort, Rhyme, and Unrelated). It is important to note that the sum of these fixation probabilities across all categories will be less than or equal to 1. 6

22 The motivating dataset was obtained from Dr. Bob McMurray from the Department of Psychological and Brain Sciences at the University of Iowa, as part of an ongoing longitudinal research project (Farris-Trimble and McMurray 2013). The eye-tracking dataset consists of a total 55 participants from two independent groups: 29 cochlear implant users and 26 normal hearing adults. Figure 2.2 A typical participant s fixation curves. Learning effects can have a considerable impact on an individual s fixation patterns throughout the time course (Salverda, Brown et al. 2011). Thus, some researchers are 7

23 interested in understanding differences between spoken word recognition between adults who were cochlear implant users and normal hearing adults for the various word types. For example, questions of interest may include whether cochlear implant users suppress competitor words as quickly as normal hearing participants. In order to understand the spoken word recognition differences, it will be helpful to evaluate both subject specific trajectories, and group level trajectories for each fixation type and to make appropriate corresponding comparisons by simultaneously fitting the multiple fixation curves. A major difficulty lies in creating a modeling framework that can capture the different shapes seen in Figure 2.2, the correlated nature of the time series per individual, and correlation between fixation curves. In most psycholinguistics paradigms, there are multiple random factors that should be considered when developing a modeling framework, such as the subjects, language, and talker. In addition to randomly sampling subjects from a population, we also need to be able to take into account that we are randomly sampling words from the language population (Clark 1973, Raaijmakers, Schrijnemakers et al. 1999). Thus, developing a modeling framework which can incorporate multiple random effects is important. The primary existing techniques for making these comparisons include using analysis of variance (ANOVA), area under the curve analysis, mixed models, and polynomial curve fitting. These techniques tend to ignore multiple random effects, the inherent shape of the data, the time series nature of the data, or within/between subject variability (McMurray, Samelson et al. 2010). We next give a detailed account of these techniques. 8

24 Area under the curve (AUC) is a widely used technique in analyzing visual world paradigm data to determine if there is a group or intervention effect. A benefit for using the AUC approach for visual world paradigm data is that this method, though not straightforward, allows for the inclusion of multiple random effects. Specifically, a researcher would run a separate subject and item (words or sentences in language) ANOVA (analyses of variance) and compute individual F statistics. Explicitly, researchers are either collapsing over items (subject analysis) or collapsing over subjects (item analysis). This is known as F1 F2 analysis, where the thinking is that if both statistics are significant then the effect is reliable across both subjects and items (Clark 1973). Researchers calculate the average proportion of fixations within an arbitrarily chosen fixed time window. Typically, AUC is used to summarize fixation proportions, either across participants or items, into one time point (it becomes the dependent variable). By and large, the window is chosen based on results from properties of the stimulus or researcher s prior knowledge, which is one of the pitfalls of using the AUC. A couple of other drawbacks to this approach are the lack of a flexible method for dealing with missing data (it requires each of the participant s to have complete data) and that prior averaging must be done before analyzing, nullifying the time series nature of the data (Baayen, Davidson et al. 2008). Although this method can allow the use of multiple time windows, it is not recommended since these different windows would be dependent of each other, making it difficult to take into account the time series nature of our data and the individual characteristics of each person s function. Thus, when using AUC 9

25 analysis to investigate VWP data, researchers tend to average within participant or within item (McMurray, Samelson et al. 2010). The drawbacks of AUC contributed to the rise in mixed effects modeling in VWP research, since mixed effects modeling allows for the inclusion of random effects as well as offering the possibility of bringing longitudinal effects straightforwardly into the modeling framework (Baayen, Davidson et al. 2008). Mixed effects models allowed for simultaneous inclusion of both subject and item random effects, as well as a way to formally test (via goodness of fit tests) the relative importance of including the random effect for items. The use of this modeling framework also gave researchers the capability to handle missing data (Baayen, Davidson et al. 2008). One way to incorporate time in a mixed effects model is via linear and nonlinear growth curves. A growth curve approach to analyzing VWP data allows researchers to describe the functional form and quantify major aspects of the probability distribution that result from the underlying process (Mirman, Dixon et al. 2008). Two main forms of growth curves have been used in VWP research: orthogonal power polynomials and nonlinear models. Orthogonal power polynomials allow researchers to fit each individual s fixations to a polynomial function of time (Mirman, Dixon et al. 2008). Each of the parameters in the function describes the participant s fixation probabilities as a function of time, not the amount of fixations to a particular object at a specific time point. Thus, each person will have a set of parameters that will be tailored to their individual time window. This approach is useful when researchers are interested in evaluating differences amongst 10

26 participants, as well as resolving some the issues that can arise using AUC methods, such as ignoring the time series nature of the data. An advantage of using orthogonal power polynomials is that the model of the average is the average of the participant models (Mirman, Dixon et al. 2008). Equivalently, since orthogonal power polynomials are linear with normally distributed outcomes, the marginal model is equal to the subjectspecific model. In other words, the relationship of the average data and the underlying probability distribution of the individual data are straightforward. Typically, a fifth or sixth order polynomial is needed in order to adequately model each person s set of fixation probabilities within a restricted time interval. As a result of fitting such a high order polynomial, interpreting the parameters in these models will be difficult. Another difficulty in using the orthogonal power polynomial approach is that the function may behave well within the fixed analysis time window, but will start to misbehave outside of this time window (the tails of the fixation curves). Thus, we aim to develop a modeling framework that will allow us to incorporate the time series nature of the data, take into account multiple random effects, and allow us to consider both individual and group parameters with meaningful interpretations. 2.2 Nonlinear Techniques Recently, nonlinear functions have been used to help resolve some of the issues presented with AUC and polynomials. Instead of using higher order polynomials to model fixation probabilities, McMurray and colleagues used nonlinear functions to model each participant s data from the different fixations (Target, Cohort, Rhyme, 11

27 Unrelated) (McMurray, Samelson et al. 2010). Next, parameter estimates resulting from each individual specific model fit were included as the dependent variable in linear regressions to evaluate the effect of the participant s language and cognitive ability. In modeling the fixations in this manner, the researchers were able to choose nonlinear functions whose parameters were easily interpretable, and modeled the data adequately. Hence, an advantage of using nonlinear functions is that, if chosen correctly, can describe fixation curves well throughout the entire time course; not only for a fixed analysis window. On the other hand, modeling these fixations in this manner leads to a dependence amongst the parameters, where one parameter cannot be estimated without the other; thus making it difficult to estimate them (McMurray, Samelson et al. 2010). These estimating procedures assume a constant variance taking advantage of the inherent autocorrelated nature of the data. Finally, researchers needed to separately model the fixations and then use fits from these models in a separate regression analysis to answer the questions at hand. In order to make inferences about both individuals and groups, two separate sets of models will need to be fit. Unlike orthogonal power polynomials, there is no straightforward relationship between the average data and the underlying probability distribution of the individual data (Mirman, Dixon et al. 2008). Ideally, it would be beneficial for the researchers use to one estimation procedure to evaluate both individual and group differences. Thus, an ideal modeling framework, and 12

28 the goal of this thesis work is to incorporate both the advantages of using orthogonal power polynomials and nonlinear curves. 2.3 Spline Modeling Nonparametric and semiparametric modeling approaches have become a viable alternative to parametric techniques, whether linear or nonlinear. One of the most attractive features of these methods is the flexibility that it allows. In these approaches, the data are used to dictate the shape of the functional relationships between the covariates and dependent variables, rather than the model dictating the shape of the data as found in parametric models (Crainiceanu, Ruppert et al. 2005). This implies that a universal model can be used to fit similar types of data, regardless of their particular shape. Some of the most common nonparametric techniques include regression splines, penalized splines, kernel estimation, and smoothing splines. In particular, regression splines frequently are used in smoothing longitudinal curves since they can be thought of as extensions to polynomials (Eubank 1999). A spline can be thought of pieces of several polynomials that are joined at different points, known as knots. Two of the most popular types of splines are smoothing splines and regression splines, where one of the major differences between the two is the knot placement. In smoothing splines, the data points themselves are potential knot locations, whereas in regression splines, the knots are placed at either equidistant or equiquantile locations (Racine 2014). The number of knots chosen is subjective, thus allowing the user flexibility in model selection. Some of the obstacles that one faces 13

29 with arbitrary knot selection include difficulty in choosing from a large pool of candidate models, and difficulty estimating regression splines when there are few data points between knots (Kliethermes 2013).Within the regression spline framework, B-spline and truncated polynomial bases are the most popular basis functions P-Splines and Mixed Models We can think of a general regression model as a smooth function plus an error term, y i = m(x i ) + ε i, where ε i are i.i.d N(0, σ 2 ε ). The smooth function, m( ), can be represented by basis splines, truncated polynomial or cubic splines, to name a few. The choice of basis function is crucial when using Bayesian analysis since there are important implications when it comes to the mixing properties of MCMC chains (Ruppert, Wand et al. 2003). Though there are several options of choosing a smooth function, a low rank thin plate spline will be used in this formulation. A low rank thin plate spline was chosen because they tend to have very good numeric properties, such as the posterior correlation of parameters is much smaller than for other bases which greatly improves mixing (Crainiceanu, Ruppert et al. 2005). Low rank thin plate splines are ideal since they provide computational efficiency and stability especially for nonlinear models and also provide a way of incorporating multidimensional smooth terms (Wood 2003). We can represent a low rank thin plate spline, m( ), as follows: m(x, θ) = β 0 + β 1 x + u k x κ k 3 K k=1 (2.1) 14

30 where θ = (β 0, β 1, u 1,, u K ) T is a vector of regression coefficients and κ 1 < κ 2 < < κ k are fixed knots where the knots are at the sample quantiles of the x s corresponding to the probability k K+1. In this case, we are looking at cubic splines. We would need to minimize the sum of squares along with a penalty term to avoid overfitting, such as n (y i m(x i, θ)) λ θt Dθ. i=1 (2.2) In this case, λ represents the smoothing parameter and D is a known positive semidefinite penalty matrix (Crainiceanu, Ruppert et al. 2005)(Crainiceanu, Ruppert et al. 2005)(Crainiceanu, Ruppert et al. 2005)(Crainiceanu, Ruppert et al. 2005)(Crainiceanu, Ruppert et al. 2005)(Crainiceanu, Ruppert et al. 2005). A penalized spline is simply a regression spline with a roughness penalty as shown in (2.2). We will represent the penalized spline in the form of a linear mixed model. We can formulate these penalized splines under a linear mixed modeling framework. Say we have the following linear mixed effects model Y = Xβ + Zb + ε cov ( b ε ) = [σ b 2 I K 0 0 σ ε 2 I K ] where X is a matrix of fixed effects with the i th row, X i = (1, x i ), and Z = Z K Ω K 1/2 penalty Ω K with the i th row Z Ki = ( x κ 1 3,, x κ k 3 ). 15

31 We also let b = Ω 1/2 K u, where u = (u 1,, u K ) represent the random parameters, satisfying with E(u) = 0, cov(u) = σ 2 u Ω 1 K with u and ε independent. With this reparameterization, the P-spline is equivalent to the BLUP (Best Linear Unbiased Predictor) of the linear mixed model (Crainiceanu, Ruppert et al. 2005). Instead of using a frequentist approach to fitting this model, we will be using Bayesian methodology to estimate the parameters necessary by placing priors and simulating from a joint distribution. In Chapter 4, we explore an adaptation of the Longitudinal Nonparametric ANOVA model by Crainiceanu and colleagues. Note that the P-spline and BLUP equivalence does not necessarily hold in our use of the p-splines required in our framework, outlined in Chapter 4. However, this modeling framework will use some of the aforementioned concepts and will allow us to have one universal modeling framework for both target and competitor fixations. 2.4 Bayesian Hierarchical Techniques In this dissertation, we take a Bayesian approach to this problem and propose methods that take into account multiple random effects, the correlated nature of the fixation curves, simultaneously fit multiple groups and curves, and capture the different shapes of the data under one modeling framework. The advantages of taking a Bayesian hierarchical modeling approach are discussed in this section. 16

32 The concept of shrinkage, the tendency to partially pool or shift parameter estimates towards each other, is prevalent in hierarchical modeling. In other words, we are using the information we already have to shift these estimates to reflect the uncertainty of where the individual observations are relative to the mean of these observations. Thus, the posterior mean is a reflection of the shrinkage of the individual observations towards the common value determined from the different stages of the hierarchical model, including the priors, and all of the data (Cowles 2013). Both the distance of the individual estimate from the common value and how much prior information we can gather from the data determine the amount of shrinkage. It is important to note that shrinkage is attributed to the hierarchical structure of the model, not the estimation technique. Therefore, using a hierarchical structure allows us the advantage of borrowing strength from different group estimates, thus reducing the chances of obtaining unreasonable estimates. In our modeling framework, hierarchical models are used so that we may take into account the researchers questions of interest. Hierarchical models allow us to create a model that can answer all of the research questions, where each level can take into account the different effects of interest by creating parameters for each of them (Gelman, Hill et al. 2012). We can achieve this by incorporating parameters for each effect of interest. In the Bayesian framework, complex models are fit via simulation, which gives us an estimate of the posterior distribution. Using Bayesian estimation techniques to obtain 17

33 the posterior distribution, allows us to work with different functions of the various parameters that are being estimated, so that we may be able to investigate the research questions at hand. One of the most popular methods of Bayesian estimation is the use of Markov chain Monte Carlo (MCMC) methods. The way in which MCMC estimates of the posterior distribution are constructed is by looking at several different combinations of the parameter values within each chain. Consequently, the posterior distribution is a depiction of the joint distribution of plausible combinations of all the parameters of interest, given the data. The longer the chains, or the larger the MCMC sample, the more fine-tuned representation of the parameters of the posterior distribution (Kruschke 2013). Therefore, using MCMC methods allows us to generate a realistic illustration of the parameter values of the posterior distribution, while giving us the opportunity to work with different functions of these parameter values to help us answer the research questions at hand. 2.5 Comparison Techniques Several statistical comparison techniques exist to help detect whether or not there is a difference between the trajectories of different groups, yet few exist that can determine when such a difference occurs. In particular, we are interested in making comparisons between groups and individuals. A previous method involves bootstrapping techniques that are used on the fitted curves of the data to obtain estimates of the mean and variance, thus allowing comparisons, via two-sample t-tests, at each time point, while accounting for the autocorrelation and multiple comparisons by way of the false 18

34 discovery rate (Oleson, Cavanaugh et al. 2015). This method will serve as a starting point for our comparison technique, given that both methods have the same goals in mind. In a similar fashion, we will use the fitted data, from the Bayesian hierarchical methods developed in this dissertation, in our comparison techniques. Unlike the previously developed methods, we are taking a completely Bayesian approach in the development of this comparison technique by using the Bayesian alternative to the two sample t-test, aptly named Bayesian Estimation Supersedes the t Test (BEST), developed by Kruschke in We will delve into this method in Chapter 5. 19

35 Chapter 3 NONLINEAR PARAMETRIC HIERARCHICAL MODELS Bayesian hierarchical models can be written as being comprised of three different components: the data model, the process model, and the parameter model. The multiplication of these three components yields an estimate, not necessarily in the closed form, of the joint posterior distribution of all quantities in the model. In this thesis, we are focused on both parametric and nonparametric Bayesian nonlinear hierarchical models; parametric models will be discussed in this chapter and nonparametric models in Chapter 4. In this chapter two different nonlinear parametric hierarchical models will be presented. In Section 3.1, the data model is presented, followed by the presentation of three different process models, with their corresponding parameter models in Section 3.2. Model implementation will be discussed in Section 3.3 followed by a brief simulation study in Section 3.4. We will close Chapter 3 by implementing the modeling framework presented to the Visual World Paradigm dataset introduced in Chapter Data Model The data model is written in the same manner for both the parametric and nonparametric models in which we will assume a normal distribution with mean μ ij for i = 1,, N subjects and j = 1,, T time points. We will assume that the errors are not 20

36 independent of each other, thereby allowing for correlated errors. Specifically the data model is defined as follows: Y ij = f(δ) ij + τ ij (3.1) τ ij = ρ τ i,j 1 + e ij (3.2) e ij ~ N(0, σ 2 ) As previously mentioned τ ij is an autoregressive model of order 1 (AR1) where ρ represents the order of one correlation and e ij can be thought of as white noise or random error. The process model is some function specified for the mean parameter denoted as f(δ) ij. If the function f(δ) ij is considered parametric, then the Bayesian nonlinear hierarchical model is parametric, and if the function f(δ) ij is nonparametric then the Bayesian nonlinear hierarchical model is considered nonparametric. Since the parameter model depends on the process model, the parameter model will also change depending on the form of the process model. Both the parametric and nonparametric versions of the Bayesian nonlinear hierarchical model will have the same data model, (3.1) (3.2). These will be specified in more detail in Section 3.2. The data set, previously introduced in Chapter 2, consists of fixation probabilities from four different word types (Target, Cohort, Rhyme, and Unrelated) from 29 Cochlear Implant and 26 Normal Hearing patients is used. This dataset is ideal for our proposed 21

37 modeling framework since there is a natural hierarchy to the longitudinal data. Each participant has a fixation probability recorded per time point and each participant is a member of a group (where group membership is mutually exclusive). Therefore, using the data model (3.1), we are modeling Y ij, the average probability of the i th subject fixating on the word type at the j th time point. For this model we are assuming that the data are normally distributed with mean, μ ij, for subject i per time point j. Each subject s fixation was measured every 4 ms even though a typical person s fixation lasts from ms (McMurray, Samelson et al. 2010). Now we will discuss three different process models, the four parameter logistic model, three parameter logistic model, and the double Gaussian model, as well as the different model implementations. 3.2 Process Model Four Parameter Logistic Model The four parameter logistic function (4PL) is an increasing monotonic function, with random minimum and maximum values. This function is characterized by four distinct parameters a minimum (min), maximum (max), slope, and inflection point (IP). Accordingly, δ is comprised of the min, max, slope, and IP. The 4PL can be written to have the following form: f(δ) ij = min i + max i min i 1+( time slope i (3.3) ) IP i 22

38 The four parameter logistic function resembles the shape one would expect from the data for the target fixation curve. That is, it is a low starting point for the curve, characterized by the minimum, then followed by a monotonically increasing trend until it eventually plateaus to the maximum. The slope determines the rate at which the curve reaches the maximum, and the inflection point, specifies the time point where the curve changes from being concave up to concave down. There is symmetry around the inflection point. Each individual s target curve will be specified by (3.3). Since individuals start fixating on the target word at different rates and at different time points, a subject specific function is ideally suited to accommodate those nuances in the data. In addition, another goal is to be able to incorporate group level effects into the model. Thus, function (3.3) comprises the first part of the process model that we use to represent fixations on the target word. The second part of the process model is focused on simultaneously considering both the subject and group specific effects. Subject/Group Specific Effects Our goal is to model each individual s growth trajectory, while simultaneously estimating overall group effects. The manner in which we approach this is by defining subject specific minimum, maximum, slope, and inflection point stochastic nodes. Each of the aforementioned parameters follows a normal distribution with a mean defined by an overall mean plus a group specific deviation specified as follows: 23

39 min i = a + α 1 G i + ν i min where ν i min ~ N(0, σ νa 2 ) (3.4) max i = b + α 2 G i + ν i max where ν i max ~ N(0, σ νb 2 ) (3.5) slope i = c + α 3 G i + ν i s where ν i s ~ N(0, σ νc 2 ) (3.6) IP i = d + α 4 G i + ν i IP where ν i IP ~ N(0 i, σ νd 2 ) (3.7) Equations (3.4) through (3.7) show how subject specific parameters are constructed. We use i to index each subject s random parameters (minimum, maximum, slope, and infection point). Specifically, each subject specific parameter, min i, is normally distributed whose mean consists of an overall baseline mean, a, a group effect G i which is an indicator variable of group membership with α 1 denoting the corresponding coefficient for group membership. Each of the parameters has its own random intercept for subject which follows a normal distribution with mean zero and variance term. By setting up the process model this way, we are allowing both subject specific fixation curves to be fit as well as separate group fixation curves. Additional covariates could be easily incorporated within this formulation. Now that the process model has been established, the parameter model can be established. This is where we can specify the prior distributions for the parameters of interest from the process model. Thus one could incorporate any previous information from the researchers or previous studies in choosing the hyperparameters for each distribution. Otherwise, the data will drive the estimation of the hyperparameters for the prior distributions defined in the parameter model. 24

40 The following parameter model, corresponding to the four parameter logistic process model, is defined as follows: a ~ N(μ 1, σ 2 a) (3.8) b ~ N(μ 2, σ 2 b) (3.9) c ~ N(μ 3, σ 2 c) (3.10) d ~ N(μ 4, σ 2 d) (3.11) α 1, α 2, α 3, α 4 ~ N(μ 6, σ 2 α) (3.12) σ 2 va, σ 2 vb, σ 2 vc, σ 2 vd, τ 0 ~ Inverse Gamma(shape, scale) (3.13) ρ ~ U(α, β) (3.14) Equations (3.8) to (3.12) specifies a Normal prior, with mean and variance on the overall means for each of the four parameters (minimum, maximum, slope, and inflection point), and on the group specific deviations for each of those parameters. In (3.14), a uniform prior is placed on ρ, the parameter used to describe the correlation in our AR(1) process found in (3.2). Finally, in (3.13), an Inverse Gamma distribution is placed on τ 0, which is the initial case of the AR(1) process, and σ 2 va, σ 2 vb, σ 2 vc, σ 2 vd, are the variances associated with the minimum, maximum, slope, and inflection point, respectively. Defining the parameter model in this way, allows for a more straightforward definition for the hyperparameters of the prior distributions. The prior densities on the unknown parameters provide enough additional information about these parameters so that estimation of these parameters is feasible. We could 25

41 have chosen prior distributions that are more aligned with the parameter specifications. For example, a Beta distribution prior may have been more suitable for the parameters associated with the minimum and maximum, since it is defined on the interval [0,1]. Nonetheless, our choice of prior distributions for the different parameter components is acceptable. Normal priors adequately represented our parameter estimates, and have a minimal effect on the joint posterior distribution, once relevant evidence has been taken into account. Therefore, even though we could have chosen different prior distributions, our choice of normal priors did not significantly affect our estimated joint posterior distribution Three Parameter Logistic Another process model that can be used in modeling the target data is the three parameter logistic (3PL) function. It is nearly identical to the four parameter logistic, a similar shape but with a fixed minimum (Ritz and Streibig 2005). Hence the name, three parameter logistic since this function is now characterized by three parameters maximum, slope, and inflection point. Hence, δ is comprised of the max, slope, and IP. The minimum is fixed at zero, therefore the three parameter logistic function can be written to have the following form: f(δ) ij = max i 1 + exp (slope i (log(time) log(ip i )) The three parameter logistic function is a reasonable process model for data from the visual world paradigm since it is practical to assume that each subject has a fixed 26

42 minimum fixation probability of zero. Due to the nature of the data set, a subject begins each trial by staring at the center of the screen which implies they are not looking at any of the four words from the different fixation categories (Target, Cohort, Rhyme, and Unrelated). Similar to Section 3.2.1, we are simultaneously incorporating the subject and group specific effects, with the exception that we only have three parameters to consider, in the process model (i.e. we do not include (3.4)). Once again, the only difference between the three parameter logistic function and the four parameter logistic function is that the minimum parameter is fixed at zero and we aren t estimating the associated effects but the remainder of the model follows that of Since the smallest value for a subject s fixation probability is zero, and every subject starts off by not fixating on any of the words, having the minimum parameter fixed at zero would be a reasonable assumption Double Gaussian In addition to modeling the target word fixations, researchers are also interested in modeling fixations to the competitor words, which include fixations on the cohort, rhyme, and unrelated words. One would expect the subjects to briefly look at the different competitor words before finally fixating on the target word. Consequently, we would need a function that is able to capture the rise of the fixations to the competitor words as well as the decline of these looks once they have fixated on the target word. The double Gaussian curve has been previously used to model this phenomenon (McMurray, Samelson et al. 2010). Therefore, when modeling the competitor curves the 27

43 double Gaussian function serves as the basis of our process model within the hierarchical framework. The double Gaussian function (DG) is a piecewise function that can be characterized by two normal distribution functions (or Gaussian curves) that are joined at the mean, m j, at the j th time point. Specifically, the indicator function, wt ij, for the i th subject at the j th time point is defined by whether or not the time point is less than or equal to the mean. f(δ) ij = wt ij y 1ij + (1 wt ij ) y 2ij (3.15) wt ij = { 1 if time j m j 0 if time j > m j (3.16) Here both y 1ij and y 2ij are similar to normal probability distribution functions in that both have unique variances and bases, but also both share the same mean as well as height, as shown below. The main deviance from a normal distribution function is that we do not have the normalizing constants in the front. y 1ij = exp ( 1 (time j m i ) 2 2 sig1 i 2 ) (ht i base1 i ) + base1 i (3.17) y 2ij = exp ( 1 (time j m i) 2 2 sig2 i 2 ) (ht i base2 i ) + base2 i 28

44 The function is characterized by a mean, a height, two bases, and two standard deviation parameters, of which we will obtain both subject and group specific sets of parameters. Subject/Group Specific Effects Similar to the previous process models for the nonlinear parametric hierarchical models above, each parameter is normally distributed with a subject specific mean and an overall variance term. In order to incorporate both subject and group specific effects, the parameter s mean is broken down into two parts: an overall mean and the group specific contribution, as we can see below. m i = am + α 5 G i + ν i m where ν i m ~ N(0, σ νm 2 ) (3.18) ht i = bh + α 6 G i + ν i ht where ν i ht ~ N(0, σ νht 2 ) (3.19) sig1 i = cs + α 7 G i + ν i sig1 where ν i sig1 ~ N (0, σ νsig1 2 ) (3.20) sig2 i = ds + α 8 G i + ν i sig2 where ν i sig2 ~ N (0, σ νsig2 2 ) (3.21) base1 i = pb + α 9 G i + ν i base1 where ν i base1 ~ N(0, σ νbase1 2 ) (3.22) base2 i = rb + α 10 G i + ν i base2 where ν i base2 ~ N(0, σ νbase2 2 ) (3.23) Similar to the four parameter logistic model, by defining the parameters in this way we are able to obtain information about both each individual s fixation curve along with each group s trajectory. The corresponding parameter model is as follows: 29

45 am ~ N(μ am, σ 2 am) (3.24) bh ~ N(μ bh, σ 2 bh) (3.25) cs ~ N(μ cs, σ 2 cs) (3.26) ds ~ N(μ ds, σ 2 ds) (3.27) pb ~ N(μ pb, σ 2 pb) (3.28) rb ~ N(μ rb, σ 2 rb) (3.29) α 5, α 6, α 7, α 8, α 9, α 10 ~ N(μ α, σ 2 α) (3.30) σ 2 am, σ 2 bh, σ 2 cs, σ 2 ds, σ 2 pb, σ 2 rb, τ 0 ~ Inverse Gamma(shape, scale) (3.31) ρ ~ U(α, β) (3.32) Similar to the four parameter logistic model, (3.24) to (3.31), we place Normal priors on each individual s overall mean for each of the six parameters that characterize the double Gaussian model. All of the parameters associated with variance terms have conjugate prior distributions. Finally we assume that ρ, the term associated with our correlation, is uniformly distributed between some α and β. The model developed in 3.2 will capture group effects and individual effects, plus correlated errors, all under the umbrella of a single unified hierarchical modeling framework. Particularly, we developed two modeling frameworks (four-parameter logistic and double Gaussian) for the target word and competitor words, respectively. In 3.3 we will implement the modeling framework developed in 3.2 to the visual world paradigm data set presented in Chapter Model Implementation Now that we have described the different pieces of these Bayesian nonlinear hierarchical models, we now turn to model implementation. All modeling and analyses were performed using RStudio (Team 2015) The rjags package provides an interface to 30

46 JAGS (Just Another Gibbs Sampler) where it uses internal MCMC libraries (Plummer 2014). In order to use this interface, we first need to create a separate model specification file (.bug/.txt). Sample model specification files for the models presented in Sections and can be found in Appendix A. Then the jags.model() function calls the model specification file, along with the initial values, and runs it in JAGS. In particular, each model consisted of 4 parallel chains and 1000 iterations for adaptation. Then the models were updated 10,000 times and the trace monitors were set and monitored for an additional 1000 iterations. The coda package was used to assess convergence (Plummer, Best et al. 2006). The fits obtained from the models were then stored in an MCMC list and values from these fits summarize the estimated posterior distribution. Each parameter s estimate is found by calculating the mean of all the iteration values from each chain. These parameter estimates are then used to obtain the fixation probabilities at each time point. 3.4 Simulations A simulation study was performed to justify the use of our modeling framework. In particular, we wanted to see how well the nonlinear parametric modeling framework presented in this chapter does at estimating the true parameter values. For this simulation study, 50 datasets were randomly generated with 2 groups of 10 subjects each, consisting of 100 time points each. A fixation proportion was generated every 4 ms for each subject from 0ms to 400ms, resulting in 100 time points each. The 31

47 four parameter logistic process and parameter model from Section was used. The following priors were used for all of the four parameter logistic models in this section: a ~ N(0, 0.002) b ~ N(0.8, 0.012) c ~ N(5.4, 0.185) d ~ N(150, 10000) (3.33) α 1, α 2, α 3, α 4 ~ N(0, 10000) σ 1, σ 2, σ 3, σ 4, σ 5, σ 6, τ 0 ~ Inverse Gamma(1, 100) ρ ~ U(0, 1). These prior distribution values were determined from a previous visual world paradigm study (McMurray, Samelson et al. 2010). For the four parameter logistic parameter model, four different situations were considered. We studied two groups whose four parameters (minimum, maximum, slope, and IP) were identical (Table 3.1), two groups with different maximums (Table 3.2) with all other parameters equal, two groups with different slopes (Table 3.3) with all other parameters equal, and two groups with different inflection points (Table 3.4) with all other parameters equal. 32

48 Table 3.1 Identical Groups for Four Parameter Logistic Process Model Minimum Maximum Slope IP True Value Mean SD Bias MSE As seen in Table 3.1, the means are very similar to the true parameter values. The resulting standard deviation, bias and MSE for the minimum, maximum, and slope parameters are relatively small. Notice that for the inflection point parameter we have both a larger standard deviation, and MSE. As seen in (3.33), the prior distribution placed on the IP parameter allowed for more uncertainty in the estimates, thus allowing a larger range of values. For Table 3.2, we fit the same four parameter logistic process model as in 3.1 but with different simulated data sets. Each data set consisted of two groups whose simulated values were generated from identical minimum, slope, and inflection point values. For Group 1, a maximum value of 0.80 was used in conjunction with the other three parameters to generate fixation proportions. In a similar fashion, the fixation proportions for Group 2 were generated with a maximum value of As seen in Table 3.2, the means are similar to the true minimum and maximum parameter values. We can observe that these parameters are all codependent. In Group 1, the means of the 33

49 slope and inflection point parameter values across the simulated data sets are similar to the true parameter estimates. On the other hand, there is a bigger discrepancy in Group 2 for these same estimates. This is attributed to the form of the four parameter logistic function. Recall that the IP specifies the time point specifies the time point where the curve changes from being concave up to concave down. In Group 2, our model needs to account for the larger range between the minimum and maximum values and it does this by having a smaller slope and inflection parameter estimate. Table 3.2 Varying Maximums for Four Parameter Logistic Process Model Group 1 Minimum Maximum Slope IP True Value Mean SD Bias MSE 3.402E E Group 2 Minimum Maximum Slope IP True Value Mean SD Bias MSE 2.951E E

50 In Table 3.3, a similar approach to Table 3.2 was used to generate data for Group 1 and Group 2. Instead of varying the maximum parameter values, the slope parameter values were varied across groups, and the minimum, maximum and inflection point values were identical. The mean parameter values, in both Group 1 and Group 2, are similar to the true parameter values used to generate the datasets. In comparison to Table 3.2, varying the slope parameter does not appear to have as large of an effect on the other parameters as did varying the maximum. Table 3.3 Varying Slopes for the Four Parameter Logistic Process Model Group 1 Minimum Maximum Slope IP True Value Mean SD Bias MSE 1.863E E Group 2 Minimum Maximum Slope IP True Value Mean SD Bias MSE 2.134E E

51 In Table 3.4, the minimum, maximum, and slope parameters were held constant across groups for two different inflection point values for the four parameter logistic process model. The model appears to capture the true parameter values used to simulate data in both groups. In comparison to Table 3.2, varying the inflection point does not appear to have as large of an effect on the minimum, maximum and slope values. Table 3.4 Varying Inflection Points for the Four Parameter Logistic Process Model Group 1 Minimum Maximum Slope IP True Value Mean SD Bias MSE 2.88E E Group 2 Minimum Maximum Slope IP True Value Mean SD Bias MSE 3.212E E In this section, we saw that varying the maximum values across two groups had the largest impact on the slope and inflection point parameter estimates. The same process 36

52 model found in Section was used for the different scenarios presented and it appeared to reasonably estimate the true parameter values. 3.5 Data Analysis As previously mentioned in Chapter 2, the data arise from our motivating example which uses the Visual World Paradigm (VWP) from the area of psycholinguistics. The data set consists of fixation probabilities from four different word types (Target, Cohort, Rhyme, and Unrelated) from 29 Cochlear Implant and 26 Normal Hearing patients. Subject s fixation probabilities were recorded for a period of 2000 ms where measurements were taken every 4 ms, which is equivalent to 501 distinct time points. It is worth reiterating that we are interested in modeling Y ij, the average probability of the i th subject fixating on the word type at the j th time point Four Parameter Logistic The first word type that we are interested in modeling is the target word. In other words, we are modeling the average probability of a subject fixating on the target word at a particular time point. We will use the four parameter logistic function, as described in Section 3.2.1, as the process model within our Bayesian hierarchical modeling framework. 37

53 Table 3.5 A typical subject's parameter values from the 4PL model (Target Fixation) Parameter CI Subject 13 CI Subject 26 NH Subject 18 NH Subject 21 Minimum Maximum Slope IP Table 3.5 was constructed by looking at four randomly selected individual s parameter fits (2 from the Cochlear Implant group and 2 from the Normal Hearing group). We can see that a typical subject s minimum value is around zero. This modeling framework allows for negative values of the subject s minimum values so that there is more flexibility in estimating the other three parameters. The fixation probabilities increase at rates from 4.87 to around 6.5 until their fixation probability eventually plateaus to a maximum ranging from 0.85 to around The inflection point (IP) parameter estimate represents around which time point the four parameter logistic curve is symmetric, and from this table we can see that it is around ms, with the exception of the first subject whose IP is around 814ms. Figures 3.1(a) 3.1(d) represent the individual model fits from the four subjects in Table 3.5 from the four parametric logistic modeling framework described in Section In Figures 3.1(a) and 3.1(b), we see two typical cochlear implant subject s raw target fixation curves (blue) with the corresponding fitted fixation curves (black) overlaid. The bottom two panels of Figure 3.1 (c and d) represent two typical normal hearing 38

54 individual s raw target fixation curves (blue) and the corresponding fitted fixation curves (black). The following is Figure 3.2, which displays all 55 participant s individual fits along with the estimated group fitted curves. In particular, the left panel displays all 29 Cochlear Implant participant s individual fitted target curves (blue), as well as the corresponding fitted group curve (black) overlaid. In the right panel we see the 26 Normal Hearing subject s individual fitted curves (blue) displayed as well as the fitted group curve (black). One would expect higher fixation probabilities on the target word as the time progresses, which can be seen in both Figure 3.1 and Figure 3.2. All of the fitted fixation curves presented were simultaneously fit under the same four parameter logistic modeling framework presented in Section

55 Figure 3.1 Four Parameter Logistic Subject Specific Fitted Curves. Top panels: Cochlear Implant Fitted curves, Bottom panels: fitted Normal Hearing curves 40

56 Figure 3.2 Four-Parameter Logistic Subject vs Individual Curves for all subjects Double Gaussian The three remaining word types, Cohort, Rhyme, and Unrelated, are left to be modeled using the double Gaussian parametric framework found in Section These competitor curves can all be modeled by using the double Gaussian process model, since they have similar shapes. 41

57 Cohort Fixations We first examine modeling the average probability of a subject fixation on the cohort word at a particular time point. The double Gaussian function will be the process model within the Bayesian hierarchical modeling framework. Table 3.6 is representative of four typical subject s parameters for a double Gaussian model of fixations to the cohort word type. Table 3.6 A typical subject's parameter values from the DG model (Cohort Fixation) Parameter CI Subject 13 CI Subject 26 NH Subject 18 NH Subject 21 m Height Sig Sig Base Base In a similar fashion to Table 3.5, Table 3.6 was constructed by examining the same four individual s parameter fits. The six parameters found in Table 3.6 characterize the double Gaussian function. In particular, m lets us know the location, or time point, of where the curve has reached its peak. With the exception of Subject 21 from the Normal Hearing Group, the other three subjects attained their peak around ms. The height, or the peak, indicates the highest fixation probability that the cohort word will attain. From Table 3.2 we can see that the heights for these four subjects are around 42

58 0.17, 0.22, 0.18, and 0.20, respectively. The sig1 parameter relates to the rate at which the subject views the cohort word. NH subject 21 has the smallest sig1 parameter of the four subjects (90.65) which indicates that the subject found the cohort word faster than the other 3 subjects. This is also indicated by the m parameter of NH Subject 21 ( ) which is noticeably smaller than the other three subjects included in this table. CI Subject 26 has the highest sig1 parameter value, indicating that this participant took a longer time to study the cohort word, which is also reflected in the m parameter estimate (643.39). On the other hand, the sig2 parameter relates to the rate at which the subject looks away from the cohort word. CI Subject 13 had a large sig2 parameter estimate, implying that this participant looked away from the cohort word at a slower rate than the other three subjects. NH Subject 18 had the smallest sig2 parameter estimate (148.30), indicating that the subject quickly looked away from the cohort word. Finally, the base1 and base2 parameters correspond to the baseline fixation probabilities before a participant views the cohort word and after they have viewed the cohort word. As expected, these numbers hover around zero. Similar to Figure 3.1, Figures 3.3(a) (d) represent typical individual model fits from the double Gaussian modeling framework from Section Figures 3.3(a) and 3.3(b) show two typical cochlear implant subject s raw cohort fixation curves (blue) with the corresponding fitted fixation curves (black) overlaid. The bottom two panels of Figure 3.3 (c and d) represent two typical normal hearing individual s raw cohort fixation curves (blue) and the corresponding fitted fixation curves (black). 43

59 Figure 3.4 displays the 55 participant s individual fits along with the group fits. In particular, the left panel displays all 29 Cochlear Implant participant s individual fitted cohort curves (blue), as well as the corresponding fitted group curve (black) overlaid. In the right panel we see the 26 Normal Hearing subject s individual fitted curves (blue) displayed as well as the fitted group curve (black). One would expect there to be a rise in fixation probabilities, until the participant views the cohort word, and then a decrease in the fixation probabilities look away, which can be seen in both Figure 3.3 and Figure 3.4. Both groups fixation probabilities peak around 0.20 as seen in Figure 3.4. On the other hand, the Cochlear Implant group (Figure 3.4) appears to have more spread, signified by larger sig1 and sig2 parameter estimates. The Normal Hearing group appears to have the majority of the curves peak around ms and rapidly decrease to their baseline values. A formal comparison of these two groups will be shown in Chapter 5. All of the fitted fixation curves presented here were simultaneously fit under a Bayesian hierarchical model with a double Gaussian process model as presented in Section

60 Figure 3.3 Cohort Subject Specific Fitted Curves. Top panels: Cochlear Implant Fitted curves, Bottom panels: fitted Normal Hearing curves 45

61 Figure 3.4 Double Gaussian - Cohort Fixation: Subject vs Individual Curves for all participants. Rhyme Fixations The next fixation type of interest is the rhyme word. Since the rhyme fixations have a similar shape to that of the cohort fixations, the double Gaussian process model found in Section will be used to model fixations of this type. We are modeling the average probability of a subject fixation on the rhyme word at a particular time point. Table 3.7 is representative of a typical subject s parameters for a double Gaussian model of fixations to the rhyme word type. 46

62 Table 3.7 A typical subject's parameter values from the DG model (Rhyme Fixation) Parameter CI Subject 13 CI Subject 26 NH Subject 18 NH Subject 21 m Height Sig Sig Base Base As was the case with the previous two tables, Table 3.7 was constructed by looking at the same four individual s parameter fits. In particular, m lets us know the location, or time point, of where the curve has reached its peak. For the Rhyme fixation, with the exception of NH Subject 21, the m parameter estimates are lower than the corresponding estimate for the cohort fixation found in Table 3.6. The mean height parameter values also appear to be lower than the mean height parameter value for the cohort word found in Table 3.6. This could possibly indicate that participants were more likely to fixate on the cohort word, than the rhyme word. The sig1 parameter estimates, with the exception of NH Subject 21 are lower than their counterparts in Table 3.6 indicating that the subjects considered words of the rhyme fixation type faster than that of the cohort word. On the other hand, the sig2 parameter estimates were higher for all four subjects. This implies that subjects looked away from the rhyme word at a slower rate, which could be translated as once the subjects fixated on the rhyme word, they 47

63 considered it for longer than the considered cohort word. Finally, the base1 and base2 parameters correspond to the baseline fixation probabilities before a participant views the rhyme word and after they have viewed the rhyme word. Once again, these values tend to be around zero. As was the case for the cohort fixation, Figure 3.5 represents typical individual fixation curves from the double Gaussian process model. Figures 3.5(a) and (b), show two typical cochlear implant subject s raw cohort fixation curves (blue) with the corresponding fitted fixation curves (black) overlaid. The bottom two panels of Figure 3.5 (c and d) represent two typical normal hearing individual s raw cohort fixation curves (blue) and the corresponding fitted fixation curves (black). Figure 3.6, displays the 55 participant s individual fits along with the group fits. In particular, the left panel displays all 29 Cochlear Implant participant s individual fitted rhyme curves (blue), as well as the corresponding fitted group curve (black) overlaid. In the right panel we see the 26 Normal Hearing subject s individual fitted curves (blue) displayed as well as the fitted group curve (black). One would expect there to be a rise in fixation probabilities, until the participant views the rhyme word, and then a decrease in the fixation probabilities look away, which can be seen in both Figure 3.5 and Figure 3.6. Both groups fixation probabilities peak around 0.17 as seen in Figure 3.6. As was the case with the cohort word, the Cochlear Implant group (Figure 3.6) appears to have more spread, indicating larger sig1 and sig2 parameter estimates. The Normal Hearing group appears to have the majority of the curves peak around ms and rapidly 48

64 decrease to their baseline values. There appears to be more variability within the Cochlear Implant group than in the Normal Hearing Group. A formal comparison of these two groups will be shown in Chapter 5. All of the fitted fixation curves presented here were simultaneously fit under a Bayesian hierarchical model with a double Gaussian process model as presented in Section Figure 3.5 Rhyme Subject Specific Fitted Curves. Top panels: Cochlear Implant Fitted curves, Bottom panels: fitted Normal Hearing curves 49

65 Figure 3.6 Double Gaussian Function: Rhyme Fixation; Subject vs Individual Curves for all participants. Unrelated Fixations The final fixation type of interest is the unrelated word. The unrelated fixations have a similar shape to both the cohort and rhyme words, thus fixations of this type will be fit using the double Gaussian process model found in Section

66 We are modeling the average probability of a subject fixation on the cohort word at a particular time point. Table 3.8 is representative of four typical subject s parameters for a double Gaussian model of fixations to the unrelated word type. Table 3.8 A typical subject's parameter values from the DG model (Unrelated Fixation) Parameter CI Subject 13 CI Subject 26 NH Subject 18 NH Subject 21 m Height Sig Sig Base Base Once again, Table 3.8 was constructed in a similar fashion by looking at the same four individual s parameter fits. For the unrelated fixation, the m parameter estimates are at their lowest in comparison to their corresponding estimate for the cohort and rhyme fixations found in Table 3.6 and Table 3.7. The mean height parameter values also appear to be lower than the mean height parameter value for the cohort word found in Table 3.6 and slightly lower, with the exception of NH Subject 21, than the height parameter values found in Table 3.7. This could possibly indicate that participants were less likely to fixate on the unrelated word, than the cohort and rhyme words. The sig1 parameter estimates, are lower than their counterparts in Table 3.6 and Table 3.7 indicating briefly considered words of the unrelated fixation type. On the other hand, 51

67 the sig2 parameter estimates were lower for the Cochlear Implant subjects in relation to Table 3.7, but higher than the cohort fixation (Table 3.6). This implies that CI subjects looked away from the unrelated word at a slower rate in comparison to the cohort word but faster than the rhyme word. For NH Subject 21, the sig2 parameter estimate is higher than both the cohort and rhyme fixation types and NH Subject 18 sig 2 parameter estimate is between the cohort and rhyme fixation types. Finally, the base1 and base2 parameters correspond to the baseline fixation probabilities before a participant views the rhyme word and after they have viewed the rhyme word. Once again, these values tend to be around zero. Finally, the last two figures, Figure 3.7 and 3.8, represent the expectation of fixations to the unrelated word types. In Figure 3.7, we are looking at four individual fixation curves to the unrelated word type. Figures 3.7(a) and 3.7(b), show two typical cochlear implant subject s raw cohort fixation curves (blue) with the corresponding fitted fixation curves (black) overlaid. The bottom two panels of Figure 3.7 (c and d) represent two typical normal hearing individual s raw cohort fixation curves (blue) and the corresponding fitted fixation curves (black). Lastly, Figure 3.8 displays the 55 participant s individual fits along with the group fits. In particular, the left panel displays all 29 Cochlear Implant participant s individual fitted unrelated curves (blue), as well as the corresponding fitted group curve (black) overlaid. In the right panel we see the 26 Normal Hearing subject s individual fitted curves (blue) displayed as well as the fitted group curve (black). Similar to Figures 3.6 and 3.4, there appears to be more variability in the Cochlear Implant 52

68 group than the Normal Hearing group. A formal test will be performed in Chapter 5 to compare these groups. Figure 3.7 Unrelated Subject Specific Fitted Curves. Top panels: Cochlear Implant Fitted curves, Bottom panels: fitted Normal Hearing curves. 53

69 Figure 3.8 Double Gaussian Function: Unrelated Fixation; Group vs Individual Curves for all participants. In this chapter, we developed two different modeling frameworks (Four-parameter logistic and double Gaussian) which can simultaneously estimate individual and group parameters. In addition to simultaneously estimating subject and group parameters, these modeling frameworks have the capability of modeling non independent errors, as well as having a straightforward interpretation for each of the parameters. We will use the group estimates from our modeling framework in the development of our formal comparison procedure in Chapter 5. 54

70 Ideally, it would be much simpler to not have to specify a particular parametric process model depending on which category (target or competitor words) the fixations belong too. An approach to unify both shapes of the data, under Bayesian nonlinear hierarchical model, would be a nonparametric process model. The flexibility that a nonparametric process model will allow will be discussed in Chapter 4. 55

71 Chapter 4 NONLINEAR NONPARAMETRIC HIERARCHICAL MODELS In this chapter we will develop nonlinear nonparametric hierarchical models. We will use a similar hierarchical modeling framework (3.1) - (3.2) from Chapter 3 but instead of using a parametric process model, (3.3) or (3.16), a nonparametric process model will be used. As mentioned in Chapter 3, the previous parametric hierarchical modeling framework is sensitive to the chosen functional form which was predetermined prior to fitting the model. Bayesian nonlinear nonparametric hierarchical models will let the data determine the shape, allowing one universal model to fit different types of curves. In Section 4.1, we will examine a nonparametric process model followed by the parameter model in Section 4.2. Model implementation will be discussed in Section 4.3. Finally in Section 4.4 we will conduct a data analysis using the Visual World Paradigm data set described in Chapter 2. Nonlinear hierarchical functional mixed effects models have been developed to account for binomially distributed data (Kliethermes and Oleson, 2014). We take a similar approach to this modeling framework with the exception that we are assuming our data are normally distributed and we are assuming that the errors are not independent of each other, thus allowing us to use an autoregressive model of order 1. Unlike previous methods, we are now introducing another component which allows to estimate treatment group deviations along with subject specific deviations simultaneously, as well as allowing us to assume correlated errors. 56

72 Recall that a hierarchical model is comprised of three different components; a data model, a process model, and a parameter model. Both the parametric and nonparametric modeling frameworks that are presented in this thesis are using data arising from the same generating process, thus allowing us to use the same data model (3.1) as shown below: Y ij = f(δ) ij + τ ij τ ij = ρ τ i,j 1 + e ij e ij ~ N(0, σ 2 ) In the previous chapter we defined two parametric process models (with their corresponding parameter models) for f(δ) ij, and in this chapter we will define a nonparametric process model. This process model will be an adaptation of the Longitudinal Nonparametric ANOVA model (Crainiceanu, Ruppert et al. 2005). This model can be broken into three different components, which will be explained in the next section. 4.1 Process Model We are using a longitudinal nonparametric model which consists of three different levels: an overall curve, treatment group deviations, and subject specific deviations f(δ) ij = f(t ij ) + f g(i) (t ij ) + f i (t ij ) + ε ij. (4.1) 57

73 In model 4.1 above, f( ) represents the overall function of the curve, f g(i) ( ), represents the treatment group deviations from the overall curve with g(i) being a group index relative to the subject, and f i ( ) denotes the subject specific deviations from the group curve (Crainiceanu, Ruppert et al. 2005). Recall from Chapter 2, the p-spline reparameterization of the linear mixed model: Y = Xβ + Zb + ε where X is an n x 2 matrix of fixed effects with the i th row X i = (1, x i ), and Z = Z K Ω 1/2 K with penalty Ω K. In a similar fashion, each particular component will be modeled nonparametrically. For this modeling framework, linear splines with a fixed amount of knots are being used to fit each level of the model: K f(t ij ) = β 0 + β 1 t ij + 1 k=1 b k (t ij κ 1k ) (4.2) f g(i) (t ij ) = γ 0g(i) I (g(i)>1) + γ 1g(i) t ij I (g(i)>1) K + 2 k=1 c g(i)k (t ij κ 2k ) (4.3) K f i (t ij ) = δ 0i + δ 1i t ij + 3 k=1 d ik (t ij κ 3k ). (4.4) Here b k, c g(i)k, and d ik are vectors of coefficients of the truncated polynomials of the spline functions for their respective components. We are summing over the spline components with a fixed number of knots, K 1, K 2, K 3, which aren t necessarily equal across levels. The knots are selected at the sample quantiles of the time variable, corresponding to the probability k K+1 (Crainceanu, Ruppert 2004). By defining the knots in this manner, only the number of knots need to be specified. Note that knot selection 58

74 is arbitrary, and the results hold for any other choice of knots(crainiceanu, Ruppert et al. 2005). Now, analogous to reference cell coding, indicators have been included in the treatment group deviations, where the number of groups, g, is greater than 1. These spline components, (t ij κ pk ), p = 1, 2,3, are the components of the Z matrix from above. In summary, our process model is equation (4.1) which is the sum of the three components, (4.2) (4.4). Next we specify the parameter model for this longitudinal nonparametric ANOVA process model. 4.2 Parameter Model The parameter model is defined as follows: b k ~ N(μ, σ 2 b ) c gk ~ N(μ, σ 2 c ) d ik ~ N(μ, σ 2 d ) δ 0i ~ N(μ, σ 2 0 ) δ 1i ~ N(μ, σ 2 1 ) β 0, β 1, γ 0g, γ 1g ~ N(μ, σ 2 ) σ 2 b, σ 2 c, σ 2 d, σ 2 ε, σ 2 2 0, σ 1 ~ Inverse Gamma(shape, scale) As seen above, b k, c gk, d ik, δ 0i, δ 1i are each normally distributed with the same mean but different variances. Variance terms are given inverse gamma priors, where the shape and scale are chosen to be noninformative. The variance terms control the 59

75 shrinkage of the different pieces of the mean function. In the next section, we apply this nonparametric modeling framework to our visual world paradigm dataset. 4.3 Model Implementation As mentioned in Chapter 3, all modeling and analyses were performed using RStudio (Team 2015). Both the rjags and coda packages were used in these analyses (Plummer, Best et al. 2006, Plummer 2014). Each model consists of 4 parallel chains and 1000 iterations for adaptation. Then the models were updated 10,000 times and the trace monitors were set and monitored for an additional 1000 iterations. Trace plots and density plots from the coda package were used to assess convergence. The fits obtained from the models were then stored in an MCMC list and estimates from these fits summarize the estimated posterior distribution. The parameters monitored were those found in the process model (4.1). Specifically, the overall mean,μ ij, for subject i and time point j, the overall function of the curve f( ), treatment group deviations f g(i) ( ) from the overall curve and f i ( ) the subject specific deviations. Each parameter s estimate is found by calculating the mean of all the iteration values from each chain. The sum of the three components of (4.1), for each individual at each time point, are then used to obtain the fixation probabilities at each time point. Sample model code can be found in the Appendix. More information about this modeling framework can be found in Bayesian Analysis for Penalized Spline Regression using WINBUGS by Crainceanu and colleagues. 60

76 4.4 Data Analysis One of the perks of using this modeling framework is that it can used to model both cochlear implant and normal hearing groups simultaneously, via f g(i) (t). This framework could be also used to model multiple fixations, regardless of the shape of the data. In other words, the mean function does not change according to which fixation type (target or competitor) we are focused on. The model was fit with the same amount of knots, per piece of the mean function. We are using the same visual world paradigm dataset described in Chapter 2, which consists of 29 Cochlear Implant and 26 Normal Hearing subjects whose fixation probabilities were measured across four different fixations: Target, Cohort, Rhyme, and Unrelated. We used three different versions of the model per fixation type, one model with 3 knots per level, one with 5 knots per level, and one with 7 knots per level. In this case specifically, we are defining levels as the overall curve, treatment group deviations, and subject specific deviations. We will define the design matrices X and Z, as follows: 1 0 X x 2 = [ ] (4.5) (t 1,1 κ 11 ) (t 2j κ 22 ) (t ij κ pk ) Z x p = [ ]. (4.6) (t 27555,1 κ 1,k ) (t 27555,2 κ 2k ) (t 27555,j κ pk ) X is a typical design matrix in which the first column is composed of 1 s, and the second column consists of time points (0 to 2000 ms) repeated 55 times. Z corresponds to the 61

77 nonparametric portion of this model, p, represents the number of knots used in the model. These correspond to fitting 3 (5 or 7) knots for equations (4.2), (4.3), and (4.4). There are observations (501 time points by 55 subjects), per fixation type, in this data set. All of the fixation probabilities for the distinct word types (Target, Cohort, Rhyme and Unrelated) use the same longitudinal nonparametric ANOVA modeling framework presented above Target Fixation The first fixation type under this Bayesian longitudinal nonparametric ANOVA model (4.1) is the fixations to the target word type. We are modeling the average probability of a subject fixating on the target word at a given time point. In particular, we fit three different models for this fixation type: one with 3 knots, one with 5 knots, and one with 7 knots. 3 Knots The equations below represent the coefficients of the parameter fits of two of the three components of the mean function. The first part of equation (4.7), β and b, corresponds to the parameters of the design matrices X and Z, respectively, from the overall curve, f, equation (4.2) from the nonparametric longitudinal ANOVA model. The second part of equation (4.7), γ and c, correspond to the parameters of the design matrices X and Z, respectively, from the treatment group deviations, fg, from the nonparametric longitudinal ANOVA model (4.3). The third component s parameter estimates, 62

78 corresponding to the subject specific deviations (4.4), are not presented here, due to the magnitude of the matrices (55 x 2 and 55 x 3). β = [ ]; b = [ ] γ = [ ] c = [ ] (4.7) Figures 4.1(a)-(d) represent typical individual model fits for fixation probabilities to the target word type, under model (4.1) found in Section 4.2. In Figures 4.1(a) and (b), two example cochlear implant subject s raw target fixation curves are shown (blue) with the corresponding fitted fixation curves (black) overlaid. Figures 4.1 (c and d) represent two typical normal hearing individual s raw target fixation curves (blue) and the corresponding fitted fixation curves (black). For consistency, the same subject s fixation types were used as from Chapter 3. In comparison to Figure 3.1, we can see that even though the model does appear to capture each individual s trend, it does not appear to fit as well. Specifically, it appears to underestimate the fixation probabilities from 0ms to 250ms and from 1500ms to 2000ms. It appears to overestimate the fixation probabilities from 250ms to about 600ms and 1000ms to 1500ms. Figure 4.2, displays all 55 participant s individual fitted curves, where the Cochlear Implant group is in black and the Normal Hearing group is in blue. 63

79 Figure 4.1 Bayesian Longitudinal Nonparametric ANOVA for the Target fixation with 3 knots, plots from 4 typical subjects. 64

80 Cochlear Implant vs Normal Hearing Target Fixation (3 Knots) Fixation Probabilities Cochlear Implant Group Normal Hearing Group Time Figure 4.2 Cochlear Implant and Normal Hearing group subject specific curves for the target fixation with 3 knots. 5 Knots In a similar fashion to the longitudinal nonparametric ANOVA model with 3 knots, we will examine a model with 5 knots ( ). Denoted in equation (4.8) are the coefficients of the parameter fits of two of the three components of the mean function. 65

81 β = [ ]; b = [ ] γ = [ ] c = [ ] (4.8) Figures 4.3(a)- (d) showcase typical individual model fits for fixation probabilities to the target word type, under model (4.1) found in Section 4.2. In Figures 4.3(a) and 4.3(b), two typical cochlear implant subject s raw target fixation curves are shown (blue) with the corresponding fitted fixation curves (black) overlaid. Figure 4.3 (c and d) represent two typical normal hearing individual s raw target fixation curves (blue) and the corresponding fitted fixation curves (black). In comparison to Figure 4.1, we can see that the model with 5 knots (Figure 4.4) appears to fit better than the 3 knot model did. Specifically, from about 300ms to 2000ms the fit improved. The model still tends to underestimate the starting points of the functions, namely from 0ms to about 300ms. The addition of two more knots in each level of (4.1) allowed more flexibility in estimating the overall function. Finally, Figure 4.4, displays all 55 participant s individual fitted curves, where the Cochlear Implant group is in black and the Normal Hearing group is in blue. 66

82 Figure 4.3 Bayesian Longitudinal Nonparametric ANOVA for the Target fixation with 5 knots, plots from 4 typical subjects. 67

83 Cochlear Implant vs Normal Hearing Target Fixation (3 Knots) Fixation Probabilities Cochlear Implant Group Normal Hearing Group Time Figure 4.4 Cochlear Implant and Normal Hearing group subject specific curves for the target fixation with 5 knots. 7 Knots A longitudinal nonparametric ANOVA model with 7 knots (4.1) to the target fixation was fit. The equations in (4.9) represent the coefficients of the parameter fits of two of the three components of the mean function. 68

84 β = [ ]; b = [ ] c = γ = [ ] [ ] (4.9) Figures 4.5(a)-(d) represent typical individual model fits for fixation probabilities to the target word type, under model (4.1) found in Section 4.2. Figures 4.5(a) and 4.5(b), showcase two example cochlear implant subject s raw target fixation curves (blue) with the corresponding fitted fixation curves (black) overlaid. Figure 4.5 (c and d) represent two typical normal hearing individual s raw target fixation curves (blue) and the corresponding fitted fixation curves (black). From Figure 4.5, we can see that adding two extra knots does appear to remedy some of the misspecification in the tails seen in Figures 4.1 and 4.3. Figure 4.6, displays all 55 participant s individual fitted curves, where the Cochlear Implant group is in black and the Normal Hearing group is in blue. 69

85 Figure 4.5 Bayesian Longitudinal Nonparametric ANOVA for the Target fixation with 7 knots plots from 4 typical subjects. 70

86 Figure 4.6 Cochlear Implant and Normal Hearing group subject specific curves for the target fixation with 7 knots Cohort Fixation The second fixation type that we examine in this Bayesian longitudinal nonparametric ANOVA model (4.1) is the fixations to the cohort word type. We are modeling the 71

87 average probability of a subject fixating on the cohort word at a particular time point. As with the target fixation, we fit three different models for this fixation type: one with 3 knots, one with 5 knots, and one with 7 knots. We are using the exact same modeling framework (4.1) but with different data types. 3 Knots The equations below represent the coefficients of the parameter fits of two of the three components of the mean function. The first part of equation (4.10), β and b, corresponds to the parameters of the design matrices X and Z, respectively, from the overall curve, f, from the nonparametric longitudinal ANOVA model (4.2). The second part of equation (4.10), γ and c, corresponds to the parameters of the design matrices X and Z, respectively, from the treatment group deviations, fg, from the nonparametric longitudinal ANOVA model (4.3). The third component s parameter estimates, corresponding to the subject specific deviations (4.4), are not presented here, due to the magnitudes of the matrices (55 x 2 and 55 x 3). β = [ ]; b = [ ] γ = [ ] ; c = [ ] (4.10) Figures 4.7(a)-(d) represent typical individual model fits for fixation probabilities to the cohort word type, under model (4.1) found in Section 4.2. Figures 4.7(a) and 4.7(b), showcase two example cochlear implant subject s raw cohort fixation curves (blue) with the corresponding fitted fixation curves (black) overlaid. Figure 4.7 (c and d) represent 72

88 two typical normal hearing individual s raw cohort fixation curves (blue) and the corresponding fitted fixation curves (black). From Figure 4.7, we can see that the model does not appear to capture each individual s trend as well as its parametric counterpart in Figure 3.3. Clearly, 3 knots per level are too few to capture the cohort fixation s trend. The model with 3 knots underestimates the majority of the fixation probabilities. Figure 4.8, displays all 55 participant s individual fitted curves, where the Cochlear Implant group is in black and the Normal Hearing group is in blue. Figure 4.7 Bayesian Longitudinal Nonparametric ANOVA for the Cohort fixation with 3 knots, plots from 4 typical subjects. 73

89 Figure 4.8 Cochlear Implant and Normal Hearing group subject specific curves for the cohort fixation with 3 knots. 5 Knots Similar to the longitudinal nonparametric ANOVA model with 5 knots in section 4.4.1, we will use the same modeling framework for the cohort fixations. Denoted in equation (4.11) the coefficients of the parameter fits of two of the three components of the mean function. 74

90 β = [ ]; b = [ ] c = γ = [ ] [ e ] (4.11) Figures 4.9(a)-(d) represent typical individual model fits for fixation probabilities to the cohort word type, under model (4.1) found in Section 4.2. Figures 4.9(a) and 4.9(b), showcase two example cochlear implant subject s raw cohort fixation curves (blue) with the corresponding fitted fixation curves (black) overlaid. Figure 4.9 (c and d) represent two typical normal hearing individual s raw cohort fixation curves (blue) and the corresponding fitted fixation curves (black). From Figure 4.9, we can see that the model appears to fit much better. 5 knots per level appear to capture the most of the cohort fixation s trend. With the exception of NH Subject 18, where there is a steeper increase in fixations and a more gradual descent, the other three subject s trends seemed to be captured nicely. The addition of a couple of more knots per level may help capture the trends better. Figure 4.10, displays all 55 participant s individual fitted curves, where the Cochlear Implant group is in black and the Normal Hearing group is in blue. 75

91 Figure 4.9 Bayesian Longitudinal Nonparametric ANOVA for the Cohort fixation with 5 knots plots from 4 typical subjects. 76

92 Cochlear Implant vs Normal Hearing Cohort Fixation (5 Knots) Fixation Probabilities Cochlear Implant Group Normal Hearing Group Time Figure 4.10 Cochlear Implant and Normal Hearing group subject specific curves for the cohort fixation with 5 knots. 7 Knots Since the model with 5 knots did not fit all of the subject s fixations to the cohort word as well, longitudinal nonparametric ANOVA model with 7 knots (4.1) was fit. Denoted in 77

93 equation (4.12) the coefficients of the parameter fits of two of the three components of the mean function. β = [ ]; b = [ ] γ = [ ] c = [ ] (4.12) Figures 4.11(a)-(d) represent typical individual model fits for fixation probabilities to the cohort word type, under model (4.1) found in Section 4.2. Figures 4.11(a) and (b), showcase two example cochlear implant subject s raw cohort fixation curves (blue) with the corresponding fitted fixation curves (black) overlaid. Figure 4.11 (c and d) represent two typical normal hearing individual s raw cohort fixation curves (blue) and the corresponding fitted fixation curves (black). From Figure 4.11, we can see that adding two extra knots does not appear to add much in terms of model fit, with the exception of NH Subject 18 where the model with 7 knots appears to fit better than in Figure 4.4 and Figure 4.5. Figure 4.12, displays all 55 participant s individual fitted curves, where the Cochlear Implant group is in black and the Normal Hearing group is in blue. 78

94 Figure 4.11 Bayesian Longitudinal Nonparametric ANOVA for the Cohort fixation with 7 knots, plots from 4 typical subjects. 79

95 Figure 4.12 Cochlear Implant and Normal Hearing group subject specific curves for the cohort fixation with 7 knots Rhyme Fixation As was the case in the previous two subsections, and 4.4.2, the rhyme fixation type will be fit under the same modeling framework. We are modeling the average probability of a subject fixating on the rhyme word at a particular time point. 80

96 3 Knots Equation (4.13) represents the coefficients of the parameter fits of two of the three components of the mean function. Once again, the first part of equation (4.13), β and b, corresponds to the parameters of the design matrices X and Z, respectively, from the overall curve, f, equation (4.2) from the nonparametric longitudinal ANOVA model. The second part of equation (4.13), γ and c, corresponds to the parameters of the design matrices X and Z, respectively, from the treatment group deviations, fg, from the nonparametric longitudinal ANOVA model (4.3). The third component s parameter estimates, corresponding to the subject specific deviations (4.4), are not presented here, due to the magnitudes of the matrices (55 x 2 and 55 x 3). β = [ ]; b = [ ] γ = [ ] ; c = [ ] (4.13) Figures 4.13(a)-4.13(d) represent typical individual model fits for fixation probabilities to the rhyme word type, under model (4.1) found in Section 4.2. Figures 4.13(a) and 4.13(b), designate two typical cochlear implant subject s raw rhyme fixation curves (blue) with the corresponding fitted fixation curves (black) overlaid. The bottom two panels of Figure 4.13 (c and d) represent two typical normal hearing individual s raw rhyme fixation curves (blue) and the corresponding fitted fixation curves (black). As was the case in 4.4.2, the model does not appear to capture each individual s trend as well as Figure 3.7. Once again, it appears that 3 knots are too few. Figure 4.14, displays all 55 81

97 participant s individual fitted curves, where the Cochlear Implant group is in black and the Normal Hearing group is in blue. Figure 4.13 Bayesian Longitudinal Nonparametric ANOVA for the Rhyme fixation with 3 knots, plots from 4 typical subjects. 82

98 Figure 4.14 Cochlear Implant and Normal Hearing group subject specific curves for the rhyme fixation with 3 knots. 5 Knots We are using the same modeling framework from with 5 knots, for the rhyme fixations. Equation (4.14) represents the coefficients of the parameter fits of two of the three components of the mean function. 83

99 β = [ ]; b = [ ] γ = [ ] c = [ ] (4.14) Figures 4.15(a)-4.15(d) represent typical individual model fits for fixation probabilities to the target word type, under model (4.1) found in Section 4.2. In Figures 4.15(a) and 4.15(b), show two typical cochlear implant subject s raw rhyme fixation curves (blue) with the corresponding fitted fixation curves (black) overlaid. The bottom two panels of Figure 4.15 (c and d) represent two typical normal hearing individual s raw rhyme fixation curves (blue) and the corresponding fitted fixation curves (black). In comparison to Figure 4.11, we can see that the model with 5 knots (Figure 4.15) appears to fit much better. Finally, Figure 4.16, displays all 55 participant s individual fitted curves, where the Cochlear Implant group is in black and the Normal Hearing group is in blue. 84

100 Figure 4.15 Bayesian Longitudinal Nonparametric ANOVA for the Rhyme fixation with 5 knots, plots from 4 typical subjects. 85

101 Cochlear Implant vs Normal Hearing Rhyme Fixation (5 Knots) Fixation Probabilities Cochlear Implant Group Normal Hearing Group Time Figure 4.16 Cochlear Implant and Normal Hearing group subject specific curves for the rhyme fixation with 5 knots. 7 Knots Since the model with 5 knots did not fit all of the subject s fixations to the rhyme word as well, a longitudinal nonparametric ANOVA model with 7 knots (4.1) was fit. 86

102 Denoted in equation (4.15) the coefficients of the parameter fits of two of the three components of the mean function. β = [ ]; b = [ e e e 05 ] γ = [ ] c = [ ] (4.15) Figures 4.17(a)-(d) represent typical individual model fits for fixation probabilities to the rhyme word type, under model (4.1) found in Section 4.2. Figures 4.17(a) and 4.17(b), showcase two example cochlear implant subject s raw rhyme fixation curves (blue) with the corresponding fitted fixation curves (black) overlaid. The bottom two panels of Figure 4.17 (c and d) represent two typical normal hearing individual s raw rhyme fixation curves (blue) and the corresponding fitted fixation curves (black). From Figure 4.17, we can see that adding two extra knots does not appear to add as much to the model fit. Both the model with 5 knots and the model with 7 knots appear to capture the rhyme fixation s trend well. Figure 4.18, displays all 55 participant s individual fitted curves, where the Cochlear Implant group is in black and the Normal Hearing group is in blue. 87

103 Figure 4.17 Bayesian Longitudinal Nonparametric ANOVA for the Rhyme fixation with 7 knots, plots from 4 typical subjects. 88

104 Cochlear Implant vs Normal Hearing Rhyme Fixation (7 Knots) Fixation Probabilities Cochlear Implant Group Normal Hearing Group Time Figure 4.18 Cochlear Implant and Normal Hearing group subject specific curves for the rhyme fixation with 7 knots Unrelated Fixation Finally, we are modeling the average probability of a subject fixating on the unrelated word at a particular time point. 89

105 3 Knots Equation (4.16) below represents the coefficients of the parameter fits of two of the three components of the mean function. Once more, the first part of equation (4.16), β and b, correspond to the parameters of the design matrices X and Z, respectively, from the overall curve, f, equation (4.2) from the nonparametric longitudinal ANOVA model. The second part of equation (4.16), γ and c, corresponds to the parameters of the design matrices X and Z, respectively, from the treatment group deviations, fg, from the nonparametric longitudinal ANOVA model (4.3). The third component s parameter estimates, corresponding to the subject specific deviations (4.4), are not presented here, due to the magnitudes of the matrices (55 x 2 and 55 x 3). β = [ ]; b = [ 2.49e e e 06] γ = [ ] c = [ ] (4.16) Figures 4.19(a)-(d) represent typical individual model fits for fixation probabilities to the unrelated word type, under model (4.1) found in Section 4.2. Figures 4.19(a) and 4.19(b), designate two typical cochlear implant subject s raw rhyme fixation curves (blue) with the corresponding fitted fixation curves (black) overlaid. The bottom two panels of Figure 4.19 (c and d) represent two typical normal hearing individual s raw rhyme fixation curves (blue) and the corresponding fitted fixation curves (black). As was the case in and 4.4.3, the model does not appear to capture each individual s trend 90

106 as well as Figure 3.9. Once again, it appears that 3 knots are too few. Figure 4.20, displays all 55 participant s individual fitted curves, where the Cochlear Implant group is in black and the Normal Hearing group is in blue. Figure 4.19 Bayesian Longitudinal Nonparametric ANOVA for the Unrelated fixation with 3 knots, plots from 4 typical subjects. 91

107 Figure 4.20 Cochlear Implant and Normal Hearing group subject specific curves for the Unrelated fixation with 3 knots. 5 Knots We are modeling the unrelated fixations using the same modeling framework from with 5 knots. Equation (4.17) below represents the coefficients of the parameter fits of two of the three components of the mean function. 92

108 β = [ ]; b = [ ] c = γ = [ ] [ ] (4.17) Figures 4.21(a)-(d) represent typical individual model fits for fixation probabilities to the target word type, under model (4.1) found in section 4.2. In Figures 4.21(a) and 4.21(b), show two typical cochlear implant subject s raw unrelated fixation curves (blue) with the corresponding fitted fixation curves (black) overlaid. The bottom two panels of Figure 4.21 (c and d) represent two typical normal hearing individual s raw unrelated fixation curves (blue) and the corresponding fitted fixation curves (black). In comparison to Figure 4.17, we can see that the model with 5 knots (Figure 4.21) appears to fit much better, though it is apparent that we could do better. In particular, CI Subject 26 and NH Subject 18, the model tends to underestimate the fixation probabilities around 500ms. Thus an additional model with 7 knots was fit. Figure 4.22, displays all 55 participant s individual fitted curves, where the Cochlear Implant group is in black and the Normal Hearing group is in blue. 93

109 Figure 4.21 Bayesian Longitudinal Nonparametric ANOVA for the Unrelated fixation with 5 knots, plots from 4 typical subjects. 94

110 Figure 4.22 Cochlear Implant and Normal Hearing group subject specific curves for the Unrelated fixation with 5 knots. 7 Knots Since the model with 5 knots did not fit all of the subject s fixations to the rhyme word as well, a longitudinal nonparametric ANOVA model with 7 knots (4.1) was fit. Equation 95

111 (4.18) represents the coefficients of the parameter fits of two of the three components of the mean function. β = [ ]; b = [ ] γ = [ ] c = [ ] (4.18) Figures 4.23(a)-(d) represent typical individual model fits for fixation probabilities to the unrelated word type, under model (4.1) found in Section 4.2. Figures 4.23(a) and 4.23(b), showcase two example cochlear implant subject s raw unrelated fixation curves (blue) with the corresponding fitted fixation curves (black) overlaid. The bottom two panels of Figure 4.23 (c and d) represent two typical normal hearing individual s raw unrelated fixation curves (blue) and the corresponding fitted fixation curves (black). From Figure 4.23, we can see that adding two extra knots does not appear to add as much to the model fit. Both the model with 5 knots and the model with 7 knots appear to still underestimate the fixations to the unrelated words for CI subject 26 and NH subject 18 around 500ms. Figure 4.24, displays all 55 participant s individual fitted curves, where the Cochlear Implant group is in black and the Normal Hearing group is in 96

112 blue. Figure 4.23 Bayesian Longitudinal Nonparametric ANOVA for the Unrelated Fixation with 7 knots. Plots from 4 typical subjects. 97

113 Cochlear Implant vs Normal Hearing Unrelated Fixation (7 Knots) Fixation Probabilities Cochlear Implant Group Normal Hearing Group Time Figure 4.24 Cochlear Implant and Normal Hearing group subject specific curves for the Unrelated fixation with 7 knots. 4.5 Selecting the Number of Knots In Section 4.4 we fit multiple models within a fixation type, and judged the model fit by appearance. We use the root mean squared error (MSE) to quantify how much 98

114 information we gain with additional knots. Note that the more knots used in the model will result in a lower root MSE. Thus, the lowest root MSE does not necessarily imply that model is the best suited for this analysis. The additional computational time and complexity of the model are a couple of other factors to consider in the decision for the most appropriate model. For each fixation type, we fit three models: 3 knots, 5 knots, and 7 knots. The root MSE values are presented in Table 4.1. Table 4.1 Root MSE Values for the different models within fixation type. Number of Knots Target Cohort Rhyme Unrelated Within the Target fixation type, there is a notable difference between all three knot choices. Particularly the difference between 5 knots and 7 knots leads us to believe that there is enough of a difference that the model with 7 knots would be the most appropriate for this data set. Conversely, there was a considerable difference within the cohort fixation between the model with 3 knots and 5 knots; a negligible difference amongst the 5 knot and 7 knot models. In this case, the 5 knot model would be the best suited model for the cohort fixation. For the rhyme fixation, we see a similar trend, a sizeable difference between 3 knots and 5 knots; a minor difference between 5 knots and 7 knots. A model with 5 knots would be suitable for the rhyme fixation type. Likewise, there is a notable difference between the 3 knot model and the 5 knot model 99

115 for the unrelated fixation type. There is a slight difference in the root MSE values for the 5 knot model and 7 knot model. The addition of two more knots to the modeling framework added approximately 15 minutes of computational time, as well as 116 more parameters to estimate. All things considered, the model with 5 knots is the most appropriate for the unrelated fixation type. A common tool in Bayesian model comparison is the use of the Deviance Information Criterion (DIC). DIC was developed as the Bayesian analog to the AIC, where it is calculated by adding expected deviance to the number of effective parameters, pd (Spiegelhalter, Best et al. 2002). The DIC protects from overfitting by penalizing models with a larger number of effective parameters. A smaller DIC suggests a better predictive ability, where a difference of at least 2 is considered meaningful (Spiegelhalter, Best et al. 2002). The DIC values for the three models (3 knots, 5 knots, and 7 knots) per fixation type are presented in Table 4.2. Table 4.2 DIC values for the different models within fixation type Number of Knots Target Cohort Rhyme Unrelated There is a considerable difference amongst the DIC values within each of the fixation types (Target, Cohort, Rhyme, and Unrelated). Regardless of the fixation type, the model with 7 knots had the lowest DIC values. Thus based on DIC values alone, the model with 100

116 7 knots for the target, cohort, rhyme and unrelated fixations would be the most appropriate. In this chapter we presented an adaptation of Crainceanu and colleagues Bayesian Nonparametric ANOVA as a nonparametric alternative model to our visual world paradigm data. This modeling framework provided one framework for both categories of fixation types which allows for non-independent errors. Thus allowing us to have a universal modeling framework, with the inclusion of parameters for both individuals and groups, for the different fixation types. In comparison to the procedures presented in Chapter 3, the models presented in this chapter do not appear to fit as well and do not provide a straightforward interpretation of the parameters. In Chapter 5, we switch gears and examine a completely Bayesian approach to detecting differences between two groups. 101

117 Chapter 5 COMPARISON TECHNIQUES In this chapter, we present a Bayesian approach to detect time-specific differences between groups. First, we introduce a Bayesian alternative to the two sample t-test, aptly named Bayesian Estimation Supersedes the t Test (BEST) and explain how we incorporated the estimation procedure in our work (Kruschke 2013). Next, a decision rule is defined that involves determining whether or not two groups differ at a specific time point. Once we have defined our decision rule, we examine how we use this decision to detect time-specific group differences. After our comparison technique has been presented we demonstrate its utility with a brief simulation study. Finally we apply this comparison method to our data set from the Visual World Paradigm setting. The BEST procedure was developed as a Bayesian alternative to the two sample t-test. The BEST approach uses the t-distribution to describe the data from both groups which are characterized by both the mean and standard deviation, per group, as well as a shared parameter η (commonly referred to as degrees of freedom ). Through MCMC methods via JAGS, one can obtain an approximation of the joint posterior distribution of the means, (μ 1, μ 2 ), across groups. Within this process, the overall mean of the two groups is estimated as well as the between group variability. Using Kruschke s BEST procedures, we are able to have complete distributional information about the means and standard deviations from each group, as well as the parameter η. Output from the BEST procedure yields information from the posterior distribution of the data. This 102

118 information includes the mean, median, and mode of each of the five parameters that are being estimated: the means of the two groups of interest (μ 1, μ 2 ), the standard deviations of both groups (σ 1, σ 2 ), and the shared parameter, η. The difference of the means of the two groups, μ 1 μ 2, is also monitored. Due to the nature of Bayesian estimation, it is important to mention that using the BEST estimation procedure allows us to better understand the distribution of the parameters, by giving us a reasonable scope of the credible values of these parameters. It is also worth noting that the longer the chains, the narrower the scope of the credible values of these parameters. Within the frequentist paradigm, the t-test is based upon a decision of whether we reject or fail to reject the null hypothesis using a p-value as the decision rule. Kruschke argues that using Bayesian estimation techniques allows us to assess the credibility of the null value and does so by specifying a region of practical equivalence (ROPE). Once the researcher defines the ROPE around the difference of means, it is used to make a decision of whether or not to accept the null hypothesis. An advantage of using Kruschke s method is that we have the ability to accept, rather than fail to reject, the null hypothesis which is not possible within the frequentist paradigm. We are able to accept the null hypothesis by using both the user defined ROPE and the 95% Highest Posterior Density Interval (HDI). The HDI is used rather than the usual Bayesian credible interval since the HDI contains more information about the posterior distribution, when the posterior is not unimodal and symmetric. By definition, the HDI has the 103

119 characteristic that the density within the region is never lower than the outside of the region. Therefore, using the HDI guarantees that, regardless of the shape of the posterior distribution, the HDI contains 95% of the posterior probability. Thus, one can accept the null hypothesis if the 95% HDI of the difference of means is completely contained within the ROPE implying that 95% credible values fall between the region of practical equivalence. The ultimate goal of this method is to not only determine where two fixation curves differ, but when both curves start to differ. Similar to (Oleson, Cavanaugh et al. 2015), we are using the estimates from each time point to make the comparisons, the difference being that we are now using a Bayesian approach to fit the curves and to make the comparisons. Also Oleson et al fit individual separately rather than a single model. The modeling framework developed in this dissertation allows for a group parameter to be estimated as well the subject specific parameters. Therefore, we can obtain group specific curves, as well as the corresponding chains. The chains are an instrumental part of this comparison procedure because they represent many different combinations of credible parameters (equal to the number of iterations previously specified). In other words, we are studying the fixation probabilities for each time point, at each iteration, per group from the model that was already fit. 5.1 Sequential BEST Test The sequential BEST test comparison procedure will be using BEST to detect where significant differences occur. In other words, we are using the BEST estimation 104

120 procedure at each time point to compare 95% HDI of the difference of means of the fixation probabilities to the ROPE. For example, say a researcher deems a difference of 10 units to be clinically meaningful, this would translate to a ROPE of( 10, 10). Thus, if the 95% HDI interval intersects with our ROPE of ( 10, 10) then we say that the two groups are not significantly different. Otherwise, if the 95% HDI does not overlap with the ROPE of ( 10,10) then we say that there is a significant difference. The tests will be done sequentially, at each time point. If a significant difference is found, the location where the significant difference was found is recorded along with the corresponding 95% HDI interval of the difference of means and the ROPE. Hence, a significant difference between mean fixation probabilities of two groups implies that the ROPE and the 95% HDI do not overlap. In this context we do not need to worry about multiple comparisons. A typical reason voiced for performing multiple comparison adjustments, is that we may find a statistically significant effect, when in reality there is none (i.e., Type I error). Instead of concerning ourselves with the multiple testing problem, we aim to create a model, within the Bayesian paradigm, that can answer all of the relevant research questions (Gelman, Hill et al. 2012). The nonlinear hierarchical parametric models presented in Chapter 3 are formulated to do just that. The use of a hierarchical structure allows us to take advantage of borrowing strength from different group estimates, thus reducing the chances of obtaining unreasonable estimates. In classical procedures, typically point estimates are kept fixed, and we adjust for multiple comparisons by adjusting our p- values so that our tests are more conservative (Gelman, Hill et al. 2012). By using the 105

121 hierarchical modeling structure within the Bayesian paradigm, we attempt to answer all of the research questions of interest and avoid the need for a multiple comparisons adjustment. The second step to the sequential BEST test is to determine a confidence interval around time points where significant differences were found. This will use the vector of time points where significant differences were found in the first step of the sequential BEST test and will determine the credible interval around these time points. Ultimately, we want to obtain a credible interval of time points in which a significant difference of fixation probabilities between groups can occur. The first step will detect at what point in time two groups are different from each other. The second step will determine the confidence interval around that time point. We can think of this as two distinct intervals. The first interval represents the significant differences of the mean fixation probabilities between groups, an interval on the y-axis. The second step will find an interval of onset detection, an interval on the x-axis. In order to construct this interval of time points, we need to obtain a distribution of time points where significant differences can occur. We will focus on the onset, or the first time point of detectable differences from the intervals obtained from the first step, but we could do the same with the offset. Other methods for analyzing onset detection, such as jackknifing (Miller, Patterson et al. 1998, McMurray, Clayards et al. 2008, Toscano and McMurray 2012, Reinisch and Sjerps 2013, Toscano and McMurray 2015), could be considered; however they are not discussed here. For example, if this sequential comparison procedure produced this vector of time 106

122 points where a significant difference between groups was found, (5, 6, 7, 8, 9), the second step will focus on time point 5. We will use the ROPE and all of the iteration values from the group curves used in the BEST procedure to construct this distribution of significant time points. First, we will use the same MCMC iteration values of both groups from the first step of the sequential BEST test and calculate the differences between both groups. Next, we record the time points in which the two groups deviate outside of the ROPE. Using our ROPE from the example in the first step, ( 10,10), we will record the time points where the difference between both groups is greater than the absolute value of 10, 10. Then, we keep the first time point that the difference deviated from the ROPE at that particular iteration. We repeat this process for the remaining MCMC iterations. This will yield a distribution of time points. This joint distribution of significant time points,t, is as follows p(t μ 1, μ 2 ) = f(t μ 1, μ 2, θ) p(θ t)dθ θ In this formulation, f(t μ 1, μ 2, θ) represents the joint posterior distribution of the onset detection of time points where the two groups differ conditional on μ 1, μ 2 from the BEST test procedure. Specifically, θ represents the parameters that characterize the function of the word type of interest. For example, for fixations to the target word we use the group parameters from the four parameter logistic model, thus θ = (max, min, slope, IP). p(θ t) corresponds to the probability distribution θ given time. 107

123 The size of this distribution is determined by the number of chains and iterations from the fitted data. In other words, we have a vector of time points that will be the length of the MCMC chain and the elements will be where the difference first deviated from the ROPE. From this distribution of time points, a credible interval will be constructed. Particularly, the 2.5 percentile to the 97.5 percentile of the time points, or the x-axis, will be used. The remainder of Chapter 5 will use the sequential BEST test to detect time-specific differences between groups. 5.2 Sequential BEST Test Implementation In order to conduct the sequential BEST test, we will need the group parameter fits of both groups, along with their corresponding MCMC chain values. All of the analyses were done in RStudio (Team 2015). The first component of the procedure used BEST estimation procedures (Kruschke 2015). These procedures also involve the use of the rjags and coda packages (Plummer, Best et al. 2006, Plummer 2014). The model is then updated 1000 times and then the parameters of interest were monitored for an additional 1000 iterations. The intervals package was also used to compare the HDI intervals (Bourgon 2014). As with the other procedures in this dissertation, a very large amount of computing time is necessary to conduct this procedure. In order to compare two groups with fitted values from 4 chains, 1000 iterations per chain, and 100 time points it will take approximately 3 hours and 10 minutes on the Linux network of machines. Specifically, we used the Intel x86, a 64 bit system under the Ubuntu operating system. For 501 time 108

124 points (as in our data set), to run this procedure will take approximately 15 and a half hours on the same machine. Note that these time frames do not include the time needed to fit these models to their respective data sets Simulated Data In Chapter 3, Section 3.4, we presented some simulated results for different situations for fixations to the target word via the four parameter logistic process model. Specifically, we randomly generated 50 datasets consisting of 10 subjects per group with fixation probabilities generated for 100 time points. Due to the amount of computing time necessary needed to both fit the dataset and apply the comparison procedure, only 25 replications were studied. The sequential BEST test was applied to all of the situations presented in Section 3.4 of Chapter 3. Table 5.1 Sequential BEST test (part 1) simulation results (n = 25 data sets) for 2 groups with 10 subjects with 100 time points each. A ROPE of was used to compare both groups. First Significant Difference Last Significant Difference Mean Median SD Mean Median SD Varying Max Varying Slope Varying IP A difference of 1.5%, which translates to a ROPE of ( 0.015, 0.015), was deemed a significant difference. The sequential BEST test was conducted on two identical groups, and no difference was found, at any of the time points; thus, resulting in a Type I error 109

125 rate of 0. Table 5.1 displays the first and last time point where a significant difference was found from the first portion of the sequential BEST test. For the varying maximum values between the two groups, on average the first significant difference is found at the 17 th time point (16.68), with a standard deviation of For the varying maximum scenario, all of the intervals of significant differences extended to the 100 th time point. This wasn t the case for varying the slope parameter scenario. On average, the first significant difference was found at the 30 th (29.70) time point, with a standard deviation of The last significant difference for the varying slopes scenario, on average, was found at the 57 th time point, with a standard deviation of For the varying inflection point (IP) scenario, the first significant difference, on average, was found at the 22 nd (21.68) time point with a standard deviation of The last significant difference, on average, was found at the 59 th time point, with a standard deviation of Table 5.2 Confidence intervals for the first significant time points found in Table % Quantile for First Sig. Difference 97.5% Quantile for First Sig. Difference Mean Median SD Mean Median SD Varying Max Varying Slope Varying IP Table 5.2 displays the 2.5% and 97.5% quantile values for the first significant difference found in Table 5.1. Specifically, on average, the 95% confidence intervals for the varying max scenario typically begin around the 1 st time point and end at the 20 th time point. For the varying slopes scenario, a typical 95% confidence interval (on average) starts at the 1 st time point and ends at the 40 th time point. Finally, a typical 95% confidence 110

126 interval for the varying inflection point scenario corresponds to a start point at the 1 st time point and ends at the 38 th time point. The type II error rate was calculate at each time point for the three different scenarios: varying max, varying slope, and varying IP. For the varying max scenario, the 2 nd to 11 th time points the Type II error rate was 92%, and 100% for the 1 st time point (no difference was found in any of the 25 data sets). Next, the type II error rate was 88% for the 12 th and 13 th time points, followed by 72% on the 14 th time point. Subsequently, the type II error rate was 60% for the 15 th time point, 56% for the 16 th time point, 36% for the 17 th time point, and 20% for the 18 th time point. For the 19 th through 33 rd time point there was a substantial difference in the type II error rate from the previous time points, with a rate of 8% and 4% for the 34 th and 35 th time points. Finally, the type II error rate was 0 for the 36 th to 100 th time point, for the varying max scenario. All in all, in the varying max scenario, after the first time point, the type II error rate decreases though out the time course. Within the varying slopes scenario, the type II error rates start at 96% for the 2 nd to 19 th time points, and 100% for the 1 st time point. The error rates drop slightly to 92% for the 20 th time point, 88% for the 21 st time point, and 72% for the 22 nd time point. There is a greater drop in type II error rates with 60% for the 23 rd time point, 56% for the 24 th to 29 th time points, 48% for the 30 th time point, and 40% for the 31 st time point. The lowest type II error rates were 36% for the 32 nd and 33 rd time point, as well as 24% for the 34 th to 37 th time points. After the 37 th time points the type II error rates gradually increase, 111

127 starting with rates of 28% (31 st time point), 32% (39 th and 46 th time point),36% (40 th, 42 nd to 45 th, 47 th to 52 nd time points), and 40% (40 th and 53 rd time point). For the 54 th through the 65 th time points the type II error rates range from 44% to 68%. The remainder of the time course the type II error rates range from 72% to 96%. Finally, for the varying IP scenario, the type II error rates start at 92% for the 2 nd to the 14 th time points. For the 15 th to 21 st time point they decrease from 88% to 72%. After the 22 nd (Type II error rate of 60%), the type II error rates drop from 44% to 20% at the 26 th time point. There is a substantial drop from the 27 th and 28 th time point (Type II error rate of 12%), to the 29 th to 35 th time points (Type II error rate of 8%). The lowest Type II error rate of 4% is obtained at the 36 th through 50 th time points. After the 51 st time point the type II error rates start to increase from 8% to 16% at the 53 rd time point. The type II error rates range from 32% at the 54 th time point to 44% at the 57 th time point, and 52% to 68% at the 64 th time point. Finally the type II error rates range from 80% (65 th to 69 th time point) to 88% (71 st to 72 nd time point). Finally, the type II error rates for the varying IP time points vary from 92% at the 73 rd time point to 96% at the end of the time course (100 th time point). From this simulation study, we ascertain that varying the maximum values has the largest span of significant time points. Varying the slope parameter and varying the inflection point scenarios yielded a shorter span of significant time points. This implies that the effect of varying these parameters can be seen in the middle of our time course, and it does not seem to impact the beginning and end of our time course. 112

128 5.2.2 Data Analysis The sequential BEST test procedure was applied to our motivating example using the eye-tracking data set from the Visual World Paradigm data set. Recall from Chapter 2, this data set consists of 29 Cochlear Implant and 26 Normal Hearing participant s eye tracking data for four fixation word types. We will be using the fitted values from the modeling frameworks from the nonlinear parametric hierarchical models presented in Chapter 3. For each of the word types (Target, Cohort, Rhyme and Unrelated), we used the MCMC iteration values obtained for the parameters used to construct the group curves. Specifically, at each time point we are looking at 4000 fixation probabilities of these parameter values, per group. These 4000 fixation probabilities correspond to the 4000 (4 chains with 1000 iterations each) MCMC iteration values from the models fit in Chapter 3. For each fixation type we examined the population curves for each group, where the BEST test procedure deems a significant difference, and finally a confidence interval of time points where this significant difference can possibly occur. There is not much research in the psycholinguistics field, where a significant absolute difference has been established. For the purposes of this analysis, we are setting our ROPE to be ( 0.015, 0.015) or an effect size of 1.5% for fixation probabilities between groups at each time point. 113

129 Target Fixation For the target word fixation type, we used the model fits obtained from Chapter 3, Section The sequential BEST test procedure found significant differences from 512ms to 2000ms. Thus we can conclude that we found a difference of at least 1.5% from 512ms 2000ms between the Cochlear Implant group and Normal Hearing group target fixation probabilities. The second part of the sequential BEST test yields a 95% credible interval of 0ms to 632ms. Therefore, we can conclude that significant differences between the Cochlear Implant and Normal Hearing groups can begin from 0ms to 632ms. Figures 5.1 and 5.2 show the population curves and the mean population curves associated with the model fits to the target fixation used in this procedure. Figure 5.3 is a histogram of the first time point where the difference of iteration values is greater than 1.5%. 114

130 Figure 5.1 Target Fixation population curves via the Four Parameter Logistic. Vertical Line at first significant difference: 512 ms. 115

131 Figure 5.2 Mean Population Group Curves for Target fixation via Four Parameter Logistic. Significant differences shaded from 512 ms to the end of the time course. 116

132 Figure 5.3 Histogram of the first time points where the difference is greater than 1.5% for the Target group. The large number of zeroes within the second portion of the sequential BEST test may indicate that the ROPE selected was too narrow. For example, if a wider ROPE of 3% ( 0.03, 0.03) were chosen, the resulting 95% credible interval would be (383.2 ms, 720 ms), as demonstrated in the corresponding histogram, Figure 5.4. Note that with the reduction of zeroes, the credible interval has shifted; hence, resulting in a larger endpoint (from 632 ms to 720 ms). 117

133 Figure 5.4 Histogram of the first time points where the difference is greater than 3% for the Target group. Cohort Fixation For the cohort word fixation type, we used the model fits obtained from Chapter 3, section 3.5.2, Cohort Fixation. The sequential BEST test procedure found significant differences from 508ms to 568ms and from 712ms to 2000ms. Thus we can conclude that we found a difference of at least 1.5% between the cohort fixation probabilities of the Cochlear Implant group and the Normal Hearing group from 508ms to 568ms and from 712ms to 2000ms. The second part of the sequential BEST test yields a 95% credible interval of 344ms to 760ms. Therefore, we can conclude that significant 118

134 differences between the Cochlear Implant and Normal Hearing group s cohort fixations can begin from 344ms to 760ms. Figures 5.4 and 5.5 show the population curves and the mean population curves associated with the model fits to the cohort fixation used in this procedure. Figure 5.6 is a histogram of the first time point where the difference of iteration values is greater than 1.5%. The distribution of these time points appear to be bimodal, which could correspond with the two distinct intervals found in the first stage of the sequential BEST test. Figure 5.5 Cohort Population Curves via Double Gaussian process model, Vertical lines at 508ms, 568 ms and 712 ms indicate time points where significant differences are found. 119

135 Figure 5.6 Cohort Mean Population Curves via Double Gaussian model, Vertical Line at 508 ms, 568ms and 712 ms, indicate time points where significant differences are found. 120

136 Figure 5.7 Histogram of the first time points where the difference is greater than 1.5% for the Cohort group. Rhyme Fixation For the rhyme word fixation type, we used the model fits obtained from Chapter 3, section 3.5.2, Rhyme Fixation. The sequential BEST test procedure found significant differences from 620ms to 2000 ms. Thus we can conclude that we found a difference of at least 1.5% between the rhyme fixation probabilities of the Cochlear Implant group and the Normal Hearing group from 620ms to 2000ms. The second part of the sequential BEST test yields a 95% credible interval of 320ms to 732ms. Therefore, we conclude that significant differences between the Cochlear Implant and Normal Hearing group s rhyme fixations can begin from 320ms to 732ms. Figures 5.7 and 5.8 show the population curves and the mean population curves associated with the model fits to the 121

137 rhyme fixation used in this procedure. Figure 5.9 is a histogram of the first time point where the difference of iteration values is greater than 1.5%. Figure 5.8 Rhyme fixation population curves via Double Gaussian model, vertical line at 620 ms indicating first significant difference.. 122

138 Figure 5.9 Rhyme Mean Population Curves via Double Gaussian model, Vertical Line at 620 ms indicates the first significant difference. 123

139 Figure 5.10 Histogram of the first time points where the difference is greater than 1.5% for the Rhyme group. Unrelated Fixation For the unrelated word fixation type, we used the model fits obtained from Chapter 3, section 3.5.2, Unrelated Fixation. The sequential BEST test procedure found significant differences from 584 ms to 2000 ms. Thus we can conclude that we found a difference of at least 1.5% between the unrelated fixation probabilities of the Cochlear Implant group and the Normal Hearing group from 584ms to 2000ms. The second part of the sequential BEST test yields a 95% credible interval of 340ms to 744ms. Therefore, we conclude that significant differences between the Cochlear Implant and Normal Hearing group s unrelated fixations can begin from 340ms to 744ms. Figures 5.10 and 5.11 show the population curves and the mean population curves associated with the model fits to 124

140 the unrelated fixation used in this procedure. Figure 5.12 is a histogram of the first time point where the difference of iteration values is greater than 1.5%. Figure 5.11 Unrelated Fixation Population Curves via Double Gaussian function, Vertical Line at first significant difference: 584ms. 125

141 Figure 5.12 Unrelated Mean Population Curves via Double Gaussian, Vertical line at first significant difference: 584ms. 126

142 Figure 5.13 Histogram of the first time points where the difference is greater than 1.5% for the Unrelated group. In summary, we used the sequential BEST test, to detect time-specific differences, as well as a credible interval of time points where significant differences can start to occur. An essential piece to this procedure are the model fits and iteration values of the group parameters from the nonlinear hierarchical models that were fit in Chapter 3. Once these fits were obtained, the BEST procedure was applied sequentially to the fixation probabilities at each time point. Paired Tests The previous data analysis focused on comparisons between the CI and NH groups within each fixation type. Another comparison of interest would be comparisons between different competitor fixation types. Thus a paired version of the sequential 127

143 BEST test will be used for these within subject comparisons. The sequential paired BEST test will use the MCMC iteration values from the group parameter fits obtained from Section Both the Cochlear Implant and Normal Hearing group fits per fixation type were used for this paired version of the sequential BEST test. This will translate to 8000 iteration values per fixation type that will be used to conduct the paired sequential BEST test. First, we compare the cohort and rhyme fixation types. Figure 5.13 displays the population curves for both fixation types. The cohort fixation is in black, meanwhile the rhyme fixation is in blue. The sequential BEST test procedure found significant differences from 480ms to 880ms. Thus we can conclude that we found a difference of at least 1.5% between the fixation probabilities of the cohort fixation and the rhyme fixation from 480ms to 880ms. The second part of the sequential BEST test yields a 95% credible interval of 328ms to 612ms. Therefore, we conclude that significant differences between the Cohort and Rhyme fixations can begin from 328ms to 612ms. Figure 5.14 displays the histogram Figure 5.12 is a histogram of the first time point where the difference of iteration values is greater than 1.5%. 128

144 Figure 5.14 Population Curves for the Cohort and Rhyme fixation types, where the vertical lines indicate the interval of significant differences from 480ms to 880ms. 129

Modeling Criminal Careers as Departures From a Unimodal Population Age-Crime Curve: The Case of Marijuana Use

Modeling Criminal Careers as Departures From a Unimodal Population Curve: The Case of Marijuana Use Donatello Telesca, Elena A. Erosheva, Derek A. Kreader, & Ross Matsueda April 15, 2014 extends Telesca