Toward a Method of Selecting Among Computational Models of Cognition

Size: px
Start display at page:

Download "Toward a Method of Selecting Among Computational Models of Cognition"

Transcription

1 Psychological Review Copyright 2002 by the America Psychological Associatio, Ic. 2002, Vol. 109, No. 3, X/02/$5.00 DOI: // X Toward a Method of Selectig Amog Computatioal Models of Cogitio Mark A. Pitt, I Jae Myug, ad Shaobo Zhag The Ohio State Uiversity The questio of how oe should decide amog competig explaatios of data is at the heart of the scietific eterprise. Computatioal models of cogitio are icreasigly beig advaced as explaatios of behavior. The success of this lie of iquiry depeds o the developmet of robust methods to guide the evaluatio ad selectio of these models. This article itroduces a method of selectig amog mathematical models of cogitio kow as miimum descriptio legth, which provides a ituitive ad theoretically well-grouded uderstadig of why oe model should be chose. A cetral but elusive cocept i model selectio, complexity, ca also be derived with the method. The adequacy of the method is demostrated i 3 areas of cogitive modelig: psychophysics, iformatio itegratio, ad categorizatio. How should oe choose amog competig theoretical explaatios of data? This questio is at the heart of the scietific eterprise, regardless of whether verbal models are beig tested i a experimetal settig or computatioal models are beig evaluated i simulatios. A umber of criteria have bee proposed to assist i this edeavor, summarized icely by Jacobs ad Graiger (1994). They iclude (a) plausibility (are the assumptios of the model biologically ad psychologically plausible?); (b) explaatory adequacy (is the theoretical explaatio reasoable ad cosistet with what is kow?); (c) iterpretability (do the model ad its parts e.g., parameters make sese? are they uderstadable?); (d) descriptive adequacy (does the model provide a good descriptio of the observed data?); (e) geeralizability (does the model predict well the characteristics of data that will be observed i the future?); ad (f) complexity (does the model capture the pheomeo i the least complex i.e., simplest possible maer?). The relative importace of these criteria may vary with the types of models beig compared. For example, verbal models are likely to be scrutiized o the first three criteria just as much as the last Mark A. Pitt, I Jae Myug, ad Shaobo Zhag, Departmet of Psychology, The Ohio State Uiversity. Portios of this work were preseted at the 40th aual meetig of the Psychoomic Society, Los Ageles, Califoria, November 18 22, 1999, ad at the 31st ad 32d aual meetigs of the Society for Mathematical Psychology, Nashville, Teessee (August 6 9, 1998) ad Sata Cruz, Califoria (July 29 August 1, 1999), respectively. We thak D. Bamber, R. Golde, ad Adrew Haso for their valuable commets ad attetio to detail i readig earlier versios of this article. Mark A. Pitt ad I Jae Myug cotributed equally to the article, so order of authorship should be viewed as arbitrary. The sectio Three Applicatio Examples is based o Shaobo Zhag s doctoral dissertatio submitted to the Departmet of Psychology at The Ohio State Uiversity. I Jae Myug ad Mark A. Pitt were supported by Natioal Istitute of Metal Health Grat MH Correspodece cocerig this article should be addressed to Mark A. Pitt or I Jae Myug, Departmet of Psychology, The Ohio State Uiversity, 1885 Neil Aveue Mall, Columbus, Ohio pitt.2@ osu.edu or myug.1@osu.edu three to thoroughly evaluate the soudess of the models ad their assumptios. Computatioal models, o the other had, may have already satisfied the first three criteria to a certai level of acceptability earlier i their evolutio, leavig the last three criteria to be the primary oes o which they are evaluated. This emphasis o the latter three ca be see i the developmet of quatitative methods desiged to compare models o these criteria. These methods are the topic of this article. I the last two decades, iterest i mathematical models of cogitio ad other psychological processes has icreased tremedously. We view this as a positive sig for the disciplie, for it suggests that this method of iquiry holds cosiderable promise. Amog other thigs, a mathematical istatiatio of a theory provides a test bed i which researchers ca examie the detailed iteractios of a model s parts with a level of precisio that is ot possible with verbal models. Furthermore, through systematic evaluatio of its behavior, a accurate assessmet of a model s viability ca be obtaied. The goal of modelig is to ifer the structural ad fuctioal properties of a cogitive process from behavioral data that were thought to have bee geerated by that process. At its most basic level, the, a mathematical model is a set of assumptios about the structure ad fuctioig of the process. The adequacy of a model is first assessed by measurig its ability to reproduce huma data. If it does so reasoably well, the the ext step is to compare its performace with competig models. It is imperative that the model selectio method that is used to select amog competig models accurately measures how well each model approximates the metal process. Above all else, the method must be valid. Otherwise, the purpose of modelig is udermied. Oe rus the risk of choosig a model that i actuality is a poor approximatio of the uderlyig process of iterest, leadig researchers astray. The potetial severity of this problem should make it clear that soud methodology is ot oly itegral to but also ecessary for theoretical advacemet. I short, model selectio methods must be as sophisticated ad robust as the models themselves. I this article, we itroduce a ew quatitative method of model selectio. It is theoretically well grouded ad provides a clear 472

2 MODEL SELECTION AND COMPLEXITY 473 uderstadig of why oe model should be chose over aother. The purpose of the article is to provide a good coceptual uderstadig of the problem of model selectio ad the solutio beig advocated. Cosequetly, oly the most importat (ad ew) techical advaces are discussed. A more thorough treatmet of the mathematics ca be foud i other sources (Myug, Balasubramaia, & Pitt, 2000; Myug, Kim, & Pitt, 2000; Myug & Pitt, 1997, 1998). After itroducig the problem of model selectio ad idetifyig model complexity as a key property of a model that must be cosidered by ay selectio method, we itroduce a ituitive statistical tool that assists i uderstadig ad measurig complexity. Next, we develop a quatitative measure of complexity withi the mathematics of differetial geometry ad show how it is icorporated ito a powerful model selectio method kow as miimum descriptio legth (MDL). Fially, applicatio examples of MDL ad the complexity measure are provided by comparig models i three areas of cogitive psychology: psychophysics, iformatio itegratio, ad categorizatio. Geeralizability Istead of Goodess of Fit Model selectio i psychology has largely bee limited to a sigle criterio to measure the accuracy with which a set of models describes a metal process: goodess of fit (GOF). The model that fits a particular set of observed data the best (i.e., accouts for the most variace) is cosidered superior because it is presumed to approximate most closely the metal process that geerated the data. Typical measures of GOF iclude the root mea squared error (RMSE), which is the square root of the sum of squared deviatios betwee observed ad predicted data divided by the umber of data poits fitted, ad the maximum likelihood, which is the probability of obtaiig the observed data maximized with respect to the model s parameter values. GOF as a selectio criterio is attractive because it appears to measure exactly what oe wats to kow: How well does the model mimic huma behavior? I additio, the GOF measure is easy to calculate. GOF is a ecessary ad importat compoet of model selectio: Data are the oly lik to the uderlyig cogitive process, so a model s ability to describe the output from this process must be cosidered i model selectio. However, model selectio based solely o GOF ca lead to erroeous results ad the choice of a iferior model. Just because a model fits data well does ot ecessarily imply that the regularity oe seeks to capture i the data is well approximated by the model (Roberts & Pashler, 2000). Properties of the model itself ca eable it to provide a good fit to the data for reasos that have othig to do with the model s approximatio to the cogitive process (Myug, 2000). Two of these properties are the umber of parameters i the model ad its fuctioal form (i.e., the way i which the model s parameters ad data are combied i the model equatio). Together they cotribute to a model s complexity, which refers to the flexibility iheret i a model that eables it to fit diverse patters of data. 1 The followig simulatio example demostrates the idepedet cotributio of these two properties to GOF. Three models were compared o their ability to fit data. Model M 1 (defied i Table 1) geerated the data to fit, ad is therefore cosidered the true model. Model M 2 differed from M 1 oly i havig oe additioal parameter, two istead of oe; ote that their Table 1 Goodess of Fit ad Geeralizability Measures of Three Models Differig i Complexity Model M 1 (true model) M 2 M 3 Goodess of fit 2.68 (0%) 2.49 (31%) 2.41 (69%) Geeralizability 2.99 (52%) 3.08 (28%) 3.14 (20%) Note. Each cell cotais the average root mea squared error of the fit of each model to the data ad the percetage of samples (out of 1,000) i which that particular model fitted the data best (i paretheses). The three models were as follows: M 1 : y l(x a) error; M 2 : y b*l(x a) error; ad M 3 : y bx a error. The error was ormally distributed, M 0, SD 3. Samples were geerated from model M 1 usig a 1 o the same 6 poits for x, which raged from 1 to 6 i icremets of 1. fuctioal forms are the same. Model M 3 had the same umber of parameters as M 2, but a differet fuctioal form (a is a expoet of x rather tha a additive compoet). Parameters were chose for each of the three models to give the best fit to 1,000 radomly geerated samples of data from the model M 1. Each model s mea fit to the samples is show i the first row of Table 1 alog with the percetage of time that particular model provided a better fit tha its two competitors. As ca bee see, M 2 ad M 3, with oe more parameter tha M 1, always provided a better fit to the data tha M 1. Because the data were geerated by M 1,M 2 ad M 3 must have overfitted the data beyod what is ecessary to capture the uderlyig regularity. Otherwise, oe would have expected M 1 to fit its ow data best at least some of the time. After all, M 1 geerated the data! The improved fit of M 2 ad M 3 occurred because the extra parameter, b, i these two models eabled them to absorb radom error (i.e., osystematic variatio) i the data. Absorptio of these radom fluctuatios is the oly meas by which M 2 ad M 3 could have fitted the data better tha the true model, M 1. Note also that M 3 provided a better fit tha M 2. This improvemet i fit must be due to fuctioal form rather tha the umber of parameters, because these two models differ oly i how the data (x) ad the two parameters (a ad b) are combied i the model equatio. This example demostrates clearly that GOF aloe is iadequate as a model selectio criterio. Because a model s complexity is ot evaluated by the method, the model capable of absorbig the most variatio i the data, regardless of its source, will be chose. Frequetly this will be the most complex model. The simulatio also highlights the poit that model selectio is particularly difficult i psychology, ad i other social scieces, precisely because radom error is preset i the data. Although this oise ca be miimized (i the experimetal desig), it caot be elimiated, so i ay give data set, variatio due to the cogitive process ad variatio due to radom error are etagled, posig a sigificat obstacle to idetifyig the best model. To get aroud this problem, model selectio must be based, istead, o a differet criterio that of geeralizability. The goal 1 Cuttig, Bruo, Brady, ad Moore (1992) used the term scope, which is similar to our defiitio of complexity. They proposed to measure the scope by assessig a model s ability to accout for all possible data fuctios, where those fuctios are geerated by a reasoably large sample of radom data sets (p. 364).

3 474 PITT, MYUNG, AND ZHANG of geeralizability is to predict the statistics of ew, as yet usee, samples geerated by the metal process beig studied. The ratioale uderlyig the criterio is that the model should be chose that fits all samples best, ot the model that provides the best fit to oe particular sample. Oly whe this coditio is met ca oe be sure a model is accurately capturig the uderlyig process, ot also the idiosycracies (i.e., radom error) of a particular data sample. More formally, geeralizability ca be defied i terms of a discrepacy fuctio that measures the expected error i predictig future data give the model of iterest (Lihart & Zucchii, 1986; also see their work for a discussio of the theoretical uderpiigs of geeralizability). The results of a secod simulatio illustrate the superiority of geeralizability as a model selectio criterio. After each of the data samples was fitted i the first simulatio, the parameters of the three models were fixed, ad geeralizability was assessed by fittig the models to aother 1,000 samples of data geerated from M 1. The average fits are show i the secod row of Table 1. As ca be see, poor geeralizability is the cost of overfittig a specific sample of data. Not oly are average fits ow worse for M 2 ad M 3 tha for M 1, but these two models provided the best fit to the secod sample much less ofte tha M 1. Geeralizability should be preferred over GOF because it does a better job of capturig the geeral tred i the data ad igorig radom variatio. This differece betwee these two selectio criteria is show i Figure 1. Dots i the pael represet observed data poits. Lies are the fuctios geerated by two models varyig i complexity. The simpler model (thick lie) captures the geeral tred i the data. If ew data poits () are added to the sample, fit will remai similar. The more complex model (thi lie) ot oly captures the geeral tred i the data, but also captures may of the idiosycracies of each observatio i the data set, which will cause fit to drop whe additioal observatios are added to the sample. Geeralizability would favor the simple model, which fits with our ituitios. GOF, o the other had, would favor the complex model. Figure 1. Illustratio of the trade-off betwee goodess of fit ad geeralizability. A observed data set (dots) was fitted to a simple model (thick lie) ad a complex model (thi lie). New observatios are show by the plus symbol. The goal of model selectio, the, should be to maximize geeralizability. This turs out to be quite difficult i practice, because the relatioship betwee complexity ad geeralizability is ot as straightforward as that betwee complexity ad GOF. These differeces are illustrated i Figure 2. Model complexity is represeted alog the horizotal axis ad ay fit idex o the vertical axis, where larger values idicate a better fit (e.g., percet variace accouted for), with the two fuctios represetig the two selectio criteria. As was demostrated i the first simulatio, as complexity icreases, so does GOF. Geeralizability will also icrease positively with complexity, but oly up to the poit where the model is sufficietly complex to capture the regularities i the data caused by the cogitive process beig modeled. Ay additioal complexity will cause a drop i geeralizability, because after that poit the model will begi to capture radom error, ot just the uderlyig process. The differece betwee the GOF ad geeralizability curves represets the amout of overfittig that ca occur. Oly by takig complexity ito accout ca a selectio method accurately measure a model s geeralizability. The task before the modelig commuity has bee to develop a accurate ad complete measure of model complexity, beig sesitive ot oly to the umber of parameters i the model but also to its fuctioal form. Aother way to iterpret the precedig discussio is that the trademark of a good model is its ability to satisfy the two opposig selectio pressures of GOF ad complexity, with the ed result beig good geeralizability. These two pressures ca be thought of as the two edges of Occam s razor: A model must be complex eough to capture the uderlyig regularity yet simple eough to avoid overfittig the data sample ad thus losig geeralizability. I this regard, model selectio methods should be evaluated o their success i implemetig Occam s razor. The selectio method that we itroduce i this article, MDL, achieves this goal. Before we describe this method, we review prior approaches to model selectio. Prior Approaches to Model Selectio We begi this sectio with a formal defiitio of a model. From a statistical stadpoit, data are a sample geerated from a true but ukow probability distributio, which is the regularity uderlyig the data. A statistical model is defied as a collectio of probability distributios defied o experimetal data ad idexed by the model s parameter vector, whose values rage over the parameter space of the model. If the model cotais as a special case the probability distributio that geerated the data (i.e., the true model), the the model is said to be correctly specified; otherwise it is misspecified. Formally, defie y (y 1,...,y N )as a vector of values of the depedet variable, ( 1,..., k )as the parameter vector of the model, f(y) as the likelihood fuctio as a fuctio of the parameter. N is the umber of observatios ad k is the umber of parameters. Ofte it is possible to write y as a sum of a determiistic compoet plus radom error: y g, x e. (1) I the equatio, x (x 1,..., x N ) is a vector of a idepedet variable x, ad e (e 1,...,e N ) is the radom error vector from a probability distributio with a mea of zero. Quite ofte the mea

4 MODEL SELECTION AND COMPLEXITY 475 Figure 2. Illustratio of the relatioship betwee goodess of fit ad geeralizability as a fuctio of model complexity (Myug & Pitt, 2001). From Steves Hadbook of Experimetal Psychology (p. 449, Figure 11. 4), by J. Wixted (Editor), 2001, New York: Wiley. Copyright 2001 by Wiley. Adapted with permissio. fuctio g(, x) itself is take to defie a mathematical model. However, the specificatio of the error distributio must be icluded i the defiitio of a model. Additioal parameters may be itroduced i the model to specify the shape of the error distributio (e.g., ormal). Ofte its shape is determied by the experimetal task or desig. For example, cosider a recogitio memory experimet i which the participat is required to respod old or ew to a set of pictures preseted across a series of idepedet trials, with the umber of correct resposes recorded as the depedet variable. Suppose that a two-parameter model assumes that the probability of a correct respose follows a logistic fuctio of the time lag (x i ), for coditio i (i 1,..., N), betwee iitial exposure ad recogitio test, i the form of g( 1, 2, x i ) [1 1 exp( 2 x i )] 1. I this case, the depedet variable y i will be biomially distributed with probability g( 1, 2, x i ) ad the umber of biomial trials, so the shape of error fuctio is completely specified by the experimetal task. Six represetative selectio methods curretly i use are show i Table 2. They are the Akaike iformatio criterio (AIC; Akaike, 1973), the Bayesia iformatio criterio (BIC; Schwarz, 1978), the root mea squared deviatio (RMSD), the iformatiotheoretic measure of complexity (ICOMP; Bozdoga, 1990), cross-validatio (CV; Stoe, 1974), ad Bayesia model selectio (BMS; Kass & Raftery, 1995; Myug & Pitt, 1997). Each of these methods assesses a model s geeralizability by combiig a measure of GOF with a measure of complexity. Each prescribes that the model that miimizes the give criterio be chose. That is, the smaller the criterio value of a model, the better the model geeralizes. 2 A fuller discussio of these methods ca be foud i Myug, Forster, ad Browe (2000; see also Lihart & Zucchii, 1986). AIC ad BIC are the two most commoly used selectio methods. The first term, 2 l(f(y ˆ)), is a maximum likelihood measure of GOF ad the secod term, ivolvig k, is a measure of complexity that is sesitive to the umber of parameters i the model. As the umber of parameters icreases, so does the criterio. I BIC, the rate of icrease is modified by the log of the sample size,. 3 RMSD uses RMSE as the measure of GOF ad also takes ito accout the umber of parameters through k. 4 These three measures, AIC, BIC, ad RMSD, are all sesitive oly to oe aspect of complexity, umber of parameters, but isesitive to fuctioal form. This is clearly iadequate because, as demostrated i Table 1, the fuctioal form of a model iflueces geeralizability. ICOMP is a improvemet o this shortcomig. Its secod ad third terms together represet a complexity measure that takes ito accout the effects of parameter sesitivity through trace() ad parameter iterdepedece through det(), which, accordig to Li, Lewadowski, ad DeBruer (1996), are two pricipal compoets of the fuctioal form that cotribute to model complexity. However, ICOMP is also problematic because it is ot ivariat uder reparameterizatio of the model, i particular uder oliear forms of reparameterizatio. 5 2 The model selectio methods discussed i the preset article do ot require the assumptio that the models beig compared are correct or ested. (A model is said to be correct if there is a parameter value of the model that yields the probability distributio that has geerated the observed data sample. A model is said to be ested withi aother model if the former ca be reduced to a special case of the latter by settig oe or more of its parameters to fixed values.) O the other had, the geeralized likelihood ratio test based o the G 2 or chi-square statistics (e.g., Bishop, Fieberg, & Hollad, 1975, pp ), which are ofte used to compare two models, assumes that the models are ested ad, further, that the reduced model is correct. Whe these assumptios are met, both types of selectio methods should perform similarly. However, the methods should ot be viewed as iterchageable because their goals differ. The selectio methods preseted i this article were desiged to idetify the model that geeralizes best i some defied sese. The geeralized likelihood ratio test, i cotrast, is a ull hypothesis sigificace test i which the hypothesis that the reduced model is correct is tested give a prescribed level of the Type 1 error rate (i.e., ). Accordigly, the model chose uder this test may ot ecessarily be the oe that geeralizes best. 3 Sample size refers to the umber of idepedet data samples (more accurately, errors, i.e., e i s) draw from the same probability distributio. Data size is the umber of observed data poits that are beig fitted to evaluate a model ad that may come from differet probability distributios, although from the same probability family. Ofte, the sample size is equal to the data size. A case i poit is a liear regressio model, y i x i e i (i 1,..., N), where e i N(0, 2 ). Note that errors, e i s, are idepedet ad idetically distributed accordig to the ormal probability distributio with mea zero ad variace 2. O the other had, if it is assumed that each e i is ormally distributed with zero mea but with a differet value of the variace, that is, e i N(0, i 2 ), (i 1,...,N), the the sample size,, will ow be equal to 1 whereas the data size, N, remais uchaged. 4 The RMSD defied i Table 2 differs from the RMSD that has ofte bee used i the psychological literature (e.g., Friedma, Massaro, Kitzis, & Cohe, 1995) where it is defied as RMSD SSE/N, i which (N k) is replaced by N, ad therefore does ot take ito accout the umber of parameters. This form of RMSD is othig more tha RMSE. As such, it is ot appropriate to use as a method of model selectio, especially whe comparig models that differ i the umber of parameters. 5 Reparameterizatio refers to trasformig the parameters of a model so that it becomes aother, behaviorally equivalet, model. For example, a oe-parameter expoetial model with ormal error, y e x N(0, 2 )is a reparameterizatio of aother model: y x N(0, 2 ). The latter is

5 476 PITT, MYUNG, AND ZHANG Table 2 Six Prior Model Selectio Methods Selectio method Criterio equatio Akaike iformatio criterio (AIC) AIC 2lf(y ˆ) 2k Bayesia iformatio criterio (BIC) BIC 2lf(y ˆ) k l Root mea square deviatio (RMSD) RMSD SSE/(N k) Iformatio-theoretic measure of ICOMP l f(y complexity (ICOMP) k ˆ)] ltrace[( 1 l det(( ˆ)) 2 k 2 Cross-validatio (CV) CV l f(y Val ˆ Cal ) Bayesia model selectio (BMS) BMS l f(y)()d Note. y data sample of size ; ˆ parameter value that maximizes the likelihood fuctio f(y); k umber of parameters; SSE sum of the squared deviatios betwee observed ad predicted data; N the umber of data poits fitted; covariace matrix of the parameter estimates; y Val validatio sample of observed data; ˆ Cal maximum likelihood parameter estimate for a calibratio sample; l the atural logarithm of base e; () the prior probability desity fuctio of the parameter. I CV, the observed data are divided ito two subsamples of equal sizes, calibratio ad validatio. The former is used to estimate the best-fittig parameter values of a model. The parameters are the fixed to these values ad used by the model to fit the validatio sample, yieldig a model s CV idex. CV is a easyto-use, heuristic method of estimatig a model s geeralizability (for a brief tutorial, see Myug & Pitt, 2001). The emphasis o geeralizability makes it reasoable to suppose that CV somehow takes ito accout the effects of fuctioal form. If, how, ad how well it does this is ot clear, however. BMS is a model selectio method motivated from Bayesia iferece. As such, the method chooses models based o the posterior probability of a model give the data. Calculatio of the posterior probability requires the specificatio of the parameter prior desity, (), creatig the possibility that model selectio will deped o the choice of the prior desity. As with CV, complexity i BMS is elusive. The itegral form of the measure idicates that BMS takes ito accout fuctioal form ad the umber of parameters, but how this is achieved is ot etirely clear. It ca be show that BIC performs equivaletly to BMS as a large sample approximatio. It is importat to ote that these selectio criteria are themselves sample estimates of a true but ukow populatio parameter (i.e., geeralizability i the populatio), ad thus their values ca chage from sample to sample. Uder the model selectio procedure described above, however, oe is forced to choose oe model o matter how small the differece is amog models, eve whe the models are virtually equivalet i their approximatio of the uderlyig regularity. Oe solutio to this dilemma is to coduct a statistical test, before applyig the model selectio procedure, to decide if two give models provide equally good descriptios of the uderlyig process. Golde (2000) proposed such a methodology, i which oe ca determie whether a subset of models are obtaied from the former by defiig a ew parameter,, as e. Wheever two models are related to each other through reparameterizatio, they become equivalet i the sese that both will fit ay give data set idetically, albeit with differet parameter values. Statistically speakig, they are idistiguishable from oe aother. equally good approximatios of the cogitive process. 6 If the umber of comparisos is ot small, however, it ca be difficult to cotrol experimet-wise error. The precedig selectio methods represet importat progress i tacklig the model selectio problem. All have shortcomigs that limit their usefuless to various degrees. The complexity measure i AIC, BIC, ad RMSD is icomplete, ad the other three are either ot ivariat uder reparameterizatio (ICOMP) or lack a clear complexity measure (CV, BMS). The remaider of this article is devoted to the developmet ad testig of a model selectio approach that overcomes these limitatios. We begi by showig that differetial geometry provides a theoretically welljustified ad ituitive framework for uderstadig complexity ad model selectio i geeral. Model Complexity: A Distributioal Approach We begi the discussio of complexity with a graphical defiitio of the term, iteded to clarify what it meas for a model to be complex. Depicted i the top pael i Figure 3 is the set of all data patters that are possible give a particular experimetal desig. Every poit i this multidimesioal data space represets a particular data patter i terms of a probability distributio, such as the shape of a frequecy distributio of respose times. All models occupy a sectio, or multiple sectios, of data space, beig able to fit a subset of the possible data patters that could be observed. It is equally appropriate to thik of data space as the uiverse of all models uder cosideratio, because every model will occupy a regio of this space, large or small. 6 This is a ull hypothesis sigificace test, which, as a extesio of the Wilke s geeralized likelihood ratio test, tests the ull hypothesis that all models uder cosideratio fit the data equally well. This test, ulike the geeralized likelihood ratio test, is applicable to comparig o-ested ad misspecified models for a wide rage of discrepacy fuctios, icludig the oes with pealty terms, such as AIC, BIC, ad MDL. I the stadard model selectio procedure usig these criteria, oe is forced to decide betwee two models uder compariso. This test allows for a third decisio that both models are equally good or there is ot eough evidece yet for choosig oe model over the other.

6 MODEL SELECTION AND COMPLEXITY 477 Although the examples i Figure 3 are hypothetical, the graphical depictio of mathematical models i this way is ot merely illustrative. Respose surface aalysis (RSA) is a statistical tool that, as i Figure 3, yields graphical represetatios of models for comparig their relative complexities. I additio, it serves as a iformative startig poit for the derivatio of a elegat quatitative measure of complexity. RSA is a method for studyig geometric relatios amog resposes geerated by a mathematical model, ofte used i oliear regressio (Bates & Watts, 1988). For a model with k parameters ad N observatios, the respose surface is defied as a k-dimesioal surface, formed by all possible respose vectors that the model ca describe. The respose surface is embedded i a N-dimesioal data space, which is the set of all possible respose vectors that could be geerated idepedetly of a model. The respose surface is a hyperplae for a liear model but may be curved whe the model is oliear. The effects of model complexity o model fit is easily visible whe models are compared i the space of respose surfaces. This is show i the followig example. See Myug, Kim, ad Pitt (2000) for a more detailed discussio. Cosider the followig oe-parameter power model: y t power model, (2) Figure 3. The top pael depicts regios i data space occupied by two models, M a (simple model) ad M b (complex model), with the rage of data patters that ca be geerated by each model i the lower paels. The amout of space occupied by a model is positively related to its complexity. A simple model (M a ) will occupy a small regio of data space because it assumes a specific structure i the data, which will maifest itself as a relatively arrow rage of similar data patters. This idea is illustrated i the lower left pael. Whe oe of these few patters occurs, the model will fit the data well; otherwise, it will fit poorly. Simple models are easily falsifiable, requirig a small miimum umber of data poits outside of its regio of data space to disprove the model. I cotrast, a complex model (M b ) will occupy a larger portio of data space. Complex models do ot assume a sigle structure i the data. Rather, the structure chages as a fuctio of the parameter values of the model. A slight chage i a parameter s value ca have a dramatic chage i the model s structure. Such chameleo-like behavior eables complex models to be fiely tued to fit a wide rage of data patters. This is illustrated i the lower right pael. Overly complex models are of questioable worth because their ability to fit such a diverse set of data patters ca make them difficult to falsify. I geeral, a complex model is oe with may parameters ad a (powerful) oliear equatio for combiig parameters. Complexity is dichotomized i this example for illustrative purposes oly. It is more accurate to thik of it as a cotiuum, as depicted i Figure 2. Respose Surface Aalysis where y is the respose probability (e.g., proportio correct), t is a presetatio or retetio iterval greater tha 1, ad (0) is a parameter. Suppose that y is measured at two differet time itervals, t 1 ad t 2. Give two fixed values of t 1 ad t 2, the respose surface is a lie or a curve i a two-dimesioal data space composed of (y t1, y t2 ) created by plottig the y values at t 1 agaist the correspodig y values at t 2 for the full rage of the parameter, similar to phase plots i dyamical systems research (Kelso, 1995). I essece, a model is represeted graphically as a plot of y t1 versus y t2 i data space. For example, for the parameter 1, the y value at t 1 2 is obtaied as y t1 (t 1 ) (2) Similarly, the y value at t 2 8 is obtaied as y t2 (t 2 ) (8) These two values are the represeted as a sigle poit (0.500, 0.125) o the (y t1, y t2 ) plae. Additioal poits are obtaied by varyig the full rage of the parameter (i.e., 0 ) to form a cotiuous curve, which is called the respose curve of a model, show i the middle pael of Figure 4. The equatio that describes this relatioship ca be derived aalytically as follows: l t y t2 y 2/l t 1 t1. (3) Note that the parameter has bee removed from the equatio. The model is ow parameter free, havig bee redefied as the relatioship betwee two y values istead of a parameter ad a y value. Each poit o the respose curve describes the relatioship betwee two y values that are themselves described perfectly by a power fuctio. Similarly, the respose curves for the followig oe-parameter models ca be obtaied ad are graphed i the adjacet paels i Figure 4: y 1 t (liear model) y [1.102 si(5 t/12) 1]/2 (blackhole model). (4) RSA provides two valuable isights ito model complexity. First, RSA makes the meaig of complexity tagible. The respose curve of a model represets a complete visual descriptio of the model (i.e., all of the data patters it ca describe). The curve is the model. Ay poit that falls o the curve ca be perfectly fit by the model. Thus, RSA clearly reveals what patters of data a model ca describe ad what patters it caot. For example, the respose curve of the liear model reveals that the

7 478 PITT, MYUNG, AND ZHANG Figure 4. Respose curves of three oe-parameter models that have the same umber of parameters but differ i fuctioal form, each obtaied for t 1 2 ad t 2 8. model ca describe oly those (y 2, y 1 ) poits satisfyig the equatio, y 2 4y 1 3 (0.75 y t 1), o others. Secod, the cotributios of fuctioal form to model complexity become evidet whe models are compared i RSA space. All three models i Figure 4 have oe parameter, but their respose curves differ greatly, idicatig that their fuctioal forms must also differ. This observatio leads to a ituitive measure of model complexity: Give that the respose surface of a model represets the collectio of all possible data patters that the model ca describe, oe could defie a atural complexity measure as the total legth of the model s respose curve. For example, for the three respose curves i Figure 4, oe ca coclude that the black-hole model is most complex with its lie legth of 25.74, followed by the power model (legth 1.50), ad the liear model (legth 1.03). Du (2000) presets aother RSA-based complexity measure. Despite the possible ways of quatifyig complexity withi RSA, ay such measure would be icomplete because it would ot take ito accout the stochastic ature of the process uderlyig the data. That is, RSA igores radom variatio i the data. The respose curves i Figure 4 depict the three models without a error term. Recall that data represet a sample from a ukow probability distributio, the shape of which must be specified by the model. A complete measure of complexity must take ito accout the distributioal characteristics of a model (e), ot oly that of the mea fuctio, that is, g(, x) i Equatio 1. Oly the latter is cosidered i RSA. Thus, ay RSA metric would yield oly a approximate measure of complexity. To icorporate radom error ito a complexity measure requires that RSA be exteded ito a space of probability distributios, to which we ow tur. the data space depicted i Figure 3, every distributio is a poit i this space, ad the collectio of poits created by varyig the parameters of the model gives rise to a hypervolume i which similar distributios are mapped to earby poits, as illustrated i Figure 5. Earlier, we defied complexity as that characteristic of a model that eables it to fit a wide rage of data patters. I a geometric cotext, this traslates ito a iheret characteristic of a model that eables it to describe a wide rage of probability distributios. Models that are able to describe more distributios should be more complex. Model complexity would therefore seem to be related to the umber of probability distributios that a model ca geerate. This ituitio immediately rus ito trouble: The umber of all such distributios is ucoutably ifiite, makig the value idetermiable. Or is it? Give that ot all distributios are equally similar to oe aother, oe solutio is to cout oly distiguishable distributios. That is, if two or more probability distributios o a model s maifold are sufficietly similar to oe aother to be statistically idistiguishable, they are couted as oe distributio, with a cluster of such distributios occupyig a local eighborhood o the maifold. This procedure yields a coutably ifiite set of distiguishable distributios, the size of which is a atural measure of complexity. More precisely, two probability distributios should be cosidered idistiguishable if oe is mistake for the other Differetial Geometric Approach to Model Complexity I this sectio we show that differetial geometry, a brach of mathematics, provides a theoretically well-justified ad ituitive measure of model complexity. A more techically rigorous presetatio of the topic ca be foud i Myug, Balasubramaia, ad Pitt (2000). Withi differetial geometry, a model forms a geometric object kow as a Riemaia maifold that is embedded i the space of all probability distributios (Amari, 1983, 1985; Rao, 1945). As i Figure 5. The space of probability distributios forms a maifold o which similar distributios are mapped to earby poits.

8 MODEL SELECTION AND COMPLEXITY 479 eve i the presece of a ifiite amout of data. A measure of volume that couts oly distiguishable distributios must be devised to achieve this goal. The followig metal exercise shows how this ca be doe. Draw data from oe distributio, which is idexed by a specific parameter, say p, i the model, ad ask how well oe ca guess whether the data came from p rather tha from a earby q. The ability to distiguish betwee these distributios icreases with the amout of available data. However, it ca be show that for ay fixed amout of data there is a little ellipsoid aroud p where the probability of error i the guessig game is large. I other words, withi this ellipsoid, distributios are ot very distiguishable i the statistical sese. To cout distiguishable distributios, oe should the tile the model maifold with such ellipsoids, coutig oe distributio for each ellipsoid. This procedure turs the maifold ito a ellipsoid-covered lattice with a distiguishable distributio at each lattice poit. The the limit of ifiite sample size should be take so that the ellipsoids of idistiguishability shrik ad the associated lattice becomes fier, formig a cotiuum i the limit. Takig this limit recovers a cotiuum measure that couts oly distiguishable distributios. Whe this computatio is carried out, the umber of distiguishable distributios turs out to be equal to d{det[i()]} 1/2 where I() is the Fisher iformatio matrix of a sample of size 1, det(i) is the determiat of the matrix I, ad d the ifiitesimal parameter volume (see Footote 6 for a defiitio of the Fisher Iformatio matrix; see also Schervish, 1995). The umber of all distiguishable probability distributios that a model ca geerate or describe is obtaied by itegratig d{det[i()]} 1/2 over the etire parameter maifold as follows: V M ddeti, (5) geometric shape embedded i a hyperdimesioal space, albeit differet spaces (probability distributios vs. respose vectors). This correspodece is ot accidetal, because the differetial geometric approach is a logical extesio of RSA. To uderstad the coectio betwee the two, thik of model selectio as a iferece game: The goal is to determie, out of a set of probability distributios that idex data patters, which model is most likely to have geerated a data sample draw from a ukow probability distributio. Referrig back to Equatio 1, the mai yardstick used i this selectio process is the likelihood fuctio, f(y). The value of the likelihood fuctio depeds upo ot oly the mea fuctio, g(, x), but also the distributioal characteristics of the error term (e). Ay justifiable measure of complexity should take ito accout these two factors. RSA cosiders oly the first term, whereas the differetial geometric approach cosiders both. To see how the two approaches are related quatitatively, cosider the respose curve of a oe-parameter model, such as the power model i Figure 4 (middle pael). The RSA measure of complexity i this model is the total legth of the respose curve, which i essece measures the umber of data poits alog the curve. I the differetial geometric approach, coutig is carried out with the additioal kowledge of the local distiguishability of data poits alog the curve. This differece is illustrated i Figure 6 for the oe-parameter power model. The respose curve is split ito segmets of differet legths, with the poits withi each segmet beig statistically idistiguishable. Note that distiguishability is ot uiform alog the curve. Poits i the middle regio are less distiguishable tha those at either ed. I fact, for ay oe-parameter model of observed data that follows a biomial probability distributio, oe ca derive formal expressios for these two measures of complexity as follows (see Appedix A for a complete derivatio): where the subscript M deotes a particular model uder cosideratio. This measure is kow as the Riemaia volume i differetial geometry. A highly desirable property of the volume measure is that it is ivariat uder reparameterizatio. This property is a outgrowth of models beig represeted as maifolds i the space of all probability distributios. I this cotext, the parameters of a model simply idex the collectio of distributios a model describes. The choice of the parameters themselves is irrelevat. The maifold is the model, which will ever chage, regardless of how the model is specified i a equatio (see Equatio 10 ad accompayig text). The Riemaia volume makes good sese as a complexity measure. Because complexity is related to the volume of a model i the space of probability distributios, the measure of volume should cout oly differet, or distiguishable, distributios, ad ot the coordiate volume ( d) of the maifold. The Riemaia volume, therefore, is a direct fuctio of the umber of distiguishable distributios that a model ca geerate, with a complex model geeratig more distributios tha a simple model. Relatio to RSA The differetial geometric approach to model complexity is similar to RSA i that a mathematical model is viewed as a Figure 6. The power model s respose curve from Figure 5 divided ito local regios of idistiguishability (i.e., the poits withi each regio are statistically idistiguishable).

9 480 PITT, MYUNG, AND ZHANG RSA: legth L M d N q1 Differetial geometry: volume V M d N q1 dg, 2 x q d ; 1 g, x q 1 g, x q dg, x q d 2. (6) I the equatios, it is assumed that observed data, y q, are distributed biomially, Bi[, g(, x q )], of sample size, ad probability g(, x q ). Note that the two measures are idetical except for the additioal term, 1/{g(, x q )[1 g(, x q )]}, i the differetial geometric complexity measure. This extra term takes ito accout local distiguishability ad is equal to det[i()] i Equatio 5. MDL Method of Model Selectio Thus far i the article we have itroduced a measure of model complexity. Although it is useful for comparig the relative complexities of models, as will be show below, by itself the measure is isufficiet as a model selectio method. What is missig is a measure of how well the model fits the data (i.e., a measure of GOF). MDL, a model selectio method from algorithmic codig theory i computer sciece (Gruwald, 2000; Rissae, 1983, 1996) combies both of these measures. The MDL approach to model selectio was developed withi the domai of iformatio theory, where the goal of model selectio is to choose the model that permits the greatest compressio of data i its descriptio. The assumptio uderlyig the approach is that regularities or patters i data imply redudacy. The more the data ca be compressed by extractig this redudacy, the more we lear about the uderlyig regularities goverig the cogitive process of iterest. The full form of the measure is show below. The first term is the GOF measure ad the secod ad third together form the itrisic complexity of the model (Rissae, 1996). MDL l fy ˆ k 2 2 l l ddeti, (7) where y (y 1,..., y ) is a data sample of size, ˆ is the maximum likelihood parameter estimate, l is the atural logarithm of base e, I() is the Fisher iformatio matrix defied earlier. 7 The itegratio of the third term is take over the parameter space defied by the model. As with prior selectio methods, MDL prescribes that the model that miimizes the criterio should be chose, the assumptio beig that such a model has extracted the most redudacy (i.e., regularity) i the data ad thus should geeralize best. I practice, the criterio represets the shortest legth of computer code (measured i bits of iformatio) ecessary to describe the data give the model. The shorter the code, the greater the amout of regularity i the data that the model ucovered. The soudess of MDL as a model selectio criterio has bee well documeted by Li ad Vitayi (1997), who showed that there is a close relatioship betwee miimizig MDL ad achievig good geeralizability. From a decisio-theoretic perspective, MDL selects the oe model, amog a set of competig models, that miimizes the expected error i predictig future data i which the predictio error is measured usig a logarithmic discrepacy fuctio (Rissae, 1999; Yamaishi, 1998). It turs out that miimizatio of MDL correspods to maximizatio of the posterior probability withi the Bayesia statistics framework (i.e., BMS). Balasubramaia (1997) showed that the MDL criterio ca be derived as a fiite series of terms i a asymptotic expasio of the Bayesia posterior probability of a model give the data for a special form of the parameter prior desity. This coectio betwee the two suggests that choosig the model that gives the shortest descriptio of the observed data is essetially equivalet to choosig the model that is most likely true i the sese of probability theory (see Theorem 1 of Vitayi & Li, 2000). The theoretical lik betwee BMS ad MDL also suggests that they may perform similarly i practice. Barro ad Cover (1991) showed that BMS ad MDL are asymptotically equivalet give large sample sizes; that is, both will coverge to the true model if the true model is correctly specified. O the other had, if models are misspecified ad sample size is relatively small, they ca yield disparate results, especially depedig o the form of the parameter prior desity used i the calculatio of BMS. Despite these similarities, MDL has at least oe advatage over BMS: The complexity measure is well uderstood. As metioed above, complexity ad GOF are ot easily disetagled i the itegral form of BMS (Table 2). I cotrast, a clear uderstadig of the complexity term i MDL is provided by its couterpart i differetial geometry, the geometric complexity measure. This is described i detail i the followig sectio. The latter two terms of the MDL criterio (Equatio 7) readily led themselves to a differetial geometric iterpretatio, which is related to the Riemaia volume measure preseted earlier. Coceptually, model selectio usig MDL proceeds by choosig the model that best approximates the true model by coutig the umber of distiguishable distributios that come close to the true model. Proximity to the true model is assessed by f(y). Withi the differetial geometric approach, this correspods to a volume measure i the space of probability distributios. The followig volume, uder the assumptio of large sample size, is show to be a valid measure of proximity (Balasubramaia, 1997; Myug, Balasubramaia, & Pitt, 2000): C M (2/) k/2 h( ˆ), where k is the umber of parameters i the model ad h( ˆ) is a datadepedet factor that goes to 1 as grows large (some additioal coditios are required; see Balasubramaia, 1997). Essetially, C M represets the Riemaia volume of a small ellipsoid aroud ˆ, withi which the probability of the data, f(y), is appreciable. As such, it measures the umber of distiguishable distributios 7 The Fisher iformatio matrix I() of the MDL criterio is defied as I ij () (1/)E( 2 l f(y)/ i j )(i, j 1,...,k) for the data vector y (y l,..., y ) where y q s are sample values of radom variables, Y q s (q 1,...,; see, e.g., Rissae, 1996, Equatio 7). Further, if Y q s are idepedetly ad idetically distributed, the above I() reduces to the Fisher iformatio matrix of sample size 1, that is, I ij () E( 2 l f(y q )/ i j )(i, j 1,...,k) for ay q.

10 MODEL SELECTION AND COMPLEXITY 481 that come close to the truth, as measured by predictig the data y with relatively high probability. However, C M aloe is ot a adequate measure of proximity because the total umber of distiguishable distributios of a model (V M, the Riemaia volume, Equatio 5) must also be cosidered. Iclusio of this additioal measure leads to a volume ratio, V M /C M, which pealizes models for havig a uecessarily large umber of distiguishable distributios (V M ) or havig relatively few distiguishable distributios close to the truth (C M ). Takig the log of this ratio gives l V M C M k 2 2 l l ddeti l h ˆ. (8) The first ad secod terms are idepedet of the true distributio as well as the data, ad therefore represet a itrisic property of the model. Together they will be called the geometric complexity of the model, ad are ivariat uder reparameterizatio of the model. As sample size icreases, the third term, which is data depedet, becomes egligible. Whe this occurs, the geometric complexity is equal to the complexity pealty i the MDL criterio i Equatio 7. It is also worth otig that the first term of the geometric complexity measure icreases logarithmically with sample size, whereas the secod term is idepedet of. A implicatio of this is that as grows large, the effects of complexity due to fuctioal form, reflected through I(), will gradually dimiish compared to those due to the umber of parameters (k). Thus, fuctioal form effects will have their greatest impact o model selectio whe sample size is small. Because small samples are the orm i experimets i much of cogitive psychology, it is imperative that the selectio method be sesitive to this property of a model. Differetial geometry provides may valuable isights ito model complexity ad model selectio. Oe is a ew explicatio of MDL. The MDL selectio criterio ca be rewritte as follows: MDL l fy ˆ V M /C M l ormalized f y ˆ. (9) This reiterpretatio provides a clearer picture of what MDL does i model selectio. It selects the model that gives the highest value of the maximum likelihood per the relative ratio of distiguishable distributios (V M /C M ). We might call this the ormalized maximum likelihood. From this perspective, the better model is the oe with may distiguishable distributios close to the truth but few distiguishable distributios overall. Perhaps the most importat isight provided by differetial geometry is a ituitive uderstadig of the meaig of complexity i MDL: It measures the mius log of the volume of the distiguishable distributios i a model relative to those close to the truth. I this regard, the size of a model maifold i the space of distributios is what matters whe measurig complexity. A model s fuctioal form ad its umber of parameters ca be misleadig idicators of complexity because they are simply the apparatus by which a collectio of distributios defied by the model is idexed. The geometric approach to complexity preseted here makes it clear that either the parameterizatio or the specific fuctioal form used i idexig is relevat so log as the same collectio of distributios is catalogued o the model maifold. For example, the followig two models, although assumig differet fuctioal forms, are equivalet ad equally complex i the geometric sese: Model A: y exp 1 x 1 2 x 2 error, Model B: y 1 x 12 x 2 error, (10) where the error has zero mea ad follows the same distributio for both models. Here, the parameters of Model A are related to the parameters of Model B through i exp( i ), i 1, 2. Three Applicatio Examples Geometric complexity ad MDL costitute a powerful pair of model evaluatio tools. Whe used together i model testig, a deeper uderstadig of the relatioship betwee models ca be gaied. The first measure eables oe to assess the relative complexities of the set of models uder cosideratio. The secod builds o the first by suggestig which model is preferable give the data i had. The followig simulatios demostrate the applicatio of these methods i three areas of cogitive modelig: psychophysics, iformatio itegratio, ad categorizatio. I each example, two competig models with the same umber of parameters but differet fuctioal forms were fitted to data sets geerated by each of these models (huma data were ot used). Of iterest is the ability of each selectio method to recover the model that geerated the data. A good selectio method should be able to discrimiate betwee data geerated by a model from those geerated by aother model. That is, it should be able to see through the radom variatio i the data sample ad accurately ifer whether the model beig tested geerated the data it is beig fit to. Errors are a sig of overgeeralizatio ad reveal a bias i the selectio method, which could be toward either the more complex or simpler model. The ideal patter of data is oe i which each model geeralizes best oly to data geerated by itself, ot to data geerated by the competig model. I the 2 2 sectios of Tables 3 5, this correspods to a mea selectio criterio measure that is lowest i the upper left ad lower right quadrats, with perfect recovery rates (100%) i these cells as well. Four selectio methods were compared: AIC, ICOMP, CV, ad MDL. Give the close relatioship betwee MDL ad BMS, the latter was ot icluded i the compariso. BIC ad RMSD were also ot icluded because of their equivalece to AIC i the preset testig coditios. AIC ca be expressed with BIC as a term i the equatio: AIC BIC k (2 l ). Cosequetly, both methods will yield the same results whe models with the same umber of parameters (i.e., equal k) are compared. RMSD will geerally yield the same outcome as well. 8 A fuller discussio of the three simulatios ca be foud i Zhag (1999). 8 Whe comparig amog models with the same umber of parameters, model selectio uder RMSD will be the same as that uder AIC ad BIC whe errors are ormally distributed ad have equal variaces. This is because i such cases the sum of squares error i RMSD is related to the likelihood fuctio i AIC (ad BIC) as SSE lf(y) ad hece, miimizatio of SSE is equivalet to maximizatio of the likelihood

Bayesian approach to reliability modelling for a probability of failure on demand parameter

Bayesian approach to reliability modelling for a probability of failure on demand parameter Bayesia approach to reliability modellig for a probability of failure o demad parameter BÖRCSÖK J., SCHAEFER S. Departmet of Computer Architecture ad System Programmig Uiversity Kassel, Wilhelmshöher Allee

More information

Math 10C Long Range Plans

Math 10C Long Range Plans Math 10C Log Rage Plas Uits: Evaluatio: Homework, projects ad assigmets 10% Uit Tests. 70% Fial Examiatio.. 20% Ay Uit Test may be rewritte for a higher mark. If the retest mark is higher, that mark will

More information

Improving Template Based Spike Detection

Improving Template Based Spike Detection Improvig Template Based Spike Detectio Kirk Smith, Member - IEEE Portlad State Uiversity petra@ee.pdx.edu Abstract Template matchig algorithms like SSE, Covolutio ad Maximum Likelihood are well kow for

More information

South Slave Divisional Education Council. Math 10C

South Slave Divisional Education Council. Math 10C South Slave Divisioal Educatio Coucil Math 10C Curriculum Package February 2012 12 Strad: Measuremet Geeral Outcome: Develop spatial sese ad proportioal reasoig It is expected that studets will: 1. Solve

More information

( n+1 2 ) , position=(7+1)/2 =4,(median is observation #4) Median=10lb

( n+1 2 ) , position=(7+1)/2 =4,(median is observation #4) Median=10lb Chapter 3 Descriptive Measures Measures of Ceter (Cetral Tedecy) These measures will tell us where is the ceter of our data or where most typical value of a data set lies Mode the value that occurs most

More information

Fundamentals of Media Processing. Shin'ichi Satoh Kazuya Kodama Hiroshi Mo Duy-Dinh Le

Fundamentals of Media Processing. Shin'ichi Satoh Kazuya Kodama Hiroshi Mo Duy-Dinh Le Fudametals of Media Processig Shi'ichi Satoh Kazuya Kodama Hiroshi Mo Duy-Dih Le Today's topics Noparametric Methods Parze Widow k-nearest Neighbor Estimatio Clusterig Techiques k-meas Agglomerative Hierarchical

More information

Ones Assignment Method for Solving Traveling Salesman Problem

Ones Assignment Method for Solving Traveling Salesman Problem Joural of mathematics ad computer sciece 0 (0), 58-65 Oes Assigmet Method for Solvig Travelig Salesma Problem Hadi Basirzadeh Departmet of Mathematics, Shahid Chamra Uiversity, Ahvaz, Ira Article history:

More information

SD vs. SD + One of the most important uses of sample statistics is to estimate the corresponding population parameters.

SD vs. SD + One of the most important uses of sample statistics is to estimate the corresponding population parameters. SD vs. SD + Oe of the most importat uses of sample statistics is to estimate the correspodig populatio parameters. The mea of a represetative sample is a good estimate of the mea of the populatio that

More information

EM375 STATISTICS AND MEASUREMENT UNCERTAINTY LEAST SQUARES LINEAR REGRESSION ANALYSIS

EM375 STATISTICS AND MEASUREMENT UNCERTAINTY LEAST SQUARES LINEAR REGRESSION ANALYSIS EM375 STATISTICS AND MEASUREMENT UNCERTAINTY LEAST SQUARES LINEAR REGRESSION ANALYSIS I this uit of the course we ivestigate fittig a straight lie to measured (x, y) data pairs. The equatio we wat to fit

More information

Big-O Analysis. Asymptotics

Big-O Analysis. Asymptotics Big-O Aalysis 1 Defiitio: Suppose that f() ad g() are oegative fuctios of. The we say that f() is O(g()) provided that there are costats C > 0 ad N > 0 such that for all > N, f() Cg(). Big-O expresses

More information

UNIT 4 Section 8 Estimating Population Parameters using Confidence Intervals

UNIT 4 Section 8 Estimating Population Parameters using Confidence Intervals UNIT 4 Sectio 8 Estimatig Populatio Parameters usig Cofidece Itervals To make ifereces about a populatio that caot be surveyed etirely, sample statistics ca be take from a SRS of the populatio ad used

More information

Analysis of Server Resource Consumption of Meteorological Satellite Application System Based on Contour Curve

Analysis of Server Resource Consumption of Meteorological Satellite Application System Based on Contour Curve Advaces i Computer, Sigals ad Systems (2018) 2: 19-25 Clausius Scietific Press, Caada Aalysis of Server Resource Cosumptio of Meteorological Satellite Applicatio System Based o Cotour Curve Xiagag Zhao

More information

Evaluation scheme for Tracking in AMI

Evaluation scheme for Tracking in AMI A M I C o m m u i c a t i o A U G M E N T E D M U L T I - P A R T Y I N T E R A C T I O N http://www.amiproject.org/ Evaluatio scheme for Trackig i AMI S. Schreiber a D. Gatica-Perez b AMI WP4 Trackig:

More information

Numerical Methods Lecture 6 - Curve Fitting Techniques

Numerical Methods Lecture 6 - Curve Fitting Techniques Numerical Methods Lecture 6 - Curve Fittig Techiques Topics motivatio iterpolatio liear regressio higher order polyomial form expoetial form Curve fittig - motivatio For root fidig, we used a give fuctio

More information

Pattern Recognition Systems Lab 1 Least Mean Squares

Pattern Recognition Systems Lab 1 Least Mean Squares Patter Recogitio Systems Lab 1 Least Mea Squares 1. Objectives This laboratory work itroduces the OpeCV-based framework used throughout the course. I this assigmet a lie is fitted to a set of poits usig

More information

DATA MINING II - 1DL460

DATA MINING II - 1DL460 DATA MINING II - 1DL460 Sprig 2017 A secod course i data miig http://www.it.uu.se/edu/course/homepage/ifoutv2/vt17/ Kjell Orsbor Uppsala Database Laboratory Departmet of Iformatio Techology, Uppsala Uiversity,

More information

MATHEMATICAL METHODS OF ANALYSIS AND EXPERIMENTAL DATA PROCESSING (Or Methods of Curve Fitting)

MATHEMATICAL METHODS OF ANALYSIS AND EXPERIMENTAL DATA PROCESSING (Or Methods of Curve Fitting) MATHEMATICAL METHODS OF ANALYSIS AND EXPERIMENTAL DATA PROCESSING (Or Methods of Curve Fittig) I this chapter, we will eamie some methods of aalysis ad data processig; data obtaied as a result of a give

More information

The isoperimetric problem on the hypercube

The isoperimetric problem on the hypercube The isoperimetric problem o the hypercube Prepared by: Steve Butler November 2, 2005 1 The isoperimetric problem We will cosider the -dimesioal hypercube Q Recall that the hypercube Q is a graph whose

More information

Designing a learning system

Designing a learning system CS 75 Machie Learig Lecture Desigig a learig system Milos Hauskrecht milos@cs.pitt.edu 539 Seott Square, x-5 people.cs.pitt.edu/~milos/courses/cs75/ Admiistrivia No homework assigmet this week Please try

More information

A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON

A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON Roberto Lopez ad Eugeio Oñate Iteratioal Ceter for Numerical Methods i Egieerig (CIMNE) Edificio C1, Gra Capitá s/, 08034 Barceloa, Spai ABSTRACT I this work

More information

Data Analysis. Concepts and Techniques. Chapter 2. Chapter 2: Getting to Know Your Data. Data Objects and Attribute Types

Data Analysis. Concepts and Techniques. Chapter 2. Chapter 2: Getting to Know Your Data. Data Objects and Attribute Types Data Aalysis Cocepts ad Techiques Chapter 2 1 Chapter 2: Gettig to Kow Your Data Data Objects ad Attribute Types Basic Statistical Descriptios of Data Data Visualizatio Measurig Data Similarity ad Dissimilarity

More information

Counting Regions in the Plane and More 1

Counting Regions in the Plane and More 1 Coutig Regios i the Plae ad More 1 by Zvezdelia Stakova Berkeley Math Circle Itermediate I Group September 016 1. Overarchig Problem Problem 1 Regios i a Circle. The vertices of a polygos are arraged o

More information

Performance Plus Software Parameter Definitions

Performance Plus Software Parameter Definitions Performace Plus+ Software Parameter Defiitios/ Performace Plus Software Parameter Defiitios Chapma Techical Note-TG-5 paramete.doc ev-0-03 Performace Plus+ Software Parameter Defiitios/2 Backgroud ad Defiitios

More information

Learning to Shoot a Goal Lecture 8: Learning Models and Skills

Learning to Shoot a Goal Lecture 8: Learning Models and Skills Learig to Shoot a Goal Lecture 8: Learig Models ad Skills How do we acquire skill at shootig goals? CS 344R/393R: Robotics Bejami Kuipers Learig to Shoot a Goal The robot eeds to shoot the ball i the goal.

More information

3D Model Retrieval Method Based on Sample Prediction

3D Model Retrieval Method Based on Sample Prediction 20 Iteratioal Coferece o Computer Commuicatio ad Maagemet Proc.of CSIT vol.5 (20) (20) IACSIT Press, Sigapore 3D Model Retrieval Method Based o Sample Predictio Qigche Zhag, Ya Tag* School of Computer

More information

1.2 Binomial Coefficients and Subsets

1.2 Binomial Coefficients and Subsets 1.2. BINOMIAL COEFFICIENTS AND SUBSETS 13 1.2 Biomial Coefficiets ad Subsets 1.2-1 The loop below is part of a program to determie the umber of triagles formed by poits i the plae. for i =1 to for j =

More information

. Written in factored form it is easy to see that the roots are 2, 2, i,

. Written in factored form it is easy to see that the roots are 2, 2, i, CMPS A Itroductio to Programmig Programmig Assigmet 4 I this assigmet you will write a java program that determies the real roots of a polyomial that lie withi a specified rage. Recall that the roots (or

More information

Protected points in ordered trees

Protected points in ordered trees Applied Mathematics Letters 008 56 50 www.elsevier.com/locate/aml Protected poits i ordered trees Gi-Sag Cheo a, Louis W. Shapiro b, a Departmet of Mathematics, Sugkyukwa Uiversity, Suwo 440-746, Republic

More information

The Closest Line to a Data Set in the Plane. David Gurney Southeastern Louisiana University Hammond, Louisiana

The Closest Line to a Data Set in the Plane. David Gurney Southeastern Louisiana University Hammond, Louisiana The Closest Lie to a Data Set i the Plae David Gurey Southeaster Louisiaa Uiversity Hammod, Louisiaa ABSTRACT This paper looks at three differet measures of distace betwee a lie ad a data set i the plae:

More information

IMP: Superposer Integrated Morphometrics Package Superposition Tool

IMP: Superposer Integrated Morphometrics Package Superposition Tool IMP: Superposer Itegrated Morphometrics Package Superpositio Tool Programmig by: David Lieber ( 03) Caisius College 200 Mai St. Buffalo, NY 4208 Cocept by: H. David Sheets, Dept. of Physics, Caisius College

More information

Normal Distributions

Normal Distributions Normal Distributios Stacey Hacock Look at these three differet data sets Each histogram is overlaid with a curve : A B C A) Weights (g) of ewly bor lab rat pups B) Mea aual temperatures ( F ) i A Arbor,

More information

Elementary Educational Computer

Elementary Educational Computer Chapter 5 Elemetary Educatioal Computer. Geeral structure of the Elemetary Educatioal Computer (EEC) The EEC coforms to the 5 uits structure defied by vo Neuma's model (.) All uits are preseted i a simplified

More information

Octahedral Graph Scaling

Octahedral Graph Scaling Octahedral Graph Scalig Peter Russell Jauary 1, 2015 Abstract There is presetly o strog iterpretatio for the otio of -vertex graph scalig. This paper presets a ew defiitio for the term i the cotext of

More information

Consider the following population data for the state of California. Year Population

Consider the following population data for the state of California. Year Population Assigmets for Bradie Fall 2016 for Chapter 5 Assigmet sheet for Sectios 5.1, 5.3, 5.5, 5.6, 5.7, 5.8 Read Pages 341-349 Exercises for Sectio 5.1 Lagrage Iterpolatio #1, #4, #7, #13, #14 For #1 use MATLAB

More information

A New Morphological 3D Shape Decomposition: Grayscale Interframe Interpolation Method

A New Morphological 3D Shape Decomposition: Grayscale Interframe Interpolation Method A ew Morphological 3D Shape Decompositio: Grayscale Iterframe Iterpolatio Method D.. Vizireau Politehica Uiversity Bucharest, Romaia ae@comm.pub.ro R. M. Udrea Politehica Uiversity Bucharest, Romaia mihea@comm.pub.ro

More information

Big-O Analysis. Asymptotics

Big-O Analysis. Asymptotics Big-O Aalysis 1 Defiitio: Suppose that f() ad g() are oegative fuctios of. The we say that f() is O(g()) provided that there are costats C > 0 ad N > 0 such that for all > N, f() Cg(). Big-O expresses

More information

Lecture 18. Optimization in n dimensions

Lecture 18. Optimization in n dimensions Lecture 8 Optimizatio i dimesios Itroductio We ow cosider the problem of miimizig a sigle scalar fuctio of variables, f x, where x=[ x, x,, x ]T. The D case ca be visualized as fidig the lowest poit of

More information

Data Structures and Algorithms. Analysis of Algorithms

Data Structures and Algorithms. Analysis of Algorithms Data Structures ad Algorithms Aalysis of Algorithms Outlie Ruig time Pseudo-code Big-oh otatio Big-theta otatio Big-omega otatio Asymptotic algorithm aalysis Aalysis of Algorithms Iput Algorithm Output

More information

APPLICATION NOTE PACE1750AE BUILT-IN FUNCTIONS

APPLICATION NOTE PACE1750AE BUILT-IN FUNCTIONS APPLICATION NOTE PACE175AE BUILT-IN UNCTIONS About This Note This applicatio brief is iteded to explai ad demostrate the use of the special fuctios that are built ito the PACE175AE processor. These powerful

More information

Data diverse software fault tolerance techniques

Data diverse software fault tolerance techniques Data diverse software fault tolerace techiques Complemets desig diversity by compesatig for desig diversity s s limitatios Ivolves obtaiig a related set of poits i the program data space, executig the

More information

Bezier curves. Figure 2 shows cubic Bezier curves for various control points. In a Bezier curve, only

Bezier curves. Figure 2 shows cubic Bezier curves for various control points. In a Bezier curve, only Edited: Yeh-Liag Hsu (998--; recommeded: Yeh-Liag Hsu (--9; last updated: Yeh-Liag Hsu (9--7. Note: This is the course material for ME55 Geometric modelig ad computer graphics, Yua Ze Uiversity. art of

More information

Lecture 2: Spectra of Graphs

Lecture 2: Spectra of Graphs Spectral Graph Theory ad Applicatios WS 20/202 Lecture 2: Spectra of Graphs Lecturer: Thomas Sauerwald & He Su Our goal is to use the properties of the adjacecy/laplacia matrix of graphs to first uderstad

More information

Creating Exact Bezier Representations of CST Shapes. David D. Marshall. California Polytechnic State University, San Luis Obispo, CA , USA

Creating Exact Bezier Representations of CST Shapes. David D. Marshall. California Polytechnic State University, San Luis Obispo, CA , USA Creatig Exact Bezier Represetatios of CST Shapes David D. Marshall Califoria Polytechic State Uiversity, Sa Luis Obispo, CA 93407-035, USA The paper presets a method of expressig CST shapes pioeered by

More information

Stone Images Retrieval Based on Color Histogram

Stone Images Retrieval Based on Color Histogram Stoe Images Retrieval Based o Color Histogram Qiag Zhao, Jie Yag, Jigyi Yag, Hogxig Liu School of Iformatio Egieerig, Wuha Uiversity of Techology Wuha, Chia Abstract Stoe images color features are chose

More information

Sorting in Linear Time. Data Structures and Algorithms Andrei Bulatov

Sorting in Linear Time. Data Structures and Algorithms Andrei Bulatov Sortig i Liear Time Data Structures ad Algorithms Adrei Bulatov Algorithms Sortig i Liear Time 7-2 Compariso Sorts The oly test that all the algorithms we have cosidered so far is compariso The oly iformatio

More information

Convex hull ( 凸殻 ) property

Convex hull ( 凸殻 ) property Covex hull ( 凸殻 ) property The covex hull of a set of poits S i dimesios is the itersectio of all covex sets cotaiig S. For N poits P,..., P N, the covex hull C is the give by the expressio The covex hull

More information

Revisiting the performance of mixtures of software reliability growth models

Revisiting the performance of mixtures of software reliability growth models Revisitig the performace of mixtures of software reliability growth models Peter A. Keiller 1, Charles J. Kim 1, Joh Trimble 1, ad Marlo Mejias 2 1 Departmet of Systems ad Computer Sciece 2 Departmet of

More information

Running Time. Analysis of Algorithms. Experimental Studies. Limitations of Experiments

Running Time. Analysis of Algorithms. Experimental Studies. Limitations of Experiments Ruig Time Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Most algorithms trasform iput objects ito output objects. The

More information

Recursive Procedures. How can you model the relationship between consecutive terms of a sequence?

Recursive Procedures. How can you model the relationship between consecutive terms of a sequence? 6. Recursive Procedures I Sectio 6.1, you used fuctio otatio to write a explicit formula to determie the value of ay term i a Sometimes it is easier to calculate oe term i a sequece usig the previous terms.

More information

Alpha Individual Solutions MAΘ National Convention 2013

Alpha Individual Solutions MAΘ National Convention 2013 Alpha Idividual Solutios MAΘ Natioal Covetio 0 Aswers:. D. A. C 4. D 5. C 6. B 7. A 8. C 9. D 0. B. B. A. D 4. C 5. A 6. C 7. B 8. A 9. A 0. C. E. B. D 4. C 5. A 6. D 7. B 8. C 9. D 0. B TB. 570 TB. 5

More information

Pseudocode ( 1.1) Analysis of Algorithms. Primitive Operations. Pseudocode Details. Running Time ( 1.1) Estimating performance

Pseudocode ( 1.1) Analysis of Algorithms. Primitive Operations. Pseudocode Details. Running Time ( 1.1) Estimating performance Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Pseudocode ( 1.1) High-level descriptio of a algorithm More structured

More information

Lecture 5. Counting Sort / Radix Sort

Lecture 5. Counting Sort / Radix Sort Lecture 5. Coutig Sort / Radix Sort T. H. Corme, C. E. Leiserso ad R. L. Rivest Itroductio to Algorithms, 3rd Editio, MIT Press, 2009 Sugkyukwa Uiversity Hyuseug Choo choo@skku.edu Copyright 2000-2018

More information

ECE4050 Data Structures and Algorithms. Lecture 6: Searching

ECE4050 Data Structures and Algorithms. Lecture 6: Searching ECE4050 Data Structures ad Algorithms Lecture 6: Searchig 1 Search Give: Distict keys k 1, k 2,, k ad collectio L of records of the form (k 1, I 1 ), (k 2, I 2 ),, (k, I ) where I j is the iformatio associated

More information

SAMPLE VERSUS POPULATION. Population - consists of all possible measurements that can be made on a particular item or procedure.

SAMPLE VERSUS POPULATION. Population - consists of all possible measurements that can be made on a particular item or procedure. SAMPLE VERSUS POPULATION Populatio - cosists of all possible measuremets that ca be made o a particular item or procedure. Ofte a populatio has a ifiite umber of data elemets Geerally expese to determie

More information

Image Segmentation EEE 508

Image Segmentation EEE 508 Image Segmetatio Objective: to determie (etract) object boudaries. It is a process of partitioig a image ito distict regios by groupig together eighborig piels based o some predefied similarity criterio.

More information

Appendix A. Use of Operators in ARPS

Appendix A. Use of Operators in ARPS A Appedix A. Use of Operators i ARPS The methodology for solvig the equatios of hydrodyamics i either differetial or itegral form usig grid-poit techiques (fiite differece, fiite volume, fiite elemet)

More information

Running Time ( 3.1) Analysis of Algorithms. Experimental Studies. Limitations of Experiments

Running Time ( 3.1) Analysis of Algorithms. Experimental Studies. Limitations of Experiments Ruig Time ( 3.1) Aalysis of Algorithms Iput Algorithm Output A algorithm is a step- by- step procedure for solvig a problem i a fiite amout of time. Most algorithms trasform iput objects ito output objects.

More information

Analysis of Algorithms

Analysis of Algorithms Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Ruig Time Most algorithms trasform iput objects ito output objects. The

More information

OCR Statistics 1. Working with data. Section 3: Measures of spread

OCR Statistics 1. Working with data. Section 3: Measures of spread Notes ad Eamples OCR Statistics 1 Workig with data Sectio 3: Measures of spread Just as there are several differet measures of cetral tedec (averages), there are a variet of statistical measures of spread.

More information

How do we evaluate algorithms?

How do we evaluate algorithms? F2 Readig referece: chapter 2 + slides Algorithm complexity Big O ad big Ω To calculate ruig time Aalysis of recursive Algorithms Next time: Litterature: slides mostly The first Algorithm desig methods:

More information

15 UNSUPERVISED LEARNING

15 UNSUPERVISED LEARNING 15 UNSUPERVISED LEARNING [My father] advised me to sit every few moths i my readig chair for a etire eveig, close my eyes ad try to thik of ew problems to solve. I took his advice very seriously ad have

More information

Fast Fourier Transform (FFT) Algorithms

Fast Fourier Transform (FFT) Algorithms Fast Fourier Trasform FFT Algorithms Relatio to the z-trasform elsewhere, ozero, z x z X x [ ] 2 ~ elsewhere,, ~ e j x X x x π j e z z X X π 2 ~ The DFS X represets evely spaced samples of the z- trasform

More information

9.1. Sequences and Series. Sequences. What you should learn. Why you should learn it. Definition of Sequence

9.1. Sequences and Series. Sequences. What you should learn. Why you should learn it. Definition of Sequence _9.qxd // : AM Page Chapter 9 Sequeces, Series, ad Probability 9. Sequeces ad Series What you should lear Use sequece otatio to write the terms of sequeces. Use factorial otatio. Use summatio otatio to

More information

Designing a learning system

Designing a learning system CS 75 Itro to Machie Learig Lecture Desigig a learig system Milos Hauskrecht milos@pitt.edu 539 Seott Square, -5 people.cs.pitt.edu/~milos/courses/cs75/ Admiistrivia No homework assigmet this week Please

More information

A Novel Feature Extraction Algorithm for Haar Local Binary Pattern Texture Based on Human Vision System

A Novel Feature Extraction Algorithm for Haar Local Binary Pattern Texture Based on Human Vision System A Novel Feature Extractio Algorithm for Haar Local Biary Patter Texture Based o Huma Visio System Liu Tao 1,* 1 Departmet of Electroic Egieerig Shaaxi Eergy Istitute Xiayag, Shaaxi, Chia Abstract The locality

More information

1 Enterprise Modeler

1 Enterprise Modeler 1 Eterprise Modeler Itroductio I BaaERP, a Busiess Cotrol Model ad a Eterprise Structure Model for multi-site cofiguratios are itroduced. Eterprise Structure Model Busiess Cotrol Models Busiess Fuctio

More information

CS 683: Advanced Design and Analysis of Algorithms

CS 683: Advanced Design and Analysis of Algorithms CS 683: Advaced Desig ad Aalysis of Algorithms Lecture 6, February 1, 2008 Lecturer: Joh Hopcroft Scribes: Shaomei Wu, Etha Feldma February 7, 2008 1 Threshold for k CNF Satisfiability I the previous lecture,

More information

Our second algorithm. Comp 135 Machine Learning Computer Science Tufts University. Decision Trees. Decision Trees. Decision Trees.

Our second algorithm. Comp 135 Machine Learning Computer Science Tufts University. Decision Trees. Decision Trees. Decision Trees. Comp 135 Machie Learig Computer Sciece Tufts Uiversity Fall 2017 Roi Khardo Some of these slides were adapted from previous slides by Carla Brodley Our secod algorithm Let s look at a simple dataset for

More information

BAYESIAN WITH FULL CONDITIONAL POSTERIOR DISTRIBUTION APPROACH FOR SOLUTION OF COMPLEX MODELS. Pudji Ismartini

BAYESIAN WITH FULL CONDITIONAL POSTERIOR DISTRIBUTION APPROACH FOR SOLUTION OF COMPLEX MODELS. Pudji Ismartini Proceedig of Iteratioal Coferece O Research, Implemetatio Ad Educatio Of Mathematics Ad Scieces 014, Yogyakarta State Uiversity, 18-0 May 014 BAYESIAN WIH FULL CONDIIONAL POSERIOR DISRIBUION APPROACH FOR

More information

Case Studies in the use of ROC Curve Analysis for Sensor-Based Estimates in Human Computer Interaction

Case Studies in the use of ROC Curve Analysis for Sensor-Based Estimates in Human Computer Interaction Case Studies i the use of ROC Curve Aalysis for Sesor-Based Estimates i Huma Computer Iteractio James Fogarty Rya S. Baker Scott E. Hudso Huma Computer Iteractio Istitute Caregie Mello Uiversity Abstract

More information

Markov Chain Model of HomePlug CSMA MAC for Determining Optimal Fixed Contention Window Size

Markov Chain Model of HomePlug CSMA MAC for Determining Optimal Fixed Contention Window Size Markov Chai Model of HomePlug CSMA MAC for Determiig Optimal Fixed Cotetio Widow Size Eva Krimiger * ad Haiph Latchma Dept. of Electrical ad Computer Egieerig, Uiversity of Florida, Gaiesville, FL, USA

More information

One advantage that SONAR has over any other music-sequencing product I ve worked

One advantage that SONAR has over any other music-sequencing product I ve worked *gajedra* D:/Thomso_Learig_Projects/Garrigus_163132/z_productio/z_3B2_3D_files/Garrigus_163132_ch17.3d, 14/11/08/16:26:39, 16:26, page: 647 17 CAL 101 Oe advatage that SONAR has over ay other music-sequecig

More information

Python Programming: An Introduction to Computer Science

Python Programming: An Introduction to Computer Science Pytho Programmig: A Itroductio to Computer Sciece Chapter 1 Computers ad Programs 1 Objectives To uderstad the respective roles of hardware ad software i a computig system. To lear what computer scietists

More information

New HSL Distance Based Colour Clustering Algorithm

New HSL Distance Based Colour Clustering Algorithm The 4th Midwest Artificial Itelligece ad Cogitive Scieces Coferece (MAICS 03 pp 85-9 New Albay Idiaa USA April 3-4 03 New HSL Distace Based Colour Clusterig Algorithm Vasile Patrascu Departemet of Iformatics

More information

Lecture Notes 6 Introduction to algorithm analysis CSS 501 Data Structures and Object-Oriented Programming

Lecture Notes 6 Introduction to algorithm analysis CSS 501 Data Structures and Object-Oriented Programming Lecture Notes 6 Itroductio to algorithm aalysis CSS 501 Data Structures ad Object-Orieted Programmig Readig for this lecture: Carrao, Chapter 10 To be covered i this lecture: Itroductio to algorithm aalysis

More information

35 YEARS OF ADVANCEMENTS WITH THE COMPLEX VARIABLE BOUNDARY ELEMENT METHOD

35 YEARS OF ADVANCEMENTS WITH THE COMPLEX VARIABLE BOUNDARY ELEMENT METHOD N. J. DeMoes et al., It. J. Comp. Meth. ad Exp. Meas., Vol. 0, No. 0 (08) 3 35 YEARS OF ADVANCEMENTS WITH THE COMPLEX VARIABLE BOUNDARY ELEMENT METHOD Noah J. DeMoes, Gabriel T. Ba, Bryce D. Wilkis, Theodore

More information

Data Warehousing. Paper

Data Warehousing. Paper Data Warehousig Paper 28-25 Implemetig a fiacial balace scorecard o top of SAP R/3, usig CFO Visio as iterface. Ida Carapelle & Sophie De Baets, SOLID Parters, Brussels, Belgium (EUROPE) ABSTRACT Fiacial

More information

Parabolic Path to a Best Best-Fit Line:

Parabolic Path to a Best Best-Fit Line: Studet Activity : Fidig the Least Squares Regressio Lie By Explorig the Relatioship betwee Slope ad Residuals Objective: How does oe determie a best best-fit lie for a set of data? Eyeballig it may be

More information

Hash Tables. Presentation for use with the textbook Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015.

Hash Tables. Presentation for use with the textbook Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015. Presetatio for use with the textbook Algorithm Desig ad Applicatios, by M. T. Goodrich ad R. Tamassia, Wiley, 2015 Hash Tables xkcd. http://xkcd.com/221/. Radom Number. Used with permissio uder Creative

More information

An Improved Shuffled Frog-Leaping Algorithm for Knapsack Problem

An Improved Shuffled Frog-Leaping Algorithm for Knapsack Problem A Improved Shuffled Frog-Leapig Algorithm for Kapsack Problem Zhoufag Li, Ya Zhou, ad Peg Cheg School of Iformatio Sciece ad Egieerig Hea Uiversity of Techology ZhegZhou, Chia lzhf1978@126.com Abstract.

More information

Analysis Metrics. Intro to Algorithm Analysis. Slides. 12. Alg Analysis. 12. Alg Analysis

Analysis Metrics. Intro to Algorithm Analysis. Slides. 12. Alg Analysis. 12. Alg Analysis Itro to Algorithm Aalysis Aalysis Metrics Slides. Table of Cotets. Aalysis Metrics 3. Exact Aalysis Rules 4. Simple Summatio 5. Summatio Formulas 6. Order of Magitude 7. Big-O otatio 8. Big-O Theorems

More information

Descriptive Statistics Summary Lists

Descriptive Statistics Summary Lists Chapter 209 Descriptive Statistics Summary Lists Itroductio This procedure is used to summarize cotiuous data. Large volumes of such data may be easily summarized i statistical lists of meas, couts, stadard

More information

On (K t e)-saturated Graphs

On (K t e)-saturated Graphs Noame mauscript No. (will be iserted by the editor O (K t e-saturated Graphs Jessica Fuller Roald J. Gould the date of receipt ad acceptace should be iserted later Abstract Give a graph H, we say a graph

More information

Administrative UNSUPERVISED LEARNING. Unsupervised learning. Supervised learning 11/25/13. Final project. No office hours today

Administrative UNSUPERVISED LEARNING. Unsupervised learning. Supervised learning 11/25/13. Final project. No office hours today Admiistrative Fial project No office hours today UNSUPERVISED LEARNING David Kauchak CS 451 Fall 2013 Supervised learig Usupervised learig label label 1 label 3 model/ predictor label 4 label 5 Supervised

More information

The Magma Database file formats

The Magma Database file formats The Magma Database file formats Adrew Gaylard, Bret Pikey, ad Mart-Mari Breedt Johaesburg, South Africa 15th May 2006 1 Summary Magma is a ope-source object database created by Chris Muller, of Kasas City,

More information

Appendix D. Controller Implementation

Appendix D. Controller Implementation COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Appedix D Cotroller Implemetatio Cotroller Implemetatios Combiatioal logic (sigle-cycle); Fiite state machie (multi-cycle, pipelied);

More information

BASED ON ITERATIVE ERROR-CORRECTION

BASED ON ITERATIVE ERROR-CORRECTION A COHPARISO OF CRYPTAALYTIC PRICIPLES BASED O ITERATIVE ERROR-CORRECTIO Miodrag J. MihaljeviC ad Jova Dj. GoliC Istitute of Applied Mathematics ad Electroics. Belgrade School of Electrical Egieerig. Uiversity

More information

Empirical Validate C&K Suite for Predict Fault-Proneness of Object-Oriented Classes Developed Using Fuzzy Logic.

Empirical Validate C&K Suite for Predict Fault-Proneness of Object-Oriented Classes Developed Using Fuzzy Logic. Empirical Validate C&K Suite for Predict Fault-Proeess of Object-Orieted Classes Developed Usig Fuzzy Logic. Mohammad Amro 1, Moataz Ahmed 1, Kaaa Faisal 2 1 Iformatio ad Computer Sciece Departmet, Kig

More information

An Efficient Algorithm for Graph Bisection of Triangularizations

An Efficient Algorithm for Graph Bisection of Triangularizations A Efficiet Algorithm for Graph Bisectio of Triagularizatios Gerold Jäger Departmet of Computer Sciece Washigto Uiversity Campus Box 1045 Oe Brookigs Drive St. Louis, Missouri 63130-4899, USA jaegerg@cse.wustl.edu

More information

Kernel Smoothing Function and Choosing Bandwidth for Non-Parametric Regression Methods 1

Kernel Smoothing Function and Choosing Bandwidth for Non-Parametric Regression Methods 1 Ozea Joural of Applied Scieces (), 009 Ozea Joural of Applied Scieces (), 009 ISSN 943-49 009 Ozea Publicatio Kerel Smoothig Fuctio ad Choosig Badwidth for No-Parametric Regressio Methods Murat Kayri ad

More information

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5 Morga Kaufma Publishers 26 February, 28 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 5 Set-Associative Cache Architecture Performace Summary Whe CPU performace icreases:

More information

27 Refraction, Dispersion, Internal Reflection

27 Refraction, Dispersion, Internal Reflection Chapter 7 Refractio, Dispersio, Iteral Reflectio 7 Refractio, Dispersio, Iteral Reflectio Whe we talked about thi film iterferece, we said that whe light ecouters a smooth iterface betwee two trasparet

More information

1. Introduction o Microscopic property responsible for MRI Show and discuss graphics that go from macro to H nucleus with N-S pole

1. Introduction o Microscopic property responsible for MRI Show and discuss graphics that go from macro to H nucleus with N-S pole Page 1 Very Quick Itroductio to MRI The poit of this itroductio is to give the studet a sufficietly accurate metal picture of MRI to help uderstad its impact o image registratio. The two major aspects

More information

n n B. How many subsets of C are there of cardinality n. We are selecting elements for such a

n n B. How many subsets of C are there of cardinality n. We are selecting elements for such a 4. [10] Usig a combiatorial argumet, prove that for 1: = 0 = Let A ad B be disjoit sets of cardiality each ad C = A B. How may subsets of C are there of cardiality. We are selectig elemets for such a subset

More information

Package popkorn. R topics documented: February 20, Type Package

Package popkorn. R topics documented: February 20, Type Package Type Pacage Pacage popkor February 20, 2015 Title For iterval estimatio of mea of selected populatios Versio 0.3-0 Date 2014-07-04 Author Vi Gopal, Claudio Fuetes Maitaier Vi Gopal Depeds

More information

Chapter 1. Introduction to Computers and C++ Programming. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

Chapter 1. Introduction to Computers and C++ Programming. Copyright 2015 Pearson Education, Ltd.. All rights reserved. Chapter 1 Itroductio to Computers ad C++ Programmig Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Overview 1.1 Computer Systems 1.2 Programmig ad Problem Solvig 1.3 Itroductio to C++ 1.4 Testig

More information

Lecture 1: Introduction and Strassen s Algorithm

Lecture 1: Introduction and Strassen s Algorithm 5-750: Graduate Algorithms Jauary 7, 08 Lecture : Itroductio ad Strasse s Algorithm Lecturer: Gary Miller Scribe: Robert Parker Itroductio Machie models I this class, we will primarily use the Radom Access

More information

Task scenarios Outline. Scenarios in Knowledge Extraction. Proposed Framework for Scenario to Design Diagram Transformation

Task scenarios Outline. Scenarios in Knowledge Extraction. Proposed Framework for Scenario to Design Diagram Transformation 6-0-0 Kowledge Trasformatio from Task Scearios to View-based Desig Diagrams Nima Dezhkam Kamra Sartipi {dezhka, sartipi}@mcmaster.ca Departmet of Computig ad Software McMaster Uiversity CANADA SEKE 08

More information

ENGI 4421 Probability and Statistics Faculty of Engineering and Applied Science Problem Set 1 Descriptive Statistics

ENGI 4421 Probability and Statistics Faculty of Engineering and Applied Science Problem Set 1 Descriptive Statistics ENGI 44 Probability ad Statistics Faculty of Egieerig ad Applied Sciece Problem Set Descriptive Statistics. If, i the set of values {,, 3, 4, 5, 6, 7 } a error causes the value 5 to be replaced by 50,

More information

Ch 9.3 Geometric Sequences and Series Lessons

Ch 9.3 Geometric Sequences and Series Lessons Ch 9.3 Geometric Sequeces ad Series Lessos SKILLS OBJECTIVES Recogize a geometric sequece. Fid the geeral, th term of a geometric sequece. Evaluate a fiite geometric series. Evaluate a ifiite geometric

More information