Evaluating Alignment Methods in Dynamic Microsimulation Models 1. Jinjing Li 2. Maastricht University. Cathal O'Donoghue 3

Size: px
Start display at page:

Download "Evaluating Alignment Methods in Dynamic Microsimulation Models 1. Jinjing Li 2. Maastricht University. Cathal O'Donoghue 3"

Transcription

1 Evaluatng Algnment Methods n Dynamc Mcrosmulaton Models 1 Jnjng L 2 Maastrcht Unversty Cathal O'Donoghue 3 Rural Economy and Development Programme, Teagasc Abstract: Algnment s a wdely adopted technque n the feld of mcrosmulaton for socal and economc polcy research. However, lmted research has been devoted to the understandng of ther smulaton propertes. Ths paper dscusses and evaluates sx common algnment algorthms used n the dynamc mcrosmulaton through a set of theoretcal and statstcal crtera proposed n the earler lterature (e.g. Morrson 2006; O Donoghue 2010). Ths paper presents and compares the algnment processes, probablty transformatons, and the statstcal propertes of algnment outputs n transparent and controlled setups wth both synthetc and real lfe dataset (LII). The result suggests that there s no sngle best method for all smulaton scenaros. Instead, the choce of algnment method mght need to be adapted to the assumptons and requrements n a specfc project. Key words: algnment, mcrosmulaton, algorthm evaluaton 1 The authors are grateful to Rck Morrson, Howard Redway and Steven Caldwell for helpful dscussons over tme n relaton to algnment n mcrosmulaton models. We are grateful to the Luxembourg AFR for supportng ths research. 2 Emal address: Jnjng.L@maastrchtunversty.nl 3 Emal address: Cathal.odonoghue@teagasc.e 1

2 Evaluatng Algnment Methods n Dynamc Mcrosmulaton Models I. INTRODUCTION Mcrosmulaton s a technque used to model complex real lfe events by smulatng the actons and the mpact of polcy change on the ndvdual mcro unt. (Hardng, 2007) Mcrosmulaton models are usually categorsed nto statc or dynamc. Statc models, e.g. EUROMOD (Mantovan et al., 2007), are often arthmetc models that evaluate the mmedate dstrbutonal mpact upon ndvduals/households of possble polcy changes. Dynamc models, e.g. DESTINIE, PENSIM, SESIM (Bardaj et al., 2003, Curry, 1996, Flood, 2007), extend the statc model by allowng the ndvduals to change ther characterstcs as a result of endogenous factors wthn the model (O Donoghue, 2001). Usng ths method, t s possble to generate new smulated populatons that can be used for polcy and scenaro analyss. Dynamc mcrosmulaton models typcally smulate behavoural processes such as demographc (e.g. marrage), labour market (e.g. unemployment) and ncome characterstcs (e.g. wage). The method uses statstcal estmates of these systems of equatons and then apples Monte Carlo smulaton technques to generate the new populatons, typcally over tme, both nto the future and when creatng hstores wth partal data, nto the past. As statstcal models are typcally estmated on hstorcal datasets wth specfc characterstcs and perod effects, projectons of the future may therefore contan error or may not correspond to exogenous expectatons of future events. In addton, the complexty of mcro behavour may mean that smulaton models may over or under predct the occurrence of a certan event, even n a well-specfed model (Duncan and Weeks, 1998). Because of these ssues, methods of calbraton known as algnment have been developed wthn the mcrosmulaton lterature to correct for ssues related to the adequacy of mcro projectons. Scott (2001) defnes algnment as a process of constranng model output to conform more closely to externally derved macro-data ('targets'). There are both arguments for and aganst algnment procedures (Baekgaard, H., 2002). Concerns drected towards algnment manly focus on the consstency ssue wthn the estmates and the level of dsaggregaton at whch ths should occur. It s suggested that equatons should be reformulated rather than constraned ex post. Clearly, n an deal world, one would try to estmate a system of equatons that could replcate realty and have effectve future projectons wthout the need for algnment. However, as Wnder (2000) stated, mcrosmulaton models usually fal to smulate known tme-seres data. By algnng the model, goodness of ft to an observed tme seres can be guaranteed. Some modellers suggest that algnment s an effectve pragmatc soluton for hghly complex models. (O Donoghue, 2010) Over the past decade, algnng the output of a mcrosmulaton model to exogenous assumptons has become standard despte ths controversy. In order to meet the need of algnment, varous methods, e.g. multplcatve scalng, sdewalk, sortng based algorthm etc., have been expermented along wth the development of mcrosmulaton (See Morrson, 2006). Mcrosmulaton models usng hstorcal datasets, e.g. CORSIM, algn the output to hstorcal data to create a more credble profle (SOA, 2001). Models that work prospectvely, e.g. APPSIM, also utlse the technque to algn ther smulaton wth external projectons (Kelly and Percval, 2009). 2

3 Nonetheless, the understandng of the smulaton propertes of algnment n mcrosmulaton models s very lmted. Lterature on ths topc are scarce, wth a few exceptons such as Anderson (1990), Caldwell et al. (1998), Neufeld (2000), Chénard (2000a, 2000b), Johnson (2001), Baekgaard (2002), Morrson (2006), Kelly and Percval (2009) and O Donoghue (2010). Although some new algnment methods were developed n an attempt to address some theoretcal and emprcal defcences of earler methods, dscussons on emprcal smulaton propertes of dfferent algnment algorthms are almost non-exstent. Ths paper ams to fll ths gap and better understand the smulaton propertes of algnment algorthms n mcrosmulaton. It evaluates all major bnary algnment methods usng a smple mcrosmulaton model wth a set of synthetc datasets and a real lfe dataset. It compares the algnment processes, probablty transformatons, and the statstcal propertes of algnment outputs n transparent and controlled setups. In addton, a real lfe panel dataset, Lvng n Ireland (LII), s used together wth a smplfed mcrosmulaton model to evaluate the algnment performances n typcal mcrosmulaton project setup. Algnment performances are tested usng varous evaluaton crtera, ncludng the ones outlned n Morrson (2006). The present paper s dvded nto 6 sectons. In the next secton, we wll revew the background to the algnment methodology used n mcrosmulaton and summarzes the exstng algorthms used n varous models. Secton 3 dscusses the objectves of algnment and the method of algorthm evaluaton. Secton 4 descrbes the detal of the datasets used n the evaluaton process and some key statstcs. We wll present the results of the evaluaton n secton 5, and conclude n the last secton. II. ALIGNMENT IN MICROSIMULATION Ths secton dscusses the purpose of algnment n a mcrosmulaton model and the common practse of ther statstcal mplementaton. Baekgaard (2000) suggests two broad categores for algnment: parameter algnment, whereby the dstrbuton functon s changed by adjustment of ts parameters; and ex post algnment, whereby algnment s performed on the bass of unadjusted predctons or nterm output from a smulaton. Ths paper prmarly focuses on the ex post algnment methods, as they are the most common form of algnments n mcrosmulaton. Models of contnuous events such as the level of earnngs or nvestment ncome utlse statstcal regressons wth contnuous dependent varables and produce a dstrbuton of contnuous values. However, the predcton of the statstcal model may devate from the expectaton for example due to an expected change n the dstrbuton or productvty or may need to be adjusted for scenaro analyss. Ths rases the need for algnment, whch s often may be an adjustment of multplcatve appled contnuous varables or va adjustng the error dstrbuton (Chénard, 2000a). For bnary varables however, one cannot not apply the same method, as bnary varable smulaton uses dscrete choce models such as logt, probt or multnomal logt models and the outputs cannot be adjusted n ths way lke contnuous varables. As the majorty 3

4 of processes, e.g. n-work, employment, health, retrement, etc., n dynamc mcrosmulaton models are bnary choce n nature, ths paper focus ts attenton on the algnment of bnary choce models. Models of dscrete events such as n-work, employment status, dsablty status etc. are typcally produce probabltes of the event occurrng as output. These models can be expressed n the followng generc form: f p X (1) As seen, equaton 1 can be dvded nto a determnstc component X and a stochastc component. In a smple Monte Carlo smulaton, we generate the random number *, adjust the model for endogenous changes n the explanatory varables to * produce a new determnstc component X and smulate a new dependent varable. In the case of a bnary choce we produce 4 : * * * f p X (2) * The dependent varable s predcted to have a value 1 f f p 0 and 0 otherwse 5. In most cases, a mcrosmulaton model apples ths predcton process to all observatons ndvdually wthout nteracton. However, ths may lead to a potental sde effect: The output of the predcaton, although t may look reasonable at each ndvdual level, may not meet the modeller s expectaton at the aggregate level. For nstance, the smulated average earnng mght be hgher or lower than the assumpton, or the n-work rate s beyond the expectaton. Therefore, algnment s ntroduced as the step after the ntal predcton n order to correct ths error. Although the theoretcal debate of algnment s not over, algnment s de facto wdely adopted n the models bult or updated wthn last decade, e.g. DYANACAN (Neufeld, 2000), CORSIM (SOA, 1997), APPSIM (Bacon, 2009). Many papers, e.g. Baekgaard (2002), Bacon (2009) and O Donoghue (2010), have dscussed the man reasons for algnment, and summarse them as follows: Algnment may be used to repar the unfortunate consequences of nsuffcent estmaton data by ncorporatng addtonal nformaton n the smulatons. Snce no country has an deal dataset for estmatng all the parameters needed for mcrosmulaton, modellers often make compromses, whch adversely affects the output qualty. Algnment can be used to fx some of these errors. Algnment can be used to adjust for poor predctve performance of the mcro model or ts msspecfcaton. Even wth perfect data, relatonshps between dependent varables and explanatory varables may change consderably n countres where substantal structural changes are takng place. Algnment allows * 4 * * p Note f p n the case of a logt model s defned as f p ln 1 * p 5 A more detaled descrpton of logt based dscrete model n mcrosmulaton can be found n O Donoghue (2010) 4

5 one to correct for these ssues and make the smulaton consstent wth holstc projecton assumptons. Algnment provdes an opportunty for producng scenaros based on dfferent assumptons. Examples nclude the smulaton of alternatve recesson scenaros on employment wth dfferent mpacts on dfferent socal groups (e.g. sex, educaton or occupaton) Algnment s nstrumental n establshng lnks between mcrosmulaton models of the household sector and the macro models. It s a crucal step to reach a consstent Mcro-Macro smulaton model (see Daves 2004). Algnment can be used to reduce Monte Carlo varablty though ts determnstc calculaton (Neufeld, 2000). Ths s partcularly useful for small samples to confne the varablty of aggregate statstcs. Algnment Methods In order to calbrate a smulaton of a bnary varable, we need a method that can adjust the outcome of a logt or probt model to produce outcomes that are consstent wth the external total. At the moment, there s no standardsed method for mplementng algnment n mcrosmulaton. Gven that dfferent modellers may have dfferent vews or needs, t s not surprsng that varous bnary algnment methods have appeared. Papers by Neufeld (2000), Morrson (2006) and O Donoghue (2010) provde descrptons on some popular optons for algnment used n the lterature. Exstng documented algnment methods nclude Multplcatve Scalng Sdewalk Shuffle, Sdewalk Hybrd and ther dervatves Central Lmt Theorem Approach Algnment by Sortng (wth dfferent sortng varables) Multplcatve scalng, whch was descrbed n Neufeld (2000), nvolves undertakng an unalgned smulaton usng Monte Carlo technques and then comparng the proporton of transtons wth the external control total. The rato between the desred transton rate and the actual transton s calculated and appled n a second pass to the smulated probabltes. The method, however, s crtczed by Morrson (2006) as probabltes are not lmted to the range 0-1, although the problem s rare n practce as the multplcatve rato tends to be small. Neufeld (2000) suggests solutons to ths may nclude usng nonlnear adjustment. The sdewalk method was frst ntroduced n Neufeld (2000) as a varance reducton technque, whch was also used as an alternatve to pure Monte Carlo smulaton. It reduces the possblty of unlkely smulated outcomes because of the use of random numbers. The orgnal method, however, does not algn the smulated data to an external control. It smply nvolves accumulatng a runnng total of predcted probabltes. Once the accumulaton exceeds 1, a transton occurs. Therefore, t elmnates the use of random numbers as a varance reducton technque. Nevertheless, the method has some dffcultes n output replcatons when the order observatons changes. The order of the observatons may be altered due to the deleton of an observaton (e.g. deaths) or other changes. Seral correlaton wthn famles (or other clusterng unt) s also an ssue as people wthn the cluster are smulated n order. It s therefore unlkely for two people 5

6 wthn a famly to be smulated to make a transton n one year f the transtonal probabltes are low. Neufeld (2000) further developed an algnment method that he characterzed as a hybrd of ndependent Monte Carlo smulaton and the sdewalk method. DYNACAN adopted ths method wth non-lnear adjustment to the equaton-generated probabltes, combned wth a mnor tweakng of the resultng probabltes dependng on whether the smulated rate s ahead of or behnd the target rate for the pool durng the progress and some randomsatons. (Morrson, 2006). The method calbrates the probabltes through the logt transformaton nstead usng probabltes drectly n order to assure the values are bounded between 0 and 1. (SOA, 1998) Sdewalk Hybrd method requres two key parameters, whch decdes how smlar the output s to standard Monte Carlo or standard sdewalk method. The Central Lmt Theorem approach s descrbed n Morrson (2006). It utlses the assumpton that the mean smulated probablty s close to the expected mean when N s large. It manpulates the probabltes of each ndvdual observaton on the fly so that the smulated mean matches the expectaton. A more detaled descrpton of the method can be found n Morrson (2006). As all the methods we have dscussed so far, ths method does not need any sortng routne. Algnment by sortng was frst documented by O Donoghue (2001) and Johnson (2001). It nvolves sortng of the predcted probablty adjusted wth a stochastc component, and selects desred number of events accordng to the sortng order. It s seen as a more transparent method (O Donoghue, 2010) although computatonally more ntensve due to the sortng procedure. Many varatons of the methods have been used n the past years and we wll dscuss the mostly used three algorthms n ths paper: Sort on predcted probablty (SOP), Sort on the dfference between predcted probablty and random number (SOD), and Sort on the dfference between logstc adjusted predcted probablty and random number (SODL). Sort on predcted probablty (SOP) Assumng that the predcted probablty from a logt model can be defned as: p * * exp X 1 exp X * (3) * p s the predcted probablty, both and are estmated coeffcents. Ths method essental pcks up the observatons wth hghest p * n each algnment pool. One consequence, however, s that those wth the hghest rsk are always beng selected for transton. In the example of n-work, the hgher educated, all other thngs beng equal would be selected to have a job. In realty those wth the hghest rsk wll on average be selected more than those wth lower rsk, but not always be selected. As a result some varablty needs to be ntroduced. Kelly and Percval (2009) propose a varant of ths 6

7 method, where a proporton (typcally 10% of the desred number) are selected when the sortng order s nverted, so as to allow low rsk unts to make a transton. Sort on the dfference between predcted probablty and random number (SOD) Gven the shortcomng of the smple probablty sortng, Baekgaard (2002) uses another method, whch sorts by dfferences between predcted probablty and a random number. * Instead of sortng the probablty p drectly, t sorts r, whch equals to the dfference * between p and a random number u, a number that s unformly dstrbuted between 0 and 1. Mathematcally, ths sortng varable can be defned as follows: -1 exp X r logt Xu u 1exp X (4) A concern about ths method s that the range of possble sortng values s not the same for each pont. In other words, because the random number u [0,1] s subtracted from the determnstcally predcted p*, and the sortng value takes the range r [ 1,1]. For each ndvdual, r wll only take a possble range r [u1,u ]. As a result, when p* s small say 0.1, the range of possble sortng values s [-0.9, 0.1]. At the other extreme f p* s large say = 0.9, then the range of possble sortng values s [-0.1, 0.9]. Thus because there s only a small overlap for these extreme ponts, an ndvdual wth a small p* wll have a very low chance of beng selected even f a low value random number s pared wth the observaton. Ideally the range of possble sortng values should be the same, so that for each ndvdual, r [ a,b], wth ndvduals wth a low p* beng clustered towards the bottom and those wth a hgh p* beng clustered towards the top. Sort on the dfference between logstc adjusted predcted probablty and random number (SODL) An alternatve method descrbed n O Donoghue et al. (2008) and Morrson (2006) tres to mtgate the above problem by usng logstc transformaton. Ths method takes a predcted logstc varable from a logt model, logt( p) Xcombned wth a random number that s drawn from a logstc dstrbuton to produce a randomsed varable: -1 p logt X (5) p s then used to sort ndvduals and smlarly the top n j of households are selected. The sortng varable can therefore be descrbed as follows: -1 exp( X ) r logt ( X ) 1 exp( X ) (6) s a logstcally dstrbuted random number wth mean value 0 and a standard error of / 3.Snce the random number s not unformly dstrbuted as u n the prevous method, t produces a dfferent sortng order. III. METHODS OF EVALUATING ALIGNMENT ALGORITHM 7

8 In order to evaluate the smulaton propertes of all algnment algorthms, t s mportant to defne what we need to compare, and what the crtera are. Although dfferent algnment methods have been brefly documented n a few papers, there s lttle dscusson on the actual performance dfferences among these methods. Implementatons vary from model to model, but no paper so far valdates the algnment methods. Ths paper tres to evaluate dfferent algorthms and compares how they perform under dfferent scenaros. Objectves of Algnment The objectves of algnment, dscussed n Morrson (2006) and O Donoghue (2010) serve as the bass of our evaluaton crtera. From a practcal pont of vew, a good algnment algorthm should be able to a) Replcate as close as possble the external control totals for the algnment totals. Ths s one of the man reasons why algnment s mplemented n mcrosmulaton and the common goal of all algnment methods as dscussed vrtually all algnment papers, e.g. Neufeld (2000), Morrson (2006) b) Retan the relatonshp between the determnstc and explanatory varables n the determnstc component of the model (O Donoghue 2010). In achevng the external totals, the algnment process should not bas the underlyng relatonshp between the dependent and explanatory varables. c) Retan the shape of dstrbutons n dfferent subgroup and nter-relatons unless there s a reason not to do t. Morrson (2006) suggests that algnment s about mplementng the rght numbers of events n the rght proportons for a pool s prospectve events, as opposed to smply gettng the rght expected numbers of events. Although algnment processes focus on the aggregated output, t should not sgnfcantly dstort the relatve dstrbuton wthn dfferent sub-groups. For nstance, f we want to algn the number of people n work, we not only want to get the numbers rght at the aggregate level, but also at the mcro/meso level, e.g. the labour partcpaton rate for 30 years old should be hgher than the rate for the 80 years old. Ths relatve dstrbuton should not be changed, at least substantally, by the algnment method. A hghly dstorted algnment process would adversely affect the dstrbutonal analyss, a typcal usage of mcrosmulaton models. d) Compute effcently. There s no doubt that today s computng resources have been more much more abundant that ever. However, when handlng large dataset, e.g. full populaton dataset, computatonal constrant s stll an mportant ssue. Some projects, e.g. LIAM2/MDaL (Legeos, 2010), redesgn the entre framework n order to acheve faster speed and accommodate larger datasets. Indcators of algnment performance In order to assess the algnment algorthms wth very dfferent desgns, the paper uses a set of quanttatve ndcators that can measure the smulaton propertes accordng to the crtera dscussed earler. The ndcators nclude A general ft measure: a false postve rate Pr( Y 1 0) and a false negatve rate Pr( Y 0 1), whch reflect how well the predcton ft the actual data n general. 8

9 A target devaton ndex (TDI), whch measures the dfference between the external control and the smulaton outcome. Ths ndcator s drectly lnked to the frst crteron. A dstrbuton devaton ndex (DDI), whch measures the dstorton of the relatonshp between dfferent varables and nter-relatons, as dscussed n crtera two and three. And a computatonal effcency measurement: The number of seconds t takes to execute one round of algnment as outlned n crteron four. Target Devaton Index (TDI) Assumng among N observatons, the deal number of events s T and the actual smulated number of events after algnment s S. Target Devaton Index (TDI) s defned as TDI T S N (7) It s a percentage number ranged 0 to 1, and shows how the algnment replcates the external control. Hgher values mply the outcome s further away from the external control. It s a straghtforward ndcator to evaluate the frst crteron. Dstrbuton devaton ndex (DDI) In order to evaluate the second and the thrd crtera, t s necessary to fnd an ndcator that can reflect how well the relatonshps are preserved and how dfferent the new dstrbuton s from the old one. A frst method could be to compare the orgnal coeffcents wth re-estmated coeffcents from algned data. Statstcally dentcal coeffcents ndcate that the relatonshp remans the same, at least mathematcally. However, ths mght not be appled to algnment tests as algnment tself, by defnton, dstorts the orgnal probabltes. The coeffcents, as a result, are bound to change even under an optmal algnment, and n most cases, the correct algned coeffcents are not avalable. A second method to compare the relatonshps s to see whether the dstrbuton of key varables have changed after algnment, e.g. whether the proporton of male workers and females workers have changed substantally. A Ch-square test could be useful for ths scenaro, as t s frequently used to test whether the observed dstrbuton follows the theoretcal dstrbuton. It s defned as n 2 ( O E) E 1 2 (8) Nevertheless, the test tself s not desgned for bnary values and requres "no more than 20% of the expected counts to be less than 5 and all ndvdual expected counts are 1 or greater" (Yates, Moore & McCabe, 1999). Ths requrement mght not be always fulflled n mcrosmulaton dependng on the scenaro assumptons and the way groups are defned. As a result, an adaptaton s requred n order to best measure the devaton between two dstrbutons for the purpose of bnary varables and possbly low or zero expected counts. 9

10 Ths paper proposes a self-defned dstrbuton devaton ndex (DDI) to evaluate the second and thrd crtera n choosng an algnment method. Assumng we are gong to evaluate the dstrbuton dstorton n a sngle algnment pool va a groupng varable X. X could be anythng lke age, gender, or age gender nteracton etc. N Observatons are dvded among nx ( ) cells. S s the mean value of events occurrence after algnment n group, and O s the observed value n the base dataset. If we defne R as the algnment rato used n the algnng process, ORwould represent the expected value after algnment. A dstrbuton devaton ndex (DDI), therefore, can be defned as N DDI S OR N n ( X ) 1 2 (9) Ths ndcator descrbes how well the mcro-smulated data retan the relatonshps between dependent varable and varable X. It s a mnmum dstance estmaton talored for bnary varable outcome n a smulaton. Essentally, DDI calculates the sum of squares of dfferences weghted by the number of observatons. It measures the dfferences between dstrbutons before and after algnment n multple dmensons, dependng on the vector X. When X s an ndependent varable, t measures the dstorton ntroduced between the ndependent varable and the dependent by algnment. When X s the dependent varable, DDI reports the degree of nonlnearty n the probablty dstorton of algnment. When X s a varable outsde of the equaton, DDI assesses the level of dstorton n an mplct relatonshp. In short, X could be a vector consstng of any varable and nteracton terms. The ndcator s postvely correlated wth the algnment devaton, t ncreases when the algned dstrbuton departs from the orgnal and decreases when the dstrbutons are gettng alke. The scale of the ndcator s ndependent to the choce of varable X and the number of groups that X may produce. Snce S and O are both probabltes between 0 and 1. DDI has a range of 0 to 1. When the dataset preserves the shape of dstrbuton perfectly, the ndex has a value of 0. It ncreases when the dfference of two redstrbutons grows, wth a maxmum value of 1. Computaton effcency The most ntutve ndcator for the computatonal effcency of an algnment algorthm s the executon tme: the length of tme an algnment method takes to execute one round of algnment wth nput n randomsed order. In order to have comparable nputs and outputs, all methods are requred to retan the ntal order of nputs. Ths makes the algorthm ready as a module n the mcrosmulaton model. However, ths extra requrement penalzes the speed of the methods that requre randomly shufflng, as the observatons need to be re-sorted before the end of the executon. The evaluaton of the computatonal effcency s performed n Stata because of ts easy ntegraton of estmaton and smulaton. Gven that the computer speed vares much, the results presented n ths paper may change dramatcally on a dfferent platform although we would expect the relatve rankng to reman stable n most cases. Algnment algorthms evaluated 10

11 Ths paper evaluates all algnment algorthms dscussed earler, whch ncludes, Multplcatve scalng Sdewalk Hybrd wth Nonlnear Adjustment Central Lmt Theorem Approach Sort on predcted probablty (SOP) Sort on the dfference between predcted probablty and random number (SOD) Sort on the dfference between logstc adjusted predcted probablty and random number. (SODL) When mplementng Sdewalk Hybrd wth Nonlnear Adjustment, there are two mportant parameters requred, η and λ. η s the maxmum allowed dfference between the actual number of events and the expected number of events before λ s added or subtracted from predcted probablty. In ths paper, η s set to 0.5 and λ s set to 0.03, whch are the same values that DYANCAN model used. (Neufeld, 2000) The order of ntal nput s shuffled n order to get rd of undesred seral correlaton. IV. DATASETS AND SCENARIOS IN ALIGNMENT ALGORITHM EVALUATION In order to understand the smulaton propertes of algnment algorthms, ths paper evaluates the performances of varous methods under two settngs, a lab settng, where synthetc dataset s used, and a real-world settng, where the algorthms are appled to a real world dataset. Ths setup makes t possble to examne the performances of the algnment methods under dfferent scenaros. Ths paper starts the evaluaton by usng synthetc datasets n a controlled settng. Algnments are used to correct some artfcal errors n the outcome of the statstcal model. Snce t s possble to control the exact source of the error n a synthetc dataset, we could analyse the smulaton propertes of dfferent algnment algorthms and the probabltes transformaton n a fully transparent setup. Synthetc dataset based evaluaton tests the algnment performances of dfferent models n four dfferent scenaros. Each scenaro represents a potental statstcal error that algnment methods try to address or compensate n a mcrosmulaton model. The qualty of the algnment s measured by the target devaton ndex (TDI), and the dstrbuton devaton ndex (DDI), where the groupng varable X s the percentle of the correct probabltes. Computaton cost s measured by the number of seconds the algorthm takes to execute one run. Baselne scenaro Assumng there s a bnary model expressed as followng -1 y logt ( x ) (10), are the parameters n the equaton, and s an error term whch follows a logstc dstrbuton wth zero mean and a varance of / 3.To smplfy the calculaton n the evaluaton, we assgn 0, 1. x s randomly drawn from a standard normal dstrbuton N (0,1). The number of observaton n the synthetc dataset s 100,

12 Table 1 lsts all the key statstcs n the baselne scenaro and Fgure 1 llustrates the dstrbuton of the baselne probabltes. Frst scenaro: Sample bas In the frst synthetc test scenaro, we try to replcate an error that commonly exsts n survey datasets: sample bas. Sample bas exsts wdely among survey datasets and t s most commonly corrected by the mplementaton of observaton weghts. Unbased estmatons of behavour equatons depend on accurate weghts. Nonetheless, despte all efforts, survey datasets may stll suffer from varous sample bas, partcularly the selecton bas and the attrton bas n panel dataset, such as ECHP (Vandecasteele and Debels, 2007). Sample bas leads to a non-representatve dataset, whch affects the qualty of smulaton output. Algnment s sometmes used to compensate to the error of sample bas. In our test, a smple sample bas s recreated. We remove 50% of the observatons wth * postve response ( y 0 ) randomly from the baselne dataset. Ths produces a nonrepresentatve sample wth the sze equvalent to 75% of the orgnal one. In other words, * the observatons wth negatve response ( y 0 ) wegh twce as much as they should n the dataset. In addton, the error structure ( ) have a dfferent dstrbuton than the baselne scenaro as a consequence of the bas ntroduced. Second scenaro: Based alpha (ntercept) The second synthetc scenaro ams to replcate a monotonc shft of the probabltes. Ths s commonly used n scenaro analyss, where a certan rato, e.g. unemployment rate, s requred to be ncreased or decreased to meet the scenaro assumptons. By manpulatng the ntercept of the equatons, t s possble to shft the probabltes across all observatons. In ths scenaro, s changed to -1 whle everythng else s constant. The result s a monotonc, but non-unform change n the probabltes. A nonunform transformaton s requred to make sure the probabltes are stll bounded wthn the range of [0,1]. Fgure 2 demonstrates the transformaton graphcally. As seen, the probabltes transformaton curve for the second scenaro stays below 45-degree lne and has a varyng slope. Ths ndcates that the transformaton s monotonc but nonunform. Contrary to the prevous scenaro, the error structure and the number of observatons stay the same n ths setup. Table 1 hghlghts the statstcal dfferences between ths scenaro and the other ones. Thrd scenaro: Based beta The thrd synthetc test scenaro ntroduces a based slope n the equaton. Ths represents a change n the behavour pattern whch could not be captured at the tme of estmaton (e.g. the evoluton of fertlty pattern). In ths scenaro, one may assume that the behavour pattern shfts over tme. Ths partcular setup tests on how algnment works as a correcton mechansm for behavour pattern correcton. The smulated dataset n ths scenaro s generated wth 0.5, half of ts value n the baselne, and therefore creates a dfferent dstrbuton of probablty. Snce x has a mean value of 0, the change does not affect the total sample mean of y at the aggregate level. The transformaton would yeld a dfferent dstrbuton but wth an unchanged sample 12

13 mean. Fgure 1 graphcally llustrates the dfference n probablty dstrbuton. As seen, the standard devaton of probabltes n scenaro 3 s much lower than the baselne scenaro whle the mean value remans the same. Unlke the frst and second scenaros, the transformaton n ths scenaro causes a nonmonotonc change n probabltes. Observatons wth low probablty ( p 0.5) n baselne scenaro have ncreased probablty snce ther x have negatve values, whle the observatons wth hgh probablty ( p 0.5) have a lower probabltes compared wth the baselne scenaro. Forth scenaro: Based ntercept and beta The last synthetc test scenaro combnes both the change n ntercept and the shft n slope. The new transformed dataset has a 1 and 0.5. Ths scenaro represents a relatvely complex change. The change results n a lowered aggregate mean of y and a non- monotonc change n the ndvdual probabltes. Table 1 Overvew of the Synthetc Data Scenaros Scenaro Synthetc Scenaro Baselne Number of observatons n estmaton 100,000 75, , , ,000 Number of observaton n smulaton 100, , , , ,000 Mean value of outcome varable (0.008) (0.010) Target Rato for Algnment N.B.: Coeffcents n the frst scenaro are estmated usng logt model. Standard errors are ncluded n the brackets. As an overvew, table 1 summarse the changes of alpha and beta n dfferent scenaro and compares the key statstcs. As seen, all scenaros have the same number of observaton except the frst one. The mean value of outcome varable ranges from to 0.5, and the target for algnment (external value) s 0.5 across all scenaros. Fgure 1 gves a vsualsed pcture of probablty dstrbutons n the dfferent scenaros. We see that all probablty dstrbutons, wth the excepton of baselne and thrd scenaro, exhbt a rght skewed pattern. Fgure 2 further compares the dfference between correct probablty and the transformed probabltes n the above scenaros. 13

14 Fgure 1 Overvew of Probablty Dstrbuton n Dfferent Scenaros 4 3 Baselne scenaro Scenaro 1 Scenaro 2 Scenaro 3 Scenaro 4 Densty Probabltes Fgure 2 Overvew of Probablty Transformaton n Dfferent Scenaros Transformed probabltes Baselne scenaro Scenaro 1 Scenaro 2 Scenaro 3 Scenaro Baselne probabltes NB. Probablty transformaton curve records how probabltes change due to the artfcal errors ntroduced n the scenaro. Evaluaton usng a real world dataset There s no doubt that synthetc evaluaton contrbutes to the understandng of algnment methods thanks to ts complete transparency. An algnment algorthm, 14

15 however, s only useful when appled to a real-world dataset. Therefore, ths paper also analyses the performance of dfferent algnment algorthms usng a real dataset. In ths real-world evaluaton, we use the Lvng n Ireland Survey (ECHP-LII) dataset for a smple exercse of labour partcpaton smulaton. The LII survey consttutes the Irsh component of the European Communty Household Panel (ECHP). It s a representatve household panel survey conducted on the Irsh populaton annually for eght waves untl The data contans nformaton on demographc, employment, and other socal economc characterstcs of around 3500 households n each wave. In 2000, addtonal 1500 households were brought nto the dataset to compensate for the attrton snce The dataset has been cleaned and adjusted to ensure the consstency as descrbed n L and O Donoghue (2010). Labour partcpaton smulaton s selected because t s one of the popular components n dynamc mcrosmulaton models. The smulaton uses a reduced form equaton for labour partcpaton. Assumng the n-work status y * s derved from followng specfcaton * -1 y logt ( X ) (11) Whereas X s a vector that covers lagged n-work status, educaton, gender, age, age squared, nteracton term between gender and havng a new-born, nteracton term between marrage and gender. In the estmaton, we nclude ndvduals age wth known prevous workng status. Table 2 provdes some basc summary statstcs of the varables ncluded and estmaton results are reported n appendx I. Table 2 Overvew of varables ncluded n n-work estmaton Varable (Mean value) In-work Out-work Mean Standard Devaton Mean Standard Devaton Lagged nwork status Gender (female=1) Age Age squared Havng a new-born Marrage Secondary educaton Unversty educaton Interacton term: new-born and gender Interacton term: marrage and gender Number of observatons n the category Total number of observatons In the prevous lterature of mcrosmulaton valdaton, Caldwell and Morrson (2000) suggest usng n-sample valdaton, out-of-sample valdaton and multple-module valdaton to evaluate smulaton output. Ths paper follows a smlar approach for algorthm evaluaton except that there s no mult-module evaluaton snce algnment s usually an ntegrated part of a more complex model. 15

16 In-sample evaluaton assesses the predctve power of the model n descrbng the data on whch t was estmated. In ths scenaro, we test how well the model replcates the labour partcpaton rate n year 1998 wth known external control (observed number of workers) usng dfferent algnment methods s selected because t s n the mddle of perod data covers. Equaton coeffcents are estmated from whole panel wth the excepton of frst wave where lagged n-work status s not avalable. Algnment performance ndcators are calculated n the same way as n the synthetc dataset evaluaton. An n-sample evaluaton test s useful but t s dfferent than the real mcrosmulaton exercse where the values are predcted out of sample. An out-of-sample evaluaton attempts to measure the predctve power of the model n explanng data of a smlar type whch were not used n the estmaton of the model (Caldwell, 1996). In ths partcular test, we use year data to predct the perod wth the known external control (the observed number of workers) and analyse the dfferences n algnment methods performances. The benchmark dstrbuton for DDI s the actual observed dstrbuton n year V. EVALUATION RESULTS Ths secton reports the evaluaton results of sx dfferent algnment algorthms and compares ther performances under dfferent scenaros through false postve/negatve rate, two self-defned ndces (TDI, DDI) and computatonal tme. Evaluaton Results usng Synthetc Datasets Table 3 lsts four key ndcators obtaned when evaluatng usng synthetc datasets, Target devaton ndex (TDI), False postve rate, False negatve rate and, Dstrbuton devaton ndex (DDI). The DDI n ths synthetc dataset based test uses the percentle of dependent varable as groupng varable X. Table 3 Propertes of Dfferent Algnment Methods n Synthetc Dataset Test Method TDI False Postve False Negatve DDI Scenaro 1: Selecton Bas Multplcatve scalng -0.43% 19.33% 19.76% 0.40% Sdewalk hybrd wth nonlnear adjustment 0.00% 20.63% 20.63% 0.03% Central lmt theorem approach 0.00% 19.65% 19.65% 0.43% Sort on predcted probablty (SOP) 0.00% 16.31% 16.31% 11.50% Sort on the dfference between predcted probablty and random number (SOD) 0.00% 21.09% 21.09% 0.15% Sort on the dfference between logstc adjusted predcted probablty and random number (SODL) 0.00% 20.69% 20.69% 0.03% Scenaro 2: Based Alpha (Intercept) Multplcatve scalng -1.41% 18.74% 20.15% 0.61% 16

17 Sdewalk hybrd wth nonlnear adjustment 0.00% 20.69% 20.69% 0.03% Central lmt theorem approach 0.00% 19.29% 19.29% 0.65% Sort on predcted probablty (SOP) 0.00% 16.31% 16.31% 11.50% Sort on the dfference between predcted probablty and random number (SOD) 0.00% 21.31% 21.31% 0.30% Sort on the dfference between logstc adjusted predcted probablty and random number (SODL) 0.00% 20.70% 20.70% 0.03% Scenaro 3: Based beta coeffcents Multplcatve scalng -0.18% 22.58% 22.76% 0.90% Sdewalk hybrd wth nonlnear adjustment -0.01% 22.59% 22.60% 0.84% Central lmt theorem approach 0.00% 22.69% 22.69% 0.91% Sort on predcted probablty (SOP) 0.00% 16.31% 16.31% 11.50% Sort on the dfference between predcted probablty and random number (SOD) 0.00% 22.54% 22.54% 0.87% Sort on the dfference between logstc adjusted predcted probablty and random number (SODL) 0.00% 22.56% 22.56% 0.88% Scenaro 4: Based alpha and beta (all coeffcents) Multplcatve scalng 0.18% 21.57% 21.39% 0.26% Sdewalk hybrd wth nonlnear adjustment 0.00% 22.45% 22.44% 0.85% Central lmt theorem approach 0.00% 21.54% 21.54% 0.28% Sort on predcted probablty (SOP) 0.00% 16.31% 16.31% 11.50% Sort on the dfference between predcted probablty and random number (SOD) 0.00% 22.97% 22.97% 1.33% Sort on the dfference between logstc adjusted predcted probablty and random number (SODL) 0.00% 22.67% 22.67% 0.92% Average Performances Multplcatve scalng -0.46% 20.55% 21.02% 0.54% Sdewalk hybrd wth nonlnear adjustment 0.00% 21.59% 21.59% 0.44% Central lmt theorem approach 0.00% 20.79% 20.79% 0.57% Sort on predcted probablty (SOP) 0.00% 16.31% 16.31% 11.50% Sort on the dfference between predcted probablty and random number (SOD) 0.00% 21.98% 21.98% 0.66% Sort on the dfference between logstc adjusted predcted probablty and random number (SODL) 0.00% 21.66% 21.66% 0.46% As seen n table 3, all algnment methods except multplcatve scalng, n all scenaros, have less than 0.01% devaton from the target number of event occurrence whle multplcatve scalng shows a devaton up to 1.41% from the target durng the evaluaton. The result s largely drven by the desgn of the algorthm, as multplcatve scalng cannot guarantee a perfect algnment rato although the expected devaton s zero. Sdewalk hybrd sometmes has a slght devaton (less than 0.01%), as the non-lnear transformaton may not be always perfect under exstng mplementaton 6. Central lmt theorem methods have bult-n 6 The process usually requres several teratons and t s computatonally expensve (Neufeld, 2000). Our test model used n ths paper stops ts calbraton when the teraton only mproves the average probablty by no more than Ths ncreases the calculaton speed but sometmes results n mperfectly algned probabltes. Detals of the calbraton steps can be found n the book publshed by Socety of Actuares (SOA, 1998). 17

18 counters that prevent the events from manfestng when the target s met. Sortng based algorthms only pck the exact number of observatons requred, whch s why ther target devaton ndex (TDI) s always zero. In terms of false postve and false negatve rates when compared wth the correct values, algnment method SOP yelds the best result, whch s on average 4 to 6 percentage ponts lower than other algorthms, as shown n the tables. Sdewalk Hybrd, together wth SOD, SODL, have the hghest false postve/ false negatve rates on average. It seems that the false postve and false negatve rates are closely related to the complexty of the algorthms. The nonlnear transformaton n Sdewalk Hybrd and dfferencng operatons n SOD and SODL are both more computatonally complcated than the other methods. Ths pattern s consstent across all scenaros, though absolute numbers fluctuate across dfferent scenaros. Whlst false postve and false negatve s a useful ndcator when the correct value s known, t s a less crtcal ndcator for smulaton as mcrosmulaton exercses tend to focus more on the dstrbutons. Therefore, the dstrbuton devaton ndex (DDI) s partcularly mportant n judgng how well the relatve relatons between varables are preserved after algnment. Appendx 2 vsualses the dfference between actual probabltes and algned probabltes n all synthetc tests. The results show that SOP method heavly dstorts the orgnal dstrbuton of the probabltes across all scenaros usng percentle groupng. Ths s also reflected by dstrbutonal devaton ndex (DDI), whch s effectvely calculatng a weghted sze of the gap n ths case. It seems that there s no method consstently outperformng across all scenaros. In the frst two scenaros, sdewalk hybrd and SODL method gves the best result; In the thrd scenaro, where the synthetc dataset modfes the slope of x, all methods have smlar DDI values except SOP; In the last scenaro, multplcatve scalng and central lmt methods generally perform much better than the rest. Compared wth other methods, methods whch nvolves dfferencng and logstc transformaton (ncl. sdewalk hybrd wth non-lnear transformaton, SOD and SODL) seem to be more senstve to the change n the beta coeffcent. Ther performances are much better when beta remans stable, e.g. scenaro 1 and 2. Ths may be due to the nature of these algorthms as the dfferencng and logt transformaton operatons assume monotonc changes n the probabltes. Evaluaton Results usng a Real-world Dataset The synthetc dataset based evaluaton offers an overvew of the performances of dfferent algorthms under partcular source of nose, but the performance wth realworld dataset s more nterestng for emprcal modellers. Table 4 reports all the key ndcators calculated when applyng algnment n a real lfe dataset wth the example of estmatng n-work populaton. DDI s calculated based on ndependent varables, ncludng sex, educaton, marrage status wth chldbrth nteracton, and external varable, natonaltes. It reflects an overall shft of the dstrbuton n mult-dmensons. Table 4 Propertes of Dfferent Algnment Methods wth a Real World Dataset (LII) Method TDI False Postve In-Sample Evaluaton False Negatve DDI 18

19 Multplcatve scalng 0.24% 10.00% 9.76% 0.62% Sdewalk hybrd wth nonlnear adjustment 0.01% 9.47% 9.45% 0.64% Central lmt theorem approach 0.00% 9.57% 9.57% 0.62% Sort on predcted probablty (SOP) 0.00% 5.86% 5.86% 0.62% Sort on the dfference between predcted probablty and random number (SOD) 0.00% 9.64% 9.64% 0.62% Sort on the dfference between logstc adjusted predcted probablty and random number (SODL) 0.00% 9.60% 9.60% 0.67% Out-of-Sample Evaluaton Multplcatve scalng 0.10% 11.24% 11.14% 0.75% Sdewalk hybrd wth nonlnear adjustment 0.00% 11.04% 11.04% 0.68% Central lmt theorem approach 0.00% 11.12% 11.12% 0.74% Sort on predcted probablty (SOP) 0.00% 7.63% 7.63% 1.47% Sort on the dfference between predcted probablty and random number (SOD) 0.00% 11.14% 11.14% 0.66% Sort on the dfference between logstc adjusted predcted probablty and random number (SODL) 0.00% 11.03% 11.03% 0.76% N.B.: In-sample evaluaton predcts 1998 n-work usng data Out-of-Sample evaluaton predcts n-work usng data Smlar to the results from synthetc dataset, multplcatve scalng s the only method wth a TDI greater than 0.01% and the SOP method outperforms all other methods n terms of false postve and false negatve rates at a sgnfcant margn. All other evaluated methods have smlar false postve and negatve rates. As to the DDI, there s no dramatc dfference between dfferent methods n n-sample evaluaton. We notce that the SOP method has a much more comparable DDI performance n the real lfe dataset than n the synthetc dataset. In fact, SOP has one of the best results n n-sample evaluaton. In the out-of-sample exercse, we fnd that the SOD, a method wth average performance wth synthetc datasets, has the lowest DDI value, whle SOP has the worst result. Besdes the algorthm desgn, the change of groupng varables also affects the observed DDI pattern n ths evaluaton. Wth the synthetc datasets, groups are dvded based on the percentle value of the dependent varable whle n the real-world dataset, observatons were grouped usng a realstc settng, usng dfferent characterstcs varables, lke age, gender etc. Computng Performance and Scalablty Computatonal effcency s another man crteron for evaluatng algnment algorthm. Gven the ncreasng avalablty of large-scale datasets n mcrosmulaton and the model complexty, algnment may consume consderable resources n the computaton processes. Nonetheless, the study of the computatonal effcency s rather scarce n the feld of mcrosmulaton and there s no paper so far analysng how the number of observatons affect the algorthms performance. Ths secton compares dfferent algnment algorthms n terms of computaton effcency and dscusses the ssue of scalablty of the algorthms. Table 5 shows an overvew of the computaton tme requred durng the synthetc scenaro test and real-world data test. The computatonal premum s tmed on an Intel 5-520m processor when only sngle core s used. As ndcated, the method that takes 19

20 least computaton resources s multplcatve scalng method. Ths s not surprsng, as multplcatve scalng nvolves only a sngle calculaton for each observaton. Sortng-based algnment methods seem to be n the next ter, whch consume up to 5 tmes more resources compared wth multplcatve scalng. The varatons n sortng method does not change the executon tme much although the last sortng varaton, SODL, consumes around 10% more resources than the other sortng based algorthms due to ts hgher computaton complexty. Sdewalk Hybrd wth nonlnear transformaton seems to be on the bottom lst n terms of the effcency. It takes about 80 tmes more CPU tme than what the fastest method, multplcatve scalng, requres, and more CPU tme than the sortng based algorthms. There are three reasons for ts relatvely poor performances. Frstly, the nonlnear transformaton may take many teratons and t s computatonal expensve (Neufeld, 2000). Secondly, the method tself suffers from seral correlaton n the orgnal desgn, as the calculaton s dependent on the result of the last observaton. In order to mtgate ths effect, an extra randomsaton va sortng s mplemented. Ths s accompaned by a reverse process, whch restores the orgnal order of the nput at the end of the algnment. Thrdly, the Sdewalk method requres teratng through observatons. Stata, whch s the platform of our evaluaton, s not partcular effcent at ndvdual observaton teraton compared wth the batch processng for whch Stata optmses 7. Ths s also the prmary reason why Central lmt theorem approach has a relatvely long runnng tme. We speculate from a theoretcal pont of vew, that the performances of the Sdewalk method and the Central lmt theorem approach could be sgnfcantly mproved when mplemented correctly as natve code n C/C++ as compled code does not re-nterpret the syntax over the teratons. Nonetheless, sdewalk method may stll be slower than the other algorthms when nonlnear probablty transformaton s appled. Table 5 Computatonal Costs for Dfferent Algnment Methods Synthetc Dataset Scenaro Real-world Dataset Method In- Sample Out- Sample Multplcatve scalng Sdewalk hybrd wth nonlnear adjustment Central lmt theorem approach Sort on predcted probablty (SOP) Sort on the dfference between predcted probablty and random number (SOD) Sort on the dfference between logstc adjusted predcted probablty and random number (SODL) When ncreasng the number of observatons,.e. sze of nput, all algorthms exhbt a mostly lnear growth rate of the executon tme n Stata (See fgure 3 to fgure 5) for a dataset under 15 mllon observatons. The run-tme seems to be drectly proportonal to ts nput sze. All algnments are usng the same nput dataset, whch s a randomly 7 Observaton teraton, a necessary step for these two algorthms, tends to be very slow n Stata because loops are renterpreted at each teraton. Stata recommends usng compled plug-n for the best performance for ths type of scenaros (Stata, 2008). However, algorthm specfc optmzaton usng compled code s beyond the scope of ths paper and t would make the comparson dffcult. 20

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010 Smulaton: Solvng Dynamc Models ABE 5646 Week Chapter 2, Sprng 200 Week Descrpton Readng Materal Mar 5- Mar 9 Evaluatng [Crop] Models Comparng a model wth data - Graphcal, errors - Measures of agreement

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

Econometrics 2. Panel Data Methods. Advanced Panel Data Methods I

Econometrics 2. Panel Data Methods. Advanced Panel Data Methods I Panel Data Methods Econometrcs 2 Advanced Panel Data Methods I Last tme: Panel data concepts and the two-perod case (13.3-4) Unobserved effects model: Tme-nvarant and dosyncratc effects Omted varables

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

y and the total sum of

y and the total sum of Lnear regresson Testng for non-lnearty In analytcal chemstry, lnear regresson s commonly used n the constructon of calbraton functons requred for analytcal technques such as gas chromatography, atomc absorpton

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

Review of approximation techniques

Review of approximation techniques CHAPTER 2 Revew of appromaton technques 2. Introducton Optmzaton problems n engneerng desgn are characterzed by the followng assocated features: the objectve functon and constrants are mplct functons evaluated

More information

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following. Complex Numbers The last topc n ths secton s not really related to most of what we ve done n ths chapter, although t s somewhat related to the radcals secton as we wll see. We also won t need the materal

More information

Programming in Fortran 90 : 2017/2018

Programming in Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values

More information

Empirical Distributions of Parameter Estimates. in Binary Logistic Regression Using Bootstrap

Empirical Distributions of Parameter Estimates. in Binary Logistic Regression Using Bootstrap Int. Journal of Math. Analyss, Vol. 8, 4, no. 5, 7-7 HIKARI Ltd, www.m-hkar.com http://dx.do.org/.988/jma.4.494 Emprcal Dstrbutons of Parameter Estmates n Bnary Logstc Regresson Usng Bootstrap Anwar Ftranto*

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Adjustment methods for differential measurement errors in multimode surveys

Adjustment methods for differential measurement errors in multimode surveys Adjustment methods for dfferental measurement errors n multmode surveys Salah Merad UK Offce for Natonal Statstcs ESSnet MM DCSS, Fnal Meetng Wesbaden, Germany, 4-5 September 2014 Outlne Introducton Stablsng

More information

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc. [Type text] [Type text] [Type text] ISSN : 0974-74 Volume 0 Issue BoTechnology 04 An Indan Journal FULL PAPER BTAIJ 0() 04 [684-689] Revew on Chna s sports ndustry fnancng market based on market -orented

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated.

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated. Some Advanced SP Tools 1. umulatve Sum ontrol (usum) hart For the data shown n Table 9-1, the x chart can be generated. However, the shft taken place at sample #21 s not apparent. 92 For ths set samples,

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

Biostatistics 615/815

Biostatistics 615/815 The E-M Algorthm Bostatstcs 615/815 Lecture 17 Last Lecture: The Smplex Method General method for optmzaton Makes few assumptons about functon Crawls towards mnmum Some recommendatons Multple startng ponts

More information

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT 3. - 5. 5., Brno, Czech Republc, EU APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT Abstract Josef TOŠENOVSKÝ ) Lenka MONSPORTOVÁ ) Flp TOŠENOVSKÝ

More information

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 An Iteratve Soluton Approach to Process Plant Layout usng Mxed

More information

CS 534: Computer Vision Model Fitting

CS 534: Computer Vision Model Fitting CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust

More information

Meta-heuristics for Multidimensional Knapsack Problems

Meta-heuristics for Multidimensional Knapsack Problems 2012 4th Internatonal Conference on Computer Research and Development IPCSIT vol.39 (2012) (2012) IACSIT Press, Sngapore Meta-heurstcs for Multdmensonal Knapsack Problems Zhbao Man + Computer Scence Department,

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

Problem Set 3 Solutions

Problem Set 3 Solutions Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,

More information

A Statistical Model Selection Strategy Applied to Neural Networks

A Statistical Model Selection Strategy Applied to Neural Networks A Statstcal Model Selecton Strategy Appled to Neural Networks Joaquín Pzarro Elsa Guerrero Pedro L. Galndo joaqun.pzarro@uca.es elsa.guerrero@uca.es pedro.galndo@uca.es Dpto Lenguajes y Sstemas Informátcos

More information

3D vector computer graphics

3D vector computer graphics 3D vector computer graphcs Paolo Varagnolo: freelance engneer Padova Aprl 2016 Prvate Practce ----------------------------------- 1. Introducton Vector 3D model representaton n computer graphcs requres

More information

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe CSCI 104 Sortng Algorthms Mark Redekopp Davd Kempe Algorthm Effcency SORTING 2 Sortng If we have an unordered lst, sequental search becomes our only choce If we wll perform a lot of searches t may be benefcal

More information

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics Introducton G10 NAG Fortran Lbrary Chapter Introducton G10 Smoothng n Statstcs Contents 1 Scope of the Chapter... 2 2 Background to the Problems... 2 2.1 Smoothng Methods... 2 2.2 Smoothng Splnes and Regresson

More information

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science EECS 730 Introducton to Bonformatcs Sequence Algnment Luke Huan Electrcal Engneerng and Computer Scence http://people.eecs.ku.edu/~huan/ HMM Π s a set of states Transton Probabltes a kl Pr( l 1 k Probablty

More information

AADL : about scheduling analysis

AADL : about scheduling analysis AADL : about schedulng analyss Schedulng analyss, what s t? Embedded real-tme crtcal systems have temporal constrants to meet (e.g. deadlne). Many systems are bult wth operatng systems provdng multtaskng

More information

Design and Analysis of Algorithms

Design and Analysis of Algorithms Desgn and Analyss of Algorthms Heaps and Heapsort Reference: CLRS Chapter 6 Topcs: Heaps Heapsort Prorty queue Huo Hongwe Recap and overvew The story so far... Inserton sort runnng tme of Θ(n 2 ); sorts

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

TN348: Openlab Module - Colocalization

TN348: Openlab Module - Colocalization TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages

More information

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z. TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS Muradalyev AZ Azerbajan Scentfc-Research and Desgn-Prospectng Insttute of Energetc AZ1012, Ave HZardab-94 E-mal:aydn_murad@yahoocom Importance of

More information

11. HARMS How To: CSV Import

11. HARMS How To: CSV Import and Rsk System 11. How To: CSV Import Preparng the spreadsheet for CSV Import Refer to the spreadsheet template to ad algnng spreadsheet columns wth Data Felds. The spreadsheet s shown n the Appendx, an

More information

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Proceedngs of the Wnter Smulaton Conference M E Kuhl, N M Steger, F B Armstrong, and J A Jones, eds A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Mark W Brantley Chun-Hung

More information

A Semi-parametric Regression Model to Estimate Variability of NO 2

A Semi-parametric Regression Model to Estimate Variability of NO 2 Envronment and Polluton; Vol. 2, No. 1; 2013 ISSN 1927-0909 E-ISSN 1927-0917 Publshed by Canadan Center of Scence and Educaton A Sem-parametrc Regresson Model to Estmate Varablty of NO 2 Meczysław Szyszkowcz

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Exercises (Part 4) Introduction to R UCLA/CCPR. John Fox, February 2005

Exercises (Part 4) Introduction to R UCLA/CCPR. John Fox, February 2005 Exercses (Part 4) Introducton to R UCLA/CCPR John Fox, February 2005 1. A challengng problem: Iterated weghted least squares (IWLS) s a standard method of fttng generalzed lnear models to data. As descrbed

More information

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS ARPN Journal of Engneerng and Appled Scences 006-017 Asan Research Publshng Network (ARPN). All rghts reserved. NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS Igor Grgoryev, Svetlana

More information

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between

More information

Synthesizer 1.0. User s Guide. A Varying Coefficient Meta. nalytic Tool. Z. Krizan Employing Microsoft Excel 2007

Synthesizer 1.0. User s Guide. A Varying Coefficient Meta. nalytic Tool. Z. Krizan Employing Microsoft Excel 2007 Syntheszer 1.0 A Varyng Coeffcent Meta Meta-Analytc nalytc Tool Employng Mcrosoft Excel 007.38.17.5 User s Gude Z. Krzan 009 Table of Contents 1. Introducton and Acknowledgments 3. Operatonal Functons

More information

A New Approach For the Ranking of Fuzzy Sets With Different Heights

A New Approach For the Ranking of Fuzzy Sets With Different Heights New pproach For the ankng of Fuzzy Sets Wth Dfferent Heghts Pushpnder Sngh School of Mathematcs Computer pplcatons Thapar Unversty, Patala-7 00 Inda pushpndersnl@gmalcom STCT ankng of fuzzy sets plays

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions Sortng Revew Introducton to Algorthms Qucksort CSE 680 Prof. Roger Crawfs Inserton Sort T(n) = Θ(n 2 ) In-place Merge Sort T(n) = Θ(n lg(n)) Not n-place Selecton Sort (from homework) T(n) = Θ(n 2 ) In-place

More information

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009. Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton

More information

A Similarity-Based Prognostics Approach for Remaining Useful Life Estimation of Engineered Systems

A Similarity-Based Prognostics Approach for Remaining Useful Life Estimation of Engineered Systems 2008 INTERNATIONAL CONFERENCE ON PROGNOSTICS AND HEALTH MANAGEMENT A Smlarty-Based Prognostcs Approach for Remanng Useful Lfe Estmaton of Engneered Systems Tany Wang, Janbo Yu, Davd Segel, and Jay Lee

More information

Life Tables (Times) Summary. Sample StatFolio: lifetable times.sgp

Life Tables (Times) Summary. Sample StatFolio: lifetable times.sgp Lfe Tables (Tmes) Summary... 1 Data Input... 2 Analyss Summary... 3 Survval Functon... 5 Log Survval Functon... 6 Cumulatve Hazard Functon... 7 Percentles... 7 Group Comparsons... 8 Summary The Lfe Tables

More information

This module is part of the. Memobust Handbook. on Methodology of Modern Business Statistics

This module is part of the. Memobust Handbook. on Methodology of Modern Business Statistics Ths module s part of the Memobust Handbook on Methodology of Modern Busness Statstcs 26 March 2014 Theme: Donor Imputaton Contents General secton... 3 1. Summary... 3 2. General descrpton... 3 2.1 Introducton

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

Topology Design using LS-TaSC Version 2 and LS-DYNA

Topology Design using LS-TaSC Version 2 and LS-DYNA Topology Desgn usng LS-TaSC Verson 2 and LS-DYNA Wllem Roux Lvermore Software Technology Corporaton, Lvermore, CA, USA Abstract Ths paper gves an overvew of LS-TaSC verson 2, a topology optmzaton tool

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

Parallel matrix-vector multiplication

Parallel matrix-vector multiplication Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more

More information

Parameter estimation for incomplete bivariate longitudinal data in clinical trials

Parameter estimation for incomplete bivariate longitudinal data in clinical trials Parameter estmaton for ncomplete bvarate longtudnal data n clncal trals Naum M. Khutoryansky Novo Nordsk Pharmaceutcals, Inc., Prnceton, NJ ABSTRACT Bvarate models are useful when analyzng longtudnal data

More information

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters Proper Choce of Data Used for the Estmaton of Datum Transformaton Parameters Hakan S. KUTOGLU, Turkey Key words: Coordnate systems; transformaton; estmaton, relablty. SUMMARY Advances n technologes and

More information

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes SPH3UW Unt 7.3 Sphercal Concave Mrrors Page 1 of 1 Notes Physcs Tool box Concave Mrror If the reflectng surface takes place on the nner surface of the sphercal shape so that the centre of the mrror bulges

More information

Reducing Frame Rate for Object Tracking

Reducing Frame Rate for Object Tracking Reducng Frame Rate for Object Trackng Pavel Korshunov 1 and We Tsang Oo 2 1 Natonal Unversty of Sngapore, Sngapore 11977, pavelkor@comp.nus.edu.sg 2 Natonal Unversty of Sngapore, Sngapore 11977, oowt@comp.nus.edu.sg

More information

Anonymisation of Public Use Data Sets

Anonymisation of Public Use Data Sets Anonymsaton of Publc Use Data Sets Methods for Reducng Dsclosure Rsk and the Analyss of Perturbed Data Harvey Goldsten Unversty of Brstol and Unversty College London and Natale Shlomo Unversty of Manchester

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

CHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION

CHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION 24 CHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION The present chapter proposes an IPSO approach for multprocessor task schedulng problem wth two classfcatons, namely, statc ndependent tasks and

More information

Today s Outline. Sorting: The Big Picture. Why Sort? Selection Sort: Idea. Insertion Sort: Idea. Sorting Chapter 7 in Weiss.

Today s Outline. Sorting: The Big Picture. Why Sort? Selection Sort: Idea. Insertion Sort: Idea. Sorting Chapter 7 in Weiss. Today s Outlne Sortng Chapter 7 n Wess CSE 26 Data Structures Ruth Anderson Announcements Wrtten Homework #6 due Frday 2/26 at the begnnng of lecture Proect Code due Mon March 1 by 11pm Today s Topcs:

More information

Machine Learning 9. week

Machine Learning 9. week Machne Learnng 9. week Mappng Concept Radal Bass Functons (RBF) RBF Networks 1 Mappng It s probably the best scenaro for the classfcaton of two dataset s to separate them lnearly. As you see n the below

More information

Hierarchical clustering for gene expression data analysis

Hierarchical clustering for gene expression data analysis Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally

More information

Backpropagation: In Search of Performance Parameters

Backpropagation: In Search of Performance Parameters Bacpropagaton: In Search of Performance Parameters ANIL KUMAR ENUMULAPALLY, LINGGUO BU, and KHOSROW KAIKHAH, Ph.D. Computer Scence Department Texas State Unversty-San Marcos San Marcos, TX-78666 USA ae049@txstate.edu,

More information

Air Transport Demand. Ta-Hui Yang Associate Professor Department of Logistics Management National Kaohsiung First Univ. of Sci. & Tech.

Air Transport Demand. Ta-Hui Yang Associate Professor Department of Logistics Management National Kaohsiung First Univ. of Sci. & Tech. Ar Transport Demand Ta-Hu Yang Assocate Professor Department of Logstcs Management Natonal Kaohsung Frst Unv. of Sc. & Tech. 1 Ar Transport Demand Demand for ar transport between two ctes or two regons

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

Hermite Splines in Lie Groups as Products of Geodesics

Hermite Splines in Lie Groups as Products of Geodesics Hermte Splnes n Le Groups as Products of Geodescs Ethan Eade Updated May 28, 2017 1 Introducton 1.1 Goal Ths document defnes a curve n the Le group G parametrzed by tme and by structural parameters n the

More information

An Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc.

An Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc. [Type text] [Type text] [Type text] ISSN : 97-735 Volume Issue 9 BoTechnology An Indan Journal FULL PAPER BTAIJ, (9), [333-3] Matlab mult-dmensonal model-based - 3 Chnese football assocaton super league

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Classification Based Mode Decisions for Video over Networks

Classification Based Mode Decisions for Video over Networks Classfcaton Based Mode Decsons for Vdeo over Networks Deepak S. Turaga and Tsuhan Chen Advanced Multmeda Processng Lab Tranng data for Inter-Intra Decson Inter-Intra Decson Regons pdf 6 5 6 5 Energy 4

More information

Appendices to accompany. Demand for Health Risk Reductions: A cross-national comparison between the U.S. and Canada

Appendices to accompany. Demand for Health Risk Reductions: A cross-national comparison between the U.S. and Canada Appendces to accompany Demand for Health Rsk Reductons: A cross-natonal comparson between the U.S. and Canada Appendx I: Atttudnal and Subectve Belefs by Age (movng average of age-wse s, 5 th and 95 th

More information

Relevance Assignment and Fusion of Multiple Learning Methods Applied to Remote Sensing Image Analysis

Relevance Assignment and Fusion of Multiple Learning Methods Applied to Remote Sensing Image Analysis Assgnment and Fuson of Multple Learnng Methods Appled to Remote Sensng Image Analyss Peter Bajcsy, We-Wen Feng and Praveen Kumar Natonal Center for Supercomputng Applcaton (NCSA), Unversty of Illnos at

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

Lecture 4: Principal components

Lecture 4: Principal components /3/6 Lecture 4: Prncpal components 3..6 Multvarate lnear regresson MLR s optmal for the estmaton data...but poor for handlng collnear data Covarance matrx s not nvertble (large condton number) Robustness

More information

CS1100 Introduction to Programming

CS1100 Introduction to Programming Factoral (n) Recursve Program fact(n) = n*fact(n-) CS00 Introducton to Programmng Recurson and Sortng Madhu Mutyam Department of Computer Scence and Engneerng Indan Insttute of Technology Madras nt fact

More information

PRÉSENTATIONS DE PROJETS

PRÉSENTATIONS DE PROJETS PRÉSENTATIONS DE PROJETS Rex Onlne (V. Atanasu) What s Rex? Rex s an onlne browser for collectons of wrtten documents [1]. Asde ths core functon t has however many other applcatons that make t nterestng

More information

ECONOMICS 452* -- Stata 11 Tutorial 6. Stata 11 Tutorial 6. TOPIC: Representing Multi-Category Categorical Variables with Dummy Variable Regressors

ECONOMICS 452* -- Stata 11 Tutorial 6. Stata 11 Tutorial 6. TOPIC: Representing Multi-Category Categorical Variables with Dummy Variable Regressors ECONOMICS * -- Stata 11 Tutoral Stata 11 Tutoral TOPIC: Representng Mult-Category Categorcal Varables wth Dummy Varable Regressors DATA: wage1_econ.dta (a Stata-format dataset) TASKS: Stata 11 Tutoral

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

The Research of Support Vector Machine in Agricultural Data Classification

The Research of Support Vector Machine in Agricultural Data Classification The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou

More information

USING GRAPHING SKILLS

USING GRAPHING SKILLS Name: BOLOGY: Date: _ Class: USNG GRAPHNG SKLLS NTRODUCTON: Recorded data can be plotted on a graph. A graph s a pctoral representaton of nformaton recorded n a data table. t s used to show a relatonshp

More information