ÇUKUROVA UNIVERSITY INSTITUTE OF NATURAL AND APPLIED SCIENCES. Dissertation.com

Predctng the Admsson Decson of a Partcpant to the School of Physcal Educaton and Sports at Çukurova Unversty by Usng Dfferent Machne Learnng Methods Combned wth Feature Selecton Copyrght 2016 Gözde Özsert Yğt All rghts reserved. No part of ths book may be reproduced or transmtted n any form or by any means, electronc or mechancal, ncludng photocopyng, recordng, or by any nformaton storage and retreval system, wthout wrtten permsson from the publsher. Boca Raton, Florda Irvne, Calforna USA 2017 ISBN-10: 1-61233-461-X (ebk.) ISBN-13: 978-1-61233-461-5 (ebk.)

ÇUKUROVA UNIVERSITY INSTITUTE OF NATURAL AND APPLIED SCIENCES Gözde ÖZSERT YİĞİT MSc THESIS PREDICTING THE ADMISSION DECISION OF A PARTICIPANT TO THE SCHOOL OF PHYSICAL EDUCATION AND SPORTS AT CUKUROVA UNIVERSITY BY USING DIFFERENT MACHINE LEARNING METHODS COMBINED WITH FEATURE SELECTION ADANA, 2016 DEPARTMENT OF COMPUTER ENGINEERING I

ABSTRACT The purpose of ths thess s to develop new hybrd admsson decson predcton models by usng dfferent machne learnng methods ncludng Support Vector Machnes (SVM), Multlayer Perceptron (MLP), Radal Bass Functon (RBF) Network, TreeBoost (TB) and K-Means Clusterng (KMC) combned wth feature selecton algorthms to nvestgate the effect of the predctor varables on the admsson decson of a canddate to the School of Physcal Educaton and Sports at Cukurova Unversty. Three feature selecton algorthms ncludng Relef-F, F-Score and Correlaton-based Feature Selecton (CFS) have been consdered. Experments have been conducted on the datasets, whch contan data of partcpants who appled to the School n 2006 and 2007. The datasets have been randomly splt nto tranng and test sets usng 10-fold cross valdaton as well as dfferent percentage ratos. The performance of the predcton models for the datasets has been assessed usng classfcaton accuracy, specfcty, senstvty, postve predctve value (PPV) and negatve predctve value (NPV). The results show that a decrease n the number of predctor varables n the predcton models usually leads to a parallel decrease n classfcaton accuracy. Key Words: Machne learnng, feature selecton, physcal ablty test II

ACKNOWLEDGMENTS Foremost, I would lke to express my sncere grattude to my advsor Assoc. Prof. Dr. M. Fath AKAY, for hs supervson gudance, encouragements, patence, motvaton, useful suggestons and hs valuable tme for ths work. I would lke to thank members of the MSc thess jury, Assoc. Prof. Dr. Ramazan ÇOBAN and Asst. Prof. Dr. Şule ÇOLAK, for ther suggestons and correctons. I would also lke to thank Cukurova Unversty Scentfc Research Projects Center for supportng ths work (Project no: FYL-2015-3845). Last but not the least, I would lke to thank my husband Talat Yğt and my famly for ther endless support and encouragements for my lfe and career. III

Contents PAGE l. INTRODUCTION...1 1.1. The Vertcal Jump Test...1 1.2. The Coordnaton and Skll Test...2 1.3. 30-meter Dash Test...3 1.4. 20-meter ShuttleRun Test...3 1.5 Determnng Pont of Partcpant Admsson...4 1.6. Prevous Works...5 1.7. Feature Selecton...7 1.8. Motvaton, Purpose and Contrbutons of Ths Thess...7 1.9. Overvew of Dataset...9 2. OVERVIEW OF METHODS...11 2.1. Support Vector Machnes...11 2.2. Mult-Layer Perceptrons...16 2.3. Radal Bass Functon Network...19 2.4. TreeBoost...20 2.5. K-Means Clusterng...21 2.6. Feature Selecton Algorthms...23 3. DEVELOPMENT OF PREDICTION MODELS...26 3.1. SVM Model for Predctng Admsson Decson...26 3.2. MLP Model for Predctng Admsson Decson...29 3.3. RBF Network Model for Predctng Admsson Decson...31 3.4. TB Model for Predctng Admsson Decson...32 3.5. KMC Model for Predctng Admsson Decson...33 3.6. Performance Metrcs...35 4. RESULTS AND DISCUSSION...38 4.1. General Dscusson on the Results...111 4.2. Dscusson for Classfcaton Accuracy on the 2006 Dataset...112 4.3. Dscusson for Classfcaton Accuracy on the 2007 Dataset...114 IV

4.4. Dscusson on Other Performance Crtera...115 5. CONCLUSION...120 REFERENCES...122 V

1. Introducton In order to admt a partcpant to the School of Physcal Educaton and Sports at Cukurova Unversty, the canddate has to be succesful n the physcal test appled at the School. There are two parts n the physcal ablty test. Each of these parts contans two sub tests. The vertcal jump test as well as the coordnaton and skll tests are appled n the frst porton of the test. The second porton of the test comprses of the 30-meter dash test and 20-meter shuttle run test (Çukurova Ünverstes Beden Eğtm ve Spor Yüksekokulu, 2015). The detals of each test s gven Secton 1.1 through Secton 1.4. 1.1. The Vertcal Jump Test In ths test, the partcpant wats for resettng of tmng mat wth weght equally balanced on both feet. After the mat s arranged, partcpant jumps vertcally to reach the hghest pont he can and then he steps back on the mat. The score of the vertcal jump test s calculated consderng the tme spent n the ar. [Çukurova Ünverstes Beden Eğtm ve Spor Yüksekokulu, 2015). Fgure 1.1 shows the vertcal jump test. Fgure 1.1.The vertcal jump test 1

1.2. The Coordnaton and Skll Test The coordnaton and skll test has several steps. Ths test starts wth a front somersault. The partcpant gathers a ball, throws the ball up n the ar and gets hold of the ball after passng across the horzontal barrer. After ths step, he jumps over the balance tool and tres to keep stablty whle movng. Back somersault follows these steps. The other two steps are related wth the partcpant s movement capabltes. The last part of the ths test conssts of sldng down and jumpng over the obstacles. The setup of the coordnaton and skll test s shown n Fgure 1.2. 12 m. 1.5 m. 3m 3.8 m. 2 m. 24 m. 3.8 m. 1.5 m. 5.5 m. 3.5 m. 1.7 m. 3.5 m. 2 m. 7.5 m. 1.2 m. 2 m. 5 m. Fgure 1.2.Structure of the coordnaton and skll test. 2

Each partcpant has two opportuntes for the two tests of the frst part. A partcpant s fnal score s calculated usng the hghest scores he acheves. Partcpants who are successful n these two tests are allowed to take part n the second secton of the physcal ablty test (Çukurova Ünverstes Beden Eğtm ve Spor Yüksekokulu, 2015). 1.3. 30-meter Dash Test 30-meter dash test evaluates the ablty of partcpant's to quckly gan speed on 30 meters. The test contans runnng one maxmum sprnt over 30 meters. At the begnnng of the test, one foot should be one step ahead. The front foot has to be on the startng lne. The partcpant should hold on to ths poston for two seconds and movng snot permtted. The nstructor should provde nformaton for the acceleraton of the partcpant and motvate to contnue runnng hard through the fnsh lne. The tme starts wth the frst movement of partcpant or f there exsts tme system when t s trggered, and fnshes when the partcpant reaches the fnsh lne or the fnshng system s trggered (TopendSports, 2015). Each partcpant has two trals and the lowest completon tme of the partcpant s recorded. Illustraton of the 30-meter dash s gven n Fgure 1.3. Fgure 1.3. 30-meter dash test 3

1.4. 20-meter Shuttle Run Test The shuttle run test s performed at a ptch whch has a dstance of 20 meters between the two end ponts. Ths testsalso called the 'beep' test. The partcpant stands behnd the startng lne and starts runnng when nstructed. In the begnng, the speed of partcpant s comparatvely slow. Runnng between the two lnes should be contnued n accordance wth the recorded beeps. As the test progresses, speed of the partcpant s ncreased wth every beep sound and the frequency of beeps s gradually ncreased. If the partcpant arrves the lne before the beep sounds, he must hold up untl the beep sounds before contnung. If the partcpant can not arrve before the beep sounds, he s gven a warnng and must keep on runnng to the lne, then turns and attempts to get up to speed wth the pace wthn two more beeps. The test s completed when partcpant gets two falures n consecutve (TopendSports, 2015). Illustraton of 20-meter shuttle run test s shown n Fgure 1.4. Fgure 1.4. 20-meter shuttle run test 1.5. Determnng a Partcpant's Admsson Decson A partcpant s admsson decson s related wth the partcpant s total scores from the physcal ablty test together wth hs Natonal Student Selecton Examnaton (NSSE) and Natonal Student Placement Examnaton (NSPE) scores and Grade Pont Average (GPA) at hgh school. The overall score of a partcpant who graduated from a sports branch at hgh school s calculated by Equaton (1.1). 4

0.52 0.36 OVERALL SCORE PATS X GPA X NSPE (1.1.) The overall score of a partcpant who graduated from another area at hgh school s calculated by Equaton (2). 0.16 0.47 OVERALL SCORE PATS X GPA X NSPE (1.2.) In Equaton (1) and Equaton (2), PATS s the physcal ablty test score. After the overall scores are calculated for all partcpants, the scores are sorted n descendng order for each partcpant and a pre-defned number of partcpants are accepted to the School. 1.6. Prevous Work Developng admsson decson predcton models has been an actve reserach area for several years. In ths regard, there exst a few studes n lterature whch have attempted to predct the admsson decson of a canddate to the School of Physcal Educaton and Sports of Cukurova Unversty by usng dfferent machne learnng methods. The frst study n ths feld was carred out by (Ackkar and Akay, 2008). Multlayer Perceptron (MLP) has been used to develop admsson decson predcton models. Two datasets consstng of the real test results of partcpants n the years of 2006 and 2007 have been used. Several performance metrcs ncludng classfcaton accuracy, senstvty, specfcty, postve predctve value (PPV) and negatve predctve value (NPV) have been reported for the developed predcton models. The authors concluded that MLP was a feasble tool n ths applcaton doman. In a follow-up work by (Ackkar and Akay, 2009), Support Vector Machnes (SVM) based admsson decson predcton models have been developed. The same datasets were used to develop the admsson decson predcton models. It was shown that SVM-based predcton models perform slghtly better than MLP-based 5

predcton models. It was concluded that the SVM classfcaton can be a useful tool for ths applcaton area. Detaled results of the SVM and MLP predcton models on the 2006 and 2007 datasets for (Açıkkar and Akay, 2009; Açıkkar and Akay, 2008) are gven n Table 1.1 and Table 1.2, respectvely. Table 1.1 Results of the SVM and MLP predcton models for the 2006 dataset Study Accuracy (%) Senstvty (%) Specfcty (%) PPV (%) NPV(%) Açıkkar and Akay, 2009 Açıkkar and Akay, 2008 97.94 97.50 98.00 96.00 99.05 97.17 92.50 99.00 97.50 97.14 Table 1.2 Results of the SVM and MLP predcton models for the 2007 dataset Study Accuracy (%) Senstvty (%) Specfcty (%) PPV (%) NPV (%) Açıkkar and Akay, 2009 Açıkkar and Akay, 2008 93.12 85.00 97.42 94.92 92.99 90.51 85.00 93.46 89.33 93.19 In (Akay et al., 2014) the authors used MLP on the same datasets to develop admsson decson predcton models. Each dataset has been splt nto tran and test sets usng dfferent ratos. More specfcally, the datasets are proportonally grouped n 90-10%, 80-20%, 70-30%, 60-40% and 50-50% portons. For the 2006 dataset, the hghest classfcaton accuracy (.e. 96.20%) has been obtaned for the case of 80-20% splt whereas for the 2007 dataset, the hghest classfcaton accuracy (.e. 86.34%) has been obtaned for the case of 80-20% splt. There are also other studes n lterature whch developed admsson decson predcton models by usng dfferent machne learnng methods. In (Abut et al., 2015), the authors used four classfcaton methods ncludng SVM, Logstc Regresson, RBF Network and K-Means Clusterng (KMC) on the 2006 dataset by 6

employng dfferent valdaton technques. The results have shown that the performance of SVM employng 10-fold cross valdaton s superor compared to the performance of other methods. The reported value for the hghest classfcaton accuracy was 97.90%. In (Turhan, 2015), the author used dfferent machne learnng algorthms ncludng SVM, MLP, Logstc Regresson, RBF Network, Sngle Decson Tree and KMC on the 2006 and 2007 datasets. Classfcaton accuracy and several other performance metrcs have been used to assess the performance of the machne learnng methods on the datasets. SVM usng 10-fold cross valdaton acheves the hghest accuracy wth 97.90% and 91.45% for the 2006 and 2007 datasets, respectvely. The rankng among the sx classfers n terms of acheved classfcaton accuracy has been determned as SVM, Logstc Regresson, MLP, RBF Network, Sngle Decson Tree and KMC. 1.7. Feature Selecton Feature selecton s helpful n locatng the dscrmnatve features that are the most approprate to predct the class. Feature selecton s used n data mnng and statstcs. The basc approach of feature selecton s to choose a subset of nput varables by removng non-relevant features. In theory, f one knew the full statstcal dstrbuton, the use of more features can yeld better results. On the other hand, snce usng a large number of features consume memory and tme, dong so may lead the algorthms become wasteful. Therefore, n the preprocessng step, t may be advantageous to pck the relevant and necessary features. Obvously, the favorable crcumstances of utlzng feature selecton may be enhancng comprehensblty and brngng down cost of data acquston and handlng. As a result of all the advantages, feature selecton has 7

nterested much consderaton nsde the machne learnng, artfcal ntellgent and data mnng communtes (Sun et al., 2011). 1.8. Motvaton, Purpose and Contrbutons of Ths Thess There s only one study (Açıkkar et al., 2014) n lterature that uses machne learnng methods combned wth a feature selecton algorthm to develop hybrd admsson decson predcton models for the School of Physcal Educaton and Sports at Cukurova Unversty. In (Açıkkar et al., 2014), MLP combned wth a feature selecton algorthm has been used to develop a predcton model to predct the admsson decson. As a feature selecton algorthm, Relef-F has been selected. Predctor varables are gender, NSSE and NSPE scores, GPA, area at hgh school and scores from coordnaton and skll test, vertcal jump test, 30-meter dash test and 20- meter shuttle run test. The results have shown that the model ncludng all the predctor varables yelds the best classfcaton accuraces, ndependent of whch actvaton functon has been used at the output layer. Among the results obtaned by usng dfferent actvaton functons, the double layered MLP model usng the lnear actvaton functon has specalzaton yelded the best classfcaton accuracy (.e. 96.50%). Apparently, more research s requred wth the help of several dfferent machne learnng methods combned wth dfferent feature selecton algorthms n order to dentfy the effect of the predctor varables on the admsson decson. The purpose of ths thess s to develop new hybrd admsson decson predcton models by usng dfferent machne learnng methods ncludng SVM, MLP, RBF Network, TB and KMC combned wth feature selecton algorthms to nvestgate the effect of the predctor varables on the admsson decson of a canddate to the School of Physcal Educaton and Sports at Cukurova Unversty. Wth the help of the predcton models that have been developed n ths thess, the partcpant can have an 8

dea about the mportance of each predctor varable and hence prepare for the physcal test usng an approprte tranng program. In ths thess, two datasets, namely 2006 dataset and 2007 dataset, have been utlzed. By usng the Relef-F and F-score feature selecton algorthms, rankng of the attrbutes has been calculated. Then, based on these rankng scores, several models have been developed by removng the attrbute wth the lowest score at a tme. In contrast to Relef-F and F-score algorthms, the CFS algorthm gves a set of selected varables to develop a sngle model. The models have been evaluated usng dfferent machne learnng methods ncludng SVM, MLP, RBF, TB and KMC. For model testng, 10-fold cross valdaton and percentage splts of data have been used. The performances of the machne learnng methods for two datasets have been assessed utlzng classfcaton accuracy, specfcty, senstvty, PPV and NPV. Ths thess has two man contrbutons when compared to the studes n lterature. Frst of all, ths s the frst thess n lterature that develops hybrd admsson decson predcton models to the School of Physcal Educaton and Sports at Cukurova Unversty usng several machne learnng methods combned wth dfferent feature selecton algorthms. Secondly, by ntegratng feature selecton algorthms nto machne learnng methods, ths thess yelds dscrmnatng the useful and redundant features for admsson decson predcton. 1.9. Overvew of Datasets Two dfferent datasetshave been used n ths thess. The datasets were provded by the School of Physcal Educaton and Sports of Cukurova Unversty. These datasets nclude data of partcpants who performed the physcal ablty tests n the years of 2006 and 2007. They contan nne attrbutes ncludng gender, the scores from the NSSE and NSPE, GPA, the specalzaton area at hgh school, the scores from the vertcal jump test, coordnaton and skll test, 30-meter dash test and 20-9

meter shuttle run test. The datasets also nclude two classes. These classes are represented wth the values 0 and 1, where 0 means reject and 1 means admt. There are 143 subjects (87 males and 56 females) and 117 subjects (73 males and 44 females) n the datasets of 2006 and 2007, respectvely. The predctor varables and ther statstcs for each dataset are gven n Table 1.3. and Table 1.4., respectvely. Table 1.3. Statstcal analyss of each predctor varable for the 2006 dataset Standard Predctor Varable Mnmum Maxmum Mean Devaton Gender 0 1 0.61 0.49 NSSE 165.34 259.89 205.88 19.48 NSPE 186.78 262.93 226.20 14.90 GPA 37.88 93.46 75.98 10.89 Specalzaton area 0 1 0.14 0.348 Vertcal jump test score 23 59 39.87 8.076 Coordnaton and skll test score 25.37 46.13 30.67 3.36 30-meter dash test score 3.74 5.32 4.30 0.39 Table 1.4. Statstcal analyss of each predctor varable for the 2007 dataset Standard Predctor Varable Mnmum Maxmum Mean Devaton Gender 0 1 0.62 0.49 NSSE 184.21 274.59 217.07 20.69 NSPE 190.46 283.21 237.68 14.69 GPA 37.03 99.38 77.27 10.38 Specalzaton area 0 1 0.094 0.29 Vertcal jump test score 24 65 50 9.39 Coordnaton and skll test score 25.47 39.73 29.86 3.06 30-meter dash test score 3.74 5.94 4.26 0.40 10

2. Overvew of Methods 2.1. Support Vector Machnes SVM s related to statstcal learnng theory (Vapnk, 1999), whch was ntroduced n 1992 (Boser, 1992). The SVM solves classfcaton problems by utlzng an adaptable representaton of the class lmts by executng automatc complexty control. Wth the help of ths feature, the problem of overfttng s reduced. A. Lnear SVM Assume a tranng set of N data ponts, S { x, y } k k where x R n k s an nput vector and y R k s an output vector. The SVM problems are related wth hyperplanes whch separate the data. The equaton of the hyperplanes are determned by a vector t w and a bas b. Decson functon for the hyperplaness w. x b 0. The margn of separaton can be made maxmum by constructng the optmal hyperplane. The support vector method technque ams at buldng a classfer t f ( x) sgn( w. x b). (2.1.) The w and b parameters are lmted wth mn w. x b 1. (2.2.) After the vectors are dvded wthout any problem, and as a result of ths dvdng process, f the space between the nearest vector and the hyperplane s 11

maxmum, t s called to be dvded by a hyperplane. Consequently, a dvdng hyperplane n standard form has to fulfll the constrants gven n (2.3), y w. x b 1, 1,2,..., n. (2.3.) A pont x havng the dstance d from the hyperplane w,b s, y( x. w b) 1 d(( w, b), x) (2.4.) w w can be calculated as 2. (2.5.) w Hence, SVM searches for a separatng hyperplane by mnmzng w prncple, 1 2 w w. w. (2.6.) n (2.6) can be mnmzed by performng the structural rsk mnmzaton 2 w c. (2.7.) h s the seres of standard hyperplanes n space that has n-dmenson and s lmted by, 12

2 h mn R c, d 1, (2.8.) n whch R s a hypersphere s radus surroundng all tranng vectors. As a result of ths, mnmzng (2.6) s equal to mnmzaton of the upper bound. The lmtatons of (2.3) can be reduced by presentng slack varables 0, 1,2,..., n, therefore (2.3) can be rewrtten as y. w. x b 1, 1,2,..., n. (2.9.) Under these crcumstances, the problem of optmzaton becomes 1 2 n w, w. w C. (2.10.) 1 In (2.10.)C s a user specfed postve fxed constant. The saddle pont of Lagrangan functon s utlzed n the soluton of the problem gven n (2.10). n n n 1 Lw, b,,, w. w C y w. x b 1 2 (2.11.) 1 1 1 In (2.11), 0, 0, 1,2,..., n are Lagrange multplers. (2.11) must be solved n terms of w, b, and. Classcal Lagrangan dualty empowers the frst ssue, turnng (2.11) nto a dual problem of t and ths makes the soluton easer. (2.12) shows the dual problem to be solved 1 max. 1 2, j1 n n j y y j x x j (2.12.) 13

wth constrants n y 0, 0 C, 1,2,..., n. (2.13.) 1 There s a unque soluton for ths classc quadratc optmzaton problem. In respect of the Kuhn-Tucker theorem of optmzaton theory (2.16.), y w. x b 1 0, 1,2,..., n. (2.14.) If x satsfy (2.15), then (2.14) wll have non-zero Lagrange multplers y wx. b 1. (2.15.) The ponts n (2.15.) are called support vectors (SV s). Small subset of the tranng vectors of the SV s determnes the hyperplane. Therefore, f the optmal * soluton does not take a value of zero, the classfer functon can be represented as n f x y x x b 1 sgn *. *. (2.16.) * * In (2.16.) b s the soluton of (14) for any non-zero. B. Non-lnear SVM 14

The majorty of the datasets can not be decently dvded by a lnear separatng hyperplane. However, they can be lnearly dvded f mapped nto a hgher dmensonal feld by utlzng a nonlnear mappng. Therefore, z () x that converts the nput vector x havng a dmenson d nto a vector z havng a dmenson d s defned and () s selected so that { ( x, y)} (new tranng data) s dvsble wth a hyperplane. The data ponts from the nput space nto some space of hgher dmenson are mapped by usng the functon n nh.: R R. (2.17.) Optmal functon (2.9) transforms (2.18) usng the same constrants, where 1 max, 1 2, j1 n n j y y jk x xj (2.18.) x, x j x. x j K (2.19.) s the kernel functon. The exponental kernel functon s gven by (2.20) K x, xj exp x x 2 2 2 j, (2.20.) 15

whereas the polynomal kernel functon s gven by (2.21) K x x j x x j q,. 1, q 1,2,... (2.21.) where the parameters and q n (2.20) and (2.21) have to be prearranged. V W X 1 Then, the classfer functon s specfed as n f yk b 1 x sgn * x, x * (2.22.) and b * * s the soluton of (2.22) for any non-zero. 2.2. Mult-Layer Perceptron An MLP s a type of artfcal neural network model and also feed-forward process. Ths model maps a set of nput data over a set of convenent outputs. An MLP ncludes multple layers of nodes n a coordnated graph and each layer s completely connected to the followng one. Each node s a neuron wth a nonlnear actvaton functon except the nput nodes. Ths method uses a handled learnng technque. Ths technque s named as back propagaton n order to tran the network. MLP s an alteraton of the standard lnear perceptron and can recognze nseparable data (Imre and Durucan, 2000). An MLP has a form as gven Fgure 2.1. v j w jk O 1 X 2 O 2..... X. 16 O k H j

Fgure 2.1. A typcal MLP Structure A data model consstng of the x values used n the nput layer th s transmtted through the network towards j n the frst hdden layer. The weghted outputs w x j are receved of the prevous layer s unts per hdden unt. The outputs from ths process are summed. After that, these values are to be turned nto an output value usng an actvaton functon. ( n Actvaton functon s f ) () x at layer n. The output unt of has two-layer. w k (2.23.) out f ( out. w ) f ( f ( n w ). w ) (2) (2) (1) (2) (2) (1) (1) (2) k j jk j jk j j If the actvatons at hdden layer are lnear, (2.23.) s reduced to. (2.24.) out f out w f f n w 2 (2) (1) (2) (2) (1) (2) k ( j. jk ) ( ( j ) j j Nevertheless, (2.24) s equal to a network havng one layer wth weght j w w (1) (2) j jk. Ths network can not be used on non-lnearly separable problems. A. Non-Lnear Actvaton/Transfer Functon The values of the logstc sgmod functon range from 0 to 1. The standard sgmod s bascally used wth the hyperbolc tangent. It has the feature, f ( x) tanh( x) 2 Sgmod (2 x) 1 f ( x) f ( x), (2.25.) and ts dervatve s gven by (2.26.) 2 ( ) 1 ( ). (2.26.) f x f x 17

B. Learnng The same steps are used for tranng N layer neural networks as the networks ( n) havng a sngle layer. The network weghts w are set to make the output cost functon gven n (2.27) mnmum, or E 1 SSE 2 CE p ( N ) p 2 ( t arg j out j ), (2.27.) p j p ( n) p p ( N ) p [ arg j.log( j ) (1 arg j ).log(1 j )] (2.28.) p j E t out t out and once agan ths can be done by a seres of gradent descent weght changes out ( N ) j E w ( m) kl ( n) ({ wj }) ( m) wkl. (2.29.) s the only output of the last layer. Ths becomes apparent n the error functon E of the output. However, outputs of the fnal layer are related wth all the weghts of ( n) the prevous layers. Ths learnng algorthm automatcally sets out of the prevous layers. C. Tranng j j Tranng for mult-layer networks performs n same way wth networks havng a sngle layer: 18