Understanding the difficulty of training deep feedforward neural networks

Size: px
Start display at page:

Download "Understanding the difficulty of training deep feedforward neural networks"

Transcription

1 Understandng the dffculty of tranng deep feedforward neural networks Xaver Glorot Yoshua Bengo DIRO, Unversté de Montréal, Montréal, Québec, Canada Abstract Whereas before 2006 t appears that deep multlayer neural networks were not successfully traned, snce then several algorthms have been shown to successfully tran them, wth expermental results showng the superorty of deeper vs less deep archtectures. All these expermental results were obtaned wth new ntalzaton or tranng mechansms. Our objectve here s to understand better why standard gradent descent from random ntalzaton s dong so poorly wth deep neural networks, to better understand these recent relatve successes and help desgn better algorthms n the future. We frst observe the nfluence of the non-lnear actvatons functons. We fnd that the logstc sgmod actvaton s unsuted for deep networks wth random ntalzaton because of ts mean value, whch can drve especally the top hdden layer nto saturaton. Surprsngly, we fnd that saturated unts can move out of saturaton by themselves, albet slowly, and explanng the plateaus sometmes seen when tranng neural networks. We fnd that a new non-lnearty that saturates less can often be benefcal. Fnally, we study how actvatons and gradents vary across layers and durng tranng, wth the dea that tranng may be more dffcult when the sngular values of the Jacoban assocated wth each layer are far from 1. Based on these consderatons, we propose a new ntalzaton scheme that brngs substantally faster convergence. 1 Deep Neural Networks Deep learnng methods am at learnng feature herarches wth features from hgher levels of the herarchy formed by the composton of lower level features. They nclude Appearng n Proceedngs of the 13 th Internatonal Conference on Artfcal Intellgence and Statstcs (AISTATS) 2010, Cha Laguna Resort, Sardna, Italy. Volume 9 of JMLR: W&CP 9. Copyrght 2010 by the authors. learnng methods for a wde array of deep archtectures, ncludng neural networks wth many hdden layers (Vncent et al., 2008) and graphcal models wth many levels of hdden varables (Hnton et al., 2006), among others (Zhu et al., 2009; Weston et al., 2008). Much attenton has recently been devoted to them (see (Bengo, 2009) for a revew), because of ther theoretcal appeal, nspraton from bology and human cognton, and because of emprcal success n vson (Ranzato et al., 2007; Larochelle et al., 2007; Vncent et al., 2008) and natural language processng (NLP) (Collobert & Weston, 2008; Mnh & Hnton, 2009). Theoretcal results revewed and dscussed by Bengo (2009), suggest that n order to learn the knd of complcated functons that can represent hgh-level abstractons (e.g. n vson, language, and other AI-level tasks), one may need deep archtectures. Most of the recent expermental results wth deep archtecture are obtaned wth models that can be turned nto deep supervsed neural networks, but wth ntalzaton or tranng schemes dfferent from the classcal feedforward neural networks (Rumelhart et al., 1986). Why are these new algorthms workng so much better than the standard random ntalzaton and gradent-based optmzaton of a supervsed tranng crteron? Part of the answer may be found n recent analyses of the effect of unsupervsed pretranng (Erhan et al., 2009), showng that t acts as a regularzer that ntalzes the parameters n a better basn of attracton of the optmzaton procedure, correspondng to an apparent local mnmum assocated wth better generalzaton. But earler work (Bengo et al., 2007) had shown that even a purely supervsed but greedy layer-wse procedure would gve better results. So here nstead of focusng on what unsupervsed pre-tranng or sem-supervsed crtera brng to deep archtectures, we focus on analyzng what may be gong wrong wth good old (but deep) multlayer neural networks. Our analyss s drven by nvestgatve experments to montor actvatons (watchng for saturaton of hdden unts) and gradents, across layers and across tranng teratons. We also evaluate the effects on these of choces of actvaton functon (wth the dea that t mght affect saturaton) and ntalzaton procedure (snce unsupervsed pretranng s a partcular form of ntalzaton and t has a drastc mpact). 249

2 2 Expermental Settng and Datasets Understandng the dffculty of tranng deep feedforward neural networks Code to produce the new datasets ntroduced n ths secton s avalable from: ca/ lsa/twk/bn/vew.cg/publc/ DeepGradentsAISTATS Onlne Learnng on an Infnte Dataset: Shapeset-3 2 Recent work wth deep archtectures (see Fgure 7 n Bengo (2009)) shows that even wth very large tranng sets or onlne learnng, ntalzaton from unsupervsed pretranng yelds substantal mprovement, whch does not vansh as the number of tranng examples ncreases. The onlne settng s also nterestng because t focuses on the optmzaton ssues rather than on the small-sample regularzaton effects, so we decded to nclude n our experments a synthetc mages dataset nspred from Larochelle et al. (2007) and Larochelle et al. (2009), from whch as many examples as needed could be sampled, for testng the onlne learnng scenaro. We call ths dataset the Shapeset-3 2 dataset, wth example mages n Fgure 1 (top). Shapeset-3 2 contans mages of 1 or 2 two-dmensonal objects, each taken from 3 shape categores (trangle, parallelogram, ellpse), and placed wth random shape parameters (relatve lengths and/or angles), scalng, rotaton, translaton and grey-scale. We notced that for only one shape present n the mage the task of recognzng t was too easy. We therefore decded to sample also mages wth two objects, wth the constrant that the second object does not overlap wth the frst by more than ffty percent of ts area, to avod hdng t entrely. The task s to predct the objects present (e.g. trangle + ellpse, parallelogram + parallelogram, trangle alone, etc.) wthout havng to dstngush between the foreground shape and the background shape when they overlap. Ths therefore defnes nne confguraton classes. The task s farly dffcult because we need to dscover nvarances over rotaton, translaton, scalng, object color, occluson and relatve poston of the shapes. In parallel we need to extract the factors of varablty that predct whch object shapes are present. The sze of the mages are arbtrary but we fxed t to n order to work wth deep dense networks effcently. 2.2 Fnte Datasets The MNIST dgts (LeCun et al., 1998a), dataset has 50,000 tranng mages, 10,000 valdaton mages (for hyper-parameter selecton), and 10,000 test mages, each showng a grey-scale pxel mage of one of the 10 dgts. CIFAR-10 (Krzhevsky & Hnton, 2009) s a labelled sub- Fgure 1: Top: Shapeset-3 2 mages at resoluton. The examples we used are at resoluton. The learner tres to predct whch objects (parallelogram, trangle, or ellpse) are present, and 1 or 2 objects can be present, yeldng 9 possble classfcatons. Bottom: Small-ImageNet mages at full resoluton. set of the tny-mages dataset that contans 50,000 tranng examples (from whch we extracted 10,000 as valdaton data) and 10,000 test examples. There are 10 classes correspondng to the man object n each mage: arplane, automoble, brd, cat, deer, dog, frog, horse, shp, or truck. The classes are balanced. Each mage s n color, but s just pxels n sze, so the nput s a vector of = 3072 real values. Small-ImageNet whch s a set of tny gray level mages dataset computed from the hgher-resoluton and larger set at wth labels from the WordNet noun herarchy. We have used 90,000 examples for tranng, 10,000 for the valdaton set, and 10,000 for testng. There are 10 balanced classes: reptles, vehcles, brds, mammals, fsh, furnture, nstruments, tools, flowers and fruts Fgure 1 (bottom) shows randomly chosen examples. 2.3 Expermental Settng We optmzed feedforward neural networks wth one to fve hdden layers, wth one thousand hdden unts per layer, and wth a softmax logstc regresson for the output layer. The cost functon s the negatve log-lkelhood log P (y x), where (x, y) s the (nput mage, target class) par. The neural networks were optmzed wth stochastc back-propagaton on mn-batches of sze ten,.e., the average g log P (y x) was computed over

3 Xaver Glorot, Yoshua Bengo tranng pars (x, y) and used to update parameters n that drecton, wth g. The learnng rate s a hyperparameter that s optmzed based on valdaton set error after a large number of updates (5 mllon). We vared the type of non-lnear actvaton functon n the hdden layers: the sgmod 1/(1 + e x ), the hyperbolc tangent tanh(x), and a newly proposed actvaton functon (Bergstra et al., 2009) called the softsgn, x/(1 + x ). The softsgn s smlar to the hyperbolc tangent (ts range s -1 to 1) but ts tals are quadratc polynomals rather than exponentals,.e., t approaches ts asymptotes much slower. In the comparsons, we search for the best hyperparameters (learnng rate and depth) separately for each model. Note that the best depth was always fve for Shapeset-3 2, except for the sgmod, for whch t was four. We ntalzed the bases to be 0 and the weghts W j at each layer wth the followng commonly used heurstc: h 1 1 W j U p, p, (1) n n where U[ a, a] s the unform dstrbuton n the nterval ( a, a) and n s the sze of the prevous layer (the number of columns of W ). 3 Effect of Actvaton Functons and Saturaton Durng Tranng Two thngs we want to avod and that can be revealed from the evoluton of actvatons s excessve saturaton of actvaton functons on one hand (then gradents wll not propagate well), and overly lnear unts (they wll not compute somethng nterestng). 3.1 Experments wth the Sgmod The sgmod non-lnearty has been already shown to slow down learnng because of ts none-zero mean that nduces mportant sngular values n the Hessan (LeCun et al., 1998b). In ths secton we wll see another symptomatc behavor due to ths actvaton functon n deep feedforward networks. We want to study possble saturaton, by lookng at the evoluton of actvatons durng tranng, and the fgures n ths secton show results on the Shapeset-3 2 data, but smlar behavor s observed wth the other datasets. Fgure 2 shows the evoluton of the actvaton values (after the nonlnearty) at each hdden layer durng tranng of a deep archtecture wth sgmod actvaton functons. Layer 1 refers to the output of frst hdden layer, and there are four hdden layers. The graph shows the means and standard devatons of these actvatons. These statstcs along wth hstograms are computed at dfferent tmes durng learnng, by lookng at actvaton values for a fxed set of 300 test examples. Fgure 2: Mean and standard devaton (vertcal bars) of the actvaton values (output of the sgmod) durng supervsed learnng, for the dfferent hdden layers of a deep archtecture. The top hdden layer quckly saturates at 0 (slowng down all learnng), but then slowly desaturates around epoch 100. We see that very quckly at the begnnng, all the sgmod actvaton values of the last hdden layer are pushed to ther lower saturaton value of 0. Inversely, the others layers have a mean actvaton value that s above 0.5, and decreasng as we go from the output layer to the nput layer. We have found that ths knd of saturaton can last very long n deeper networks wth sgmod actvatons, e.g., the depthfve model never escaped ths regme durng tranng. The bg surprse s that for ntermedate number of hdden layers (here four), the saturaton regme may be escaped. At the same tme that the top hdden layer moves out of saturaton, the frst hdden layer begns to saturate and therefore to stablze. We hypothesze that ths behavor s due to the combnaton of random ntalzaton and the fact that an hdden unt output of 0 corresponds to a saturated sgmod. Note that deep networks wth sgmods but ntalzed from unsupervsed pre-tranng (e.g. from RBMs) do not suffer from ths saturaton behavor. Our proposed explanaton rests on the hypothess that the transformaton that the lower layers of the randomly ntalzed network computes ntally s not useful to the classfcaton task, unlke the transformaton obtaned from unsupervsed pre-tranng. The logstc layer output softmax(b + Wh) mght ntally rely more on ts bases b (whch are learned very quckly) than on the top hdden actvatons h derved from the nput mage (because h would vary n ways that are not predctve of y, maybe correlated mostly wth other and possbly more domnant varatons of x). Thus the error gradent would tend to push Wh towards 0, whch can be acheved by pushng h towards 0. In the case of symmetrc actvaton functons lke the hyperbolc tangent and the softsgn, sttng around 0 s good because t allows gradents to flow backwards. However, pushng the sgmod outputs to 0 would brng them nto a saturaton regme whch would prevent gradents to flow backward and prevent the lower layers from learnng useful features. Eventually but slowly, the lower layers move toward more useful features and the top hdden layer then moves out of the saturaton regme. Note however that, even after ths, the network moves nto a soluton that s of poorer qualty (also n terms of generalzaton) 251

4 Understandng the dffculty of tranng deep feedforward neural networks then those found wth symmetrc actvaton functons, as can be seen n fgure 11. where the gradents would flow well. 3.2 Experments wth the Hyperbolc tangent As dscussed above, the hyperbolc tangent networks do not suffer from the knd of saturaton behavor of the top hdden layer observed wth sgmod networks, because of ts symmetry around 0. However, wth our standard weght h ntalzaton U p1 1 n, pn, we observe a sequentally occurrng saturaton phenomenon startng wth layer 1 and propagatng up n the network, as llustrated n Fgure 3. Why ths s happenng remans to be understood. Fgure 4: Actvaton values normalzed hstogram at the end of learnng, averaged across unts of the same layer and across 300 test examples. Top: actvaton functon s hyperbolc tangent, we see mportant saturaton of the lower layers. Bottom: actvaton functon s softsgn, we see many actvaton values around (-0.6,-0.8) and (0.6,0.8) where the unts do not saturate but are non-lnear. 4 Studyng Gradents and ther Propagaton 4.1 Effect of the Cost Functon Fgure 3: Top:98 percentles (markers alone) and standard devaton (sold lnes wth markers) of the dstrbuton of the actvaton values for the hyperbolc tangent networks n the course of learnng. We see the frst hdden layer saturatng frst, then the second, etc. Bottom: 98 percentles (markers alone) and standard devaton (sold lnes wth markers) of the dstrbuton of actvaton values for the softsgn durng learnng. Here the dfferent layers saturate less and do so together. 3.3 Experments wth the Softsgn The softsgn x/(1+ x ) s smlar to the hyperbolc tangent but mght behave dfferently n terms of saturaton because of ts smoother asymptotes (polynomal nstead of exponental). We see on Fgure 3 that the saturaton does not occur one layer after the other lke for the hyperbolc tangent. It s faster at the begnnng and then slow, and all layers move together towards larger weghts. We can also see at the end of tranng that the hstogram of actvaton values s very dfferent from that seen wth the hyperbolc tangent (Fgure 4). Whereas the latter yelds modes of the actvatons dstrbuton mostly at the extremes (asymptotes -1 and 1) or around 0, the softsgn network has modes of actvatons around ts knees (between the lnear regme around 0 and the flat regme around -1 and 1). These are the areas where there s substantal non-lnearty but We have found that the logstc regresson or condtonal log-lkelhood cost functon ( log P (y x) coupled wth softmax outputs) worked much better (for classfcaton problems) than the quadratc cost whch was tradtonally used to tran feedforward neural networks (Rumelhart et al., 1986). Ths s not a new observaton (Solla et al., 1988) but we fnd t mportant to stress here. We found that the plateaus n the tranng crteron (as a functon of the parameters) are less present wth the log-lkelhood cost functon. We can see ths on Fgure 5, whch plots the tranng crteron as a functon of two weghts for a two-layer network (one hdden layer) wth hyperbolc tangent unts, and a random nput and target sgnal. There are clearly more severe plateaus wth the quadratc cost. 4.2 Gradents at ntalzaton Theoretcal Consderatons and a New Normalzed Intalzaton We study the back-propagated gradents, or equvalently the gradent of the cost functon on the nputs bases at each layer. Bradley (2009) found that back-propagated gradents were smaller as one moves from the output layer towards the nput layer, just after ntalzaton. He studed networks wth lnear actvaton at each layer, fndng that the varance of the back-propagated gradents decreases as we go backwards n the network. We wll also start by studyng the lnear regme. 252

5 Xaver Glorot, Yoshua Bengo From a forward-propagaton pont of vew, to keep nformaton flowng we would lke that 8(, 0 ), V ar[z ]=Var[z 0 ]. (8) From a back-propagaton pont of vew we would smlarly lke to have 8(, 0 ), V = Var. 0 These two condtons transform to: 8, n Var[W ]=1 (10) Fgure 5: Cross entropy (black, surface on top) and quadratc (red, bottom surface) cost as a functon of two weghts (one at each layer) of a network wth two layers, W 1 respectvely on the frst layer and W 2 on the second, output layer. For a dense artfcal neural network usng symmetrc actvaton functon f wth unt dervatve at 0 (.e. f 0 (0) = 1), f we wrte z for the actvaton vector of layer, and s the argument vector of the actvaton functon at layer, we have s = z W + b and z +1 = f(s ). From these defntons we obtan k = f 0 (s k)w l,k k The varances wll be expressed wth respect to the nput, outpout and weght ntalzaton randomness. Consder the hypothess that we are n a lnear regme at the ntalzaton, that the weghts are ntalzed ndependently and that the nputs features varances are the same (= Var[x]). Then we can say that, wth n the sze of layer and x the network nput, Var[z ]=Var[x] (3) f 0 (s k) 1, (4) Y 1 0 =0 n 0Var[W 0 ], (5) We wrte Var[W 0 ] for the shared scalar varance of all weghts at layer 0. Then for a network wth d layers, dy = d n 0 +1Var[W 0 ], (6) = Y 1 0 =0 0 = dy 1 n 0Var[W 0 ] 0 = d. n 0 +1Var[W 0 ] (7) 8, n +1 Var[W ]=1 (11) As a compromse between these two constrants, we mght want to have 8, V ar[w ]= 2 n + n +1 (12) Note how both constrants are satsfed when all layers have the same wdth. If we also have the same ntalzaton for the weghts we could get the followng nterestng propertes: h d Var[x] 8, V = nv ar[w ] (13) 8, V = h nv ar[w ] d (14) We can see that the varance of the gradent on the weghts s the same for all layers, but the varance of the backpropagated gradent mght stll vansh or explode as we consder deeper networks. Note how ths s remnscent of ssues rased when studyng recurrent neural networks (Bengo et al., 1994), whch can be seen as very deep networks when unfolded through tme. The standard ntalzaton that we have used (eq.1) gves rse to varance wth the followng property: nv ar[w ]= 1 3 (15) where n s the layer sze (assumng all layers of the same sze). Ths wll cause the varance of the back-propagated gradent to be dependent on the layer (and decreasng). The normalzaton factor may therefore be mportant when ntalzng deep networks because of the multplcatve effect through layers, and we suggest the followng ntalzaton procedure to approxmately satsfy our objectves of mantanng actvaton varances and back-propagated gradents varance as one moves up or down the network. We call t the normalzed ntalzaton: W U h p 6 p nj + n j+1, p 6 p nj + n j+1 (16) 253

6 Understandng the dffculty of tranng deep feedforward neural networks Gradent Propagaton Study To emprcally valdate the above theoretcal deas, we have plotted some normalzed hstograms of actvaton values, weght gradents and of the back-propagated gradents at ntalzaton wth the two dfferent ntalzaton methods. The results dsplayed (Fgures 6, 7 and 8) are from experments on Shapeset-3 2, but qualtatvely smlar results were obtaned wth the other datasets. We montor the sngular values of the Jacoban matrx assocated wth layer : (17) When consecutve layers have the same dmenson, the average sngular value corresponds to the average rato of nfntesmal volumes mapped from z to z +1, as well as to the rato of average actvaton varance gong from z to z +1. Wth our normalzed ntalzaton, ths rato s around 0.8 whereas wth the standard ntalzaton, t drops down to 0.5. Fgure 6: Actvaton values normalzed hstograms wth hyperbolc tangent actvaton, wth standard (top) vs normalzed ntalzaton (bottom). Top: 0-peak ncreases for hgher layers. 4.3 Back-propagated Gradents Durng Learnng The dynamc of learnng n such networks s complex and we would lke to develop better tools to analyze and track t. In partcular, we cannot use smple varance calculatons n our theoretcal analyss because the weghts values are not anymore ndependent of the actvaton values and the lnearty hypothess s also volated. As frst noted by Bradley (2009), we observe (Fgure 7) that at the begnnng of tranng, after the standard ntalzaton (eq. 1), the varance of the back-propagated gradents gets smaller as t s propagated downwards. However we fnd that ths trend s reversed very quckly durng learnng. Usng our normalzed ntalzaton we do not see such decreasng back-propagated gradents (bottom of Fgure 7). Fgure 7: Back-propagated gradents normalzed hstograms wth hyperbolc tangent actvaton, wth standard (top) vs normalzed (bottom) ntalzaton. Top: 0-peak decreases for hgher layers. What was ntally really surprsng s that even when the back-propagated gradents become smaller (standard ntalzaton), the varance of the weghts gradents s roughly constant across layers, as shown on Fgure 8. However, ths s explaned by our theoretcal analyss above (eq. 14). Interestngly, as shown n Fgure 9, these observatons on the weght gradent of standard and normalzed ntalzaton change durng tranng (here for a tanh network). Indeed, whereas the gradents have ntally roughly the same magntude, they dverge from each other (wth larger gradents n the lower layers) as tranng progresses, especally wth the standard ntalzaton. Note that ths mght be one of the advantages of the normalzed ntalzaton, snce havng gradents of very dfferent magntudes at dfferent layers may yeld to ll-condtonng and slower tranng. Fnally, we observe that the softsgn networks share smlartes wth the tanh networks wth normalzed ntalzaton, as can be seen by comparng the evoluton of actvatons n both cases (resp. Fgure 3-bottom and Fgure 10). 5 Error Curves and Conclusons The fnal consderaton that we care for s the success of tranng wth dfferent strateges, and ths s best llustrated wth error curves showng the evoluton of test error as tranng progresses and asymptotes. Fgure 11 shows such curves wth onlne tranng on Shapeset-3 2, whle Table 1 gves fnal test error for all the datasets studed (Shapeset-3 2, MNIST, CIFAR-10, and Small- ImageNet). As a baselne, we optmzed RBF SVM models on one hundred thousand Shapeset examples and obtaned 59.47% test error, whle on the same set we obtaned 50.47% wth a depth fve hyperbolc tangent network wth normalzed ntalzaton. These results llustrate the effect of the choce of actvaton and ntalzaton. As a reference we nclude n Fg- 254

7 Xaver Glorot, Yoshua Bengo Fgure 8: Weght gradent normalzed hstograms wth hyperbolc tangent actvaton just after ntalzaton, wth standard ntalzaton (top) and normalzed ntalzaton (bottom), for dfferent layers. Even though wth standard ntalzaton the back-propagated gradents get smaller, the weght gradents do not! Table 1: Test error wth dfferent actvaton functons and ntalzaton schemes for deep networks wth 5 hdden layers. N after the actvaton functon name ndcates the use of normalzed ntalzaton. Results n bold are statstcally dfferent from non-bold ones under the null hypothess test wth p = TYPE Shapeset MNIST CIFAR-10 ImageNet Softsgn Softsgn N Tanh Tanh N Sgmod ure 11 the error curve for the supervsed fne-tunng from the ntalzaton obtaned after unsupervsed pre-tranng wth denosng auto-encoders (Vncent et al., 2008). For each network the learnng rate s separately chosen to mnmze error on the valdaton set. We can remark that on Shapeset-3 2, because of the task dffculty, we observe mportant saturatons durng learnng, ths mght explan that the normalzed ntalzaton or the softsgn effects are more vsble. Several conclusons can be drawn from these error curves: The more classcal neural networks wth sgmod or hyperbolc tangent unts and standard ntalzaton fare rather poorly, convergng more slowly and apparently towards ultmately poorer local mnma. The softsgn networks seem to be more robust to the ntalzaton procedure than the tanh networks, presumably because of ther gentler non-lnearty. For tanh networks, the proposed normalzed ntalzaton can be qute helpful, presumably because the layer-to-layer transformatons mantan magntudes of Fgure 9: Standard devaton ntervals of the weghts gradents wth hyperbolc tangents wth standard ntalzaton (top) and normalzed (bottom) durng tranng. We see that the normalzaton allows to keep the same varance of the weghts gradent across layers, durng tranng (top: smaller varance for hgher layers). Fgure 10: 98 percentle (markers alone) and standard devaton (sold lnes wth markers) of the dstrbuton of actvaton values for hyperbolc tangent wth normalzed ntalzaton durng learnng. actvatons (flowng upward) and gradents (flowng backward). Others methods can allevate dscrepances between layers durng learnng, e.g., explotng second order nformaton to set the learnng rate separately for each parameter. For example, we can explot the dagonal of the Hessan (LeCun et al., 1998b) or a gradent varance estmate. Both those methods have been appled for Shapeset-3 2 wth hyperbolc tangent and standard ntalzaton. We observed a gan n performance but not reachng the result obtaned from normalzed ntalzaton. In addton, we observed further gans by combnng normalzed ntalzaton wth second order methods: the estmated Hessan mght then focus on dscrepances between unts, not havng to correct mportant ntal dscrepances between layers. In all reported experments we have used the same number of unts per layer. However, we verfed that we obtan the same gans when the layer sze ncreases (or decreases) wth layer number. The other conclusons from ths study are the followng: Montorng actvatons and gradents across layers and 255

8 Understandng the dffculty of tranng deep feedforward neural networks Fgure 11: Test error durng onlne tranng on the Shapeset-3 2 dataset, for varous actvaton functons and ntalzaton schemes (ordered from top to bottom n decreasng fnal error). N after the actvaton functon name ndcates the use of normalzed ntalzaton. Fgure 12: Test error curves durng tranng on MNIST and CIFAR10, for varous actvaton functons and ntalzaton schemes (ordered from top to bottom n decreasng fnal error). N after the actvaton functon name ndcates the use of normalzed ntalzaton. tranng teratons s a powerful nvestgatve tool for understandng tranng dffcultes n deep nets. Sgmod actvatons (not symmetrc around 0) should be avoded when ntalzng from small random weghts, because they yeld poor learnng dynamcs, wth ntal saturaton of the top hdden layer. Keepng the layer-to-layer transformatons such that both actvatons and gradents flow well (.e. wth a Jacoban around 1) appears helpful, and allows to elmnate a good part of the dscrepancy between purely supervsed deep networks and ones pre-traned wth unsupervsed learnng. Many of our observatons reman unexplaned, suggestng further nvestgatons to better understand gradents and tranng dynamcs n deep archtectures. References Bengo, Y. (2009). Learnng deep archtectures for AI. Foundatons and Trends n Machne Learnng, 2, Also publshed as a book. Now Publshers, Bengo, Y., Lambln, P., Popovc, D., & Larochelle, H. (2007). Greedy layer-wse tranng of deep networks. NIPS 19 (pp ). MIT Press. Bengo, Y., Smard, P., & Frascon, P. (1994). Learnng long-term dependences wth gradent descent s dffcult. IEEE Transactons on Neural Networks, 5, Bergstra, J., Desjardns, G., Lambln, P., & Bengo, Y. (2009). Quadratc polynomals learn better mage features (Techncal Report 1337). Département d Informatque et de Recherche Opératonnelle, Unversté de Montréal. Bradley, D. (2009). Learnng n modular systems. Doctoral dssertaton, The Robotcs Insttute, Carnege Mellon Unversty. Collobert, R., & Weston, J. (2008). A unfed archtecture for natural language processng: Deep neural networks wth multtask learnng. ICML Erhan, D., Manzagol, P.-A., Bengo, Y., Bengo, S., & Vncent, P. (2009). The dffculty of tranng deep archtectures and the effect of unsupervsed pre-tranng. AISTATS 2009 (pp ). Hnton, G. E., Osndero, S., & Teh, Y. (2006). A fast learnng algorthm for deep belef nets. Neural Computaton, 18, Krzhevsky, A., & Hnton, G. (2009). Learnng multple layers of features from tny mages (Techncal Report). Unversty of Toronto. Larochelle, H., Bengo, Y., Louradour, J., & Lambln, P. (2009). Explorng strateges for tranng deep neural networks. The Journal of Machne Learnng Research, 10, Larochelle, H., Erhan, D., Courvlle, A., Bergstra, J., & Bengo, Y. (2007). An emprcal evaluaton of deep archtectures on problems wth many factors of varaton. ICML LeCun, Y., Bottou, L., Bengo, Y., & Haffner, P. (1998a). Gradent-based learnng appled to document recognton. Proceedngs of the IEEE, 86, LeCun, Y., Bottou, L., Orr, G. B., & Müller, K.-R. (1998b). Effcent backprop. In Neural networks, trcks of the trade, Lecture Notes n Computer Scence LNCS Sprnger Verlag. Mnh, A., & Hnton, G. E. (2009). A scalable herarchcal dstrbuted language model. NIPS 21 (pp ). Ranzato, M., Poultney, C., Chopra, S., & LeCun, Y. (2007). Effcent learnng of sparse representatons wth an energy-based model. NIPS 19. Rumelhart, D. E., Hnton, G. E., & Wllams, R. J. (1986). Learnng representatons by back-propagatng errors. Nature, 323, Solla, S. A., Levn, E., & Flesher, M. (1988). Accelerated learnng n layered neural networks. Complex Systems, 2, Vncent, P., Larochelle, H., Bengo, Y., & Manzagol, P.-A. (2008). Extractng and composng robust features wth denosng autoencoders. ICML Weston, J., Ratle, F., & Collobert, R. (2008). Deep learnng va sem-supervsed embeddng. ICML 2008 (pp ). New York, NY, USA: ACM. Zhu, L., Chen, Y., & Yulle, A. (2009). Unsupervsed learnng of probablstc grammar-markov models for object categores. IEEE Transactons on Pattern Analyss and Machne Intellgence, 31,

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

Machine Learning 9. week

Machine Learning 9. week Machne Learnng 9. week Mappng Concept Radal Bass Functons (RBF) RBF Networks 1 Mappng It s probably the best scenaro for the classfcaton of two dataset s to separate them lnearly. As you see n the below

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS46: Mnng Massve Datasets Jure Leskovec, Stanford Unversty http://cs46.stanford.edu /19/013 Jure Leskovec, Stanford CS46: Mnng Massve Datasets, http://cs46.stanford.edu Perceptron: y = sgn( x Ho to fnd

More information

Backpropagation: In Search of Performance Parameters

Backpropagation: In Search of Performance Parameters Bacpropagaton: In Search of Performance Parameters ANIL KUMAR ENUMULAPALLY, LINGGUO BU, and KHOSROW KAIKHAH, Ph.D. Computer Scence Department Texas State Unversty-San Marcos San Marcos, TX-78666 USA ae049@txstate.edu,

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification Introducton to Artfcal Intellgence V22.0472-001 Fall 2009 Lecture 24: Nearest-Neghbors & Support Vector Machnes Rob Fergus Dept of Computer Scence, Courant Insttute, NYU Sldes from Danel Yeung, John DeNero

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Deep learning is a good steganalysis tool when embedding key is reused for different images, even if there is a cover source-mismatch

Deep learning is a good steganalysis tool when embedding key is reused for different images, even if there is a cover source-mismatch Deep learnng s a good steganalyss tool when embeddng key s reused for dfferent mages, even f there s a cover source-msmatch Lonel PIBRE 2,3, Jérôme PASQUET 2,3, Dno IENCO 2,3, Marc CHAUMONT 1,2,3 (1) Unversty

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15 CS434a/541a: Pattern Recognton Prof. Olga Veksler Lecture 15 Today New Topc: Unsupervsed Learnng Supervsed vs. unsupervsed learnng Unsupervsed learnng Net Tme: parametrc unsupervsed learnng Today: nonparametrc

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements Module 3: Element Propertes Lecture : Lagrange and Serendpty Elements 5 In last lecture note, the nterpolaton functons are derved on the bass of assumed polynomal from Pascal s trangle for the fled varable.

More information

y and the total sum of

y and the total sum of Lnear regresson Testng for non-lnearty In analytcal chemstry, lnear regresson s commonly used n the constructon of calbraton functons requred for analytcal technques such as gas chromatography, atomc absorpton

More information

Classification / Regression Support Vector Machines

Classification / Regression Support Vector Machines Classfcaton / Regresson Support Vector Machnes Jeff Howbert Introducton to Machne Learnng Wnter 04 Topcs SVM classfers for lnearly separable classes SVM classfers for non-lnearly separable classes SVM

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010 Smulaton: Solvng Dynamc Models ABE 5646 Week Chapter 2, Sprng 200 Week Descrpton Readng Materal Mar 5- Mar 9 Evaluatng [Crop] Models Comparng a model wth data - Graphcal, errors - Measures of agreement

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

Unsupervised Learning

Unsupervised Learning Pattern Recognton Lecture 8 Outlne Introducton Unsupervsed Learnng Parametrc VS Non-Parametrc Approach Mxture of Denstes Maxmum-Lkelhood Estmates Clusterng Prof. Danel Yeung School of Computer Scence and

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated.

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated. Some Advanced SP Tools 1. umulatve Sum ontrol (usum) hart For the data shown n Table 9-1, the x chart can be generated. However, the shft taken place at sample #21 s not apparent. 92 For ths set samples,

More information

Edge Detection in Noisy Images Using the Support Vector Machines

Edge Detection in Noisy Images Using the Support Vector Machines Edge Detecton n Nosy Images Usng the Support Vector Machnes Hlaro Gómez-Moreno, Saturnno Maldonado-Bascón, Francsco López-Ferreras Sgnal Theory and Communcatons Department. Unversty of Alcalá Crta. Madrd-Barcelona

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

USING GRAPHING SKILLS

USING GRAPHING SKILLS Name: BOLOGY: Date: _ Class: USNG GRAPHNG SKLLS NTRODUCTON: Recorded data can be plotted on a graph. A graph s a pctoral representaton of nformaton recorded n a data table. t s used to show a relatonshp

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET 1 BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET TZU-CHENG CHUANG School of Electrcal and Computer Engneerng, Purdue Unversty, West Lafayette, Indana 47907 SAUL B. GELFAND School

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

Collaboratively Regularized Nearest Points for Set Based Recognition

Collaboratively Regularized Nearest Points for Set Based Recognition Academc Center for Computng and Meda Studes, Kyoto Unversty Collaboratvely Regularzed Nearest Ponts for Set Based Recognton Yang Wu, Mchhko Mnoh, Masayuk Mukunok Kyoto Unversty 9/1/013 BMVC 013 @ Brstol,

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

Meta-heuristics for Multidimensional Knapsack Problems

Meta-heuristics for Multidimensional Knapsack Problems 2012 4th Internatonal Conference on Computer Research and Development IPCSIT vol.39 (2012) (2012) IACSIT Press, Sngapore Meta-heurstcs for Multdmensonal Knapsack Problems Zhbao Man + Computer Scence Department,

More information

A Statistical Model Selection Strategy Applied to Neural Networks

A Statistical Model Selection Strategy Applied to Neural Networks A Statstcal Model Selecton Strategy Appled to Neural Networks Joaquín Pzarro Elsa Guerrero Pedro L. Galndo joaqun.pzarro@uca.es elsa.guerrero@uca.es pedro.galndo@uca.es Dpto Lenguajes y Sstemas Informátcos

More information

Comparing Image Representations for Training a Convolutional Neural Network to Classify Gender

Comparing Image Representations for Training a Convolutional Neural Network to Classify Gender 2013 Frst Internatonal Conference on Artfcal Intellgence, Modellng & Smulaton Comparng Image Representatons for Tranng a Convolutonal Neural Network to Classfy Gender Choon-Boon Ng, Yong-Haur Tay, Bok-Mn

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

The Research of Support Vector Machine in Agricultural Data Classification

The Research of Support Vector Machine in Agricultural Data Classification The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1 4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces Range mages For many structured lght scanners, the range data forms a hghly regular pattern known as a range mage. he samplng pattern s determned by the specfc scanner. Range mage regstraton 1 Examples

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z. TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS Muradalyev AZ Azerbajan Scentfc-Research and Desgn-Prospectng Insttute of Energetc AZ1012, Ave HZardab-94 E-mal:aydn_murad@yahoocom Importance of

More information

Active Contours/Snakes

Active Contours/Snakes Actve Contours/Snakes Erkut Erdem Acknowledgement: The sldes are adapted from the sldes prepared by K. Grauman of Unversty of Texas at Austn Fttng: Edges vs. boundares Edges useful sgnal to ndcate occludng

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

TN348: Openlab Module - Colocalization

TN348: Openlab Module - Colocalization TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages

More information

Discriminative Dictionary Learning with Pairwise Constraints

Discriminative Dictionary Learning with Pairwise Constraints Dscrmnatve Dctonary Learnng wth Parwse Constrants Humn Guo Zhuoln Jang LARRY S. DAVIS UNIVERSITY OF MARYLAND Nov. 6 th, Outlne Introducton/motvaton Dctonary Learnng Dscrmnatve Dctonary Learnng wth Parwse

More information

Robust Dictionary Learning with Capped l 1 -Norm

Robust Dictionary Learning with Capped l 1 -Norm Proceedngs of the Twenty-Fourth Internatonal Jont Conference on Artfcal Intellgence (IJCAI 205) Robust Dctonary Learnng wth Capped l -Norm Wenhao Jang, Fepng Ne, Heng Huang Unversty of Texas at Arlngton

More information

Biostatistics 615/815

Biostatistics 615/815 The E-M Algorthm Bostatstcs 615/815 Lecture 17 Last Lecture: The Smplex Method General method for optmzaton Makes few assumptons about functon Crawls towards mnmum Some recommendatons Multple startng ponts

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

Shape Representation Robust to the Sketching Order Using Distance Map and Direction Histogram

Shape Representation Robust to the Sketching Order Using Distance Map and Direction Histogram Shape Representaton Robust to the Sketchng Order Usng Dstance Map and Drecton Hstogram Department of Computer Scence Yonse Unversty Kwon Yun CONTENTS Revew Topc Proposed Method System Overvew Sketch Normalzaton

More information

Parallelization of a Series of Extreme Learning Machine Algorithms Based on Spark

Parallelization of a Series of Extreme Learning Machine Algorithms Based on Spark Parallelzaton of a Seres of Extreme Machne Algorthms Based on Spark Tantan Lu, Zhy Fang, Chen Zhao, Yngmn Zhou College of Computer Scence and Technology Jln Unversty, JLU Changchun, Chna e-mal: lutt1992x@sna.com

More information

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following. Complex Numbers The last topc n ths secton s not really related to most of what we ve done n ths chapter, although t s somewhat related to the radcals secton as we wll see. We also won t need the materal

More information

A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION

A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION 1 THE PUBLISHING HOUSE PROCEEDINGS OF THE ROMANIAN ACADEMY, Seres A, OF THE ROMANIAN ACADEMY Volume 4, Number 2/2003, pp.000-000 A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION Tudor BARBU Insttute

More information

Face Recognition Based on SVM and 2DPCA

Face Recognition Based on SVM and 2DPCA Vol. 4, o. 3, September, 2011 Face Recognton Based on SVM and 2DPCA Tha Hoang Le, Len Bu Faculty of Informaton Technology, HCMC Unversty of Scence Faculty of Informaton Scences and Engneerng, Unversty

More information

An Ensemble Learning algorithm for Blind Signal Separation Problem

An Ensemble Learning algorithm for Blind Signal Separation Problem An Ensemble Learnng algorthm for Blnd Sgnal Separaton Problem Yan L 1 and Peng Wen 1 Department of Mathematcs and Computng, Faculty of Engneerng and Surveyng The Unversty of Southern Queensland, Queensland,

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

Online Detection and Classification of Moving Objects Using Progressively Improving Detectors

Online Detection and Classification of Moving Objects Using Progressively Improving Detectors Onlne Detecton and Classfcaton of Movng Objects Usng Progressvely Improvng Detectors Omar Javed Saad Al Mubarak Shah Computer Vson Lab School of Computer Scence Unversty of Central Florda Orlando, FL 32816

More information

LECTURE : MANIFOLD LEARNING

LECTURE : MANIFOLD LEARNING LECTURE : MANIFOLD LEARNING Rta Osadchy Some sldes are due to L.Saul, V. C. Raykar, N. Verma Topcs PCA MDS IsoMap LLE EgenMaps Done! Dmensonalty Reducton Data representaton Inputs are real-valued vectors

More information

Programming in Fortran 90 : 2017/2018

Programming in Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values

More information

Lecture 4: Principal components

Lecture 4: Principal components /3/6 Lecture 4: Prncpal components 3..6 Multvarate lnear regresson MLR s optmal for the estmaton data...but poor for handlng collnear data Covarance matrx s not nvertble (large condton number) Robustness

More information

Why visualisation? IRDS: Visualization. Univariate data. Visualisations that we won t be interested in. Graphics provide little additional information

Why visualisation? IRDS: Visualization. Univariate data. Visualisations that we won t be interested in. Graphics provide little additional information Why vsualsaton? IRDS: Vsualzaton Charles Sutton Unversty of Ednburgh Goal : Have a data set that I want to understand. Ths s called exploratory data analyss. Today s lecture. Goal II: Want to dsplay data

More information

CS 534: Computer Vision Model Fitting

CS 534: Computer Vision Model Fitting CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust

More information

Face Detection with Deep Learning

Face Detection with Deep Learning Face Detecton wth Deep Learnng Yu Shen Yus122@ucsd.edu A13227146 Kuan-We Chen kuc010@ucsd.edu A99045121 Yzhou Hao y3hao@ucsd.edu A98017773 Mn Hsuan Wu mhwu@ucsd.edu A92424998 Abstract The project here

More information

Learning a Class-Specific Dictionary for Facial Expression Recognition

Learning a Class-Specific Dictionary for Facial Expression Recognition BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 16, No 4 Sofa 016 Prnt ISSN: 1311-970; Onlne ISSN: 1314-4081 DOI: 10.1515/cat-016-0067 Learnng a Class-Specfc Dctonary for

More information

Hermite Splines in Lie Groups as Products of Geodesics

Hermite Splines in Lie Groups as Products of Geodesics Hermte Splnes n Le Groups as Products of Geodescs Ethan Eade Updated May 28, 2017 1 Introducton 1.1 Goal Ths document defnes a curve n the Le group G parametrzed by tme and by structural parameters n the

More information

Research of Image Recognition Algorithm Based on Depth Learning

Research of Image Recognition Algorithm Based on Depth Learning 208 4th World Conference on Control, Electroncs and Computer Engneerng (WCCECE 208) Research of Image Recognton Algorthm Based on Depth Learnng Zhang Jan, J Xnhao Zhejang Busness College, Hangzhou, Chna,

More information

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law)

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law) Machne Learnng Support Vector Machnes (contans materal adapted from talks by Constantn F. Alfers & Ioanns Tsamardnos, and Martn Law) Bryan Pardo, Machne Learnng: EECS 349 Fall 2014 Support Vector Machnes

More information

Writer Identification using a Deep Neural Network

Writer Identification using a Deep Neural Network Wrter Identfcaton usng a Deep Neural Network Jun Chu and Sargur Srhar Department of Computer Scence and Engneerng Unversty at Buffalo, The State Unversty of New York Buffalo, NY 1469, USA {jchu6, srhar}@buffalo.edu

More information

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION Paulo Quntlano 1 & Antono Santa-Rosa 1 Federal Polce Department, Brasla, Brazl. E-mals: quntlano.pqs@dpf.gov.br and

More information

Comparison Study of Textural Descriptors for Training Neural Network Classifiers

Comparison Study of Textural Descriptors for Training Neural Network Classifiers Comparson Study of Textural Descrptors for Tranng Neural Network Classfers G.D. MAGOULAS (1) S.A. KARKANIS (1) D.A. KARRAS () and M.N. VRAHATIS (3) (1) Department of Informatcs Unversty of Athens GR-157.84

More information

TPL-Aware Displacement-driven Detailed Placement Refinement with Coloring Constraints

TPL-Aware Displacement-driven Detailed Placement Refinement with Coloring Constraints TPL-ware Dsplacement-drven Detaled Placement Refnement wth Colorng Constrants Tao Ln Iowa State Unversty tln@astate.edu Chrs Chu Iowa State Unversty cnchu@astate.edu BSTRCT To mnmze the effect of process

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Histogram of Template for Pedestrian Detection

Histogram of Template for Pedestrian Detection PAPER IEICE TRANS. FUNDAMENTALS/COMMUN./ELECTRON./INF. & SYST., VOL. E85-A/B/C/D, No. xx JANUARY 20xx Hstogram of Template for Pedestran Detecton Shaopeng Tang, Non Member, Satosh Goto Fellow Summary In

More information

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Proceedngs of the Wnter Smulaton Conference M E Kuhl, N M Steger, F B Armstrong, and J A Jones, eds A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Mark W Brantley Chun-Hung

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

SPARSE-CODED NET MODEL AND APPLICATIONS

SPARSE-CODED NET MODEL AND APPLICATIONS TO APPEAR IN 2016 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING SPARSE-CODED NET MODEL AND APPLICATIONS Youngjune Gwon 1, Mram Cha 2, Wllam Campbell 1, H. T. Kung 2, Cagr Dagl 1

More information

Neural Network Control for TCP Network Congestion

Neural Network Control for TCP Network Congestion 5 Amercan Control Conference June 8-, 5. Portland, OR, USA FrA3. Neural Network Control for TCP Network Congeston Hyun C. Cho, M. Sam Fadal, Hyunjeong Lee Electrcal Engneerng/6, Unversty of Nevada, Reno,

More information

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION SHI-LIANG SUN, HONG-LEI SHI Department of Computer Scence and Technology, East Chna Normal Unversty 500 Dongchuan Road, Shangha 200241, P. R. Chna E-MAIL: slsun@cs.ecnu.edu.cn,

More information

General Vector Machine. Hong Zhao Department of Physics, Xiamen University

General Vector Machine. Hong Zhao Department of Physics, Xiamen University General Vector Machne Hong Zhao (zhaoh@xmu.edu.cn) Department of Physcs, Xamen Unversty The support vector machne (SVM) s an mportant class of learnng machnes for functon approach, pattern recognton, and

More information

Announcements. Supervised Learning

Announcements. Supervised Learning Announcements See Chapter 5 of Duda, Hart, and Stork. Tutoral by Burge lnked to on web page. Supervsed Learnng Classfcaton wth labeled eamples. Images vectors n hgh-d space. Supervsed Learnng Labeled eamples

More information

A New Feature of Uniformity of Image Texture Directions Coinciding with the Human Eyes Perception 1

A New Feature of Uniformity of Image Texture Directions Coinciding with the Human Eyes Perception 1 A New Feature of Unformty of Image Texture Drectons Concdng wth the Human Eyes Percepton Xng-Jan He, De-Shuang Huang, Yue Zhang, Tat-Mng Lo 2, and Mchael R. Lyu 3 Intellgent Computng Lab, Insttute of Intellgent

More information

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 1. SSDH: Semi-supervised Deep Hashing for Large Scale Image Retrieval

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 1. SSDH: Semi-supervised Deep Hashing for Large Scale Image Retrieval IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY SSDH: Sem-supervsed Deep Hashng for Large Scale Image Retreval Jan Zhang, and Yuxn Peng arxv:607.08477v2 [cs.cv] 8 Jun 207 Abstract Hashng

More information

Single Sample Face Recognition via Learning Deep Supervised Auto-Encoders Shenghua Gao, Yuting Zhang, Kui Jia, Jiwen Lu, Yingying Zhang

Single Sample Face Recognition via Learning Deep Supervised Auto-Encoders Shenghua Gao, Yuting Zhang, Kui Jia, Jiwen Lu, Yingying Zhang 1 Sngle Sample Face Recognton va Learnng Deep Supervsed Auto-Encoders Shenghua Gao, Yutng Zhang, Ku Ja, Jwen Lu, Yngyng Zhang Abstract Ths paper targets learnng robust mage representaton for sngle tranng

More information

Dropout: A Simple Way to Prevent Neural Networks from Overfitting

Dropout: A Simple Way to Prevent Neural Networks from Overfitting Journal of Machne Learnng Research 15 (2014) 1929-1958 Submtted 11/13; Publshed 6/14 Dropout: A Smple Way to Prevent Neural Networks from Overfttng Ntsh Srvastava Geoffrey Hnton Alex Krzhevsky Ilya Sutskever

More information

A Bilinear Model for Sparse Coding

A Bilinear Model for Sparse Coding A Blnear Model for Sparse Codng Davd B. Grmes and Rajesh P. N. Rao Department of Computer Scence and Engneerng Unversty of Washngton Seattle, WA 98195-2350, U.S.A. grmes,rao @cs.washngton.edu Abstract

More information

Empirical Distributions of Parameter Estimates. in Binary Logistic Regression Using Bootstrap

Empirical Distributions of Parameter Estimates. in Binary Logistic Regression Using Bootstrap Int. Journal of Math. Analyss, Vol. 8, 4, no. 5, 7-7 HIKARI Ltd, www.m-hkar.com http://dx.do.org/.988/jma.4.494 Emprcal Dstrbutons of Parameter Estmates n Bnary Logstc Regresson Usng Bootstrap Anwar Ftranto*

More information

Adaptive Transfer Learning

Adaptive Transfer Learning Adaptve Transfer Learnng Bn Cao, Snno Jaln Pan, Yu Zhang, Dt-Yan Yeung, Qang Yang Hong Kong Unversty of Scence and Technology Clear Water Bay, Kowloon, Hong Kong {caobn,snnopan,zhangyu,dyyeung,qyang}@cse.ust.hk

More information

The Study of Remote Sensing Image Classification Based on Support Vector Machine

The Study of Remote Sensing Image Classification Based on Support Vector Machine Sensors & Transducers 03 by IFSA http://www.sensorsportal.com The Study of Remote Sensng Image Classfcaton Based on Support Vector Machne, ZHANG Jan-Hua Key Research Insttute of Yellow Rver Cvlzaton and

More information

Face Recognition Method Based on Within-class Clustering SVM

Face Recognition Method Based on Within-class Clustering SVM Face Recognton Method Based on Wthn-class Clusterng SVM Yan Wu, Xao Yao and Yng Xa Department of Computer Scence and Engneerng Tong Unversty Shangha, Chna Abstract - A face recognton method based on Wthn-class

More information

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS ARPN Journal of Engneerng and Appled Scences 006-017 Asan Research Publshng Network (ARPN). All rghts reserved. NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS Igor Grgoryev, Svetlana

More information

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search Sequental search Buldng Java Programs Chapter 13 Searchng and Sortng sequental search: Locates a target value n an array/lst by examnng each element from start to fnsh. How many elements wll t need to

More information