Dropout: A Simple Way to Prevent Neural Networks from Overfitting

Size: px
Start display at page:

Download "Dropout: A Simple Way to Prevent Neural Networks from Overfitting"

Transcription

1 Journal of Machne Learnng Research 15 (2014) Submtted 11/13; Publshed 6/14 Dropout: A Smple Way to Prevent Neural Networks from Overfttng Ntsh Srvastava Geoffrey Hnton Alex Krzhevsky Ilya Sutskever Ruslan Salakhutdnov Department of Computer Scence Unversty of Toronto 10 Kngs College Road, Rm 3302 Toronto, Ontaro, M5S 3G4, Canada. ntsh@cs.toronto.edu hnton@cs.toronto.edu krz@cs.toronto.edu lya@cs.toronto.edu rsalakhu@cs.toronto.edu Edtor: Yoshua Bengo Abstract Deep neural nets wth a large number of parameters are very powerful machne learnng systems. However, overfttng s a serous problem n such networks. Large networks are also slow to use, makng t dffcult to deal wth overfttng by combnng the predctons of many dfferent large neural nets at test tme. Dropout s a technque for addressng ths problem. The key dea s to randomly drop unts (along wth ther connectons) from the neural network durng tranng. Ths prevents unts from co-adaptng too much. Durng tranng, dropout samples from an exponental number of dfferent thnned networks. At test tme, t s easy to approxmate the effect of averagng the predctons of all these thnned networks by smply usng a sngle unthnned network that has smaller weghts. Ths sgnfcantly reduces overfttng and gves major mprovements over other regularzaton methods. We show that dropout mproves the performance of neural networks on supervsed learnng tasks n vson, speech recognton, document classfcaton and computatonal bology, obtanng state-of-the-art results on many benchmark data sets. Keywords: neural networks, regularzaton, model combnaton, deep learnng 1. Introducton Deep neural networks contan multple non-lnear hdden layers and ths makes them very expressve models that can learn very complcated relatonshps between ther nputs and outputs. Wth lmted tranng data, however, many of these complcated relatonshps wll be the result of samplng nose, so they wll exst n the tranng set but not n real test data even f t s drawn from the same dstrbuton. Ths leads to overfttng and many methods have been developed for reducng t. These nclude stoppng the tranng as soon as performance on a valdaton set starts to get worse, ntroducng weght penaltes of varous knds such as L1 and L2 regularzaton and soft weght sharng (Nowlan and Hnton, 1992). Wth unlmted computaton, the best way to regularze a fxed-szed model s to average the predctons of all possble settngs of the parameters, weghtng each settng by c 2014 Ntsh Srvastava, Geoffrey Hnton, Alex Krzhevsky, Ilya Sutskever and Ruslan Salakhutdnov.

2 Srvastava, Hnton, Krzhevsky, Sutskever and Salakhutdnov (a) Standard Neural Net (b) After applyng dropout. Fgure 1: Dropout Neural Net Model. Left: A standard neural net wth 2 hdden layers. Rght: An example of a thnned net produced by applyng dropout to the network on the left. Crossed unts have been dropped. ts posteror probablty gven the tranng data. Ths can sometmes be approxmated qute well for smple or small models (Xong et al., 2011; Salakhutdnov and Mnh, 2008), but we would lke to approach the performance of the Bayesan gold standard usng consderably less computaton. We propose to do ths by approxmatng an equally weghted geometrc mean of the predctons of an exponental number of learned models that share parameters. Model combnaton nearly always mproves the performance of machne learnng methods. Wth large neural networks, however, the obvous dea of averagng the outputs of many separately traned nets s prohbtvely expensve. Combnng several models s most helpful when the ndvdual models are dfferent from each other and n order to make neural net models dfferent, they should ether have dfferent archtectures or be traned on dfferent data. Tranng many dfferent archtectures s hard because fndng optmal hyperparameters for each archtecture s a dauntng task and tranng each large network requres a lot of computaton. Moreover, large networks normally requre large amounts of tranng data and there may not be enough data avalable to tran dfferent networks on dfferent subsets of the data. Even f one was able to tran many dfferent large networks, usng them all at test tme s nfeasble n applcatons where t s mportant to respond quckly. Dropout s a technque that addresses both these ssues. It prevents overfttng and provdes a way of approxmately combnng exponentally many dfferent neural network archtectures effcently. The term dropout refers to droppng out unts (hdden and vsble) n a neural network. By droppng a unt out, we mean temporarly removng t from the network, along wth all ts ncomng and outgong connectons, as shown n Fgure 1. The choce of whch unts to drop s random. In the smplest case, each unt s retaned wth a fxed probablty p ndependent of other unts, where p can be chosen usng a valdaton set or can smply be set at 0.5, whch seems to be close to optmal for a wde range of networks and tasks. For the nput unts, however, the optmal probablty of retenton s usually closer to 1 than to

3 Dropout w Present wth probablty p (a) At tranng tme Always present (b) At test tme pw Fgure 2: Left: A unt at tranng tme that s present wth probablty p and s connected to unts n the next layer wth weghts w. Rght: At test tme, the unt s always present and the weghts are multpled by p. The output at test tme s same as the expected output at tranng tme. Applyng dropout to a neural network amounts to samplng a thnned network from t. The thnned network conssts of all the unts that survved dropout (Fgure 1b). A neural net wth n unts, can be seen as a collecton of 2 n possble thnned neural networks. These networks all share weghts so that the total number of parameters s stll O(n 2 ), or less. For each presentaton of each tranng case, a new thnned network s sampled and traned. So tranng a neural network wth dropout can be seen as tranng a collecton of 2 n thnned networks wth extensve weght sharng, where each thnned network gets traned very rarely, f at all. At test tme, t s not feasble to explctly average the predctons from exponentally many thnned models. However, a very smple approxmate averagng method works well n practce. The dea s to use a sngle neural net at test tme wthout dropout. The weghts of ths network are scaled-down versons of the traned weghts. If a unt s retaned wth probablty p durng tranng, the outgong weghts of that unt are multpled by p at test tme as shown n Fgure 2. Ths ensures that for any hdden unt the expected output (under the dstrbuton used to drop unts at tranng tme) s the same as the actual output at test tme. By dong ths scalng, 2 n networks wth shared weghts can be combned nto a sngle neural network to be used at test tme. We found that tranng a network wth dropout and usng ths approxmate averagng method at test tme leads to sgnfcantly lower generalzaton error on a wde varety of classfcaton problems compared to tranng wth other regularzaton methods. The dea of dropout s not lmted to feed-forward neural nets. It can be more generally appled to graphcal models such as Boltzmann Machnes. In ths paper, we ntroduce the dropout Restrcted Boltzmann Machne model and compare t to standard Restrcted Boltzmann Machnes (RBM). Our experments show that dropout RBMs are better than standard RBMs n certan respects. Ths paper s structured as follows. Secton 2 descrbes the motvaton for ths dea. Secton 3 descrbes relevant prevous work. Secton 4 formally descrbes the dropout model. Secton 5 gves an algorthm for tranng dropout networks. In Secton 6, we present our expermental results where we apply dropout to problems n dfferent domans and compare t wth other forms of regularzaton and model combnaton. Secton 7 analyzes the effect of dropout on dfferent propertes of a neural network and descrbes how dropout nteracts wth the network s hyperparameters. Secton 8 descrbes the Dropout RBM model. In Secton 9 we explore the dea of margnalzng dropout. In Appendx A we present a practcal gude 1931

4 Srvastava, Hnton, Krzhevsky, Sutskever and Salakhutdnov for tranng dropout nets. Ths ncludes a detaled analyss of the practcal consderatons nvolved n choosng hyperparameters when tranng dropout networks. 2. Motvaton A motvaton for dropout comes from a theory of the role of sex n evoluton (Lvnat et al., 2010). Sexual reproducton nvolves takng half the genes of one parent and half of the other, addng a very small amount of random mutaton, and combnng them to produce an offsprng. The asexual alternatve s to create an offsprng wth a slghtly mutated copy of the parent s genes. It seems plausble that asexual reproducton should be a better way to optmze ndvdual ftness because a good set of genes that have come to work well together can be passed on drectly to the offsprng. On the other hand, sexual reproducton s lkely to break up these co-adapted sets of genes, especally f these sets are large and, ntutvely, ths should decrease the ftness of organsms that have already evolved complcated coadaptatons. However, sexual reproducton s the way most advanced organsms evolved. One possble explanaton for the superorty of sexual reproducton s that, over the long term, the crteron for natural selecton may not be ndvdual ftness but rather mx-ablty of genes. The ablty of a set of genes to be able to work well wth another random set of genes makes them more robust. Snce a gene cannot rely on a large set of partners to be present at all tmes, t must learn to do somethng useful on ts own or n collaboraton wth a small number of other genes. Accordng to ths theory, the role of sexual reproducton s not just to allow useful new genes to spread throughout the populaton, but also to facltate ths process by reducng complex co-adaptatons that would reduce the chance of a new gene mprovng the ftness of an ndvdual. Smlarly, each hdden unt n a neural network traned wth dropout must learn to work wth a randomly chosen sample of other unts. Ths should make each hdden unt more robust and drve t towards creatng useful features on ts own wthout relyng on other hdden unts to correct ts mstakes. However, the hdden unts wthn a layer wll stll learn to do dfferent thngs from each other. One mght magne that the net would become robust aganst dropout by makng many copes of each hdden unt, but ths s a poor soluton for exactly the same reason as replca codes are a poor way to deal wth a nosy channel. A closely related, but slghtly dfferent motvaton for dropout comes from thnkng about successful conspraces. Ten conspraces each nvolvng fve people s probably a better way to create havoc than one bg conspracy that requres ffty people to all play ther parts correctly. If condtons do not change and there s plenty of tme for rehearsal, a bg conspracy can work well, but wth non-statonary condtons, the smaller the conspracy the greater ts chance of stll workng. Complex co-adaptatons can be traned to work well on a tranng set, but on novel test data they are far more lkely to fal than multple smpler co-adaptatons that acheve the same thng. 3. Related Work Dropout can be nterpreted as a way of regularzng a neural network by addng nose to ts hdden unts. The dea of addng nose to the states of unts has prevously been used n the context of Denosng Autoencoders (DAEs) by Vncent et al. (2008, 2010) where nose 1932

5 Dropout s added to the nput unts of an autoencoder and the network s traned to reconstruct the nose-free nput. Our work extends ths dea by showng that dropout can be effectvely appled n the hdden layers as well and that t can be nterpreted as a form of model averagng. We also show that addng nose s not only useful for unsupervsed feature learnng but can also be extended to supervsed learnng problems. In fact, our method can be appled to other neuron-based archtectures, for example, Boltzmann Machnes. Whle 5% nose typcally works best for DAEs, we found that our weght scalng procedure appled at test tme enables us to use much hgher nose levels. Droppng out 20% of the nput unts and 50% of the hdden unts was often found to be optmal. Snce dropout can be seen as a stochastc regularzaton technque, t s natural to consder ts determnstc counterpart whch s obtaned by margnalzng out the nose. In ths paper, we show that, n smple cases, dropout can be analytcally margnalzed out to obtan determnstc regularzaton methods. Recently, van der Maaten et al. (2013) also explored determnstc regularzers correspondng to dfferent exponental-famly nose dstrbutons, ncludng dropout (whch they refer to as blankout nose ). However, they apply nose to the nputs and only explore models wth no hdden layers. Wang and Mannng (2013) proposed a method for speedng up dropout by margnalzng dropout nose. Chen et al. (2012) explored margnalzaton n the context of denosng autoencoders. In dropout, we mnmze the loss functon stochastcally under a nose dstrbuton. Ths can be seen as mnmzng an expected loss functon. Prevous work of Globerson and Rowes (2006); Dekel et al. (2010) explored an alternate settng where the loss s mnmzed when an adversary gets to pck whch unts to drop. Here, nstead of a nose dstrbuton, the maxmum number of unts that can be dropped s fxed. However, ths work also does not explore models wth hdden unts. 4. Model Descrpton Ths secton descrbes the dropout neural network model. Consder a neural network wth L hdden layers. Let l {1,..., L} ndex the hdden layers of the network. Let z (l) denote the vector of nputs nto layer l, y (l) denote the vector of outputs from layer l (y (0) = x s the nput). W (l) and b (l) are the weghts and bases at layer l. The feed-forward operaton of a standard neural network (Fgure 3a) can be descrbed as (for l {0,..., L 1} and any hdden unt ) z (l+1) = w (l+1) y l + b (l+1), y (l+1) = f(z (l+1) ), where f s any actvaton functon, for example, f(x) = 1/ (1 + exp( x)). Wth dropout, the feed-forward operaton becomes (Fgure 3b) r (l) j Bernoull(p), ỹ (l) = r (l) y (l), z (l+1) = w (l+1) ỹ l + b (l+1), y (l+1) = f(z (l+1) ). 1933

6 Srvastava, Hnton, Krzhevsky, Sutskever and Salakhutdnov b (l+1) r (l) 3 b (l+1) y (l) 3 y (l) 3 ỹ (l) 3 w (l+1) z (l+1) f y (l+1) r (l) 2 w (l+1) z (l+1) f y (l+1) y (l) 2 y (l) 2 ỹ (l) 2 r (l) 1 y (l) 1 y (l) 1 (a) Standard network (b) Dropout network Fgure 3: Comparson of the basc operatons of a standard and dropout network. Here denotes an element-wse product. For any layer l, r (l) s a vector of ndependent Bernoull random varables each of whch has probablty p of beng 1. Ths vector s sampled and multpled element-wse wth the outputs of that layer, y (l), to create the thnned outputs ỹ (l). The thnned outputs are then used as nput to the next layer. Ths process s appled at each layer. Ths amounts to samplng a sub-network from a larger network. For learnng, the dervatves of the loss functon are backpropagated through the sub-network. At test tme, the weghts are scaled as W (l) test = pw (l) as shown n Fgure 2. The resultng neural network s used wthout dropout. ỹ (l) 1 5. Learnng Dropout Nets Ths secton descrbes a procedure for tranng dropout neural nets. 5.1 Backpropagaton Dropout neural networks can be traned usng stochastc gradent descent n a manner smlar to standard neural nets. The only dfference s that for each tranng case n a mn-batch, we sample a thnned network by droppng out unts. Forward and backpropagaton for that tranng case are done only on ths thnned network. The gradents for each parameter are averaged over the tranng cases n each mn-batch. Any tranng case whch does not use a parameter contrbutes a gradent of zero for that parameter. Many methods have been used to mprove stochastc gradent descent such as momentum, annealed learnng rates and L2 weght decay. Those were found to be useful for dropout neural networks as well. One partcular form of regularzaton was found to be especally useful for dropout constranng the norm of the ncomng weght vector at each hdden unt to be upper bounded by a fxed constant c. In other words, f w represents the vector of weghts ncdent on any hdden unt, the neural network was optmzed under the constrant w 2 c. Ths constrant was mposed durng optmzaton by projectng w onto the surface of a ball of radus c, whenever w went out of t. Ths s also called max-norm regularzaton snce t mples that the maxmum value that the norm of any weght can take s c. The constant 1934

7 Dropout c s a tunable hyperparameter, whch s determned usng a valdaton set. Max-norm regularzaton has been prevously used n the context of collaboratve flterng (Srebro and Shrabman, 2005). It typcally mproves the performance of stochastc gradent descent tranng of deep neural nets, even when no dropout s used. Although dropout alone gves sgnfcant mprovements, usng dropout along wth maxnorm regularzaton, large decayng learnng rates and hgh momentum provdes a sgnfcant boost over just usng dropout. A possble justfcaton s that constranng weght vectors to le nsde a ball of fxed radus makes t possble to use a huge learnng rate wthout the possblty of weghts blowng up. The nose provded by dropout then allows the optmzaton process to explore dfferent regons of the weght space that would have otherwse been dffcult to reach. As the learnng rate decays, the optmzaton takes shorter steps, thereby dong less exploraton and eventually settles nto a mnmum. 5.2 Unsupervsed Pretranng Neural networks can be pretraned usng stacks of RBMs (Hnton and Salakhutdnov, 2006), autoencoders (Vncent et al., 2010) or Deep Boltzmann Machnes (Salakhutdnov and Hnton, 2009). Pretranng s an effectve way of makng use of unlabeled data. Pretranng followed by fnetunng wth backpropagaton has been shown to gve sgnfcant performance boosts over fnetunng from random ntalzatons n certan cases. Dropout can be appled to fnetune nets that have been pretraned usng these technques. The pretranng procedure stays the same. The weghts obtaned from pretranng should be scaled up by a factor of 1/p. Ths makes sure that for each unt, the expected output from t under random dropout wll be the same as the output durng pretranng. We were ntally concerned that the stochastc nature of dropout mght wpe out the nformaton n the pretraned weghts. Ths dd happen when the learnng rates used durng fnetunng were comparable to the best learnng rates for randomly ntalzed nets. However, when the learnng rates were chosen to be smaller, the nformaton n the pretraned weghts seemed to be retaned and we were able to get mprovements n terms of the fnal generalzaton error compared to not usng dropout when fnetunng. 6. Expermental Results We traned dropout neural networks for classfcaton problems on data sets n dfferent domans. We found that dropout mproved generalzaton performance on all data sets compared to neural networks that dd not use dropout. Table 1 gves a bref descrpton of the data sets. The data sets are MNIST : A standard toy data set of handwrtten dgts. TIMIT : A standard speech benchmark for clean speech recognton. CIFAR-10 and CIFAR-100 : Tny natural mages (Krzhevsky, 2009). Street Vew House Numbers data set (SVHN) : Images of house numbers collected by Google Street Vew (Netzer et al., 2011). ImageNet : A large collecton of natural mages. Reuters-RCV1 : A collecton of Reuters newswre artcles. 1935

8 Srvastava, Hnton, Krzhevsky, Sutskever and Salakhutdnov Alternatve Splcng data set: RNA features for predctng alternatve gene splcng (Xong et al., 2011). We chose a dverse set of data sets to demonstrate that dropout s a general technque for mprovng neural nets and s not specfc to any partcular applcaton doman. In ths secton, we present some key results that show the effectveness of dropout. A more detaled descrpton of all the experments and data sets s provded n Appendx B. Data Set Doman Dmensonalty Tranng Set Test Set MNIST Vson 784 (28 28 grayscale) 60K 10K SVHN Vson 3072 (32 32 color) 600K 26K CIFAR-10/100 Vson 3072 (32 32 color) 60K 10K ImageNet (ILSVRC-2012) Vson ( color) 1.2M 150K TIMIT Speech 2520 (120-dm, 21 frames) 1.1M frames 58K frames Reuters-RCV1 Text K 200K Alternatve Splcng Genetcs Results on Image Data Sets Table 1: Overvew of the data sets used n ths paper. We used fve mage data sets to evaluate dropout MNIST, SVHN, CIFAR-10, CIFAR-100 and ImageNet. These data sets nclude dfferent mage types and tranng set szes. Models whch acheve state-of-the-art results on all of these data sets use dropout MNIST Method Unt Type Archtecture Error % Standard Neural Net (Smard et al., 2003) Logstc 2 layers, 800 unts 1.60 SVM Gaussan kernel NA NA 1.40 Dropout NN Logstc 3 layers, 1024 unts 1.35 Dropout NN ReLU 3 layers, 1024 unts 1.25 Dropout NN + max-norm constrant ReLU 3 layers, 1024 unts 1.06 Dropout NN + max-norm constrant ReLU 3 layers, 2048 unts 1.04 Dropout NN + max-norm constrant ReLU 2 layers, 4096 unts 1.01 Dropout NN + max-norm constrant ReLU 2 layers, 8192 unts 0.95 Dropout NN + max-norm constrant (Goodfellow et al., 2013) Maxout 2 layers, (5 240) unts DBN + fnetunng (Hnton and Salakhutdnov, 2006) Logstc DBM + fnetunng (Salakhutdnov and Hnton, 2009) Logstc DBN + dropout fnetunng Logstc DBM + dropout fnetunng Logstc Table 2: Comparson of dfferent models on MNIST. The MNIST data set conssts of pxel handwrtten dgt mages. The task s to classfy the mages nto 10 dgt classes. Table 2 compares the performance of dropout wth other technques. The best performng neural networks for the permutaton nvarant

9 Dropout settng that do not use dropout or unsupervsed pretranng acheve an error of about 1.60% (Smard et al., 2003). Wth dropout the error reduces to 1.35%. Replacng logstc unts wth rectfed lnear unts (ReLUs) (Jarrett et al., 2009) further reduces the error to 1.25%. Addng max-norm regularzaton agan reduces t to 1.06%. Increasng the sze of the network leads to better results. A neural net wth 2 layers and 8192 unts per layer gets down to 0.95% error. Note that ths network has more than 65 mllon parameters and s beng traned on a data set of sze 60,000. Tranng a network of ths sze to gve good generalzaton error s very hard wth standard regularzaton methods and early stoppng. Dropout, on the other hand, prevents overfttng, even n ths case. It does not even need early stoppng. Goodfellow et al. (2013) showed that results can be further mproved to 0.94% by replacng ReLU unts wth maxout unts. All dropout nets use p = 0.5 for hdden unts and p = 0.8 for nput unts. More expermental detals can be found n Appendx B.1. Dropout nets pretraned wth stacks of RBMs and Deep Boltzmann Machnes also gve mprovements as shown n Table 2. DBM pretraned dropout nets acheve a test error of 0.79% whch s the best performance ever reported for the permutaton nvarant settng. We note that t possble to obtan better results by usng 2-D spatal nformaton and augmentng the tranng set wth dstorted versons of mages from the standard tranng set. We demonstrate the effectveness of dropout n that settng on more nterestng data sets. In order to test the robustness of dropout, classfcaton experments were done wth networks of many dfferent archtectures keepng all hyperparameters, ncludng p, fxed. Fgure 4 shows the test error rates obtaned for these dfferent archtectures as tranng progresses. The same archtectures traned wth and wthout dropout have drastcally dfferent test errors as seen as by the two separate clusters of trajectores. Dropout gves a huge mprovement across all archtectures, wthout usng hyperparameters that were tuned specfcally for each archtecture Street Vew House Numbers The Street Vew House Numbers (SVHN) Data Set (Netzer et al., 2011) conssts of color mages of house numbers collected by Classfcaton Error % Wthout dropout Wth dropout Number of weght updates Fgure 4: Test error for dfferent archtectures wth and wthout dropout. The networks have 2 to 4 hdden layers each wth 1024 to 2048 unts. Google Street Vew. Fgure 5a shows some examples of mages from ths data set. The part of the data set that we use n our experments conssts of color mages roughly centered on a dgt n a house number. The task s to dentfy that dgt. For ths data set, we appled dropout to convolutonal neural networks (LeCun et al., 1989). The best archtecture that we found has three convolutonal layers followed by 2 fully connected hdden layers. All hdden unts were ReLUs. Each convolutonal layer was 1937

10 Srvastava, Hnton, Krzhevsky, Sutskever and Salakhutdnov Method Error % Bnary Features (WDCH) (Netzer et al., 2011) 36.7 HOG (Netzer et al., 2011) 15.0 Stacked Sparse Autoencoders (Netzer et al., 2011) 10.3 KMeans (Netzer et al., 2011) 9.4 Mult-stage Conv Net wth average poolng (Sermanet et al., 2012) 9.06 Mult-stage Conv Net + L2 poolng (Sermanet et al., 2012) 5.36 Mult-stage Conv Net + L4 poolng + paddng (Sermanet et al., 2012) 4.90 Conv Net + max-poolng 3.95 Conv Net + max poolng + dropout n fully connected layers 3.02 Conv Net + stochastc poolng (Zeler and Fergus, 2013) 2.80 Conv Net + max poolng + dropout n all layers 2.55 Conv Net + maxout (Goodfellow et al., 2013) 2.47 Human Performance 2.0 Table 3: Results on the Street Vew House Numbers data set. followed by a max-poolng layer. Appendx B.2 descrbes the archtecture n more detal. Dropout was appled to all the layers of the network wth the probablty of retanng a hdden unt beng p = (0.9, 0.75, 0.75, 0.5, 0.5, 0.5) for the dfferent layers of the network (gong from nput to convolutonal layers to fully connected layers). Max-norm regularzaton was used for weghts n both convolutonal and fully connected layers. Table 3 compares the results obtaned by dfferent methods. We fnd that convolutonal nets outperform other methods. The best performng convolutonal nets that do not use dropout acheve an error rate of 3.95%. Addng dropout only to the fully connected layers reduces the error to 3.02%. Addng dropout to the convolutonal layers as well further reduces the error to 2.55%. Even more gans can be obtaned by usng maxout unts. The addtonal gan n performance obtaned by addng dropout n the convolutonal layers (3.02% to 2.55%) s worth notng. One may have presumed that snce the convolutonal layers don t have a lot of parameters, overfttng s not a problem and therefore dropout would not have much effect. However, dropout n the lower layers stll helps because t provdes nosy nputs for the hgher fully connected layers whch prevents them from overfttng CIFAR-10 and CIFAR-100 The CIFAR-10 and CIFAR-100 data sets consst of color mages drawn from 10 and 100 categores respectvely. Fgure 5b shows some examples of mages from ths data set. A detaled descrpton of the data sets, nput preprocessng, network archtectures and other expermental detals s gven n Appendx B.3. Table 4 shows the error rate obtaned by dfferent methods on these data sets. Wthout any data augmentaton, Snoek et al. (2012) used Bayesan hyperparameter optmzaton to obtaned an error rate of 14.98% on CIFAR-10. Usng dropout n the fully connected layers reduces that to 14.32% and addng dropout n every layer further reduces the error to 12.61%. Goodfellow et al. (2013) showed that the error s further reduced to 11.68% by replacng ReLU unts wth maxout unts. On CIFAR-100, dropout reduces the error from 43.48% to 37.20% whch s a huge mprovement. No data augmentaton was used for ether data set (apart from the nput dropout). 1938

11 Dropout (a) Street Vew House Numbers (SVHN) (b) CIFAR-10 Fgure 5: Samples from mage data sets. Each row corresponds to a dfferent category. Method Conv Conv Conv Conv Conv Conv Net Net Net Net Net Net max poolng (hand tuned) stochastc poolng (Zeler and Fergus, 2013) max poolng (Snoek et al., 2012) max poolng + dropout fully connected layers max poolng + dropout n all layers maxout (Goodfellow et al., 2013) CIFAR-10 CIFAR Table 4: Error rates on CIFAR-10 and CIFAR ImageNet ImageNet s a data set of over 15 mllon labeled hgh-resoluton mages belongng to roughly 22,000 categores. Startng n 2010, as part of the Pascal Vsual Object Challenge, an annual competton called the ImageNet Large-Scale Vsual Recognton Challenge (ILSVRC) has been held. A subset of ImageNet wth roughly 1000 mages n each of 1000 categores s used n ths challenge. Snce the number of categores s rather large, t s conventonal to report two error rates: top-1 and top-5, where the top-5 error rate s the fracton of test mages for whch the correct label s not among the fve labels consdered most probable by the model. Fgure 6 shows some predctons made by our model on a few test mages. ILSVRC-2010 s the only verson of ILSVRC for whch the test set labels are avalable, so most of our experments were performed on ths data set. Table 5 compares the performance of dfferent methods. Convolutonal nets wth dropout outperform other methods by a large margn. The archtecture and mplementaton detals are descrbed n detal n Krzhevsky et al. (2012). 1939

12 Srvastava, Hnton, Krzhevsky, Sutskever and Salakhutdnov Fgure 6: Some ImageNet test cases wth the 4 most probable labels as predcted by our model. The length of the horzontal bars s proportonal to the probablty assgned to the labels by the model. Pnk ndcates ground truth. Model Model Top-1 Top-5 Sparse Codng (Ln et al., 2010) SIFT + Fsher Vectors (Sanchez and Perronnn, 2011) Conv Net + dropout (Krzhevsky et al., 2012) Table 5: Results on the ILSVRC-2010 test set. Top-1 (val) Top-5 (val) Top-5 (test) SVM on Fsher Vectors of Dense SIFT and Color Statstcs Avg of classfers over FVs of SIFT, LBP, GIST and CSIFT Conv Net + dropout (Krzhevsky et al., 2012) Avg of 5 Conv Nets + dropout (Krzhevsky et al., 2012) Table 6: Results on the ILSVRC-2012 valdaton/test set. Our model based on convolutonal nets and dropout won the ILSVRC-2012 competton. Snce the labels for the test set are not avalable, we report our results on the test set for the fnal submsson and nclude the valdaton set results for dfferent varatons of our model. Table 6 shows the results from the competton. Whle the best methods based on standard vson features acheve a top-5 error rate of about 26%, convolutonal nets wth dropout acheve a test error of about 16% whch s a staggerng dfference. Fgure 6 shows some examples of predctons made by our model. We can see that the model makes very reasonable predctons, even when ts best guess s not correct. 6.2 Results on TIMIT Next, we appled dropout to a speech recognton task. We use the TIMIT data set whch conssts of recordngs from 680 speakers coverng 8 major dalects of Amercan Englsh readng ten phonetcally-rch sentences n a controlled nose-free envronment. Dropout neural networks were traned on wndows of 21 log-flter bank frames to predct the label of the central frame. No speaker dependent operatons were performed. Appendx B.4 descrbes the data preprocessng and tranng detals. Table 7 compares dropout neural 1940

13 Dropout nets wth other models. A 6-layer net gves a phone error rate of 23.4%. Dropout further mproves t to 21.8%. We also traned dropout nets startng from pretraned weghts. A 4-layer net pretraned wth a stack of RBMs get a phone error rate of 22.7%. Wth dropout, ths reduces to 19.7%. Smlarly, for an 8-layer net the error reduces from 20.5% to 19.7%. Method Phone Error Rate% NN (6 layers) (Mohamed et al., 2010) 23.4 Dropout NN (6 layers) 21.8 DBN-pretraned NN (4 layers) 22.7 DBN-pretraned NN (6 layers) (Mohamed et al., 2010) 22.4 DBN-pretraned NN (8 layers) (Mohamed et al., 2010) 20.7 mcrbm-dbn-pretraned NN (5 layers) (Dahl et al., 2010) 20.5 DBN-pretraned NN (4 layers) + dropout 19.7 DBN-pretraned NN (8 layers) + dropout Results on a Text Data Set Table 7: Phone error rate on the TIMIT core test set. To test the usefulness of dropout n the text doman, we used dropout networks to tran a document classfer. We used a subset of the Reuters-RCV1 data set whch s a collecton of over 800,000 newswre artcles from Reuters. These artcles cover a varety of topcs. The task s to take a bag of words representaton of a document and classfy t nto 50 dsjont topcs. Appendx B.5 descrbes the setup n more detal. Our best neural net whch dd not use dropout obtaned an error rate of 31.05%. Addng dropout reduced the error to 29.62%. We found that the mprovement was much smaller compared to that for the vson and speech data sets. 6.4 Comparson wth Bayesan Neural Networks Dropout can be seen as a way of dong an equally-weghted averagng of exponentally many models wth shared weghts. On the other hand, Bayesan neural networks (Neal, 1996) are the proper way of dong model averagng over the space of neural network structures and parameters. In dropout, each model s weghted equally, whereas n a Bayesan neural network each model s weghted takng nto account the pror and how well the model fts the data, whch s the more correct approach. Bayesan neural nets are extremely useful for solvng problems n domans where data s scarce such as medcal dagnoss, genetcs, drug dscovery and other computatonal bology applcatons. However, Bayesan neural nets are slow to tran and dffcult to scale to very large network szes. Besdes, t s expensve to get predctons from many large nets at test tme. On the other hand, dropout neural nets are much faster to tran and use at test tme. In ths secton, we report experments that compare Bayesan neural nets wth dropout neural nets on a small data set where Bayesan neural networks are known to perform well and obtan state-of-the-art results. The am s to analyze how much does dropout lose compared to Bayesan neural nets. The data set that we use (Xong et al., 2011) comes from the doman of genetcs. The task s to predct the occurrence of alternatve splcng based on RNA features. Alternatve splcng s a sgnfcant cause of cellular dversty n mammalan tssues. Predctng the 1941

14 Srvastava, Hnton, Krzhevsky, Sutskever and Salakhutdnov Method Code Qualty (bts) Neural Network (early stoppng) (Xong et al., 2011) 440 Regresson, PCA (Xong et al., 2011) 463 SVM, PCA (Xong et al., 2011) 487 Neural Network wth dropout 567 Bayesan Neural Network (Xong et al., 2011) 623 Table 8: Results on the Alternatve Splcng Data Set. occurrence of alternate splcng n certan tssues under dfferent condtons s mportant for understandng many human dseases. Gven the RNA features, the task s to predct the probablty of three splcng related events that bologsts care about. The evaluaton metrc s Code Qualty whch s a measure of the negatve KL dvergence between the target and the predcted probablty dstrbutons (hgher s better). Appendx B.6 ncludes a detaled descrpton of the data set and ths performance metrc. Table 8 summarzes the performance of dfferent models on ths data set. Xong et al. (2011) used Bayesan neural nets for ths task. As expected, we found that Bayesan neural nets perform better than dropout. However, we see that dropout mproves sgnfcantly upon the performance of standard neural nets and outperforms all other methods. The challenge n ths data set s to prevent overfttng snce the sze of the tranng set s small. One way to prevent overfttng s to reduce the nput dmensonalty usng PCA. Thereafter, standard technques such as SVMs or logstc regresson can be used. However, wth dropout we were able to prevent overfttng wthout the need to do dmensonalty reducton. The dropout nets are very large (1000s of hdden unts) compared to a few tens of unts n the Bayesan network. Ths shows that dropout has a strong regularzng effect. 6.5 Comparson wth Standard Regularzers Several regularzaton methods have been proposed for preventng overfttng n neural networks. These nclude L2 weght decay (more generally Tkhonov regularzaton (Tkhonov, 1943)), lasso (Tbshran, 1996), KL-sparsty and max-norm regularzaton. Dropout can be seen as another way of regularzng neural networks. In ths secton we compare dropout wth some of these regularzaton methods usng the MNIST data set. The same network archtecture ( ) wth ReLUs was traned usng stochastc gradent descent wth dfferent regularzatons. Table 9 shows the results. The values of dfferent hyperparameters assocated wth each knd of regularzaton (decay constants, target sparsty, dropout rate, max-norm upper bound) were obtaned usng a valdaton set. We found that dropout combned wth max-norm regularzaton gves the lowest generalzaton error. 7. Salent Features The experments descrbed n the prevous secton provde strong evdence that dropout s a useful technque for mprovng neural networks. In ths secton, we closely examne how dropout affects a neural network. We analyze the effect of dropout on the qualty of features produced. We see how dropout affects the sparsty of hdden unt actvatons. We 1942

15 Dropout Method Test Classfcaton error % L L2 + L1 appled towards the end of tranng 1.60 L2 + KL-sparsty 1.55 Max-norm 1.35 Dropout + L Dropout + Max-norm 1.05 Table 9: Comparson of dfferent regularzaton methods on MNIST. also see how the advantages obtaned from dropout vary wth the probablty of retanng unts, sze of the network and the sze of the tranng set. These observatons gve some nsght nto why dropout works so well. 7.1 Effect on Features (a) Wthout dropout (b) Dropout wth p = 0.5. Fgure 7: Features learned on MNIST wth one hdden layer autoencoders havng 256 rectfed lnear unts. In a standard neural network, the dervatve receved by each parameter tells t how t should change so the fnal loss functon s reduced, gven what all other unts are dong. Therefore, unts may change n a way that they fx up the mstakes of the other unts. Ths may lead to complex co-adaptatons. Ths n turn leads to overfttng because these co-adaptatons do not generalze to unseen data. We hypothesze that for each hdden unt, dropout prevents co-adaptaton by makng the presence of other hdden unts unrelable. Therefore, a hdden unt cannot rely on other specfc unts to correct ts mstakes. It must perform well n a wde varety of dfferent contexts provded by the other hdden unts. To observe ths effect drectly, we look at the frst level features learned by neural networks traned on vsual tasks wth and wthout dropout. 1943

16 Srvastava, Hnton, Krzhevsky, Sutskever and Salakhutdnov Fgure 7a shows features learned by an autoencoder on MNIST wth a sngle hdden layer of 256 rectfed lnear unts wthout dropout. Fgure 7b shows the features learned by an dentcal autoencoder whch used dropout n the hdden layer wth p = 0.5. Both autoencoders had smlar test reconstructon errors. However, t s apparent that the features shown n Fgure 7a have co-adapted n order to produce good reconstructons. Each hdden unt on ts own does not seem to be detectng a meanngful feature. On the other hand, n Fgure 7b, the hdden unts seem to detect edges, strokes and spots n dfferent parts of the mage. Ths shows that dropout does break up co-adaptatons, whch s probably the man reason why t leads to lower generalzaton errors. 7.2 Effect on Sparsty (a) Wthout dropout (b) Dropout wth p = 0.5. Fgure 8: Effect of dropout on sparsty. ReLUs were used for both models. Left: The hstogram of mean actvatons shows that most unts have a mean actvaton of about 2.0. The hstogram of actvatons shows a huge mode away from zero. Clearly, a large fracton of unts have hgh actvaton. Rght: The hstogram of mean actvatons shows that most unts have a smaller mean mean actvaton of about 0.7. The hstogram of actvatons shows a sharp peak at zero. Very few unts have hgh actvaton. We found that as a sde-effect of dong dropout, the actvatons of the hdden unts become sparse, even when no sparsty nducng regularzers are present. Thus, dropout automatcally leads to sparse representatons. To observe ths effect, we take the autoencoders traned n the prevous secton and look at the sparsty of hdden unt actvatons on a random mn-batch taken from the test set. Fgure 8a and Fgure 8b compare the sparsty for the two models. In a good sparse model, there should only be a few hghly actvated unts for any data case. Moreover, the average actvaton of any unt across data cases should be low. To assess both of these qualtes, we plot two hstograms for each model. For each model, the hstogram on the left shows the dstrbuton of mean actvatons of hdden unts across the mnbatch. The hstogram on the rght shows the dstrbuton of actvatons of the hdden unts. Comparng the hstograms of actvatons we can see that fewer hdden unts have hgh actvatons n Fgure 8b compared to Fgure 8a, as seen by the sgnfcant mass away from 1944

17 Dropout zero for the net that does not use dropout. The mean actvatons are also smaller for the dropout net. The overall mean actvaton of hdden unts s close to 2.0 for the autoencoder wthout dropout but drops to around 0.7 when dropout s used. 7.3 Effect of Dropout Rate Dropout has a tunable hyperparameter p (the probablty of retanng a unt n the network). In ths secton, we explore the effect of varyng ths hyperparameter. The comparson s done n two stuatons. 1. The number of hdden unts s held constant. 2. The number of hdden unts s changed so that the expected number of hdden unts that wll be retaned after dropout s held constant. In the frst case, we tran the same network archtecture wth dfferent amounts of dropout. We use a archtecture. No nput dropout was used. Fgure 9a shows the test error obtaned as a functon of p. If the archtecture s held constant, havng a small p means very few unts wll turn on durng tranng. It can be seen that ths has led to underfttng snce the tranng error s also hgh. We see that as p ncreases, the error goes down. It becomes flat when 0.4 p 0.8 and then ncreases as p becomes close to Test Error Tranng Error Test Error Tranng Error Classfcaton Error % Classfcaton Error % Probablty of retanng a unt (p) (a) Keepng n fxed Probablty of retanng a unt (p) (b) Keepng pn fxed. Fgure 9: Effect of changng dropout rates on MNIST. Another nterestng settng s the second case n whch the quantty pn s held constant where n s the number of hdden unts n any partcular layer. Ths means that networks that have small p wll have a large number of hdden unts. Therefore, after applyng dropout, the expected number of unts that are present wll be the same across dfferent archtectures. However, the test networks wll be of dfferent szes. In our experments, we set pn = 256 for the frst two hdden layers and pn = 512 for the last hdden layer. Fgure 9b shows the test error obtaned as a functon of p. We notce that the magntude of errors for small values of p has reduced by a lot compared to Fgure 9a (for p = 0.1 t fell from 2.7% to 1.7%). Values of p that are close to 0.6 seem to perform best for ths choce of pn but our usual default value of 0.5 s close to optmal. 1945

18 Srvastava, Hnton, Krzhevsky, Sutskever and Salakhutdnov 7.4 Effect of Data Set Sze One test of a good regularzer s that t should make t possble to get good generalzaton error from models wth a large number of parameters traned on small data sets. Ths secton explores the effect of changng the data set sze when dropout s used wth feedforward networks. Huge neural networks traned n the standard way overft massvely on small data sets. To see f dropout can help, we run classfcaton experments on MNIST and vary the amount of data gven to the network. The results of these experments are 30 shown n Fgure 10. The network was gven Wth dropout Wthout dropout data sets of sze 100, 500, 1K, 5K, 10K 25 and 50K chosen randomly from the MNIST tranng set. The same network archtecture ( ) was used for all data sets. Dropout wth p = 0.5 was performed at all the hdden layers and p = 0.8 at the nput layer. It can be observed that for extremely small data sets (100, 500) dropout does not gve any mprovements. The model has enough parameters that t can overft on the tranng data, even wth all the nose comng from dropout. As the sze of the data set s ncreased, the gan Classfcaton Error % Dataset sze Fgure 10: Effect of varyng data set sze. from dong dropout ncreases up to a pont and then declnes. Ths suggests that for any gven archtecture and dropout rate, there s a sweet spot correspondng to some amount of data that s large enough to not be memorzed n spte of the nose but not so large that overfttng s not a problem anyways. 7.5 Monte-Carlo Model Averagng vs. Weght Scalng The effcent test tme procedure that we propose s to do an approxmate model combnaton by scalng down the weghts of the traned neural network. An expensve but more correct way of averagng the models s to sample k neural nets usng dropout for each test case and average ther predctons. As k, ths Monte-Carlo model average gets close to the true model average. It s nterestng to see emprcally how many samples k are needed to match the performance of the approxmate averagng method. By computng the error for dfferent values of k we can see how quckly the error rate of the fnte-sample average approaches the error rate of the true model average. Test Classfcaton error % Monte-Carlo Model Averagng Approxmate averagng by weght scalng Number of samples used for Monte-Carlo averagng (k) Fgure 11: Monte-Carlo model averagng vs. weght scalng. 1946

19 Dropout We agan use the MNIST data set and do classfcaton by averagng the predctons of k randomly sampled neural networks. Fgure 11 shows the test error rate obtaned for dfferent values of k. Ths s compared wth the error obtaned usng the weght scalng method (shown as a horzontal lne). It can be seen that around k = 50, the Monte-Carlo method becomes as good as the approxmate method. Thereafter, the Monte-Carlo method s slghtly better than the approxmate method but well wthn one standard devaton of t. Ths suggests that the weght scalng method s a farly good approxmaton of the true model average. 8. Dropout Restrcted Boltzmann Machnes Besdes feed-forward neural networks, dropout can also be appled to Restrcted Boltzmann Machnes (RBM). In ths secton, we formally descrbe ths model and show some results to llustrate ts key propertes. 8.1 Model Descrpton Consder an RBM wth vsble unts v {0, 1} D and hdden unts h {0, 1} F. It defnes the followng probablty dstrbuton P (h, v; θ) = 1 Z(θ) exp(v W h + a h + b v). Where θ = {W, a, b} represents the model parameters and Z s the partton functon. Dropout RBMs are RBMs augmented wth a vector of bnary random varables r {0, 1} F. Each random varable r j takes the value 1 wth probablty p, ndependent of others. If r j takes the value 1, the hdden unt h j s retaned, otherwse t s dropped from the model. The jont dstrbuton defned by a Dropout RBM can be expressed as P (r, h, v; p, θ) = P (r; p)p (h, v r; θ), F P (r; p) = p r j (1 p) 1 r j, P (h, v r; θ) = j=1 1 Z (θ, r) exp(v W h + a h + b v) g(h j, r j ) = 1(r j = 1) + 1(r j = 0)1(h j = 0). F g(h j, r j ), Z (θ, r) s the normalzaton constant. g(h j, r j ) mposes the constrant that f r j = 0, h j must be 0. The dstrbuton over h, condtoned on v and r s factoral j=1 F P (h r, v) = P (h j r j, v), j=1 P (h j = 1 r j, v) = 1(r j = 1)σ b j + ( W j v ). 1947

20 Srvastava, Hnton, Krzhevsky, Sutskever and Salakhutdnov (a) Wthout dropout (b) Dropout wth p = 0.5. Fgure 12: Features learned on MNIST by 256 hdden unt RBMs. The features are ordered by L2 norm. The dstrbuton over v condtoned on h s same as that of an RBM P (v h) = D P (v h), =1 P (v = 1 h) = σ a + j W j h j. Condtoned on r, the dstrbuton over {v, h} s same as the dstrbuton that an RBM would mpose, except that the unts for whch r j = 0 are dropped from h. Therefore, the Dropout RBM model can be seen as a mxture of exponentally many RBMs wth shared weghts each usng a dfferent subset of h. 8.2 Learnng Dropout RBMs Learnng algorthms developed for RBMs such as Contrastve Dvergence (Hnton et al., 2006) can be drectly appled for learnng Dropout RBMs. The only dfference s that r s frst sampled and only the hdden unts that are retaned are used for tranng. Smlar to dropout neural networks, a dfferent r s sampled for each tranng case n every mnbatch. In our experments, we use CD-1 for tranng dropout RBMs. 8.3 Effect on Features Dropout n feed-forward networks mproved the qualty of features by reducng co-adaptatons. Ths secton explores whether ths effect transfers to Dropout RBMs as well. Fgure 12a shows features learned by a bnary RBM wth 256 hdden unts. Fgure 12b shows features learned by a dropout RBM wth the same number of hdden unts. Features 1948

21 Dropout (a) Wthout dropout (b) Dropout wth p = 0.5. Fgure 13: Effect of dropout on sparsty. Left: The actvaton hstogram shows that a large number of unts have actvatons away from zero. Rght: A large number of unts have actvatons close to zero and very few unts have hgh actvaton. learned by the dropout RBM appear qualtatvely dfferent n the sense that they seem to capture features that are coarser compared to the sharply defned stroke-lke features n the standard RBM. There seem to be very few dead unts n the dropout RBM relatve to the standard RBM. 8.4 Effect on Sparsty Next, we nvestgate the effect of dropout RBM tranng on sparsty of the hdden unt actvatons. Fgure 13a shows the hstograms of hdden unt actvatons and ther means on a test mn-batch after tranng an RBM. Fgure 13b shows the same for dropout RBMs. The hstograms clearly ndcate that the dropout RBMs learn much sparser representatons than standard RBMs even when no addtonal sparsty nducng regularzer s present. 9. Margnalzng Dropout Dropout can be seen as a way of addng nose to the states of hdden unts n a neural network. In ths secton, we explore the class of models that arse as a result of margnalzng ths nose. These models can be seen as determnstc versons of dropout. In contrast to standard ( Monte-Carlo ) dropout, these models do not need random bts and t s possble to get gradents for the margnalzed loss functons. In ths secton, we brefly explore these models. Determnstc algorthms have been proposed that try to learn models that are robust to feature deleton at test tme (Globerson and Rowes, 2006). Margnalzaton n the context of denosng autoencoders has been explored prevously (Chen et al., 2012). The margnalzaton of dropout nose n the context of lnear regresson was dscussed n Srvastava (2013). Wang and Mannng (2013) further explored the dea of margnalzng dropout to speed-up tranng. van der Maaten et al. (2013) nvestgated dfferent nput nose dstrbutons and 1949

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS46: Mnng Massve Datasets Jure Leskovec, Stanford Unversty http://cs46.stanford.edu /19/013 Jure Leskovec, Stanford CS46: Mnng Massve Datasets, http://cs46.stanford.edu Perceptron: y = sgn( x Ho to fnd

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

Announcements. Supervised Learning

Announcements. Supervised Learning Announcements See Chapter 5 of Duda, Hart, and Stork. Tutoral by Burge lnked to on web page. Supervsed Learnng Classfcaton wth labeled eamples. Images vectors n hgh-d space. Supervsed Learnng Labeled eamples

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Meta-heuristics for Multidimensional Knapsack Problems

Meta-heuristics for Multidimensional Knapsack Problems 2012 4th Internatonal Conference on Computer Research and Development IPCSIT vol.39 (2012) (2012) IACSIT Press, Sngapore Meta-heurstcs for Multdmensonal Knapsack Problems Zhbao Man + Computer Scence Department,

More information

Edge Detection in Noisy Images Using the Support Vector Machines

Edge Detection in Noisy Images Using the Support Vector Machines Edge Detecton n Nosy Images Usng the Support Vector Machnes Hlaro Gómez-Moreno, Saturnno Maldonado-Bascón, Francsco López-Ferreras Sgnal Theory and Communcatons Department. Unversty of Alcalá Crta. Madrd-Barcelona

More information

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1 4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:

More information

CS 534: Computer Vision Model Fitting

CS 534: Computer Vision Model Fitting CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

Classifying Acoustic Transient Signals Using Artificial Intelligence

Classifying Acoustic Transient Signals Using Artificial Intelligence Classfyng Acoustc Transent Sgnals Usng Artfcal Intellgence Steve Sutton, Unversty of North Carolna At Wlmngton (suttons@charter.net) Greg Huff, Unversty of North Carolna At Wlmngton (jgh7476@uncwl.edu)

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

Unsupervised Learning

Unsupervised Learning Pattern Recognton Lecture 8 Outlne Introducton Unsupervsed Learnng Parametrc VS Non-Parametrc Approach Mxture of Denstes Maxmum-Lkelhood Estmates Clusterng Prof. Danel Yeung School of Computer Scence and

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

Three supervised learning methods on pen digits character recognition dataset

Three supervised learning methods on pen digits character recognition dataset Three supervsed learnng methods on pen dgts character recognton dataset Chrs Flezach Department of Computer Scence and Engneerng Unversty of Calforna, San Dego San Dego, CA 92093 cflezac@cs.ucsd.edu Satoru

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification Introducton to Artfcal Intellgence V22.0472-001 Fall 2009 Lecture 24: Nearest-Neghbors & Support Vector Machnes Rob Fergus Dept of Computer Scence, Courant Insttute, NYU Sldes from Danel Yeung, John DeNero

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics Introducton G10 NAG Fortran Lbrary Chapter Introducton G10 Smoothng n Statstcs Contents 1 Scope of the Chapter... 2 2 Background to the Problems... 2 2.1 Smoothng Methods... 2 2.2 Smoothng Splnes and Regresson

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

UB at GeoCLEF Department of Geography Abstract

UB at GeoCLEF Department of Geography   Abstract UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department

More information

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following. Complex Numbers The last topc n ths secton s not really related to most of what we ve done n ths chapter, although t s somewhat related to the radcals secton as we wll see. We also won t need the materal

More information

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION Paulo Quntlano 1 & Antono Santa-Rosa 1 Federal Polce Department, Brasla, Brazl. E-mals: quntlano.pqs@dpf.gov.br and

More information

Problem Set 3 Solutions

Problem Set 3 Solutions Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

General Vector Machine. Hong Zhao Department of Physics, Xiamen University

General Vector Machine. Hong Zhao Department of Physics, Xiamen University General Vector Machne Hong Zhao (zhaoh@xmu.edu.cn) Department of Physcs, Xamen Unversty The support vector machne (SVM) s an mportant class of learnng machnes for functon approach, pattern recognton, and

More information

Classification / Regression Support Vector Machines

Classification / Regression Support Vector Machines Classfcaton / Regresson Support Vector Machnes Jeff Howbert Introducton to Machne Learnng Wnter 04 Topcs SVM classfers for lnearly separable classes SVM classfers for non-lnearly separable classes SVM

More information

Face Detection with Deep Learning

Face Detection with Deep Learning Face Detecton wth Deep Learnng Yu Shen Yus122@ucsd.edu A13227146 Kuan-We Chen kuc010@ucsd.edu A99045121 Yzhou Hao y3hao@ucsd.edu A98017773 Mn Hsuan Wu mhwu@ucsd.edu A92424998 Abstract The project here

More information

Journal of Chemical and Pharmaceutical Research, 2014, 6(6): Research Article. A selective ensemble classification method on microarray data

Journal of Chemical and Pharmaceutical Research, 2014, 6(6): Research Article. A selective ensemble classification method on microarray data Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(6):2860-2866 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A selectve ensemble classfcaton method on mcroarray

More information

Collaboratively Regularized Nearest Points for Set Based Recognition

Collaboratively Regularized Nearest Points for Set Based Recognition Academc Center for Computng and Meda Studes, Kyoto Unversty Collaboratvely Regularzed Nearest Ponts for Set Based Recognton Yang Wu, Mchhko Mnoh, Masayuk Mukunok Kyoto Unversty 9/1/013 BMVC 013 @ Brstol,

More information

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes SPH3UW Unt 7.3 Sphercal Concave Mrrors Page 1 of 1 Notes Physcs Tool box Concave Mrror If the reflectng surface takes place on the nner surface of the sphercal shape so that the centre of the mrror bulges

More information

TN348: Openlab Module - Colocalization

TN348: Openlab Module - Colocalization TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages

More information

Biostatistics 615/815

Biostatistics 615/815 The E-M Algorthm Bostatstcs 615/815 Lecture 17 Last Lecture: The Smplex Method General method for optmzaton Makes few assumptons about functon Crawls towards mnmum Some recommendatons Multple startng ponts

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET 1 BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET TZU-CHENG CHUANG School of Electrcal and Computer Engneerng, Purdue Unversty, West Lafayette, Indana 47907 SAUL B. GELFAND School

More information

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation Intellgent Informaton Management, 013, 5, 191-195 Publshed Onlne November 013 (http://www.scrp.org/journal/m) http://dx.do.org/10.36/m.013.5601 Qualty Improvement Algorthm for Tetrahedral Mesh Based on

More information

Machine Learning 9. week

Machine Learning 9. week Machne Learnng 9. week Mappng Concept Radal Bass Functons (RBF) RBF Networks 1 Mappng It s probably the best scenaro for the classfcaton of two dataset s to separate them lnearly. As you see n the below

More information

Backpropagation: In Search of Performance Parameters

Backpropagation: In Search of Performance Parameters Bacpropagaton: In Search of Performance Parameters ANIL KUMAR ENUMULAPALLY, LINGGUO BU, and KHOSROW KAIKHAH, Ph.D. Computer Scence Department Texas State Unversty-San Marcos San Marcos, TX-78666 USA ae049@txstate.edu,

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

SVM-based Learning for Multiple Model Estimation

SVM-based Learning for Multiple Model Estimation SVM-based Learnng for Multple Model Estmaton Vladmr Cherkassky and Yunqan Ma Department of Electrcal and Computer Engneerng Unversty of Mnnesota Mnneapols, MN 55455 {cherkass,myq}@ece.umn.edu Abstract:

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

Image Representation & Visualization Basic Imaging Algorithms Shape Representation and Analysis. outline

Image Representation & Visualization Basic Imaging Algorithms Shape Representation and Analysis. outline mage Vsualzaton mage Vsualzaton mage Representaton & Vsualzaton Basc magng Algorthms Shape Representaton and Analyss outlne mage Representaton & Vsualzaton Basc magng Algorthms Shape Representaton and

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

Deep learning is a good steganalysis tool when embedding key is reused for different images, even if there is a cover source-mismatch

Deep learning is a good steganalysis tool when embedding key is reused for different images, even if there is a cover source-mismatch Deep learnng s a good steganalyss tool when embeddng key s reused for dfferent mages, even f there s a cover source-msmatch Lonel PIBRE 2,3, Jérôme PASQUET 2,3, Dno IENCO 2,3, Marc CHAUMONT 1,2,3 (1) Unversty

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

Reducing Frame Rate for Object Tracking

Reducing Frame Rate for Object Tracking Reducng Frame Rate for Object Trackng Pavel Korshunov 1 and We Tsang Oo 2 1 Natonal Unversty of Sngapore, Sngapore 11977, pavelkor@comp.nus.edu.sg 2 Natonal Unversty of Sngapore, Sngapore 11977, oowt@comp.nus.edu.sg

More information

y and the total sum of

y and the total sum of Lnear regresson Testng for non-lnearty In analytcal chemstry, lnear regresson s commonly used n the constructon of calbraton functons requred for analytcal technques such as gas chromatography, atomc absorpton

More information

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010 Smulaton: Solvng Dynamc Models ABE 5646 Week Chapter 2, Sprng 200 Week Descrpton Readng Materal Mar 5- Mar 9 Evaluatng [Crop] Models Comparng a model wth data - Graphcal, errors - Measures of agreement

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

Understanding the difficulty of training deep feedforward neural networks

Understanding the difficulty of training deep feedforward neural networks Understandng the dffculty of tranng deep feedforward neural networks Xaver Glorot Yoshua Bengo DIRO, Unversté de Montréal, Montréal, Québec, Canada Abstract Whereas before 2006 t appears that deep multlayer

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

The Research of Support Vector Machine in Agricultural Data Classification

The Research of Support Vector Machine in Agricultural Data Classification The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou

More information

A Statistical Model Selection Strategy Applied to Neural Networks

A Statistical Model Selection Strategy Applied to Neural Networks A Statstcal Model Selecton Strategy Appled to Neural Networks Joaquín Pzarro Elsa Guerrero Pedro L. Galndo joaqun.pzarro@uca.es elsa.guerrero@uca.es pedro.galndo@uca.es Dpto Lenguajes y Sstemas Informátcos

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

Comparing Image Representations for Training a Convolutional Neural Network to Classify Gender

Comparing Image Representations for Training a Convolutional Neural Network to Classify Gender 2013 Frst Internatonal Conference on Artfcal Intellgence, Modellng & Smulaton Comparng Image Representatons for Tranng a Convolutonal Neural Network to Classfy Gender Choon-Boon Ng, Yong-Haur Tay, Bok-Mn

More information

Research of Image Recognition Algorithm Based on Depth Learning

Research of Image Recognition Algorithm Based on Depth Learning 208 4th World Conference on Control, Electroncs and Computer Engneerng (WCCECE 208) Research of Image Recognton Algorthm Based on Depth Learnng Zhang Jan, J Xnhao Zhejang Busness College, Hangzhou, Chna,

More information

Review of approximation techniques

Review of approximation techniques CHAPTER 2 Revew of appromaton technques 2. Introducton Optmzaton problems n engneerng desgn are characterzed by the followng assocated features: the objectve functon and constrants are mplct functons evaluated

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

Learning a Class-Specific Dictionary for Facial Expression Recognition

Learning a Class-Specific Dictionary for Facial Expression Recognition BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 16, No 4 Sofa 016 Prnt ISSN: 1311-970; Onlne ISSN: 1314-4081 DOI: 10.1515/cat-016-0067 Learnng a Class-Specfc Dctonary for

More information

Online Detection and Classification of Moving Objects Using Progressively Improving Detectors

Online Detection and Classification of Moving Objects Using Progressively Improving Detectors Onlne Detecton and Classfcaton of Movng Objects Usng Progressvely Improvng Detectors Omar Javed Saad Al Mubarak Shah Computer Vson Lab School of Computer Scence Unversty of Central Florda Orlando, FL 32816

More information

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Some materal adapted from Mohamed Youns, UMBC CMSC 611 Spr 2003 course sldes Some materal adapted from Hennessy & Patterson / 2003 Elsever Scence Performance = 1 Executon tme Speedup = Performance (B)

More information

A Bilinear Model for Sparse Coding

A Bilinear Model for Sparse Coding A Blnear Model for Sparse Codng Davd B. Grmes and Rajesh P. N. Rao Department of Computer Scence and Engneerng Unversty of Washngton Seattle, WA 98195-2350, U.S.A. grmes,rao @cs.washngton.edu Abstract

More information

Large-Scale Feature Learning With Spike-and-Slab Sparse Coding

Large-Scale Feature Learning With Spike-and-Slab Sparse Coding Ian J. Goodfellow Aaron Courvlle Yoshua Bengo DIRO, Unversté de Montréal, Montréal, Québec, Canada goodfel.@ro.umontreal.ca Aaron.Courvlle@umontreal.ca Yoshua.Bengo@umontreal.ca Abstract We consder the

More information

Journal of Process Control

Journal of Process Control Journal of Process Control (0) 738 750 Contents lsts avalable at ScVerse ScenceDrect Journal of Process Control j ourna l ho me pag e: wwwelsevercom/locate/jprocont Decentralzed fault detecton and dagnoss

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15 CS434a/541a: Pattern Recognton Prof. Olga Veksler Lecture 15 Today New Topc: Unsupervsed Learnng Supervsed vs. unsupervsed learnng Unsupervsed learnng Net Tme: parametrc unsupervsed learnng Today: nonparametrc

More information

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION SHI-LIANG SUN, HONG-LEI SHI Department of Computer Scence and Technology, East Chna Normal Unversty 500 Dongchuan Road, Shangha 200241, P. R. Chna E-MAIL: slsun@cs.ecnu.edu.cn,

More information

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like:

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like: Self-Organzng Maps (SOM) Turgay İBRİKÇİ, PhD. Outlne Introducton Structures of SOM SOM Archtecture Neghborhoods SOM Algorthm Examples Summary 1 2 Unsupervsed Hebban Learnng US Hebban Learnng, Cntd 3 A

More information

Parallel matrix-vector multiplication

Parallel matrix-vector multiplication Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more

More information

EXTENDED BIC CRITERION FOR MODEL SELECTION

EXTENDED BIC CRITERION FOR MODEL SELECTION IDIAP RESEARCH REPORT EXTEDED BIC CRITERIO FOR ODEL SELECTIO Itshak Lapdot Andrew orrs IDIAP-RR-0-4 Dalle olle Insttute for Perceptual Artfcal Intellgence P.O.Box 59 artgny Valas Swtzerland phone +4 7

More information

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 1. SSDH: Semi-supervised Deep Hashing for Large Scale Image Retrieval

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 1. SSDH: Semi-supervised Deep Hashing for Large Scale Image Retrieval IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY SSDH: Sem-supervsed Deep Hashng for Large Scale Image Retreval Jan Zhang, and Yuxn Peng arxv:607.08477v2 [cs.cv] 8 Jun 207 Abstract Hashng

More information

CAN COMPUTERS LEARN FASTER? Seyda Ertekin Computer Science & Engineering The Pennsylvania State University

CAN COMPUTERS LEARN FASTER? Seyda Ertekin Computer Science & Engineering The Pennsylvania State University CAN COMPUTERS LEARN FASTER? Seyda Ertekn Computer Scence & Engneerng The Pennsylvana State Unversty sertekn@cse.psu.edu ABSTRACT Ever snce computers were nvented, manknd wondered whether they mght be made

More information

GSLM Operations Research II Fall 13/14

GSLM Operations Research II Fall 13/14 GSLM 58 Operatons Research II Fall /4 6. Separable Programmng Consder a general NLP mn f(x) s.t. g j (x) b j j =. m. Defnton 6.. The NLP s a separable program f ts objectve functon and all constrants are

More information

5 The Primal-Dual Method

5 The Primal-Dual Method 5 The Prmal-Dual Method Orgnally desgned as a method for solvng lnear programs, where t reduces weghted optmzaton problems to smpler combnatoral ones, the prmal-dual method (PDM) has receved much attenton

More information

USING GRAPHING SKILLS

USING GRAPHING SKILLS Name: BOLOGY: Date: _ Class: USNG GRAPHNG SKLLS NTRODUCTON: Recorded data can be plotted on a graph. A graph s a pctoral representaton of nformaton recorded n a data table. t s used to show a relatonshp

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION

CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION 48 CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION 3.1 INTRODUCTION The raw mcroarray data s bascally an mage wth dfferent colors ndcatng hybrdzaton (Xue

More information

Learning to Project in Multi-Objective Binary Linear Programming

Learning to Project in Multi-Objective Binary Linear Programming Learnng to Project n Mult-Objectve Bnary Lnear Programmng Alvaro Serra-Altamranda Department of Industral and Management System Engneerng, Unversty of South Florda, Tampa, FL, 33620 USA, amserra@mal.usf.edu,

More information

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009. Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton

More information

Artificial Intelligence (AI) methods are concerned with. Artificial Intelligence Techniques for Steam Generator Modelling

Artificial Intelligence (AI) methods are concerned with. Artificial Intelligence Techniques for Steam Generator Modelling Artfcal Intellgence Technques for Steam Generator Modellng Sarah Wrght and Tshldz Marwala Abstract Ths paper nvestgates the use of dfferent Artfcal Intellgence methods to predct the values of several contnuous

More information

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT 3. - 5. 5., Brno, Czech Republc, EU APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT Abstract Josef TOŠENOVSKÝ ) Lenka MONSPORTOVÁ ) Flp TOŠENOVSKÝ

More information

Selecting Shape Features Using Multi-class Relevance Vector Machine

Selecting Shape Features Using Multi-class Relevance Vector Machine Selectng Shape Features Usng Mult-class Relevance Vector Machne Hao Zhang Jtendra Malk Electrcal Engneerng and Computer Scences Unversty of Calforna at Berkeley Techncal Report No. UCB/EECS-5-6 http://www.eecs.berkeley.edu/pubs/techrpts/5/eecs-5-6.html

More information

Research Article A High-Order CFS Algorithm for Clustering Big Data

Research Article A High-Order CFS Algorithm for Clustering Big Data Moble Informaton Systems Volume 26, Artcle ID 435627, 8 pages http://dx.do.org/.55/26/435627 Research Artcle A Hgh-Order Algorthm for Clusterng Bg Data Fanyu Bu,,2 Zhku Chen, Peng L, Tong Tang, 3 andyngzhang

More information

Writer Identification using a Deep Neural Network

Writer Identification using a Deep Neural Network Wrter Identfcaton usng a Deep Neural Network Jun Chu and Sargur Srhar Department of Computer Scence and Engneerng Unversty at Buffalo, The State Unversty of New York Buffalo, NY 1469, USA {jchu6, srhar}@buffalo.edu

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

VISUAL SELECTION OF SURFACE FEATURES DURING THEIR GEOMETRIC SIMULATION WITH THE HELP OF COMPUTER TECHNOLOGIES

VISUAL SELECTION OF SURFACE FEATURES DURING THEIR GEOMETRIC SIMULATION WITH THE HELP OF COMPUTER TECHNOLOGIES UbCC 2011, Volume 6, 5002981-x manuscrpts OPEN ACCES UbCC Journal ISSN 1992-8424 www.ubcc.org VISUAL SELECTION OF SURFACE FEATURES DURING THEIR GEOMETRIC SIMULATION WITH THE HELP OF COMPUTER TECHNOLOGIES

More information

Private Information Retrieval (PIR)

Private Information Retrieval (PIR) 2 Levente Buttyán Problem formulaton Alce wants to obtan nformaton from a database, but she does not want the database to learn whch nformaton she wanted e.g., Alce s an nvestor queryng a stock-market

More information

Fast Feature Value Searching for Face Detection

Fast Feature Value Searching for Face Detection Vol., No. 2 Computer and Informaton Scence Fast Feature Value Searchng for Face Detecton Yunyang Yan Department of Computer Engneerng Huayn Insttute of Technology Hua an 22300, Chna E-mal: areyyyke@63.com

More information

Face Recognition University at Buffalo CSE666 Lecture Slides Resources:

Face Recognition University at Buffalo CSE666 Lecture Slides Resources: Face Recognton Unversty at Buffalo CSE666 Lecture Sldes Resources: http://www.face-rec.org/algorthms/ Overvew of face recognton algorthms Correlaton - Pxel based correspondence between two face mages Structural

More information