Effects of Model Complexity on Generalization Performance of Convolutional Neural Networks

Effects of Model Complexity on Genealization Pefomance of Convolutional Neual Netwoks Tae-Jun Kim 1, Dongsu Zhang 2, and Joon Shik Kim 3 1 Seoul National Univesity, Seoul 151-742, Koea, E-mail: tjkim@bi.snu.ac.k 2 Yangjae High School, Seoul, Koea, E-mail: 96lives@gmail.com 3 Univesity of Seoul, Seoul 130-743, Koea, E-mail: jskim.ozmagi@gmail.com Abstact. Convolutional neual netwoks ae known to be effective in leaning complex image classification tasks. Howeve, how to design the achitectue o complexity of the netwok stuctue equies a moe quantitative analysis of the achitectue design. In this pape, we study the effect of model complexity on genealization capability of the convolutional neual netwoks on lage-scale, eal-life digit ecognition data. We used the digit images of the MNIST dataset to tain the neual netwoks and evaluated thei pefomance on a test set of unobseved images. Using the LeNet softwae tool we vaied the numbe of hidden layes and the numbe of units in the layes to evaluate the effect of model complexity on the genealization capability of the convolutional neual netwoks. In ou expeimental settings, we obseve obust genealization pefomances of the convolutional neual netwoks on a wide ange of model complexities. We analyze and discuss how the convolution laye and the subsampling laye may contibute to the genealization pefomance. Keywods: Convolutional neual netwoks, MNIST, LeNet, image classification, model complexity, Occam s azo, genealization pefomance 1. Intoduction Computes solve many poblems well if pogammed appopiately by human pogammes. Howeve, some atificial intelligence poblems, such as image and speech analysis, ae had to pogam the computes to solve. In this case, machine leaning offes new possibilities since it allows to automatically build a pogam fom data, i.e. by epeatedly obseving humans solving the poblem. An impotant issue in machine leaning is how to contol the complexity of the model: if the model is too simple, it cannot lean the data, wheeas too complex models may ovefit the data. Convolutional neual netwoks ae especially inteesting as a machine leaning model since they can lean complex pattens in eal-life data sets, such as images. Howeve, the design of the netwok stuctue, i.e. the numbe of layes and the numbe of units in the layes, emain an at. In this pape we aim to undestand the elationship between the complexity of the netwok model and its genealization pefomance by exploing the achitectue space expeimentally. Ou vision is to build a

deep neual netwok that can lean to ecognize human faces as moe human faces ae obseved. In this fist stage of ou eseach, in this pape we expeiment with the MNIST benchmak data sets. The pape is oganized as follows. In Section 2 we descibe achitectue and leaning method of the convolutional neual netwok. In Section 3 we descibe the data set and ou expeimental designs. Section 4 epots on the expeimental esults and thei analysis. Section 5 concludes the wok. 2. Deep Convolutional Neual Netwoks Neual netwoks ae neuobiologically inspied computational models of leaning and memoy. A single neuon j pocesses infomation in two stages: It fist computes the net input net j fom the incoming activations x i of pesynaptic neuons i Pe net j I ipe w ji x i and then tansfes the net input though a nonlinea activation function (), such as a sigmoid function 1 (net) 1 exp(net). The neuons ae oganized typically in a layeed stuctue whee a laye of neuons ae connected fom a pevious laye of neuons and thee ae no connections between the neuons in the same laye. One of popula neual achitectues is the multilaye pecepton consisting of two fully-connected feedfowad layes of neuons. If I and H denote the numbes of input neuons and hidden neuons, espectively, the k-th output of the multilaye pecepton is expessed as H I (2) f k (x,w) k w kh h w (1) hi x i h1 whee x is the input taining patten and w (w (1),w (2) ) is the weight vecto that detemines the neual netwok function. One of the most inteesting featues of neual netwoks is its leaning capability: without pogamming a neual netwok can automatically lean to solve poblems fom taining examples, such as D N {(x n,y n ) n 1,2,..., N} whee x n is the n-th input patten and y n the associated taget output patten. The well-known eo back-popagation algoithm adapts the weight vectos iteatively, that is by i) pesenting an input patten x n to the input laye of the neual netwok, ii) computing the neual netwok output vecto f(x,w), iii) computing the eo E n (w) between the actual output f(x n,w) and the taget output y n i1 E n (w) 1 2 y n f(x n,w) 2

iv) changing the weight values w ji by a small amount ( ) in the diectionn of educ- ing the eos, E n w ji, with espect too the weights w w ji ji w and v) by popagating the eos computed sequentially and backwads fomm the out- net- put layes to the diection of input layes. Due to its leaningg capability, neual woks have been successfully usedd fo many applications inn atificial intelligence, including patten ecognition, compute vision, natual language pocessing, and ocognitive botics. Neual netwoks have also been used as computational tools fo science eseach to investigate how humans epesent, pocess, and lean infomation. Taditionally, two-laye multilaye peceptons have been used fo most applica- con- tions. Howeve, ecentt eseach shows that deep netwoks, i.e. a leaning model sisting of many layes of pocessingg units, can outpefom shallow netwoks. This is inteesting since theoetically a multilaye pecepton having a single hiddenn laye can be poven to be a univesal appoximato and, thus, can lean any complex nonlinea functions. Deep netwoks have poven especially useful if the datasets aee complex such as image and speech data. Thee ae two diffeent classes of deep netwoks. One is the deep belief netwoks (DBNs)) consisting of multiple layes of esticted Boltzwhich ae mann machines (RBMs) [1 4]. An RBM consists of two layes of neuons fully connected in a feedfowad fashion. A DBM is taditionally tained with contastive divegence which is based on Gibbs sampling [2]. Anothe class of deep netwoks is the convolutional neual netwoks (CNNs). In contast to a deep belief netwok, a CNN consistss of patially connected netwok lay- laye (Fig. 1). The convolution laye contains a numbe of featue maps, whee each es. That is, a CNN consists of manyy stacks of a convolution laye and a sub-sampling unit has a patial eceptive field of the input image. That is, in i a CNN, only patial, local patches of the inputs, not all of them, ae connected to the next sub-sampling laye. The sub-sampling laye consists of a numbe of featue maps, which ae scaled local aveages of the espective featue maps in the pevious convolutional laye. Each unit in the sub-sampling laye has one tainable coefficient fo the local aveage and one tainable bias.. The eo back-popagation algoithm is i used to tainn the con- volutional neual netwoks [5 9]. E n w ji Fig.1. A six-laye convolutional neual netwok

3. MNIST Digit Data Sets and Expeimen nt Design MNIST data sets consist of handwitten digit images [10]. This database was distibuted by the National Institute I of Standads and Technology (NIST)( [11]. The digits ae centeed and nomalized in an image of fixed size (28 288 pixel). Fo the expei- in ten ments, we used taining set and a test set of images of single-digit numbes categoies (fom 0 to t 9 ). Some example images in the database ae shown in Fig.2. Fig. 2. Example of handwitten digitt images fom MNIST database. To study the effect of model complexity on the genealization pefomance, we designed the expeiments by vaying the numbe of layes and units. We compaed the pefomances (the taining t and test eos) of CNN models of 4 layes (Fig. 3.) and 6 layes (Fig. 1). We also a vaied thee numbe of units in each laye fo the 4-laye and 6-laye CNNs. We also plot the esults in tems of the effectivee numbe of paametes (which depends on the numbe of layes and units) on the genealization pefomance. Fig 3. Achitectue of 4-laye CNN used in the expeiments fo compaison.

4. Expeimental Results and Analysis The expeiments have been pefomed by the Matlab code in [6]. We an the fou diffeent kinds of expeiments descibed in the pevious section to analyze the effect of model complexity on the leaning pefomance. The fist is to analyze the taining and test eos vs. the numbe of layes. The second is to see the taining and test eos vs. the numbe of taining iteations. The thid is the change of taining and test eos vs. the total numbe of adjustable paametes, which epesents the effective model complexity. The last is the change of taining and test eos vs. the total numbe of paametes using neual netwoks. Fig. 4. shows the expeimental esults fo the 4-laye and 6-laye CNNs. The taining eo is measued by the mean squaed eo (MSE). The test eo is evaluated by the pecentage of false answes i.e. the classification eo ate. The taining set consisted of 6,000 examples and the test set of 1,000 examples. The numbe of units was fixed to 8 fo each laye and the numbe of iteations was 1,000 epochs. Fo this setting, we see that the deepe netwoks achieve bette pefomance both in taining and test eos. 35.00% 30.00% 25.00% 20.00% 15.00% 4 laye 6 laye 10.00% 5.00% 0.00% Taining eo Test eo Fig. 4. The taining and test eos vs. the numbe of layes in the convolutional neual netwoks. Fig. 5 shows the changes of pefomance fo vaious numbes of taining epochs. Since a lage numbe of taining iteations is equivalent to a moe complex model than a small numbe of iteation, inceasing the numbe of iteations has the same effect of using moe complex models. In Fig. 5 we see the pefomance gets inceased as moe iteations ae pefomed. We do not obseve any ovefitting behavio in this

setting of expeiments. This shows an inteesting popety of the convolutional neual netwoks. We will come back to this issue in the discussion. 90.00% 80.00% 70.00% 60.00% 4 laye Taining eo 50.00% 4 laye Test eo 40.00% 6 laye Taining eo 30.00% 20.00% 6 laye Test eo 10.00% 0.00% 10 50 300 1000 Fig. 5. Pefomance compaison fo the numbe of epochs in taining.. 7.00% E o 6.00% 5.00% 4.00% 3.00% 2.00% 1.00% 0.00% Tain eo Test eo Numbe fo Paametes Fig. 6. Effects of the numbe of paametes of the convolutional neual netwoks (by changing the numbe of units as well as layes) on the taining eo (the mean squaed eo) and the test eo (the classification eo ate). Fig. 6 compaes the pefomance of the convolutional neual netwoks fo a wide ange of the numbe of paametes (360 to 3160). The ange was geneated by changing the numbe of units and the numbe of layes. The esults shows that both the

testing and taining eos decease as the numbe of paametes inceases, i.e. consistent behavio of the taining eo (the mean squaed eo) and the test eo (the classification eo ate). That is, the genealization pefomance ove a wide ange of model complexity is good except fo some specific paamete settings (e.g. aound the numbe of paametes of 2860). This demonstates the obustness of CNNs in genealization pefomance. Howeve, this esult should be intepeted moe caefully. The minimum value of test eo we achieved in these expeiments is above 3% which is lage than epoted optimal value such as 0.83% [4]. On the othe hand, we obseved the ovefitting ange expeimented by neual netwoks(fig. 7). In this expeiment, we fixed the input nodes(784), output nodes(10) and vaied hidden nodes(fom 100 to 280). The esults shows that the taining eos continuously decease as the numbe of paametes inceases, but the testing eos inceases aound the numbe of paametes of 119100. This means that NNs is much weake at ovefitting poblem than CNNs. 11.50% 11.00% E o 10.50% 10.00% 9.50% 9.00% 8.50% Tain eo Test eo 8.00% 79400 119100 158800 198500 222320 Numbe of Paametes Fig. 7. Effects of the numbe of paametes of the neual netwoks (by changing the numbe of units as well as layes) on the taining eo (the mean squaed eo) and the test eo (the classification eo ate). 5. Conclusion We studied the genealization behavio of convolutional neual netwoks fo vaious combinations of model complexities. The model complexities wee vaied by the numbe of layes, the numbe of units, the numbe of epochs, and the numbe of effective paametes. On the MNIST digit data set of 6,000 images we found that the

CNNs show obust genealization pefomances ove a wide ange of model complexities. The esults ae vey inteesting since the machine leaning theoy of model complexity says that the model should ovefit if the model complexity inceases. Pehaps one eason fo not ovefitting is that ou expeimental settings wee, despite the vaious settings, not wide enough to detect the ovefitting anges. And it is had to find ovefitting anges because of sub-sampling layes. Howeve, consideing the natue of the digit ecognition poblem and the elatively big data set size (6,000 images fo taining and 1,000 images fo test), the convolutional netwoks seem to be able to find invaiance in the images. In addition, the capability of subsampling also seems to contibute to genealization pefomance by building spase models of the data. Acknowledgement This wok was suppoted by the National Reseach Foundation of Koea (NRF) gant funded by the Koea govenment (MSIP) (NRF-2010-0017734-Videome), suppoted in pat by KEIT gant funded by the Koea govenment (MKE) (KEIT- 10035348-mLife, KEIT-10044009). Refeences 1. Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neual netwoks. Science, 313, 504 507 (2006) 2. Bengio, Y.: Leaning deep achitectues fo AI. Foundations and Tends in Machine Leaning, 2, 1 127 (2009) 3. Hinton, G.E., Osindeo, S., Teh, Y.-W.: A fast leaning algoithm fo deep belief nets. Neual Computation, 18, 1527 1554 (2006) 4. Deng, L., Yu, D., Platt, J.: Scalable stacking and leaning fo building deep achitectues. In: 2012 IEEE Intenational Confeence on Acoustics, Speech and Signal Pocessing (ICASSP), pp. 2133 2136. IEEE Pess, New Yok (2012). 5. LeCun, Y., Bose, B., Denke, J.S., Hendeson, D., Howad, R.E., Hubbad, W., Jackel, L.D.: Backpopagation applied to handwitten Zip code ecognition. Neual Computation, 1, 541 551 (1989) 6. Palm, R.B.: Pediction as a candidate fo leaning deep hieachical models of data. Technical Repot, Technical Univesity of Denmak (2012) 7. Ciesan, D.C., Meie, U., Gambadella, L.M., Schmidhube, J.: Deep big multilaye pecepton fo digit ecognition. In: Montavon, G. et al. (eds.) Neual Netwoks: Ticks of the Tade, 2 nd edn., LNCS 7700, pp. 581 598. Spinge, Heidelbeg (2012) 8. Raiko, T., Valpola, H., LeCun, Y.: Deep leaning made easie by linea tansfomations in peceptons. In: 15 th Intenational Confeence on Atificial Intelligence and Statistics (AISTATS), pp. 924 932. (2012) 9. Bouchain, D.: Chaacte ecognition using convolutional neual netwoks. Technical epot, Institute fo Neual Infomation Pocessing (2006)

10. LeÓn, G.M., Moeno-Báez, A., Magallanes-Quintana R., Valdez-Cepeda, R.D.: Assessment in subsets of MNIST handwitten digits and thei effect in the ecognition ate. Jounal of Patten Recognition Reseach, 2. 244 252 (2011) 11. LeCun, Y.: The MNIST database of handwitten digits, Available: http://yann.lecun.com/exdb/mnist/index.html 12. Wilson, K.G.: The enomalization goup and citical phenomena. Review of Moden Physics, 55, 583 600 (1983)