Research of Image Recognition Algorithm Based on Depth Learning

208 4th World Conference on Control, Electroncs and Computer Engneerng (WCCECE 208) Research of Image Recognton Algorthm Based on Depth Learnng Zhang Jan, J Xnhao Zhejang Busness College, Hangzhou, Chna, 30000 Keywords: Depth Learnng, Image Recognton Algorthm, Deep eural etwork, Convoluton eural etwork Abstract: In recent years, deep learnng has become more and more wdely used n the felds of speech recognton, computer vson and onformatcs. Especally n the feld of mage recognton, deep learnng-based convolutonal neural networks show a strong techncal superorty that surpasses prevous machne vson methods. The tradtonal methods of machne vson have not been able to meet the practcalty and securty requrements of mage recognton n the g data era. At ths stage, deep learnng has become a hotspot n mage recognton technology. To ths end, the paper elaborates the background and basc deas of deep learnng, and analyzes common models of deep learnng: deep belef networks, algorthm prncples of convolutonal neural networks, and mage recognton effcency.. Introducton Wth the development of nformaton technology, the amount of mage nformaton s gettng larger and larger. Therefore, t s very mportant to extract the feature nformaton we need from the huge amount of mage nformaton and dentfy the mage accurately. year 2006. The deep belef network (DB) proposed by Hnton et al. Has become a research hotspot n the feld of mage recognton because of ts extraordnary alty of mathematcal representaton. Deep learnng can buld algorthm through deep neural network, so as to extract mage features herarchcally and get the alty of automatc learnng features through data tranng, whch has greatly promoted the development of mage recognton technology. In recent years, based on the depth of learnng mage recognton algorthm models contnue to emerge. Manly DB (Deep Belef etwork), C (Convolutonal eural etwork), R (Recurrent eural etworks). Practce has proved that convoluton neural network s the most deal structure n the current deep learnng structure, convoluton neural network n mage recognton s wdely used. Below, the text wll explan the basc dea of deep learnng. 2. Depth learnng n the feld of mage recognton applcatons Snce 2006, when G.E. Hnton et al. Proposed deep learnng, deep learnng has shown superor performance advantages n many felds and has become a research hotspot n the feld of Internet g data and artfcal ntellgence. In 202, Hnton's research team used a deep learnng model based on convolutonal neural networks to wn the Imageet Image Classfcaton Contest, reducng the error to 5.35%, compared wth the lowest error rate of 26.72% for the tradtonal computer vson method. Many researchers have begun to study deep learnng, especally for convolutonal neural network model for a large number of studes. At the same tme, deep learnng has also brought tremendous busness value. For example, some banks and enterprses started to provde onlne account openng servce and brush face payment servce, so that people can handle all knds of busness wthout leavng home, avodng the long wat. Mcrosoft developed a deep neural network to replace the tradtonal GMM (Gaussan Mxture Model) algorthm, to speech recognton has brought a new techncal framework, sgnfcantly reducng the error rate, and n 202 offcally launched the frst deep neural network-based voce search system, becomng one of the frst companes n the world to offer voce servces. Wth the advent of the era of g data, major Internet companes n the world, such as Badu and Google, are actvely studyng the applcaton of Copyrght (208) Francs Academc Press, UK 235

deep learnng. For example, the use of deep neural networks to reduce the computatonal complexty of feature numbers now enhances the performance of search advertsng systems. In addton, deep learnng also has many applcatons n LP (atural Language Processng), IR (Informaton Retreval) and vsual scene annotaton. 3. Research of Image Recognton Algorthm Based on Depth Learnng 3. Reconstructon of MIST Dgtal Images Based on Depth eural etworks MIST, a sub-database of the atonal Insttute of Standards and Technology's large data set, s a handwrtten dgtal lbrary contanng 0,000 test samples and 60,000 tranng samples consstng of 0 to 9 dgtal samples wth a resoluton of 28 * 28 These samples are not standard mage formats but are stored n the dx fle. The MIST dataset bascally does not requre data preprocessng, t can be used mmedately. As a result, many researchers have used MIST as the preferred database for studyng technques and pattern recognton methods. Shown n Fgure, the MIST data sets have been converted to the mage format part of the sample dagram. Fgure. A partal sample dagram of the MIST dataset that has been converted to mage format Reconstructon of samples n MIST dataset wth four - layer depth belef network usng RBM. Frstly, the 300-dmensonal effectve features of the mage are extracted, the parameters are adjusted, and the error of the ntal mage between the reconstructed mage and the nput network s reduced. Ths method can sgnfcantly reduce the dmensons of the mage, compress the data effcently, and reduce the space requred to store the mage. 3.2 MIST Dgtal Image Recognton Based on Convolutonal eural etwork 3.2. Transform layer The mage has a fxed feature, and part of the mage statstcs s the same as the other parts. In other words, the features learned n ths part can also be the same as the other part. Therefore, we can use the same learnng feature to handle the same mage anywhere. Convolutonal neural network s the use of ths feature of the mage to acheve weght sharng, thereby reducng the network parameters. We can regard the mage as a plane, and based on preservng the two-dmensonal characterstc of the mage, we can process the nput mage by two knds of transformatons: lnear transformaton and non-lnear transformaton. The above-mentoned non-lnear operaton refers to the so-called ncentve functon, shown n Fgure 2, where manly three knds of nonlnear. Fgure 2. onlnear functon 236

The left one s the sgmod functon. Accordng to the mage, we can see that the range of the nput value s (0, ), whch s a common nonlnear transformaton functon of neural network. It can well explan nearly fully saturated neurons. At present, less used n the applcaton, manly because the neuronal actvaton value n the vcnty of 0 or, the gradent of the regon s almost 0, when the back propagaton, the frst few layers of weght change s mnmal, f the ntal weght s too large, eurons wll soon be saturated, the network wll stop learnng. In addton, neurons handle data centers that are not zero, adversely affectng the dynamcs of the gradent declne. The mddle s a hyperbolc tangent functon, whch lmts the nput value wthn the range of (-, ), but there s also a saturaton problem. The frst one s a non-lnear correcton functon. Compared wth the sgmod functon and the hyperbolc tangent functon, the calculaton of the number of nonlnear correcton lnes s easer and accelerates the convergence of stochastc gradent descent. Therefore, t has become very popular n recent years, but t also has the dsadvantage that ReLU neurons are not actvated at any one data pont when large gradent values pass through ReLU neurons, so ReLU cells are more vulnerable. 3.2.2 Poolng layer Convoluton feature extracton dmenson s too hgh, and there s redundancy, not sutable for classfcaton, we must frst reduce the dmenson. You can aggregate features n dfferent locatons of the mage. For example, the average or maxmum value of a feature n a regon of an mage s calculated, so as to reduce the feature dmenson and make the network dffcult to overft. Ths type of aggregaton s called poolng. As shown n Fgure 3, poolng uses the local correlaton prncple of the mage to sample the feature map and reduces the amount of data to be processed whle retanng useful structural nformaton. Maxmum poolng Average poolng 4. Analyss of Algorthms Fgure 3. 2*2 two knds of pool type dagram In ths paper, C, a convolutonal neural network wth depth structure, s used to automatcally learn mage features and perform mage recognton. By comnng the feature extracton and classfer tranng n depth learnng, the readness of mage recognton can be sgnfcantly mproved. It should be ponted out that the tradtonal mage recognton algorthms need to preprocess mages to extract better mage features, and then select the approprate classfer for more features. The tradtonal mage recognton algorthm wth a lot of uncertanty, requres a wealth of experence, and vulnerable to human factors, resultng n accurate recognton s not hgh, lack of stalty. In addton, the tradtonal mage recognton algorthm also requres complex parameter adjustment, thus sgnfcantly ncreasng the tranng tme. The tradtonal mage recognton algorthms dfferent neural network C s the convoluton of the two-dmensonal mage can be drectly nput nto the network, neural network recognzes vsual patterns n the orgnal pcture, the requred preprocessng lttle affected by human factors Small, breakng the lmtatons of human desgn features. C, as an end-to-end learnng network, has an error rate of 0.84% on the recognton result and ts recognton performance s comparable to or even better than 237

the tradtonal method, whch proves the effectveness of C method. The value of the gradent n the mage recognton algorthm based on convolutonal neural network comes from the number of nput samples. Here we wll dscuss how to calculate the gradent. Wj b) Wj = b) The most prmtve method s based on the sample gradent, as shown n the above formula, f the sample s small, the program can run normally, but f the sample number s large, the runnng tme wll be very long, the computng speed s very busy, you need to consume A large number of computng resources, easly lead to nsuffcent hardware space resources. The random selecton of a sample update parameter s called SGD (Stochastc Gradent Descend), SGD wll lead to more serous cost penalty functon oscllaton, wll result n greater data error. Wj x, ) y Wj = x, ) y In ths paper, we use the Mn-batch SGD to update the parameters. For the gradent of the random sample calculator, the parameters are updated as follows: W j = = = W W j j x, y ) x, y ) The algorthm steps are as follows: for = : MaxIters (maxmum number of teratons) (a). SGD randomly samples; (b). Calculate the resduals; (c). Calculate the gradent usng the BP algorthm; (d). etwork Parameter Update: Update the network parameters wth the gradent calculated by the BP algorthm. Softmax regresson s an extenson of logstc regresson, whch s manly used to deal wth the second-class classfcaton problems, whle the softmax regresson can be used for mult-class classfcaton tasks mutually exclusve. The tranng set label can take k values and output a k-dmensonal vector for representng the probalty values that the sample belongs to k categores. 5. Deep Learnng n the feld of mage recognton Wth the ncreasng demand for mage recognton and the development of deep learnng, the applcaton of deep learnng wll become more and more wdespread and more ntellgent n the future feld of mage recognton. ow we look at the drecton of the development of deep learnng n the feld of mage recognton. 5. The model herarchy s more and more, the model structure s more complcated Depth learnng needs to model the characterstcs of the mage layer by layer. If the network model does not have enough depth, t wll lead to the exponental ncrease of the requred computng unt, and the mage recognton complexty s ncreased. By learnng the mult-layer mage features, the characterstcs of the network model are gettng more and more global, and the restored mages are more realstc. For example, Alexet acqured the Imageet Image Recognton Contest champon n 202 usng fve convolutonal layers, three pool layers, and two fully 238

connected layers. The network model used by GoogLeet to wn the ILSVRC Champonshp n 204 used 59 Convolutonal layer, 6 pool layers and 2 fully connected layers. The network model used by Mcrosoft's Reset n 205 uses 52 convolutonal layers. Ths shows that the depth and complexty of the model are ncreasng. 5.2 The scale of tranng data ncreased rapdly The complexty of depth learnng model s gettng hgher and hgher, more and more features are needed, whch requres larger tranng data to mprove the classfcaton or predcton accuracy n mage recognton. At present, the tranng data of deep learnng algorthms are usually n the scale of hundreds of thousands and mllons. For large enterprses such as Google and Badu, the tranng data scale has reached ten mllon or even hundreds of mllons of levels. So far, Imageet has collected more than 20,000 categores and a total of about 4.2 mllon mages, but stll cannot meet the rapdly ncreasng demand for tranng. 5.3 The accuracy of model recognton s gettng hgher and hgher Wth the contnuous development of the deep learnng model and contnuous mprovement, the accuracy and effcency of mage recognton have obvously mproved. Early R-C models took 3 seconds to process an mage on the GPU and 53 seconds to the CPU, yeldng an accuracy of 53.7% n recognton. The YOLO model of 206 acheves the recognton speed of 45FPS wthout losng the accuracy, and the model recognton effcency and detecton precson are greatly mproved. 6. Conclusons To sum up, wth the contnuous development of deep learnng, the development of mage recognton technology ushered n a leap-forward development. Deep learnng has effectvely promoted the development of mage recognton technology and promoted the research and applcaton of mage recognton technology by leaps and bounds. At present, all major scence and technology enterprses n the world are nvestng heavly n research on mage recognton as the man ssue. In the comng years, we can foresee that deep learnng wll enter a perod of rapd development n theory, algorthms and applcatons, and wll also have a profound mpact on the feld of mage recognton and other related felds. Ths paper frst descrbes the basc dea of deep learnng, and analyzes the method of reconstructng MIST dataset based on depth belef network n order to get the most effectve characterzaton of the mage set. By constructng a 5-layer convoluton neural network to recognze MIST mages, the performance of each mage recognton algorthm model based on depth learnng s contrasted. The deeper the network level, the more accurate the recognton of the essental features of mages. References [] Hnton G E, Osndero S, Teh Y W. A fast learnng algorthm for deep belef nets [J]. eural Computaton, 2006. [2] LeCun, Y., Bottou, L., Bengo, Y. & Haffner, P. Gradent-based learnng appled to document recognton. Proc. IEEE 86, 2278-2324 (998) [3] Yann, LeCun, Yoshua, Bengo, &, Geoffrey, Hnton. Revew of Deep Learnng [J]. ature, 205, (52): 436-444 [4] Krzhevsky, A., Sutskever, I. & Hnton, G. Imageet classfcaton wth deep convolutonal neural networks. In Proc. Advances n eural Informaton Processng Systems 25 090-098 (202) 239