Design of a Processing Structure of CNN Algorithm using Filter Buffers

, pp.37-41 http://dx.doi.org/10.14257/astl.2016.129.08 Design of a Processing Structure of CNN Algorithm using Filter Buffers Kwan-Ho Lee 1, Jun-Mo Jeong 2, Jong-Joon Park 3 1 Dept. of Electronics and Computer Engineering, Seokyeong University 124, Seogyeong-ro, Seongbuk-gu, Seoul 02713, KOREA kwanho2@skuniv.ac.kr 2 Dept. of Electronic Engineering, Seokyeong University jjmo@skuniv.ac.kr 3 Dept. of Computer Science, Seokyeong University jong@skuniv.ac.kr Abstract. We proposed a structure of processing Convolution Neural Network(CNN) algorithm which applied Filter Buffers. Devices of smaller sizes and lower power consumptions are used in smart mobile devices or hardware used in an Intelligent Advanced Driver Assistance Systems. In these cases, it is needed to access external memory for a limited number of Process Elements. We propose the method of increasing the performance of processing by eliminating the unnecessary accesses of external memory. This is a useful algorithm for a parallel processing of an artificial neural network. It has been shown that the ratio of external memory accesses has decreased by 20%, and the processing performance has increased. Keywords: Convolution Neural Network, External Memory Access, Process Element, Filter Buffer, ADAS 1 Introduction Currently, there is much active research on pattern recognitions and computer vision techniques which is used in various areas of computer image processing. A lot of algorithms were used in various detecting characteristic points and recognitions. They were used in various areas such as smart mobile devices and in an Intelligent Advanced Driver Assistance System. These days, the main trend of image recognition uses Machine Learning with Deep Learning algorithm [1]. In real time processing areas such as an Intelligent Advanced Driver Assistance System and recognition of sign posts or of pedestrians, we need to make the system learn with appropriate data. We also need to make the system recognize sign posts and pedestrians, and learn constantly while the system is running. On the matter of repetitive recognition and learning in real time, unfortunately, there is a limit of processing according to the algorithm applied to the system. Therefore, we propose a new method ISSN: 2287-1233 ASTL Copyright 2016 SERSC

of accessing memory using parallel CNN (Convolution Neural Network) which is an artificial neural network to improve the performance of the system[2]. 2 Memory Access using Filter Buffers 2.1 Convolution Neural Network (CNN) Figure 1 shows a basic CNN algorithm[3]. The characteristics are generated by repeating the Convolution and Pooling steps for an n*n size image. After these steps, through the Full Connected steps and Softmax step, which is an activation function, we get the final classified images. Fig. 1. Interpreter execution model 1 A Convolution step produces m feature maps using m kinds of k*k size Filters(the feature map means a weight value in Deep Learning for a Filter). We can do more exact classification and recognition using m feature maps and can also increase the performance of the processing by using the parallel processing for these steps. But, in the case of a limited number of PE(Process Element)s, it is needed to read external memory repeatedly to process new input data for a number of k*k size filters for each time, and it makes the performance decrease[4]. 2.2 The suggested method of accessing memory The method of accessing memory using a Filter Buffer structure used in this paper, increases the performance of the process by preventing re-reading of the same previous Filter in the case of limited PEs. 1 38 Copyright 2016 SERSC

Figure 2 shows the number of the accesses of data and weights according to the number of nodes. Here, we assumed 4 PEs and the 3*3 size of the Filter to process a real 640*480(VGA) image. The number of memory access of weights has increased more than that of input data, according to the increase of the number of nodes, in the case of a limited number of PEs. Fig. 2. The number of memory accessing of data and weights according to an increase in the number of nodes Figure 3 shows the comparison of memory access when a Filter Buffer is applied, and when it is not applied in the structures of a Convolution Neural Network processing. The method of preventing external memory access using a temporal buffer memory increases the performance of processing, in the case of a small number of PEs and a larger number of nodes in a Convolution Neural Network. Fig. 3. The access of external memory using Filter Buffers Copyright 2016 SERSC 39

2.3 Experiments and Results We have shown that the suggested Filter Buffer structure applied to a CNN algorithm is very efficient. We applied the structure of SIMD(Single Instruction Multiple Data) architecture which is implemented by Verilog HDL. We also compared the simulation results of a 640*480 size of image processing by the increment of nodes and the following access of external memory of the Filter. The input data of the German Traffic Sign Recognition Benchmark(GTSRB) has been applied to the simulation structure. Figure 4 shows the comparison of the results of the Filter Buffer method and a conventional method in the number of accessing of external memory. Fig. 4. Comparison of the result 3 Conclusion We suggested a structure of processing CNN algorithm with Filter Buffer. When you increase the number of nodes in a finite number of PEs, a large number of accessing of external memory is needed, usually for the use of multi-filters. It has been shown that our Filter Buffer-applied-structure increases the performance of processing by 20%, more than that of a conventional structure using CNN algorithm, by decreasing the number of accessing of external memory. We expect a great increase in the performance of a system with much research about the recognition of images. This is achieved by solving the problems of the method of external memory accessing, using the algorithms of machine learning and deep learning. 40 Copyright 2016 SERSC

Acknowledgment. This work was supported by the Industrial Core Technology Development Program (10049192, Development of a smart automotive ADAS SW- SoC for a self-driving car) funded By the Ministry of Trade, industry & Energy and was supported by Seokyeong University in 2014. References 1. Deng, H., Stathopoulos, G., Suen, C. Y.: Applying Error-Correcting Output Coding to Enhance Convolutional Neural Network for Target Detection and Pattern Recognition. Pattern Recognition (ICPR), 2010 20th International Conference on, pp. 4291-4294 (2010) 2. Dawwd, S. A.: The multi 2D systolic design and implementation of Convolutional Neural Networks. Electronics, Circuits, and Systems (ICECS), 2013 IEEE 20th International Conference on, pp. 221-224 (2013) 3. HTML Standard, https://en.wikipedia.org/wiki/convolutional_neural_network 4. Gokhale, V., Jin, J., Dundar, A., Martini, B., Culurciello, Artini E., Culurciello, E.: A 240 G-ops/s Mobile Coprocessor for Deep Neural Networks. Computer Vision and Pattern Recognition Workshops (CVPRW), 2014 IEEE Conference on, pp. 696-701 (2014) 5. Ovtcharov, K., Ruwase, O., Kim, J. Y., Fowers, J., Strauss, K., Chung, E. S.: Accelerating Deep Convolutional Neural Networks Using Specialized Hardware. Microsoft Research, 2015 Copyright 2016 SERSC 41