Design of a Processing Structure of CNN Algorithm using Filter Buffers

Similar documents
Deep Learning Based Real-time Object Recognition System with Image Web Crawler

Switched by Input: Power Efficient Structure for RRAMbased Convolutional Neural Network

Implementation of a Pedestrian Detection Device based on CENTRIST for an Embedded Environment

Image Classification using Fast Learning Convolutional Neural Networks

Determination of the Parameter for Transformation of Local Geodetic System to the World Geodetic System using GNSS

A hardware design of optimized ORB algorithm with reduced hardware cost

A study on improvement of evaluation method on web accessibility automatic evaluation tool's <IMG> alternative texts based on OCR

A Study on the IoT Sensor Interaction Transmission System based on BigData

A Robust Hand Gesture Recognition Using Combined Moment Invariants in Hand Shape

Volume 6, Issue 12, December 2018 International Journal of Advance Research in Computer Science and Management Studies

A Preliminary Study on Daylighting Performance of Light Shelf according to the Depth of Space

Car License Plate Detection Based on Line Segments

Supporting Collaborative 3D Editing over Cloud Storage

Data Imbalance Problem solving for SMOTE Based Oversampling: Study on Fault Detection Prediction Model in Semiconductor Manufacturing Process

Big Data Service Combination for Efficient Energy Data Analytics

Improved MAC protocol for urgent data transmission in wireless healthcare monitoring sensor networks

The Design and Implementation of a BLE-based WebD2D Service for Android Smartphone

Study on the Signboard Region Detection in Natural Image

An Improvement of the Occlusion Detection Performance in Sequential Images Using Optical Flow

The Design of Real-time Display Screen Control Techniques for Mobile Devices 1

Research on Autonomic Control System Connection Goal-model and Fault-tree

A study on MAC protocol for urgent data transmission in Wireless Bio Signal Monitoring Environment

Improving Energy Efficiency of Block-Matching Motion Estimation Using Dynamic Partial Reconfiguration

Automatic Pipeline Generation by the Sequential Segmentation and Skelton Construction of Point Cloud

Common Service Discovery Scheme in IoT Environments

Design and Implementation of HTML5 based SVM for Integrating Runtime of Smart Devices and Web Environments

Post-Classification Change Detection of High Resolution Satellite Images Using AdaBoost Classifier

Deadlock-free XY-YX router for on-chip interconnection network

A Design of Building Group Management Service Framework for On-Going Commissioning

Scanline-based rendering of 2D vector graphics

An Efficient Provable Data Possession Scheme based on Counting Bloom Filter for Dynamic Data in the Cloud Storage

Introduction to Deep Learning for Facial Understanding Part III: Regional CNNs

A Spatial Point Pattern Analysis to Recognize Fail Bit Patterns in Semiconductor Manufacturing

Building Ubiquitous Computing Environment Using the Web of Things Platform

Segmentation-based Disparity Plane Fitting using PSO

3D Memory Formed of Unrepairable Memory Dice and Spare Layer

Applicability Estimation of Mobile Mapping. System for Road Management

An advanced data leakage detection system analyzing relations between data leak activity

CPU-GPU hybrid computing for feature extraction from video stream

A Vision Recognition Based Method for Web Data Extraction

Trajectory Planning for Mobile Robots with Considering Velocity Constraints on Xenomai

A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on

A Study on Architecture of CAN over 3GPP Gateway in Vehicle Network

A Design of Authentication Protocol for a Limited Mobile Network Environment

Robot localization method based on visual features and their geometric relationship

Deep Convolutional Neural Network using Triplet of Faces, Deep Ensemble, and Scorelevel Fusion for Face Recognition

Binary Convolutional Neural Network on RRAM

Perceptual Quality Improvement of Stereoscopic Images

Voice Annotation Technique for Reading-disabled People on Mobile System

A Virtual-Synchronized-File Based Privacy Protection System

efpga for Neural Network based Image Recognition

An Efficient Flow Table Management Scheme for SDNs Based On Flow Forwarding Paths

A Novel Image Super-resolution Reconstruction Algorithm based on Modified Sparse Representation

Byte Index Chunking Approach for Data Compression

TIOVX TI s OpenVX Implementation

A Personal Information Retrieval System in a Web Environment

VLSI design of a power-efficient object detector using PCANet

Pupil Center Detection Using Edge and Circle Characteristic

Network Intrusion Forensics System based on Collection and Preservation of Attack Evidence

Online Version Only. Book made by this file is ILLEGAL. Design and Implementation of Binary File Similarity Evaluation System. 1.

Multi-level Byte Index Chunking Mechanism for File Synchronization

Optimized Vehicular Traffic Flow Strategy using Content Centric Network based Azimuth Routing

An introduction to Machine Learning silicon

Deep Learning Requirements for Autonomous Vehicles

Lecture: Deep Convolutional Neural Networks

Data Mining Technology Based on Bayesian Network Structure Applied in Learning

Research on the Performance of JavaScript-based IoT Service Platform

Learning Binary Code with Deep Learning to Detect Software Weakness

Reducing FMR of Fingerprint Verification by Using the Partial Band of Similarity

Object Detection. Part1. Presenter: Dae-Yong

Fuzzy Set Theory in Computer Vision: Example 3

A Study on Development of Azimuth Angle Tracking Algorithm for Tracking-type Floating Photovoltaic System

Computer Architectures for Deep Learning. Ethan Dell and Daniyal Iqbal

REAL-TIME ROAD SIGNS RECOGNITION USING MOBILE GPU

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen

VISION FOR AUTOMOTIVE DRIVING

An Efficient Learning Scheme for Extreme Learning Machine and Its Application

Character Segmentation and Recognition Algorithm of Text Region in Steel Images

Design of Self-Adaptive System Observation over Internet of Things

MULTI-LEVEL 3D CONVOLUTIONAL NEURAL NETWORK FOR OBJECT RECOGNITION SAMBIT GHADAI XIAN LEE ADITYA BALU SOUMIK SARKAR ADARSH KRISHNAMURTHY

A study on accessibility to mobile websites - Centering on public institution mobile websites in Korea -

Delay Reduced MAC Protocol for Bio Signal Monitoring in the WBSN Environment

DEEP LEARNING WITH GPUS Maxim Milakov, Senior HPC DevTech Engineer, NVIDIA

Research Fellow, Korea Institute of Civil Engineering and Building Technology, Korea (*corresponding author) 2

M.Tech Student, Department of ECE, S.V. College of Engineering, Tirupati, India

How to Estimate the Energy Consumption of Deep Neural Networks

AttentionNet for Accurate Localization and Detection of Objects. (To appear in ICCV 2015)

An Energy Efficient Data Dissemination Algorithm for Wireless Sensor Networks

A threshold decision of the object image by using the smart tag

A Study of Open Middleware for Wireless Sensor Networks

SHIV SHAKTI International Journal in Multidisciplinary and Academic Research (SSIJMAR) Vol. 7, No. 2, April 2018 (ISSN )

C-Brain: A Deep Learning Accelerator

Analysis of Virtual Machine Scalability based on Queue Spinlock

Dynamic Monitoring Tool based on Vector Clocks for Multithread Programs

The Comparative Software Reliability Cost Model based on Generalized Goel-NHPP Model

A Study on HTML5 Web Standard Compliance of Korean Government Website utilizing Web Standard Validators

Remote Direct Storage Management for Exa-Scale Storage

An Analysis of Website Accessibility of Private Industries -Focusing on the Time for Compulsory Compliance with Web Accessibility Guidelines in Korea-

Dynamic Routing Between Capsules

A Distance-Based Classifier Using Dissimilarity Based on Class Conditional Probability and Within-Class Variation. Kwanyong Lee 1 and Hyeyoung Park 2

Transcription:

, pp.37-41 http://dx.doi.org/10.14257/astl.2016.129.08 Design of a Processing Structure of CNN Algorithm using Filter Buffers Kwan-Ho Lee 1, Jun-Mo Jeong 2, Jong-Joon Park 3 1 Dept. of Electronics and Computer Engineering, Seokyeong University 124, Seogyeong-ro, Seongbuk-gu, Seoul 02713, KOREA kwanho2@skuniv.ac.kr 2 Dept. of Electronic Engineering, Seokyeong University jjmo@skuniv.ac.kr 3 Dept. of Computer Science, Seokyeong University jong@skuniv.ac.kr Abstract. We proposed a structure of processing Convolution Neural Network(CNN) algorithm which applied Filter Buffers. Devices of smaller sizes and lower power consumptions are used in smart mobile devices or hardware used in an Intelligent Advanced Driver Assistance Systems. In these cases, it is needed to access external memory for a limited number of Process Elements. We propose the method of increasing the performance of processing by eliminating the unnecessary accesses of external memory. This is a useful algorithm for a parallel processing of an artificial neural network. It has been shown that the ratio of external memory accesses has decreased by 20%, and the processing performance has increased. Keywords: Convolution Neural Network, External Memory Access, Process Element, Filter Buffer, ADAS 1 Introduction Currently, there is much active research on pattern recognitions and computer vision techniques which is used in various areas of computer image processing. A lot of algorithms were used in various detecting characteristic points and recognitions. They were used in various areas such as smart mobile devices and in an Intelligent Advanced Driver Assistance System. These days, the main trend of image recognition uses Machine Learning with Deep Learning algorithm [1]. In real time processing areas such as an Intelligent Advanced Driver Assistance System and recognition of sign posts or of pedestrians, we need to make the system learn with appropriate data. We also need to make the system recognize sign posts and pedestrians, and learn constantly while the system is running. On the matter of repetitive recognition and learning in real time, unfortunately, there is a limit of processing according to the algorithm applied to the system. Therefore, we propose a new method ISSN: 2287-1233 ASTL Copyright 2016 SERSC

of accessing memory using parallel CNN (Convolution Neural Network) which is an artificial neural network to improve the performance of the system[2]. 2 Memory Access using Filter Buffers 2.1 Convolution Neural Network (CNN) Figure 1 shows a basic CNN algorithm[3]. The characteristics are generated by repeating the Convolution and Pooling steps for an n*n size image. After these steps, through the Full Connected steps and Softmax step, which is an activation function, we get the final classified images. Fig. 1. Interpreter execution model 1 A Convolution step produces m feature maps using m kinds of k*k size Filters(the feature map means a weight value in Deep Learning for a Filter). We can do more exact classification and recognition using m feature maps and can also increase the performance of the processing by using the parallel processing for these steps. But, in the case of a limited number of PE(Process Element)s, it is needed to read external memory repeatedly to process new input data for a number of k*k size filters for each time, and it makes the performance decrease[4]. 2.2 The suggested method of accessing memory The method of accessing memory using a Filter Buffer structure used in this paper, increases the performance of the process by preventing re-reading of the same previous Filter in the case of limited PEs. 1 38 Copyright 2016 SERSC

Figure 2 shows the number of the accesses of data and weights according to the number of nodes. Here, we assumed 4 PEs and the 3*3 size of the Filter to process a real 640*480(VGA) image. The number of memory access of weights has increased more than that of input data, according to the increase of the number of nodes, in the case of a limited number of PEs. Fig. 2. The number of memory accessing of data and weights according to an increase in the number of nodes Figure 3 shows the comparison of memory access when a Filter Buffer is applied, and when it is not applied in the structures of a Convolution Neural Network processing. The method of preventing external memory access using a temporal buffer memory increases the performance of processing, in the case of a small number of PEs and a larger number of nodes in a Convolution Neural Network. Fig. 3. The access of external memory using Filter Buffers Copyright 2016 SERSC 39

2.3 Experiments and Results We have shown that the suggested Filter Buffer structure applied to a CNN algorithm is very efficient. We applied the structure of SIMD(Single Instruction Multiple Data) architecture which is implemented by Verilog HDL. We also compared the simulation results of a 640*480 size of image processing by the increment of nodes and the following access of external memory of the Filter. The input data of the German Traffic Sign Recognition Benchmark(GTSRB) has been applied to the simulation structure. Figure 4 shows the comparison of the results of the Filter Buffer method and a conventional method in the number of accessing of external memory. Fig. 4. Comparison of the result 3 Conclusion We suggested a structure of processing CNN algorithm with Filter Buffer. When you increase the number of nodes in a finite number of PEs, a large number of accessing of external memory is needed, usually for the use of multi-filters. It has been shown that our Filter Buffer-applied-structure increases the performance of processing by 20%, more than that of a conventional structure using CNN algorithm, by decreasing the number of accessing of external memory. We expect a great increase in the performance of a system with much research about the recognition of images. This is achieved by solving the problems of the method of external memory accessing, using the algorithms of machine learning and deep learning. 40 Copyright 2016 SERSC

Acknowledgment. This work was supported by the Industrial Core Technology Development Program (10049192, Development of a smart automotive ADAS SW- SoC for a self-driving car) funded By the Ministry of Trade, industry & Energy and was supported by Seokyeong University in 2014. References 1. Deng, H., Stathopoulos, G., Suen, C. Y.: Applying Error-Correcting Output Coding to Enhance Convolutional Neural Network for Target Detection and Pattern Recognition. Pattern Recognition (ICPR), 2010 20th International Conference on, pp. 4291-4294 (2010) 2. Dawwd, S. A.: The multi 2D systolic design and implementation of Convolutional Neural Networks. Electronics, Circuits, and Systems (ICECS), 2013 IEEE 20th International Conference on, pp. 221-224 (2013) 3. HTML Standard, https://en.wikipedia.org/wiki/convolutional_neural_network 4. Gokhale, V., Jin, J., Dundar, A., Martini, B., Culurciello, Artini E., Culurciello, E.: A 240 G-ops/s Mobile Coprocessor for Deep Neural Networks. Computer Vision and Pattern Recognition Workshops (CVPRW), 2014 IEEE Conference on, pp. 696-701 (2014) 5. Ovtcharov, K., Ruwase, O., Kim, J. Y., Fowers, J., Strauss, K., Chung, E. S.: Accelerating Deep Convolutional Neural Networks Using Specialized Hardware. Microsoft Research, 2015 Copyright 2016 SERSC 41