International Journal of Computer Engineering and Applications, Volume XII, Special Issue, September 18,

Similar documents
Real-time Object Detection CS 229 Course Project

Deep Learning Workshop. Nov. 20, 2015 Andrew Fishberg, Rowan Zellers

Seminars in Artifiial Intelligenie and Robotiis

YOLO9000: Better, Faster, Stronger

Unified, real-time object detection

Volumetric and Multi-View CNNs for Object Classification on 3D Data Supplementary Material

TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK

Recognize Complex Events from Static Images by Fusing Deep Channels Supplementary Materials

Channel Locality Block: A Variant of Squeeze-and-Excitation

Keras: Handwritten Digit Recognition using MNIST Dataset

Ryerson University CP8208. Soft Computing and Machine Intelligence. Naive Road-Detection using CNNS. Authors: Sarah Asiri - Domenic Curro

Finding Tiny Faces Supplementary Materials

Groupout: A Way to Regularize Deep Convolutional Neural Network

Multi-Glance Attention Models For Image Classification

Part Localization by Exploiting Deep Convolutional Networks

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Real Time Monitoring of CCTV Camera Images Using Object Detectors and Scene Classification for Retail and Surveillance Applications

ImageNet Classification with Deep Convolutional Neural Networks

Deep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University.

EFFECTIVE OBJECT DETECTION FROM TRAFFIC CAMERA VIDEOS. Honghui Shi, Zhichao Liu*, Yuchen Fan, Xinchao Wang, Thomas Huang

CS230: Lecture 3 Various Deep Learning Topics

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen

Facial Key Points Detection using Deep Convolutional Neural Network - NaimishNet

Computer Vision Lecture 16

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

Study of Residual Networks for Image Recognition

Computer Vision Lecture 16

Object Detection. TA : Young-geun Kim. Biostatistics Lab., Seoul National University. March-June, 2018

INTRODUCTION TO DEEP LEARNING

Convolutional Neural Network Layer Reordering for Acceleration

MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK. Wenjie Guan, YueXian Zou*, Xiaoqun Zhou

Application of Deep Learning Techniques in Satellite Telemetry Analysis.

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation

Deep Learning Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD.

Spatial Localization and Detection. Lecture 8-1

CEA LIST s participation to the Scalable Concept Image Annotation task of ImageCLEF 2015

Additive Manufacturing Defect Detection using Neural Networks

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Computer Vision Lecture 16

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Tiny ImageNet Visual Recognition Challenge

Convolution Neural Network for Traditional Chinese Calligraphy Recognition

Towards Real-Time Automatic Number Plate. Detection: Dots in the Search Space

Content-Based Image Recovery

Project 3 Q&A. Jonathan Krause

SELF SUPERVISED DEEP REPRESENTATION LEARNING FOR FINE-GRAINED BODY PART RECOGNITION

Lecture 5: Object Detection

Traffic Multiple Target Detection on YOLOv2

Object Detection Based on Deep Learning

Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab.

arxiv: v5 [cs.cv] 11 Dec 2018

Classification of objects from Video Data (Group 30)

Proceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong

Deep Face Recognition. Nathan Sun

Final Report: Smart Trash Net: Waste Localization and Classification

arxiv: v2 [cs.lg] 3 Dec 2018

Human Action Recognition Using CNN and BoW Methods Stanford University CS229 Machine Learning Spring 2016

Object Detection. CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016

Constrained Convolutional Neural Networks for Weakly Supervised Segmentation. Deepak Pathak, Philipp Krähenbühl and Trevor Darrell

HENet: A Highly Efficient Convolutional Neural. Networks Optimized for Accuracy, Speed and Storage

Comparison of Fine-tuning and Extension Strategies for Deep Convolutional Neural Networks

Supplementary material for Analyzing Filters Toward Efficient ConvNet

An Exploration of Computer Vision Techniques for Bird Species Classification

Advanced Machine Learning

Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Tutorial on Keras CAP ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY

Caffe tutorial. Seong Joon Oh

Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

arxiv: v1 [cs.cv] 6 Jul 2016

Vulnerability of machine learning models to adversarial examples

Deep Convolutional Neural Networks. Nov. 20th, 2015 Bruce Draper

RoI Pooling Based Fast Multi-Domain Convolutional Neural Networks for Visual Tracking

A Novel Representation and Pipeline for Object Detection

Video Gesture Recognition with RGB-D-S Data Based on 3D Convolutional Networks

All You Want To Know About CNNs. Yukun Zhu

CS489/698: Intro to ML

Object detection with CNNs

Deep Neural Networks:

Object Detection and Its Implementation on Android Devices

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

Code Mania Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python:

Additive Manufacturing Defect Detection using Neural Networks. James Ferguson May 16, 2016

Fine-tuning Pre-trained Large Scaled ImageNet model on smaller dataset for Detection task

Structured Prediction using Convolutional Neural Networks

Rotation Invariance Neural Network

Yiqi Yan. May 10, 2017

arxiv: v1 [cs.cv] 15 Oct 2018

Rich feature hierarchies for accurate object detection and semantic segmentation

arxiv: v1 [cs.cv] 15 Oct 2018

Elastic Neural Networks for Classification

Object Detection in Sports Videos

Machine Learning 13. week

Know your data - many types of networks

Application of Convolutional Neural Network for Image Classification on Pascal VOC Challenge 2012 dataset

arxiv: v1 [cs.cv] 31 Mar 2016

Rich feature hierarchies for accurate object detection and semantic segmentation

arxiv: v1 [cs.cv] 20 Dec 2016

3D Densely Convolutional Networks for Volumetric Segmentation. Toan Duc Bui, Jitae Shin, and Taesup Moon

Transcription:

REAL-TIME OBJECT DETECTION WITH CONVOLUTION NEURAL NETWORK USING KERAS Asmita Goswami [1], Lokesh Soni [2 ] Department of Information Technology [1] Jaipur Engineering College and Research Center Jaipur[2] {asmita.jaipur@gmail.com, sonilokesh24@gmail.com } ABSTRACT This paper presents how we can achieve the accuracy of classification, localization and detection of an object by using Convolutional Networks. We applied increasingly complex neural networks to simple images after which detection of multiple objects of varying shape and colour was enabled. We also predict object expected boundaries via mean squared error(mse) using a deep learning approach to localization by learning. In order to increase detection confidence bounding boxes are then accumulated rather than suppressed. We used adadelta as an optimizer which is basically standard stochastic gradient descent, but with an adaptive learning rate. We show that single shared network can learn different tasks simultaneously. 1) Introduction In the past years, there has been tremendous progress in the field of machine learning for addressing these difficult problems. A model which can achieve reasonable performance on the task of hard visual recognition of matching or exceeding human performance in certain domains is called a deep convolutional neural network[1, 2]. Images are easy to generate and handle, and they are easy to understand for human beings, but difficult for computers. Image analysis has always played a key role in the history of deep neural networks. Asmita Goswami and Lokesh Soni 1

REAL-TIME OBJECT DETECTION WITH CONVOLUTION NEURAL NETWORK USING KERAS Convolutional neural networks[12] are the state of the art technique for identifying objects that is image recognition. Until the emergence of convolutional neural networks, it has been difficult to implement object recognition using machine algorithms which is natural in humans. With boost and improved designs in CNN, annotated data set, by the availability of cheap computing power and enhanced techniques such as inception modules and skip connections, notably deeper models with more layers is enabled and have created models that challenge the accuracy of human in object identification. However, in terms of detection speed, however, even the best algorithms are still suffering from heavy computational cost. 2) Detection Approach a) Single Object Detection The neural network is a very simple feedforward network and a key element of this is the novel structure of the information processing system with one hidden layer (no convolutions). It predicts the parameters of the bounding box (i.e. the coordinates x and y of the lower left corner, the width w and the height h) with the input of the flattened image (i.e. 8 x 8 = 64 values). During training, simply regression of the predicted Dropout(0.2), Dense(4) ]) model.compile('adadelta', 'mse') The network is trained with 40k random images for 50 epochs (~1 minute on my laptop s CPU) and got almost perfect results. The predicted bounding boxes on the images above are as follows (they were held out during training): Figure 2: Predicted Bounding Boxes [2] The index plotted above each bounding box is called Intersection Over Union(IOU) and measures the overlap between the predicted and the real bounding box. It is calculated by dividing the area of intersection (pink in the image below) by the area of the union (blue in the image below). The IOU is between 0 (no overlap) and 1 (perfect overlap). (1) to the expected bounding boxes via mean squared error (MSE) is done. Adadelta is used as an optimizer which is basically standard stochastic gradient descent, but with an adaptive learning rate. It reduces a lot of time spent on hyperparameter optimization. Here s how the network is implemented in keras: model = Sequential([ Dense(200, input_dim=64), Activation('relu'), Figure 3: Intersection over union [3] b) Multiple Objects Detection Due to the formation of duplicate images in the centre from the usage of the single object detection method, each predicted bounding box is assigned to a rectangle during training. The predictors, then learn to narrow on certain locations or shapes of rectangles. Process the target vectors after every Asmita Goswami and Lokesh Soni 2

epoch in order to do this: For each training image, calculate the mean squared error(mse) between the prediction and the target A of bounding boxes in the target vector (i.e. x1, y1, w1, h1, x2, y2, w2, h2) and B) for the current order if the bounding boxes in the target vector are flipped (i.e. x2, y2, w2, h2, x1, y1, w1, h1). If the Mean squared error of A is greater than B, leave the target vector as is; if the MSE of B is greater than A, flip the vector. The algorithm for this is given by shape-detection[3]. The visualization of the flipping process is shown below: The network achieves a mean IOU of 0.5 on the training data. c) Classifying Objects To proceed further add triangles and classify whether an object is a rectangle or a triangle. The same network is used as above and just one value per bounding box is added to the target vector: 0 if the object is a rectangle, and 1 if it s a triangle (i.e. binary classification). Here are the results: Figure 6: Classification of objects [6] Red box means predicted is rectangle and yellow box means predicted is triangle. Figure 4: Flipping Process [4] Each row is a sample from the training set in the plot above. The epochs of the training process are from left to right. Black indicates that the target vector was flipped after this epoch, white corresponds to no flip. Most flips occur at the inception of the training when the predictors haven t specialized yet. 3) Experiments In this section, we benchmark our method of putting up Shapes, Colors, and Convolutional Neural Networks together. To bootstrap the images, we used the pycairo library[3 ], which can write RGB images and simple shapes to numpy arrays. We also made some modifications to the network itself, but let s first have a look at the results: If network is trained with flipping enabled, the following results are displayed (again on held-out test images): Figure 5: Predicted bounding boxes after flipping [5] Figure 7: Experimental Results [7] Asmita Goswami and Lokesh Soni 3

REAL-TIME OBJECT DETECTION WITH CONVOLUTION NEURAL NETWORK USING KERAS The mean IOU on the test dataset is around 0.4, which is not bad for recognizing three objects at once. The predicted shapes and colours (written above the bounding boxes) are almost perfect (test accuracy of 95 %). To assign the predictors to different objects (as we aimed for with the flipping trick), the network has really erudite. In comparison to the simple experiments above, we made three modifications 1) We used a convolutional neural network (CNN) [4] instead of a feedforward network. CNN's scan the image with learnable filters and extract more and more intellectual features at each layer. Filters in early layers may, for example, detect edges or colour gradients, while later layers may register complex shapes[5]. For the results achieved above, a network is trained with four convolutional and two pooling layers for a time period of about 30 40 minutes. Probably better results could be achieved by deeper/more optimized/longer trained network. 2) We didn t use a single (binary) value for classification, but one-hot vectors (0 everywhere, 1 at the index of the class). In particular, we used one vector per object to classify shape (rectangle, triangle or circle) and another one vector to classify colour (red, green or blue). Note that we added some random variation to the colours in the input images to see if the network can handle this. All in all, the target vector for an image consists of 10 values for each object (4 for the bounding box, 3 for the shape classification, and 3 for the colour classification). 3) We adapted the flipping algorithm to work with multiple bounding boxes. The algorithm calculates the mean squared error for all combinations of one predicted and one expected bounding box after each epoch. After which it takes the minimum among those values, assigns the predicted and the corresponding expected bounding boxes to each other, then the next smallest value is taken out of the boxes which are not assigned yet, and so on. 4) Conclusions Real-time object detection with convolutional Neural Network is an important basic research problem in computer vision and natural language processing that requires a system to do much more than task-specific algorithms, such as object recognition algorithm and object detection algorithm. Combination of recent technical innovations on deep learning makes us possible to re-design the feature extraction part of the Faster R-CNN framework to maximize the computational efficiency. As an example, we have presented an algorithm that can only predict a fixed number of bounding boxes per image. 5) References [1] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012. [2] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: the Convolutional architecture for fast feature embedding. arxiv:1408.5093, 2014. [3]https://github.com/jrieke/shape-detection/blob/master/t wo-rectangles-or-triangles.ipynb [4]https://cairographics.org/pycairo/ [5] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: the Convolutional architecture for fast feature embedding. arxiv:1408.5093, 2014. [6]Stanford s CS231n, Michael Nielsen s book. Deep Learning, book by Ian Goodfellow, Yoshua Bengio, and Aaron Courville [7]Ishan Misra, Ross Girshick, Rob Fergus, Martial Hebert, Abhinav Gupta, Laurens van der Maaten. Learning by Asking Questions. arxiv preprint 2017 Asmita Goswami and Lokesh Soni 4

[8]Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi. Computer Vision and Pattern Recognition (cs.cv). arxiv:1506.02640 [9] Kye-Hyeon Kim, Sanghoon Hong, Byungseok Roh, Yeongjae Cheon, and Minje Park. Deep but Lightweight Neural Networks for Real-time Object Detection. arxiv:1608.08021v3, 2016. [10] Mateusz Malinowski, Marcus Rohrbach, Mario Fritz. A Neural-based Approach to Answering Questions about Images. [11] William Koehrsen. Object Recognition with Google s Convolutional Neural Networks [12]Bowen Baker, Otkrist Gupta, Nikhil Naik & Ramesh Raskar. Designing Neural Network Architectures Using Reinforcement Learning. arxiv:1611.02167v2. 30 Nov 2016. Asmita Goswami and Lokesh Soni 5