Volume 6, Issue 12, December 2018 International Journal of Advance Research in Computer Science and Management Studies

Similar documents
Object Detection. CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR

Yiqi Yan. May 10, 2017

YOLO9000: Better, Faster, Stronger

Lecture 5: Object Detection

Object Detection Based on Deep Learning

Unified, real-time object detection

Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network

Automatic detection of books based on Faster R-CNN

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Object detection with CNNs

MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK. Wenjie Guan, YueXian Zou*, Xiaoqun Zhou

Final Report: Smart Trash Net: Waste Localization and Classification

Efficient Segmentation-Aided Text Detection For Intelligent Robots

SSD: Single Shot MultiBox Detector. Author: Wei Liu et al. Presenter: Siyu Jiang

Spatial Localization and Detection. Lecture 8-1

Introduction to Deep Learning for Facial Understanding Part III: Regional CNNs

Object Detection using Single Shot Detector for a Self-Driving Car

Real-time Object Detection CS 229 Course Project

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Object Detection. TA : Young-geun Kim. Biostatistics Lab., Seoul National University. March-June, 2018

Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab.

Deep learning for object detection. Slides from Svetlana Lazebnik and many others

Yield Estimation using faster R-CNN

Delivering Deep Learning to Mobile Devices via Offloading

Fish Species Likelihood Prediction. Kumari Deepshikha (1611MC03) Sequeira Ryan Thomas (1611CS13)

CS6501: Deep Learning for Visual Recognition. Object Detection I: RCNN, Fast-RCNN, Faster-RCNN

Pedestrian Detection based on Deep Fusion Network using Feature Correlation

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation

Visual features detection based on deep neural network in autonomous driving tasks

Hand Detection For Grab-and-Go Groceries

Feature-Fused SSD: Fast Detection for Small Objects

Andrei Polzounov (Universitat Politecnica de Catalunya, Barcelona, Spain), Artsiom Ablavatski (A*STAR Institute for Infocomm Research, Singapore),

DEEP NEURAL NETWORKS FOR OBJECT DETECTION

CIS680: Vision & Learning Assignment 2.b: RPN, Faster R-CNN and Mask R-CNN Due: Nov. 21, 2018 at 11:59 pm

Modern Convolutional Object Detectors

arxiv: v1 [cs.cv] 31 Mar 2016

arxiv: v1 [cs.cv] 30 Apr 2018

Instance-aware Semantic Segmentation via Multi-task Network Cascades

3 Object Detection. BVM 2018 Tutorial: Advanced Deep Learning Methods. Paul F. Jaeger, Division of Medical Image Computing

arxiv: v1 [cs.cv] 15 Oct 2018

Gender Classification Technique Based on Facial Features using Neural Network

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION

Mask R-CNN. Kaiming He, Georgia, Gkioxari, Piotr Dollar, Ross Girshick Presenters: Xiaokang Wang, Mengyao Shi Feb. 13, 2018

Object Detection in Sports Videos

R-FCN: OBJECT DETECTION VIA REGION-BASED FULLY CONVOLUTIONAL NETWORKS

Deep Learning for Object detection & localization

Design of a Processing Structure of CNN Algorithm using Filter Buffers

Classifying a specific image region using convolutional nets with an ROI mask as input

[Supplementary Material] Improving Occlusion and Hard Negative Handling for Single-Stage Pedestrian Detectors

Creating Affordable and Reliable Autonomous Vehicle Systems

Facial Keypoint Detection

EFFECTIVE OBJECT DETECTION FROM TRAFFIC CAMERA VIDEOS. Honghui Shi, Zhichao Liu*, Yuchen Fan, Xinchao Wang, Thomas Huang

Abandoned Luggage Detection

Object Detection. Part1. Presenter: Dae-Yong

Object Detection with YOLO on Artwork Dataset

A Comparison of CNN-based Face and Head Detectors for Real-Time Video Surveillance Applications

Mask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma

YOLO: You Only Look Once Unified Real-Time Object Detection. Presenter: Liyang Zhong Quan Zou

arxiv: v1 [cs.cv] 18 Jun 2017

Object Detection on Self-Driving Cars in China. Lingyun Li

Amodal and Panoptic Segmentation. Stephanie Liu, Andrew Zhou

arxiv: v1 [cs.cv] 26 May 2017

Classification of objects from Video Data (Group 30)

Deep Learning Based Real-time Object Recognition System with Image Web Crawler

arxiv: v1 [cs.ro] 28 Dec 2018

Traffic Multiple Target Detection on YOLOv2

YOLO 9000 TAEWAN KIM

Vehicle Classification on Low-resolution and Occluded images: A low-cost labeled dataset for augmentation

Towards Automatic Identification of Elephants in the Wild

Content-Based Image Recovery

SEMANTIC SEGMENTATION AVIRAM BAR HAIM & IRIS TAL

Mask R-CNN. By Kaiming He, Georgia Gkioxari, Piotr Dollar and Ross Girshick Presented By Aditya Sanghi

Learning Semantic Video Captioning using Data Generated with Grand Theft Auto

Convolutional Neural Networks: Applications and a short timeline. 7th Deep Learning Meetup Kornel Kis Vienna,

CNN BASED REGION PROPOSALS FOR EFFICIENT OBJECT DETECTION. Jawadul H. Bappy and Amit K. Roy-Chowdhury

Regionlet Object Detector with Hand-crafted and CNN Feature

R-FCN++: Towards Accurate Region-Based Fully Convolutional Networks for Object Detection

Using Faster-RCNN to Improve Shape Detection in LIDAR

CS230: Lecture 3 Various Deep Learning Topics

Learning to Segment Object Candidates

Driving dataset in the wild: Various driving scenes

TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK

Todo before next class

Cascade Region Regression for Robust Object Detection

CAP 6412 Advanced Computer Vision

Detection Method of Insulator Based on Single Shot MultiBox Detector

International Journal of Computer Engineering and Applications, Volume XII, Special Issue, September 18,

Traffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers

Joint Object Detection and Viewpoint Estimation using CNN features

An Efficient Approach for Color Pattern Matching Using Image Mining

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen

CEA LIST s participation to the Scalable Concept Image Annotation task of ImageCLEF 2015

WITH the recent advance [1] of deep convolutional neural

Deep Residual Learning

arxiv: v1 [cs.cv] 9 Dec 2018

arxiv: v1 [cs.cv] 13 Mar 2019

NORMALIZATION INDEXING BASED ENHANCED GROUPING K-MEAN ALGORITHM

Towards Real-Time Automatic Number Plate. Detection: Dots in the Search Space

Classification and Information Extraction for Complex and Nested Tabular Structures in Images

MCMOT: Multi-Class Multi-Object Tracking using Changing Point Detection

Transcription:

ISSN: 2321-7782 (Online) e-isjn: A4372-3114 Impact Factor: 7.327 Volume 6, Issue 12, December 2018 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com Object Detection and Classification in Globally Inclusive Images Using Yolo Swetha M S 1 Assistant Professor, Department of IS&E, BMS Institute of Technology & Management, Yelahanka, Bangalore -560064, Karnataka India. Vineetha M 2 Student, Department of IS&E, BMS Institute of Technology & Management, Yelahanka, Bangalore -560064, Karnataka India. Abstract: Object detection is the process of object recognition and classification. There are several Training sets available online for training an object detection model. But the models aren't trained to detect the same object from different geographical regions. This paper proposes a model that can detect globally inclusive images. The given model is created using YOLO Neural networks and it is initially trained using the Inclusive Images dataset. Furthermore, it is trained using a newly curated, more representative dataset. A machine learning model should be efficient and accurate even when it learns from imperfect data sources. This model shows an improvement in the accuracy when detecting geographically diverse images. Keywords: CNN, R-CNN, YOLO, SSD, Neural Networks, Object Detection, Bounding boxes. I. INTRODUCTION In recent years, with the rapid development of Machine learning and Artificial Intelligence a number of research areas with respect to these concepts have increased immensely. Along with this there is a continuous improvement of convolution neural networks and with this Computer vision algorithms and architecture has developed immensely. Vision starts with the eyes but all the computation happens in the brain. Thus, the principal target of computer vision and AI is detecting the object and their relation, action and intention. The biggest challenge, however, is that it needs training big data. In general, Artificial Intelligence technologies have have to learn thoroughly from the training set. That is, in order to recognize or detect an object from a video stream, image data should be used as training data. Furthermore, it requires very well-refined and well-represented dataset, for good training. The dataset it s compliant to be COCO, Pascal VOC and Image Net and also recently Google Images V4. The ultimate goal is that, through experimentation to strengthen the understanding of the model, and through the analysis of the results, learn the importance of targeted and inclusive datasets for deep learning. In addition, to this optimize the model for efficient utilization when integrated with the necessary system or application. Therefore, this paper proposes a system that efficiently detects objects belonging to different geographical regions. II. RELATED WORK Convolution Neural Networks [1] uses the concepts of deep learning and becomes the golden standard for image classification. But CNN can recognize an object in a single image. This implies that CNN can t detect different objects in a single image. Object detection not only includes recognizing whether specific object is present or not, but also determines the exact position of the target region. Furthermore, the objects should be separated by bounding boxes. IJARCSMS (www.ijarcsms.com), All Rights Reserved 6 P a g e

On the other hand, R-CNN uses selective search method to extract 2000 regions (region proposals) from the image. The problem with this algorithm is it cannot be implemented in real time operations as it is extremely slow. The R-CNN algorithm is typically selective search to get objects of interest in the scene. The Fast R-CNN [2] approach varies from the R-CNN as it applies this method of selective search on the entire image instead of selecting one region at a time. Faster R-CNN [3] method came up was object detection algorithm that eliminates the selective search algorithm to perform the operation. Single Shot MultiBox Detector detects objects by applying feature maps based on CNN. The feature points are formed in a grid on the image and the bounding box which centred on each of the feature grid cell determines the presence of an objects. SSD [4] is faster than Faster R-CNN, but its accuracy is lower because limited number of the anchors. You Only Look Once (YOLO) [5] outperformed all the customary methodologies. YOLO engineering contains a single neural network which predicts bounding boxes and class probabilities particularly from full pictures in a single assessment. Since the whole detection pipeline is a single network and it will in general be improved and it can be used in real-time object detection. Just after this the second version of YOLO was released called YOLO9000 [6]. Although this trained the model for 9000 object categories it still was not sufficient to detect globally inclusive images. Therefore in order to get good performance and accurate results (especially related to globally inclusive images) with YOLO, it should be trained with the necessary targeted datasets. This paper proposes training the YOLO model with the Google s Inclusive Image dataset and newly curated more representative dataset. Fig 1 Architecture of YOLO III. METHODOLOGY The model is trained using the datasets Inclusive Images by Google and the newly curated inclusive images. The model itself is built using YOLO neural networks and it is trained using the Nvidia GTX 1080ti GPU. Rudimentary Algorithm for Object detection: STEP 1: A model or algorithm is utilized to produce regions of interest or region recommendations. These region recommendations are a substantial set of bounding boxes traversing the full picture (that is, an object localization segment). STEP 2: Secondly, visual features are separated for every one of the bounding boxes, they are assessed and it is resolved whether and which objects are available in the proposition dependent on visual features (i.e. an object order segment). STEP 3: In the last post-processing step, covering boxes are joined into a single bounding Box and the image are detected and it looks as given in fig 2. 2018, IJARCSMS All Rights Reserved ISSN: 2321-7782 (Online) Impact Factor: 7.327 e-isjn: A4372-3114 7 P a g e

A. Collection of data Fig. 2 Objects detected The data to input in this model are images. As mentioned previously the datasets being used are: Inclusive Images (subset of Google v4): This subset contains 1,743,042 training images with annotated bounding box around every object in the image. I is divided into the training and tuning set. Since there will be a great difference between the train and the test set, tuning plays a very important role. (Fig. 3) Newly curated Inclusive images: This dataset was developed by parsing the web for related images. After this was done the images were annotated. In the first step the dataset is annotated manually using Labelbox. And in the second step the annotations for the remaining samples is derived using a model which was trained with the first stage annotations. This is done after the following steps. Fig. 3 Examples of the Google Inclusive Image dataset 2018, IJARCSMS All Rights Reserved ISSN: 2321-7782 (Online) Impact Factor: 7.327 e-isjn: A4372-3114 8 P a g e

Fig. 4: Image Processing B. Image Processing [6] (Fig. 4) YOLO has 24 convolutional layers followed by 2 fully connected layers. Image acquisition: Two ways for creating a digital picture are with a digital camera or a Flatbed scanner. Most of these images need to go through the following stages. Pre-processing: This means to specifically evacuate the repetition present in the image without affecting the interest region of the image. This includes the accompanying two phases-image Resizing and Filtering. Vulnerabilities are brought into the picture in several ways. This results in data loss and even a noisy image. Filtration the process of removing this. Segmentation: This is the process of isolating the various interest regions in an image using techniques such as edge detection. Feature Extraction: Feature is the important portion of the image; hence it is essential to recognize them dependably. This is done in the first few convolutions as shown in the fig. Representation: This step involves changing the information to a frame appropriate for computer processing. Training the model: The model is trained with the datasets as mentioned above. YOLO predicts several bounding boxes for each grid in the image. During training we require only one bounding box predictor for each of the object. Therefore, one predictor will be assigned to be responsible for predicting an object based on the prediction that has the highest current IOU with the ground truth. Each predictor will be better at predicting certain sizes, aspect ratios, and basically the overall recall. During training of the model the objective is to optimize the loss function: Fig. 5: The loss function 2018, IJARCSMS All Rights Reserved ISSN: 2321-7782 (Online) Impact Factor: 7.327 e-isjn: A4372-3114 9 P a g e

Fig.6: The bounding box prediction stages IV. RESULTS The developed is way more accurate in 20% more number of trials compared to the YOLO model which is not trained using the Inclusive Images when the objects to be detected are from different geographical as regions( belonging to the regions shown in Fig.7). Fig 7: The geographical regions covered TP, FP, and FN respectively represent the number of samples that efficiently recognized the objects, the number of samples that identified another object as interest object, and the number of samples that identified the interest object is another object. 2018, IJARCSMS All Rights Reserved ISSN: 2321-7782 (Online) Impact Factor: 7.327 e-isjn: A4372-3114 10 P a g e

Pre-Trained YOLO YOLO trained with inclusive datasets Precision 96.82% 97.63% Recall 87.83% 89.28% IOU 74.37% 76.67% Fig. 9: Comparison of the two models V. CONCLUSION A dataset can greatly affect the efficiency of the training and the performance of a deep learning mode. The represented in this paper adopts improved image processing methods and effective targeted datasets in order to effectively detect globally inclusive objects i.e., the same object but with few differences in appearance because it s from different geographical region. This model can further be improved by creating a model that can detect inclusive images even when sufficient data is not available to train it. References 1. Jawadul H. Bappy and Amit K. Roy-Chowdhury, CNN based region proposals for efficient object detection, 2016 IEEE International Conference on Image Processing (ICIP), IEEE, 2016, pp.3658-3662. 2. Ross Girshick, Fast R-CNN, IEEE International Conference on Computer Vision, 2015, pp. 1440-1448. 3. Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun, Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks IEEE Transactions on Pattern Analysis and Machine Intelligence (Volume: 39, Issue: 6),2017, pp. 1137-1149. 4. Chengcheng Ning, Huajun Zhou, Yan Song, linhui Tang, INCEPTION SINGLE SHOT MULTIBOX DETECTOR FOR OBJECT DETECTION, Proceedings of the IEEE International Conference on Multimedia and Expo Workshops (ICMEW), 2017, pp. 549-554. 5. Joseph Redmon, Ali Farhadi, YOLO9000: Better, Faster, Stronger,IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 6517-6525 6. Hyeok-June Jeong, Kyeong-Sik Park, Young-Guk Ha Image Preprocessing for Efficient Training of YOLO Deep Learning Networks, IEEE International Conference on Big Data and Smart Computing, 2018, pp.635-637 7. Chanran Kim, Younkyoung Lee, Jong-II park, Diminishing unwanted Objects Based on Object Detection using Deep Learning and Image Inpainting, IEEE, 2018. 8. Swetha M S and Vallae Haritha, Object Detection using Single Shot Detector for a Self-Driving Car, in International Journal for Innovative Research in Science & Technology (IJIRST) VOLUME 5, ISSUE 7 December 2018 ISSN (online): 2349-6010 pp. 9. Swetha M S and Veena M Shellikeri2, Survey of Object Detection using Deep Neural Networks, in International Journal of Advanced Research in Computer and Communication Engineering Vol. 7, Issue 11, November 2018 ISSN (Online) 2278-1021 ISSN (Print) 2319-5940 pp.19-24 10. Swetha M S and Srishti Suman, Survey on Faster Region Convolution Neural Network for Object Detection, in International Journal on Future Revolution in Computer Science & Communication Engineering ISSN: 2454-4248 Volume: 4 Issue: 11 November 2018 pp. 79 85 11. Swetha M Dr.Thungamani M and,ankita Mishra, Enhancement of Performance Analysis in Anonymity MANET through Trust-Aware Routing Protocol. In the proceedings of, Volume 5, Issue 5, May 2017. 2018, IJARCSMS All Rights Reserved ISSN: 2321-7782 (Online) Impact Factor: 7.327 e-isjn: A4372-3114 11 P a g e