Final Report: Smart Trash Net: Waste Localization and Classification

Similar documents
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Spatial Localization and Detection. Lecture 8-1

Object Detection. TA : Young-geun Kim. Biostatistics Lab., Seoul National University. March-June, 2018

Mask R-CNN. Kaiming He, Georgia, Gkioxari, Piotr Dollar, Ross Girshick Presenters: Xiaokang Wang, Mengyao Shi Feb. 13, 2018

Automatic detection of books based on Faster R-CNN

Deep learning for object detection. Slides from Svetlana Lazebnik and many others

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION

Yiqi Yan. May 10, 2017

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Mask R-CNN. By Kaiming He, Georgia Gkioxari, Piotr Dollar and Ross Girshick Presented By Aditya Sanghi

Object detection with CNNs

Visual features detection based on deep neural network in autonomous driving tasks

CIS680: Vision & Learning Assignment 2.b: RPN, Faster R-CNN and Mask R-CNN Due: Nov. 21, 2018 at 11:59 pm

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation

Object Detection on Self-Driving Cars in China. Lingyun Li

Object Detection Based on Deep Learning

Deep Learning for Object detection & localization

Object Detection. CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR

Lecture 5: Object Detection

YOLO9000: Better, Faster, Stronger

Vehicle Classification on Low-resolution and Occluded images: A low-cost labeled dataset for augmentation

Mask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma

Supplementary Material: Unconstrained Salient Object Detection via Proposal Subset Optimization

Real-time Object Detection CS 229 Course Project

Kaggle Data Science Bowl 2017 Technical Report

A CLOSER LOOK: SMALL OBJECT DETECTION IN FASTER R-CNN. Christian Eggert, Stephan Brehm, Anton Winschel, Dan Zecha, Rainer Lienhart

Deep Residual Learning

arxiv: v1 [cs.cv] 4 Jun 2015

Efficient Segmentation-Aided Text Detection For Intelligent Robots

Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network

Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab.

Instance-aware Semantic Segmentation via Multi-task Network Cascades

MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK. Wenjie Guan, YueXian Zou*, Xiaoqun Zhou

Classification of objects from Video Data (Group 30)

OBJECT DETECTION HYUNG IL KOO

A Novel Representation and Pipeline for Object Detection

Project 3 Q&A. Jonathan Krause

Unsupervised domain adaptation of deep object detectors

Fine-tuning Pre-trained Large Scaled ImageNet model on smaller dataset for Detection task

CS6501: Deep Learning for Visual Recognition. Object Detection I: RCNN, Fast-RCNN, Faster-RCNN

Yield Estimation using faster R-CNN

Volume 6, Issue 12, December 2018 International Journal of Advance Research in Computer Science and Management Studies

Modern Convolutional Object Detectors

Convolutional Neural Networks: Applications and a short timeline. 7th Deep Learning Meetup Kornel Kis Vienna,

An Exploration of Computer Vision Techniques for Bird Species Classification

Return of the Devil in the Details: Delving Deep into Convolutional Nets

AttentionNet for Accurate Localization and Detection of Objects. (To appear in ICCV 2015)

Todo before next class

Classifying a specific image region using convolutional nets with an ROI mask as input

Regionlet Object Detector with Hand-crafted and CNN Feature

Rich feature hierarchies for accurate object detection and semantic segmentation

R-FCN: Object Detection with Really - Friggin Convolutional Networks

arxiv: v1 [cs.cv] 26 Jul 2018

Cascade Region Regression for Robust Object Detection

Hand Detection For Grab-and-Go Groceries

Computer Vision Lecture 16

DeepProposals: Hunting Objects and Actions by Cascading Deep Convolutional Layers

Encoder-Decoder Networks for Semantic Segmentation. Sachin Mehta

FCHD: A fast and accurate head detector

Computer Vision Lecture 16

Study of Residual Networks for Image Recognition

arxiv: v1 [cs.cv] 15 Oct 2018

Gradient of the lower bound

Skin Lesion Attribute Detection for ISIC Using Mask-RCNN

MCMOT: Multi-Class Multi-Object Tracking using Changing Point Detection

arxiv: v1 [cs.cv] 26 May 2017

Rich feature hierarchies for accurate object detection and semantic segmentation

Fish Species Likelihood Prediction. Kumari Deepshikha (1611MC03) Sequeira Ryan Thomas (1611CS13)

Two-Stream Convolutional Networks for Action Recognition in Videos

A Study of Vehicle Detector Generalization on U.S. Highway

Traffic Multiple Target Detection on YOLOv2

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen

Joint Object Detection and Viewpoint Estimation using CNN features

Learning Deep Features for Visual Recognition

Optimizing Object Detection:

arxiv: v1 [cs.cv] 31 Mar 2016

Introduction to Deep Learning for Facial Understanding Part III: Regional CNNs

Unified, real-time object detection

Object Detection with Partial Occlusion Based on a Deformable Parts-Based Model

CAP 6412 Advanced Computer Vision

3 Object Detection. BVM 2018 Tutorial: Advanced Deep Learning Methods. Paul F. Jaeger, Division of Medical Image Computing

CNN BASED REGION PROPOSALS FOR EFFICIENT OBJECT DETECTION. Jawadul H. Bappy and Amit K. Roy-Chowdhury

[Supplementary Material] Improving Occlusion and Hard Negative Handling for Single-Stage Pedestrian Detectors

Rich feature hierarchies for accurate object detection and semant

Bounding Out-of-Sample Objects A weakly-supervised approach

R-FCN: OBJECT DETECTION VIA REGION-BASED FULLY CONVOLUTIONAL NETWORKS

CS230: Lecture 3 Various Deep Learning Topics

Object Detection and Its Implementation on Android Devices

Zero-shot learning with Fast RCNN and Mask RCNN through semantic attribute Mapping

EFFECTIVE OBJECT DETECTION FROM TRAFFIC CAMERA VIDEOS. Honghui Shi, Zhichao Liu*, Yuchen Fan, Xinchao Wang, Thomas Huang

Mimicking Very Efficient Network for Object Detection

Content-Based Image Recovery

Fully Convolutional Networks for Semantic Segmentation

arxiv: v3 [cs.cv] 2 Jun 2017

PT-NET: IMPROVE OBJECT AND FACE DETECTION VIA A PRE-TRAINED CNN MODEL

SSD: Single Shot MultiBox Detector. Author: Wei Liu et al. Presenter: Siyu Jiang

A Comparison of CNN-based Face and Head Detectors for Real-Time Video Surveillance Applications

Photo OCR ( )

Object Detection in Sports Videos

Transcription:

Final Report: Smart Trash Net: Waste Localization and Classification Oluwasanya Awe oawe@stanford.edu Robel Mengistu robel@stanford.edu December 15, 2017 Vikram Sreedhar vsreed@stanford.edu Abstract Given an image of a jumbled waste, we seek to categorize the different pieces of the waste into three categories: landfill, recycling, paper. This project utilizes Faster R-CNN to get region proposals and classify objects [6]. In this report, we will give an overview of our project and what we have done so far in terms of solving this problem. First, we define our waste sorting problem and present current research and solutions to similar object detection problems. We then give an outline of our proposed architecture and model for approaching our specific task, which entails using a fine-tuned Faster R-CNN. We will also describe the nature and generation of our dataset, as well as the results we achieved from our experiments. Lastly, we outline our next steps in terms of optimizing and improving upon our solution. 1 Introduction Americans produce more than 250 million tons of waste every year. According to Environmental Protection Agency, 75% of this waste is recyclable. However, currently, only 30% of it is recycled [1]. We want to increase this recycling rate by automating waste sorting. Given an image of jumbled trash/waste which contains two or more different pieces of waste of different types, we want to localize the image and classify the different forms of waste into three categories: recyclable, paper, and landfill. 1.1 Definition Our automated waste classifier takes in a 768 1024 image containing 2 or more pieces (objects) of waste on a white background. These pieces can be overlapping or non-overlaping and of different size on the white background. We will fine-tune a Faster R-CNN model pre-trained on PASCAL VOC dataset [2] [6]. Our fine-tuned model will produce anchors (region proposals) and classify objects into three classes: landfill, recycling, and paper. Our image dataset is generated by composing (stitching together) images in TrashNet dataset [5]. We will discuss this in detail later in this paper. 2 Related Work Waste Classification. Prior research on waste sorting was done by a previous CS 229 project group, Mindy Yang and Gary Thung [5], where they used a support vector machine (SVM), with scale-invariant feature transform (SIFT) features, and a convolutional neural network (CNN) to classify images of a single object of waste and classify them into six different categories: metal, paper, glass, plastic, trash, and cardboard. They achieved a 63% classification accuracy for the trained SVM and 22% classification accuracy for the CNN. However, their implementation involved classifying a single object image (single piece of trash) as opposed to a jumbled mix of waste. Deep Networks for region proposals and object detection. Faster R-CNN is a state-ofthe-art object detection network that include nearly cost-free Region Proposal Networks (RPNs) that share convolutional layers with state-of-the-art object detection networks (Fast R-CNN) [3] [6]. There is also Mask R-CNN that extends Faster R-CNN by adding a branch for predicting 1

segmentation masks on each Region of Interest (RoI), in parallel with the existing branch for classification and bounding box regression [4]. Since our problem does not require segmentation, we will use Faster R-CNN as a base model. 3 Methods and Architecture 3.1 Faster R-CNN Faster R-CNN has a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thereby allowing nearly cost-free region proposals. An RPN is a fully convolutional network that simultaneously predicts object bounds and objectness scores at each position. The RPN is trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. Figure 1: Faster RCNN Architecture based off [6] 4 Image Dataset Faster R-CNN requires a large image dataset of 10,000+ images for effective training. We could not find a jumbled waste images dataset of this size anywhere. Hence, we decided to augment Mindy Yang and Gary Thung s dataset [5] 2,500 images of single pieces of waste which they used for their trash classification project. We were able to obtain their approval and dataset upon request. Figure 2: Sample from the dataset showing a combination of waste items. 4.1 Data Preprocessing We wrote a python script that removed the background of each image in order to isolate a bounding box around each piece of waste. Next, we sorted the edited images based on the classes we were looking at, which were landfill, recycling, and paper. We then would use a 768 1024 plain white background and merge 2-6 pieces of trash onto this background at random locations on the white 2

space. Thus, we created new images with corresponding labels and the locations of each trash s bounding box within the white background. We were able to generate our 10,000 images of piles of trash and could always modify our code in order to merge even more pieces of trash onto one image. We split the (train, validation, test) data into roughly (6000, 2000, 2000) images respectively. 5 Experiments/Results/Discussion 5.1 Hyperparameters We worked on the following hyperparameters: 1. RPN Batch Size 2. Batch Size (Number of Regions of Interest) 3. Learning Rate 4. Model: RPN/SS + Fast R-CNN 5.2 Experiments For our baseline, we fine-tuned a pre-trained Faster R-CNN model by modifying the last fully connected layers clsscore and bboxpred. Hence, we utilized pre-trained lower level features. We trained on 2000 images containing only two objects each with a fairly even representation of every class across the different examples. Similar to the standard Faster R-CNN [6], we used mean average precision (map) to evaluate the performance of our model. The map equals to the integral over the precision-recall curve p(r) 1 0 p(r)dr. Where the precision-recall curve is determined by calculating the IoU (intersection of union) between the predicted and ground truth bounding boxes. We also use the same multi-task loss function described in Faster R-CNN paper [6]: L({p i }, {t i }) = 1 L cls (p i, p i ) + λ p i L reg (t i, t i ). N cls N reg i This is a sum of two normalized losses over anchors with indices, i, such that p i is the model s probability of an anchor representing an object, p i {0, 1} is the ground truth on whether or not the anchor is actually an object. t i is the coordinates of the model s proposed bounding box while t i is the ground truth of the coordinates. [6] give more detail on the loss functions used. 5.3 Results We tuned hyperparameter values for RPN Batch Size, Batch Size, and Learning Rate. Class Learning Rate (0.0012) Batch RPN Size (128) Batch Size (128) Landfill 0.514 0.725 0.722 Paper 0.433 0.596 0.596 Recycling 0.585 0.767 0.767 i Table 1: Validation AP scores from best hyperparameters tested After running our Faster R-CNN on our dataset using our proposed data split, we achieved a mean Average Precision (map) of 0.683 overall on classification of the trash images, in which we got the following AP values for the different classes: 3

Figure 3: Average map scores over the 3 classes for different learning rates Figure 4: Accuracy over the 3 classes for different RPN batch sizes Figure 5: Accuracy over the 3 classes for different batch sizes Figure 6: Sample model output without preprocessing Figure 7: Sample output after preprocessing 5.3.1 Categorizing Photos of Trash Given that we fine-tuned our model on an artificial dataset, we wanted to see if our model would still be able to identify waste items from arbitrary photos pooled from Stanford students. 5.4 Discussion We had hoped to get a higher accuracy for each category, however this is due to our baseline using a larger dataset from PASCAL VOC rather than our custom dataset. We also realized training our model took a lot of time and if we had more CPUs working in parallel, we would have been able to try out more hyperparameter optimizations. In general, we noticed that the paper category had the worst performance from our model. 4

Class AP of Optimal Hyperparameters Landfill 0.699 Paper 0.607 Recycling 0.744 map 0.683 We thought this could have been a bias issue with the number of examples trained on but there was a roughly even representation in our models. The only other plausible cause for this would be the fact that we used a white background which tends to have a similar color with members of the paper class. In terms of processing raw photos, in most cases it was only after preprocessing the image to look like the trained dataset were we able to get our model to return reasonable values. 5.5 Future Work Some of the key areas we hope to work on are: 1. Training from Scratch: Given that we can theoretically generate an infinite number of images with our original dataset, we have sufficient data to train the model from scratch. Time and resource permitting, we will train the model from scratch and use that instead of the pre-trained model. 2. Architecture: We have only used ZF Net [8] that has 5 convolutional layers and 3 fullyconnected layers, but will look into using the alternative pre-trained model based on VGG-16 model [7] that has 13 convolutional layers and 3 fully-connected layers. 3. Test on Real Images: We hope to test our model on real images of piles of trash in order to determine how our model does on the actual problem. 6 Contributions 1. Oluwasanya Awe: Worked on model set-up, fine-tuning, and milestone/final report writeup. Researched and refined the problem. 2. Robel Mengistu: Worked on model and gpu set-up, and milestone/final report writeup. Researched and refined the problem. 3. Vikram Sreedhar: Worked on dataset generation/pre-processing and milestone/final report writeup. Researched and refined the problem. References [1] Texas Natural Resource Conservation Commission et al. Municipal solid waste plan for Texas. Texas Natural Resource Conservation Commission, 1995. [2] Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. The pascal visual object classes (voc) challenge. International journal of computer vision, 88(2):303 338, 2010. [3] Ross B. Girshick. Fast r-cnn. 2015 IEEE International Conference on Computer Vision (ICCV), pages 1440 1448, 2015. [4] Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross B. Girshick. Mask R-CNN. CoRR, abs/1703.06870, 2017. [5] Gary Thung Mindy Yang. Classification of trash for recyclability status. CS229 Project Report 2016, 2016. 5

[6] Shaoqing Ren, Kaiming He, Ross B. Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39:1137 1149, 2015. [7] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arxiv preprint arxiv:1409.1556, 2014. [8] Matthew D Zeiler and Rob Fergus. Visualizing and understanding convolutional networks. In European conference on computer vision, pages 818 833. Springer, 2014. 6