TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK

Similar documents
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Deep learning for object detection. Slides from Svetlana Lazebnik and many others

Fine-tuning Pre-trained Large Scaled ImageNet model on smaller dataset for Detection task

Rich feature hierarchies for accurate object detection and semantic segmentation

Lecture 5: Object Detection

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Deformable Part Models

Object Detection Based on Deep Learning

THE SPEED-LIMIT SIGN DETECTION AND RECOGNITION SYSTEM

Recognize Complex Events from Static Images by Fusing Deep Channels Supplementary Materials

Object detection with CNNs

Object Tracking Algorithm based on Combination of Edge and Color Information

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation

GLASS OBJECT SEGMENTATION BY LABEL TRANSFER ON JOINT DEPTH AND APPEARANCE MANIFOLDS. Tao Wang, Xuming He, Nick Barnes

Spatial Localization and Detection. Lecture 8-1

Real-time Object Detection CS 229 Course Project

Semantic Segmentation

CEA LIST s participation to the Scalable Concept Image Annotation task of ImageCLEF 2015

A Deep Learning Framework for Authorship Classification of Paintings

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen

CS231A Course Project Final Report Sign Language Recognition with Unsupervised Feature Learning

Rich feature hierarchies for accurate object detection and semantic segmentation

Unified, real-time object detection

Volumetric and Multi-View CNNs for Object Classification on 3D Data Supplementary Material

International Journal of Computer Engineering and Applications, Volume XII, Special Issue, September 18,

Segmentation as Selective Search for Object Recognition in ILSVRC2011

Faceted Navigation for Browsing Large Video Collection

Towards Large-Scale Semantic Representations for Actionable Exploitation. Prof. Trevor Darrell UC Berkeley

Highlight detection with application to sweet pepper localization

Combining Selective Search Segmentation and Random Forest for Image Classification

Part Localization by Exploiting Deep Convolutional Networks

Ryerson University CP8208. Soft Computing and Machine Intelligence. Naive Road-Detection using CNNS. Authors: Sarah Asiri - Domenic Curro

Multi-Glance Attention Models For Image Classification

Ensemble of Bayesian Filters for Loop Closure Detection

CAP 6412 Advanced Computer Vision

Selective Search for Object Recognition

Learning with Side Information through Modality Hallucination

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION

Object Detection in Sports Videos

Computer Vision Lecture 16

Regionlet Object Detector with Hand-crafted and CNN Feature

Road Surface Traffic Sign Detection with Hybrid Region Proposal and Fast R-CNN

Su et al. Shape Descriptors - III

An Object Detection Algorithm based on Deformable Part Models with Bing Features Chunwei Li1, a and Youjun Bu1, b

Using RGB, Depth, and Thermal Data for Improved Hand Detection

Tri-modal Human Body Segmentation

Deep Neural Networks:

YOLO9000: Better, Faster, Stronger

CS 1674: Intro to Computer Vision. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh November 16, 2016

arxiv: v1 [cs.cv] 6 Jul 2016

Beyond Bags of Features

Skin and Face Detection

Encoder-Decoder Networks for Semantic Segmentation. Sachin Mehta

The Kinect Sensor. Luís Carriço FCUL 2014/15

Rich feature hierarchies for accurate object detection and semant

An Exploration of Computer Vision Techniques for Bird Species Classification

MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK. Wenjie Guan, YueXian Zou*, Xiaoqun Zhou

Visual features detection based on deep neural network in autonomous driving tasks

Pedestrian Detection with Deep Convolutional Neural Network

Project 3 Q&A. Jonathan Krause

Content-Based Image Recovery

Detection of a Single Hand Shape in the Foreground of Still Images

Comparison of Fine-tuning and Extension Strategies for Deep Convolutional Neural Networks

Discovering Visual Hierarchy through Unsupervised Learning Haider Razvi

Segmenting Objects in Weakly Labeled Videos

Yiqi Yan. May 10, 2017

An ICA based Approach for Complex Color Scene Text Binarization

Deep Face Recognition. Nathan Sun

Structured Prediction using Convolutional Neural Networks

Short Survey on Static Hand Gesture Recognition

Supplementary Material for Zoom and Learn: Generalizing Deep Stereo Matching to Novel Domains

Hand Posture Recognition Using Adaboost with SIFT for Human Robot Interaction

Texture Segmentation by Windowed Projection

Subspace Alignment Based Domain Adaptation for RCNN Detector

Saliency-guided Selective Magnification for Company Logo Detection

arxiv: v1 [cs.cv] 5 Oct 2015

RoI Pooling Based Fast Multi-Domain Convolutional Neural Networks for Visual Tracking

arxiv: v1 [cs.cv] 26 Jun 2017

CNN BASED REGION PROPOSALS FOR EFFICIENT OBJECT DETECTION. Jawadul H. Bappy and Amit K. Roy-Chowdhury

Industrial Technology Research Institute, Hsinchu, Taiwan, R.O.C ǂ

Efficient Segmentation-Aided Text Detection For Intelligent Robots

arxiv: v1 [cs.cv] 20 Dec 2016

arxiv: v1 [cs.cv] 31 Mar 2016

3D model classification using convolutional neural network

Cost-alleviative Learning for Deep Convolutional Neural Network-based Facial Part Labeling

Object Detection. CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR

STRUCTURAL EDGE LEARNING FOR 3-D RECONSTRUCTION FROM A SINGLE STILL IMAGE. Nan Hu. Stanford University Electrical Engineering

Automatic detection of books based on Faster R-CNN

Learning to Recognize Faces in Realistic Conditions

Cultural Event Recognition by Subregion Classification with Convolutional Neural Network

SELF SUPERVISED DEEP REPRESENTATION LEARNING FOR FINE-GRAINED BODY PART RECOGNITION

Detecting motion by means of 2D and 3D information

Improving Latent Fingerprint Matching Performance by Orientation Field Estimation using Localized Dictionaries

Frequent Inner-Class Approach: A Semi-supervised Learning Technique for One-shot Learning

Generic Face Alignment Using an Improved Active Shape Model

Traffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers

Recognition of Animal Skin Texture Attributes in the Wild. Amey Dharwadker (aap2174) Kai Zhang (kz2213)

arxiv: v1 [cs.cv] 6 Jul 2015

CS395T paper review. Indoor Segmentation and Support Inference from RGBD Images. Chao Jia Sep

A Modified Mean Shift Algorithm for Visual Object Tracking

Transcription:

TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK 1 Po-Jen Lai ( 賴柏任 ), 2 Chiou-Shann Fuh ( 傅楸善 ) 1 Dept. of Electrical Engineering, National Taiwan University, Taiwan 2 Dept. of Computer Science and Information Engineering, National Taiwan University, Taiwan E-mail: berln579@gmail.com, fuh@csie.ntu.edu.tw ABSTRACT In this paper, we aim to develop a transparent object detection algorithm which can detect the location of transparent objects in color image. Due to the special characteristics of transparent objects, very few vision methods exist to identify them. To achieve our goal, we try to use deep learning based method to recognize the transparent object in color image. Experimental results show that our algorithm can achieve good transparent object retrieval results. Keywords Transparent Object Detection; Convolutional Neural Network; Deep Learning 1. INTRODUCTION Transparent objects are very common in our environment, from our home and restaurants to laboratories. However, transparent materials are difficult to detect due to the appearances of transparent objects change over different backgrounds, their edges are implicit and contain strong highlights. In this paper, we explore algorithms for better transparent object detection. As for the outline of this paper, Section II describes the previous work have been done and briefly introduces of method used in this paper. Section III describes the proposed algorithm on better transparent object recognition. Section IV then proposes our improvement for better region proposal for detection. Finally, in section V, we evaluate the proposed approach and investigate performance in various scenes containing transparent objects. 2. RELATED WORK Transparent objects are hard to detect, so the research are not popular until 2003. Osadchy et al. [7] used specular highlights as a positive source of information to recognize shiny objects. But the process required a bright light source. McHenry et al. [8] proposed several features and characteristic of transparent object such as color similarity, blurring, overlay consistency, texture distortion and highlights. The method is effective to distinguish transparent objects. Although it successfully segment transparent object, the algorithms only adapted with non-cluttered scene without pose estimation. Fritz et al. [1] use an additive model of latent features to learned transparent local patch appearance. It successfully detects transparency in varying backgrounds too. Phillips et al. [9] provide a new idea to detect semitransparent objects by utilizing inverse perspective mapping. This method needs to capture more than one view and assumes that object is on a plane. The largest error of pose estimate was about 10.4 mm. There still have much more improvement for pose estimation. For pose estimation, Lysenkov et al. [10] detect transparent object by using Kinect sensor, while unknown depth information (shown as black area in depth image) is considered as transparent object. It proposed an algorithm to calculate poses of transparent objects. The improvement in [11] makes their algorithm be able to deal with overlapped instances and cluttered transparent objects. In recent years, deep neural network has been very popular recently and has been shown powerful in object recognition tasks. So we want to try using this model to find how to recognize transparent objects. [1] has already shown that there exists some common visual words which belong to transparent objects, thus suggested that we might be able to use caffe [2], a popular tool for deep neural network for recognizing transparent objects. However, to detect the location of transparent objects, merely using object classification module is not enough. To identify the locations of transparent objects in the image, we take

advantage of the technique called Regions with Convolutional Neural Network (R-CNN) [3], Fig.1 shows the results illustrating what R-CNN can do. Fig. 3: An illustration of selective search. Fig. 1: An example of results of R-CNN. 3. IMPLEMENTATION As a brief introduction, R-CNN is a state-of-the-art detector that classifies region proposals by a fine tuned Caffe model. The main idea of R-CNN is to do structured learning with lots of training examples so that in testing stage, we can assume that image is x, and the region is y, and score each hypothesis y to select the highest one as the output. With the training images, the convolutional neural network can learn what a correct region shown in Fig. 2. In this paper, training data used is the same as in [3]. The CNN is pre-trained by the dataset of ILSVRC 2012. And since beaker is a class of ILSVRC dataset, we use it to detect transparent object. Fig. 2: An training example of R-CNN. The first problem in the testing stage is to select the candidate regions for scoring, the techniques used in R- CNN is selective search [4]. [4] addresses the problem of generating possible object locations for use in object recognition. They introduce selective search which combines the strength of both an exhaustive search and segmentation. Like segmentation, they use the image structure to guide the sampling process. Like exhaustive search, the paper aim to capture all possible object locations. By using selective search, we don t need to exhaustively enumerate all possible regions for scoring. Recognizing transparent objects is quite specific. However, training a convolutional neural network needs a lot of training images. For example, in [5] they use 1.2 million training images. For this problem, we can reference the technique used in [3]. They use R-CNN to recognize the images in PASCAL VOC, but they use ImageNet to pre-train the CNN and fine-tune the network by data of PASCAL VOC. And they found that the performance is better than training the network only with data in ImageNet. As a result, we would like to use the same training method and assess the result. As for the neural network part, we would like to use the structure in [5], as shown in Fig 4. By adding SVMs in the final layer, we can get the R-CNN structure. Fig. 4: The structure of used neural network. 4. IMPROVEMENT FOR SELECTIVE SEARCH Selective search generates many candidate regions, but there are lots of candidates that are not transparent. These candidates might lower the speed dramatically. As a result, we want to use visual cues of transparent object to rule out regions that does not contain highlight. Here we use highlight and the color of transparent object. First, we use highlight because every transparent object contains highlight in a scene with light source. Second, one of the important features of transparent object is that the color tends to be similar on both side of the edge if the background of transparent object is not cluttered. Because the object is transparent, it presents the color of background, which is similar to the color around object. As for the highlight, in general, transparent object reflects light such that it produces multiple highlights on its surface. Various empirical models can be used to count the local highlight points on a surface, such as Phong model [6], which is commonly used in 3D computer graphics. To find the highlight, a way is to build hypothesized 3D shape and search through a large set of candidate image highlights [7]. By binary threshold, one can detect highlight regions of transparent

object, but threshold value is hard to be determined. We then used the method described in [8] as it is an efficient but useful method for searching highlight. The first step is to threshold image by value from 0 to 255, and estimate number of perimeter pixels for each image after threshold. The slope of the curve would significantly decrease at the value close to 255. To find the critical value of threshold, we produce first order polynomial to fit the straight line of perimeter curve as: 5. EXPERIMENT 5.1. Selective search To test if selective search can generate regions containing transparent object, we use the test data used in [1] as input. There are 14 images in total. Three of them are shown in Fig. 7. P = at + b (1) Variable T represents the threshold value. We fit the perimeter from threshold 255 to 0, iteratively. For each fit, we estimate the mean square error and plot another curve of error. Finally, we compare the slope of error curve from threshold 255 to 0. When there is a significant increase of slope, the value is the proper value of threshold. As shown in the Fig. 5, only highlights remain in image after threshold. Fig. 5: Highlight extraction of image. Next, to check if the colors on both sides are similar, we calculate the histogram of hue in HSV color space on both sides. After retrieving transparent candidate as shown in Fig 6(b), we can calculate the pixels around the candidate, the result is shown in Fig 6(c). Then we can calculate the hue histogram of pixels inside transparent candidate and pixels around the candidate. Fig 6(d) shows the histogram of the pixels inside candidate and pixels around candidate. To compare the similarity of the two histograms, we view these two histograms as vectors. By comparing the Euclidean distance, we are able to compare the similarity, if the distance is short, the colors on both sides are similar. Fig. 7: The images used for testing. As shown in Fig. 8, some proposals (in red rectangle) are good, but there are still many regions that are not transparent. As a result, we develop some algorithms Fig. 6: The result of computing color similarity.

described in section 4 to prune some regions that are not transparent.

Fig. 8: The result of selective search on one image. 5.2. Transparent Object Detection To test if our algorithm can detect transparent objects in color image, we use the 14 images as input and one of the results is shown in Fig. 10. As can be seen in Fig. 9, the red rectangle contains a transparent object and the label is beaker. Although there is a blue rectangle recognized, the label is axe, so it is not related to transparent object. The result shows that R-CNN can be used to recognize transparent objects in color image.

Fig. 9: The testing result. Other results are shown in Fig. 10. In Fig. 10, we only show rectangles contains beaker. As can be seen, in most of the cases, transparent object can be detected in red rectangle.

Fig. 10: The testing results. 7. CONCLUSION In this paper, we use the R-CNN to detect transparent objects in color image. To improve the efficiency of region proposal algorithm, we use some characteristics of transparent objects to rule out some region proposals. Moreover, the result shows that this algorithm can be used for transparent object detection. REFERENCES [1] M. Fritz, G. Bradski, S. Karayev, T. Darrell, and M. J. Black, "An additive latent feature model for transparent

object recognition," in Advances in Neural Information Processing Systems, 2009, pp. 558-566. [2] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, et al., "Caffe: Convolutional architecture for fast feature embedding," in Proceedings of the ACM International Conference on Multimedia, 2014, pp. 675-678. [3] R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation," in Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, 2014, pp. 580-587. [4] J. R. Uijlings, K. E. van de Sande, T. Gevers, and A. W. Smeulders, "Selective search for object recognition," International journal of computer vision, vol. 104, pp. 154-171, 2013. [5] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," in Advances in neural information processing systems, 2012, pp. 1097-1105. [6] B. Tuong-Phong, "Illumination for computer-generated images," University of Utah, pp. 29-51, 1973. [7] M. Osadchy, D. Jacobs, and R. Ramamoorthi, "Using specularities for recognition," in Computer Vision, 2003. Proceedings. Ninth IEEE International Conference on, 2003, pp. 1512-1519. [8] K. McHenry, J. Ponce, and D. Forsyth, "Finding glass," in Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, 2005, pp. 973-979. [9] C. J. Phillips, K. G. Derpanis, and K. Daniilidis, "A novel stereoscopic cue for figure-ground segregation of semitransparent objects," in Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on, 2011, pp. 1100-1107. [10] I. Lysenkov, V. Eruhimov, and G. Bradski, "Recognition and pose estimation of rigid transparent objects with a kinect sensor," Robotics, p. 273, 2013. [11] I. Lysenkov and V. Rabaud, "Pose estimation of rigid transparent objects in transparent clutter," in Robotics and Automation (ICRA), 2013 IEEE International Conference on, 2013, pp. 162-169.