Brand-Aware Fashion Clothing Search using CNN Feature Encoding and Re-ranking

Size: px
Start display at page:

Download "Brand-Aware Fashion Clothing Search using CNN Feature Encoding and Re-ranking"

Transcription

1 Brand-Aware Fashion Clothing Search using CNN Feature Encoding and Re-ranking Dipu Manandhar, Kim Hui Yap, Muhammet Bastan, Zhao Heng School of Electrical and Electronics Engineering Nanyang Technological University, Singapore Abstract Brand plays a significant role in fashion clothing. Consumers are brand conscious during the clothing search and purchase. Existing visual fashion search methods [1], [8], [17] [19] often do not explicitly consider the brand information such as logos. Brand logo in clothing images are quite small and often suffer various deformations, and hence pose a significant challenge for branded clothing search. In view of this, this paper presents a new Brand-Aware Fashion Search (BAFS) framework that explores the brand information during the visual search. We construct a new brand fashion dataset which consists of 10K images of branded clothing with trademark logos. The proposed framework first jointly detects the brand logo and clothing items in the images. Next, to extract the rich visual information from the clothing images, we propose a new deep feature encoding known as Principal Component Maximum Activation of Convolutions (PMAC) that leverages hierarchies of CNN activations. The PMAC feature aims to capture both lowlevel visual information and high-level global abstraction from the images. The proposed method further uses a brand-aware reranking technique to improve the search. Experiments conducted on the brand fashion dataset shows that the proposed framework achieves superior performance to other comparative methods. I. I NTRODUCTION In recent years, e-commerce and online shopping on clothing [1], [2] have been growing in various commercial platforms like ebay, Amazon, Taobao etc. In clothing fashion, the brand is an integral part of fashion. Brands often reflect customers self-image and personality. Customers are often brand conscious and hence the brand information plays an important role shopping. Consider a case where a user would like to search a branded clothing item. Fig. 1 shows clothing retrieval examples using off-the-shelf CNN features extracted from VGG16-Net [3]. The images in the first column show the query images and corresponding retrieved images are shown in the respective rows. For instance, in the first row, the customer wants to search for a black Adidas hoodie. Although the search has retrieved visually similar images, the results are not brand aware. In this scenario, the hoodies from other brands such as Puma, Ferrari etc. are also retrieved in the top ranks. In this work, we will explore the issue of clothing retrieval using deep learning framework, which leverages brand information during the visual search. The clothing retrieval has a wide range of commercial applications [4] [8]. The advantage of the proposed BAFS is that it considers the users brand preference and incorporate it into the search process during Fig. 1. Example of branded clothing search. Relevant and irrelevant retrieved images are shown with green border and red border respectively online shopping. In addition, the proposed method will also help the fashion companies to promote their brands [9]. Clothing retrieval is emerging as a popular topic in recent years. The earlier works [6], [7], [10], [11] used handcrafted features to describe the clothing attributes and perform the search. In recent years, advances in convolutional neural networks (CNN) and deep learning [12] have proven to be powerful techniques for various vision-related tasks such as image classification [13], detection [14], [15] etc. Fashion retrieval using deep learning is developed in [5], [8], [16] [19]. Features from fined-tuned CNN architectures have been used for clothing recognition and retrieval in [16] [18]. Kaipour et. al. [8] and Wang et. al. [5] have explored the problem of matching street images to online shop images using pretrained network features. Recently, Liu et. al. [19] proposed FashionNet which jointly uses features pooled from clothing landmarks and fully connected layer to perform fashion retrieval. Existing visual search methods [2], [8], [16] [18], [20] do not encode the brand information during the search. Moreover, these methods mainly rely on features extracted from fully connected layers of CNN networks, which provide global representations of images. Thus, local information and small objects are not well encoded in these global image signatures. Hence, existing methods cannot integrate brand information effectively, which plays a key role in clothing retrieval. In view of this, we propose a new brand-aware fashion search using robust features from CNN. The main contribution of this work is threefold. First, we construct a new brand fashion clothing dataset where images belong to different brands are characterized by different trademark logo. We explore the importance of brand information in clothing retrieval. Second,

2 we propose a new PMAC feature which captures the hierarchical visual information learned by different convolutional layers. We show that PMAC features are effective for instance retrieval over traditional FC layer features or single layer features. Third, we propose a clothing re-ranking engine based on the available clothing and brand information. The engine shows good performance improvement in fashion search. Overall, the proposed fashion search framework outperforms other state-of-the-art methods by 16% map. II. OVERVIEW OF THE PROPOSED METHOD The proposed fashion search framework is given in Fig. 2. Given a query image, the framework first detects the clothing and brand logo in the image. Next, using the detected clothing region, we extract the PMAC features from activations of different CNN layers. We compute the similarity score between the query and database images. The initial filtered images are then passed through a brand-aware re-ranking engine and the system returns the relevant images to the user. III. DATASET CONSTRUCTION We constructed a new Brand Fashion dataset which contains images of various clothing categories with brand information centered on trademark logo. We collect clothing images from 15 popular fashion brands such as Adidas, Puma, Nike etc. We will make the dataset publicly available in the near future. A. Image Collection and Cleaning We crawled the images from Google Images 1 using relevant search keywords. For each brand, we create a pool of keywords such as Demin Versace Hoodie, Playboy Tee, Floral Adidas Tank etc. to query the website. For 15 brands, around 150K images were downloaded. Irrelevant images are then removed by human screening. We also remove single channel and lowresolution images which are less than pixels. Next, to remove redundant and highly similar images, we use the FC7 response from VGGNet [3]. For each brand, pairwise image similarities are computed using FC7 features. For those images with similarity scores greater a threshold (thres = 0.97), we keep one copy and remove the rest. At the end of this step, the dataset contains 9,498 clean and relevant images. 1 Fig. 2. Proposed brand-aware fashion framework B. Annotation The images in dataset contain two types of annotations namely, clothing item category and brand logo information. We use the same fine-grained clothing categories as Deep- Fashion dataset [19]. Each image is first labeled with clothing category together with the bounding box coordinates. Next, all the logos in the dataset are annotated with the brand information and corresponding bounding boxes. Interactive annotation tools have been developed to support fast annotation. IV. PROPOSED BRAND-AWARE FASHION SEARCH (BAFS) FRAMEWORK The proposed fashion search framework consists of three main modules which are described in the following sections. A. Clothing and Brand Logo Detector The proposed framework simultaneously detects the existence of clothing items and brand logo in images. We develop a joint clothing and logo detector using Faster-RCNN [14]. The detector contains two sub-networks namely, Region Proposal Network (RPN) and Fast-RCNN which shares common convolutional layers for efficiency. In order to generate the region proposals for clothing and logos, RPN slides the predefined anchor boxes of different scales and aspect ratios over last convolution layer. In this paper, in order to target small brand logo, we sample the anchors with 4 scales as opposed to 3 scales used in [14]. Our experiments show that using finer scales of region proposals improves logo detection performance by an map of 8%. The RPN is trained to generate proposals using a multi-loss function in (1). L(p i, b i ) = L cls (p i, p i ) + λ p i L reg (b i, b i ) (1) The first term in (1) refers to soft-max loss, where p i and p i are the predicted probability for anchor i being an object and the grouth-truth label respectively. The value of p i {0, 1} is based on Intersection-over-Union (IoU) of anchors with ground-truth box {IoU < 0.3, IoU > 0.7}. The second term is activated only for true anchors (p i = 1) and it represents the regression loss for bounding box prediction, where b i and b i are predicted and ground-truth box coordinates. Next, the proposals obtained for clothing and logos are used to pool the features from the last convolutional layer and then

3 used to classify them into clothing and brand logos. We use a 4-step alternation method [14] to train the detector network. Fig. 3 shows examples of joint detection for clothing and logo from brand fashion dataset. B. PMAC Feature Extraction Several previous works have exploited the FC layer [2], [16], [17], [20] [23] features and pooled features from convolutional layers [24], [25] for instance image retrieval. However, these features do not encode hierarchy of information required for instance search. In view of this, we propose a new Principal Component Maximum Activation of Convolutions (PMAC) feature encoding which leverages both low-level and high-level abstractions to extract rich features for retrieval. In order to extract the PMAC features from images, we first operate on the n activations of convolutional layers L = {l i } n i=1 of the network. For a particular layer l, we pool feature from its 3D activation maps of W (l) H (l) K (l) dimensions, where K (l) is the number of filters in layer l and W (l) H (l) represents the spatial region R (l) i, i = {1, 2,, K} of the feature maps. For each feature map R (l) i, we spatially pool the { maximum activation of the map to construct a feature f l = f l 1, f2, l, fi l,, f } K l for layer l. f (l) i = max R (l) i, where, i = 1,..., K (2) Next, we concatenate the features from multiple layers to generate a single descriptor F= [f l ], l {1,, n}. The concatenation strategy is a simple yet effective way for image representation. Next, in order to extract discriminative information, the features Fare projected to a new subspace using PCA and whitening. Mathematically, we construct k-dimensional PMAC feature vector X = {x 1, x 2,, x i,, x k }. Each component x i of the vector is computed using (3). x i = 1 U T F i. (3) λi where U represents the transformation matrix formed by k principal eigenvectors and λ i represents the corresponding eigenvalues. We will demonstrate the effectiveness of the proposed PMAC in clothing retrieval in Section V-B. The similarity between the query image and database image is computed using the cosine similarity (4) of the PMAC representations q and D respectively. Sim(q, D) = (q T D)/( q 2 D 2 ) (4) C. Brand-Aware Re-ranking Although the PMAC image encoding captures both the lowlevel details and high-level abstraction of the images, it does not prioritize the brand information. It is because brand logo information generally tends to cover a small area in images, thus visual information from such small objects is not captured well into the descriptor. The proposed framework thus uses a brand-aware re-ranking engine to rank the initial retrieval shortlist. The engine is a two-stage re-ranking system which uses the information from brand and clothing detection. First, we retain the images which belong to the same brand as the Fig. 3. Sample images from branded fashion dataset with clothing and logo detection detected brand in the query images. Second, similar filtering is performed based on detected clothing category. We show the re-ranking engine significantly improve the retrieval results. A. Experimental Setting V. EXPERIMENTS AND RESULTS The clothing retrieval experiments are conducted on the Brand Fashion dataset which contains 9,498 clothing images. In the experiments, we use 50 query images. For each query, the ground-truths are images of the same clothing item in the database. For each query, Average Precision (AP) is calculated using Precision-Recall curve. The APs of all the query images are averaged to obtain the mean Average Precision (map) which is used as the performance metric. To demonstrate the effectiveness of the proposed method, we use the ZF-Net as the backbone CNN network as it is a light weight model with fewer parameters, when compared to other architectures such as Alexnet, VGGNet and ResNet. We employ transfer learning by fine-tuning the pre-trained network on ImageNet. The algorithms are implemented using Caffe [26] framework. B. Results and Discussion 1) Joint Logo and Clothing Detector: The joint logo and clothing detector achieves 96% and 98% detection accuracy for logo and clothing in the query images. Examples of joint detection for logo and clothing is shown in Fig. 3. From the figure, we can observe that the detector can simultaneously detect various logo and clothing items in the fashion images under different scales, deformation and background clutter. In the first image in Fig. 3, the proposed detector is able to detect the clothing item i.e. Hoodie and the brand Adidas with high confidence as shown in blue and red boxes respectively. Similarly, in other images, clothing items and logos are accurately detected despite their large variations in scale, orientation and location. These detected clothing boxes and logo information are incorporated into the subsequent search during feature extraction and clothing re-ranking. 2) Retrieval with PMAC encoding and Brand-Aware Reranking: This section first presents analysis on the choice of convolution layers used for PMAC encoding. Next, the performance of the proposed BAFS method is discussed. Table I shows the retrieval performance using different convolutional layers and their combinations. It is observed that the early layers achieve poor performance as early features are too generic which only represent low-level visual information such as edge, color, blobs etc. [27]. We observe that features

4 from the third layer to the last layer provides relatively similar performance ( 30%). Although all of these layers provide similar performance, they encode different level of information. Therefore, in order to exploit rich hierarchy of feature information, we explore various ways to combine layers for feature representation. The combined use of three penultimate layers of the network shows the best performance ( 35%). Hence, we choose these layers for feature embedding as described in (2) and (3) in our experiments. TABLE I RETRIEVAL PERFORMANCE (MAP) VS. CONVOLUTIONAL LAYERS USED FOR FEATURE ENCODING Single Layer map Multi-Layers map conv1 8.9 conv3 + conv conv conv3 + conv conv conv4 + conv conv conv3 + conv4 + conv conv The retrieval performance of the proposed method is shown in Table II. It shows the performance comparison between various features used and different re-ranking steps. The initial retrieval using direct concatenated feature vector achieves an map of 34.8%. The retrieval using the proposed PMAC feature outperforms this by achieving an map of 45.9% (a gain of 11%). Although the direct concatenated features try to capture multi-layer feature information, it contains significant redundant information. In contrast, the proposed PMAC extract discriminative features from various layers which capture low-to-high level visual information. It also uses PCA to further project the high dimensional feature into meaningful subspace. The initial performance of the proposed PMAC is further improved by brand-aware re-ranking to 51.9% and 53.6% using clothing and logo re-ranking. The experimental results clearly show the advantage of PMAC feature encoding and brand-aware re-ranking. TABLE II RETRIEVAL PERFORMANCES OF THE PROPOSED BAFS FRAMEWORK Feature Used Initial Clothing Logo Retrieval Re-ranking Re-ranking Direct Concatenated PMAC ) Comparison with other State-of-the-art Methods: We compare our proposed method with other state-of-the-art works [16], [17], [20] which use various fully connected layer features from CNN networks after clothing detection. Table III shows the performance comparison of the proposed BAFS method with other three methods. From the table, it is observed that the baseline method achieves the lowest performance as it directly uses features from the pre-trained networks which are not well adapted to the clothing domain. The R-MAC [24] method uses features pooled from the last convolutional layer and achieves an map of 30.03%. The other methods [17] and [20] uses features extracted from the finetuned network on domain dataset and achieves 33.17% and 37.11% respectively. All of these methods rely on the features extracted from the single layer of the network which often does not capture the full range of information which is crucial for clothing instance search. Moreover, they do not effectively incorporate the brand information during the search. Compared to this, the proposed BAFS method uses a rich hierarchy of CNN features and incorporate the brand information, and hence it achieves an map of 53.6% which clearly outperforms other state-of-the-art methods. The qualitative analysis on the retrieval performance of the proposed method and state-ofthe-art methods is shown in Fig. 4, which demonstrates the effectiveness of the proposed BAFS framework. TABLE III COMPARISON OF THE PROPOSED METHOD WITH OTHER METHODS Method map Baseline Methods 1. VGG16 - FC Alexnet - FC R-MAC [24] Rapid-Clothing [17] Visual-Search@Pinterest [20] Proposed method Fig. 4. Qualitative comparison of retrieval results of various methods. Retrieved relevant and irrelevant images are shown with green border and red border respectively. VI. CONCLUSION This paper proposes a new brand-aware fashion search framework. We introduce a new brand fashion dataset which consists of 10K images of branded fashion clothing images. A joint detection of the clothing items and logo is employed to extract the clothing and brand information. A new feature encoding method, PMAC is proposed which captures a hierarchy of low-level to high-level information into the feature descriptor. A brand-aware re-ranking engine is also proposed to improve the visual clothing search. The experimental results clearly show the effectiveness of the proposed method. VII. ACKNOWLEDGMENT This research was carried out at the Rapid-Rich Object Search (ROSE) Lab at the Nanyang Technological University, Singapore. The ROSE Lab is supported by the Infocomm Media Development Authority, Singapore. We gratefully acknowledge the support of NVIDIA AI Technology Center for their donation of a K40m GPU used for our research at the ROSE Lab.

5 REFERENCES [1] D. Shankar, S. Narumanchi, H. Ananya, P. Kompalli, and K. Chaudhury, Deep learning based large scale visual recommendation and search for e-commerce, arxiv preprint arxiv: , [2] F. Yang, A. Kale, Y. Bubnov, L. Stein, Q. Wang, H. Kiapour, and R. Piramuthu, Visual search at ebay, arxiv preprint arxiv: , [3] K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, arxiv: , [4] S. Bell and K. Bala, Learning visual similarity for product design with convolutional neural networks, ACM Transactions on Graphics (TOG), vol. 34, no. 4, p. 98, [5] X. Wang, Z. Sun, W. Zhang, Y. Zhou, and Y.-G. Jiang, Matching user photos to online products with robust deep features, in Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval. ACM, 2016, pp [6] S. Liu, Z. Song, G. Liu, C. Xu, H. Lu, and S. Yan, Street-toshop: Cross-scenario clothing retrieval via parts alignment and auxiliary set, in Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012, pp [7] W. Di, C. Wah, A. Bhardwaj, R. Piramuthu, and N. Sundaresan, Style finder: Fine-grained clothing style detection and retrieval, in Proceedings of the IEEE Conference on computer vision and pattern recognition workshops, 2013, pp [8] M. Hadi Kiapour, X. Han, S. Lazebnik, A. C. Berg, and T. L. Berg, Where to buy it: Matching street clothing photos in online shops, in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp [9] Z.-Q. Cheng, X. Wu, Y. Liu, and X.-S. Hua, Video ecommerce++: Towards large scale online video advertising, IEEE Transactions on Multimedia, [10] M. Mizuochi, A. Kanezaki, and T. Harada, Clothing retrieval based on local similarity with multiple images, in Proceedings of the 22nd ACM international conference on Multimedia. ACM, 2014, pp [11] J. Fu, J. Wang, Z. Li, M. Xu, and H. Lu, Efficient clothing retrieval with semantic-preserving visual phrases, in Asian Conference on Computer Vision. Springer, 2012, pp [12] Y. LeCun, Y. Bengio, and G. Hinton, Deep learning, Nature, vol. 521, no. 7553, pp , [13] A. Krizhevsky, I. Sutskever, and G. E. Hinton, Imagenet classification with deep convolutional neural networks, in Advances in neural information processing systems (NIPS), 2012, pp [14] S. Ren, K. He, R. Girshick, and J. Sun, Faster R-CNN: Towards realtime object detection with region proposal networks, in Advances in Neural Information Processing Systems (NIPS), [15] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, You only look once: Unified, real-time object detection, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp [16] J. Huang, W. Xia, and S. Yan, Deep search with attribute-aware deep network, in Proceedings of the 22Nd ACM International Conference on Multimedia, ser. MM 14. New York, NY, USA: ACM, 2014, pp [Online]. Available: [17] K. Lin, H.-F. Yang, K.-H. Liu, J.-H. Hsiao, and C.-S. Chen, Rapid clothing retrieval via deep learning of binary codes and hierarchical search, in Proceedings of the 5th ACM on International Conference on Multimedia Retrieval. ACM, 2015, pp [18] J.-C. Chen and C.-F. Liu, Visual-based deep learning for clothing from large database, in Proceedings of the ASE BigData & SocialInformatics ACM, 2015, p. 42. [19] Z. Liu, P. Luo, S. Qiu, X. Wang, and X. Tang, Deepfashion: Powering robust clothes recognition and retrieval with rich annotations, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp [20] Y. Jing, D. Liu, D. Kislyuk, A. Zhai, J. Xu, J. Donahue, and S. Tavel, Visual search at pinterest, in Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2015, pp [21] A. Babenko, A. Slesarev, A. Chigorin, and V. Lempitsky, Neural codes for image retrieval, in European Conference on Computer Vision (ECCV). Springer, 2014, pp [22] A. Sharif Razavian, H. Azizpour, J. Sullivan, and S. Carlsson, Cnn features off-the-shelf: an astounding baseline for recognition, in IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp [23] Y. Kalantidis, C. Mellina, and S. Osindero, Cross-dimensional weighting for aggregated deep convolutional features, in European Conference on Computer Vision. Springer, 2016, pp [24] G. Tolias, R. Sicre, and H. Jégou, Particular object retrieval with integral max-pooling of cnn activations, arxiv: , [25] A. S. Razavian, J. Sullivan, A. Maki, and S. Carlsson, A baseline for visual instance retrieval with deep convolutional networks, CoRR, vol. abs/ , [Online]. Available: [26] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, Caffe: Convolutional architecture for fast feature embedding, arxiv preprint arxiv: , [27] M. D. Zeiler and R. Fergus, Visualizing and Understanding Convolutional Networks. Cham: Springer International Publishing, 2014, pp [Online]. Available:

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION Kingsley Kuan 1, Gaurav Manek 1, Jie Lin 1, Yuan Fang 1, Vijay Chandrasekhar 1,2 Institute for Infocomm Research, A*STAR, Singapore 1 Nanyang Technological

More information

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS Kuan-Chuan Peng and Tsuhan Chen School of Electrical and Computer Engineering, Cornell University, Ithaca, NY

More information

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun Presented by Tushar Bansal Objective 1. Get bounding box for all objects

More information

CEA LIST s participation to the Scalable Concept Image Annotation task of ImageCLEF 2015

CEA LIST s participation to the Scalable Concept Image Annotation task of ImageCLEF 2015 CEA LIST s participation to the Scalable Concept Image Annotation task of ImageCLEF 2015 Etienne Gadeski, Hervé Le Borgne, and Adrian Popescu CEA, LIST, Laboratory of Vision and Content Engineering, France

More information

Spatial Localization and Detection. Lecture 8-1

Spatial Localization and Detection. Lecture 8-1 Lecture 8: Spatial Localization and Detection Lecture 8-1 Administrative - Project Proposals were due on Saturday Homework 2 due Friday 2/5 Homework 1 grades out this week Midterm will be in-class on Wednesday

More information

Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab.

Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab. [ICIP 2017] Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab., POSTECH Pedestrian Detection Goal To draw bounding boxes that

More information

Part Localization by Exploiting Deep Convolutional Networks

Part Localization by Exploiting Deep Convolutional Networks Part Localization by Exploiting Deep Convolutional Networks Marcel Simon, Erik Rodner, and Joachim Denzler Computer Vision Group, Friedrich Schiller University of Jena, Germany www.inf-cv.uni-jena.de Abstract.

More information

Hybrid Supervised Deep Learning for Ethnicity Classification using Face Images

Hybrid Supervised Deep Learning for Ethnicity Classification using Face Images 1 Hybrid Supervised Deep Learning for Ethnicity Classification using Face Images Zhao Heng, Manandhar Dipu, Kim-Hui Yap School of Electrical and Electronic Engineering Nanyang Technological University

More information

Deep learning for object detection. Slides from Svetlana Lazebnik and many others

Deep learning for object detection. Slides from Svetlana Lazebnik and many others Deep learning for object detection Slides from Svetlana Lazebnik and many others Recent developments in object detection 80% PASCAL VOC mean0average0precision0(map) 70% 60% 50% 40% 30% 20% 10% Before deep

More information

Object detection with CNNs

Object detection with CNNs Object detection with CNNs 80% PASCAL VOC mean0average0precision0(map) 70% 60% 50% 40% 30% 20% 10% Before CNNs After CNNs 0% 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 year Region proposals

More information

Object Detection Based on Deep Learning

Object Detection Based on Deep Learning Object Detection Based on Deep Learning Yurii Pashchenko AI Ukraine 2016, Kharkiv, 2016 Image classification (mostly what you ve seen) http://tutorial.caffe.berkeleyvision.org/caffe-cvpr15-detection.pdf

More information

Faster R-CNN Features for Instance Search

Faster R-CNN Features for Instance Search Faster R-CNN Features for Instance Search Amaia Salvador, Xavier Giró-i-Nieto, Ferran Marqués Universitat Politècnica de Catalunya (UPC) Barcelona, Spain {amaia.salvador,xavier.giro}@upc.edu Shin ichi

More information

DeepIndex for Accurate and Efficient Image Retrieval

DeepIndex for Accurate and Efficient Image Retrieval DeepIndex for Accurate and Efficient Image Retrieval Yu Liu, Yanming Guo, Song Wu, Michael S. Lew Media Lab, Leiden Institute of Advance Computer Science Outline Motivation Proposed Approach Results Conclusions

More information

Channel Locality Block: A Variant of Squeeze-and-Excitation

Channel Locality Block: A Variant of Squeeze-and-Excitation Channel Locality Block: A Variant of Squeeze-and-Excitation 1 st Huayu Li Northern Arizona University Flagstaff, United State Northern Arizona University hl459@nau.edu arxiv:1901.01493v1 [cs.lg] 6 Jan

More information

Real-time Object Detection CS 229 Course Project

Real-time Object Detection CS 229 Course Project Real-time Object Detection CS 229 Course Project Zibo Gong 1, Tianchang He 1, and Ziyi Yang 1 1 Department of Electrical Engineering, Stanford University December 17, 2016 Abstract Objection detection

More information

Feature-Fused SSD: Fast Detection for Small Objects

Feature-Fused SSD: Fast Detection for Small Objects Feature-Fused SSD: Fast Detection for Small Objects Guimei Cao, Xuemei Xie, Wenzhe Yang, Quan Liao, Guangming Shi, Jinjian Wu School of Electronic Engineering, Xidian University, China xmxie@mail.xidian.edu.cn

More information

Proceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong

Proceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong TABLE I CLASSIFICATION ACCURACY OF DIFFERENT PRE-TRAINED MODELS ON THE TEST DATA

More information

MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK. Wenjie Guan, YueXian Zou*, Xiaoqun Zhou

MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK. Wenjie Guan, YueXian Zou*, Xiaoqun Zhou MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK Wenjie Guan, YueXian Zou*, Xiaoqun Zhou ADSPLAB/Intelligent Lab, School of ECE, Peking University, Shenzhen,518055, China

More information

Lecture 5: Object Detection

Lecture 5: Object Detection Object Detection CSED703R: Deep Learning for Visual Recognition (2017F) Lecture 5: Object Detection Bohyung Han Computer Vision Lab. bhhan@postech.ac.kr 2 Traditional Object Detection Algorithms Region-based

More information

Computer Vision Lecture 16

Computer Vision Lecture 16 Announcements Computer Vision Lecture 16 Deep Learning Applications 11.01.2017 Seminar registration period starts on Friday We will offer a lab course in the summer semester Deep Robot Learning Topic:

More information

Computer Vision Lecture 16

Computer Vision Lecture 16 Computer Vision Lecture 16 Deep Learning Applications 11.01.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Seminar registration period starts

More information

Efficient Segmentation-Aided Text Detection For Intelligent Robots

Efficient Segmentation-Aided Text Detection For Intelligent Robots Efficient Segmentation-Aided Text Detection For Intelligent Robots Junting Zhang, Yuewei Na, Siyang Li, C.-C. Jay Kuo University of Southern California Outline Problem Definition and Motivation Related

More information

In Defense of Fully Connected Layers in Visual Representation Transfer

In Defense of Fully Connected Layers in Visual Representation Transfer In Defense of Fully Connected Layers in Visual Representation Transfer Chen-Lin Zhang, Jian-Hao Luo, Xiu-Shen Wei, Jianxin Wu National Key Laboratory for Novel Software Technology, Nanjing University,

More information

Yiqi Yan. May 10, 2017

Yiqi Yan. May 10, 2017 Yiqi Yan May 10, 2017 P a r t I F u n d a m e n t a l B a c k g r o u n d s Convolution Single Filter Multiple Filters 3 Convolution: case study, 2 filters 4 Convolution: receptive field receptive field

More information

Cascade Region Regression for Robust Object Detection

Cascade Region Regression for Robust Object Detection Large Scale Visual Recognition Challenge 2015 (ILSVRC2015) Cascade Region Regression for Robust Object Detection Jiankang Deng, Shaoli Huang, Jing Yang, Hui Shuai, Zhengbo Yu, Zongguang Lu, Qiang Ma, Yali

More information

Computer Vision Lecture 16

Computer Vision Lecture 16 Computer Vision Lecture 16 Deep Learning for Object Categorization 14.01.2016 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Seminar registration period

More information

Final Report: Smart Trash Net: Waste Localization and Classification

Final Report: Smart Trash Net: Waste Localization and Classification Final Report: Smart Trash Net: Waste Localization and Classification Oluwasanya Awe oawe@stanford.edu Robel Mengistu robel@stanford.edu December 15, 2017 Vikram Sreedhar vsreed@stanford.edu Abstract Given

More information

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides Deep Learning in Visual Recognition Thanks Da Zhang for the slides Deep Learning is Everywhere 2 Roadmap Introduction Convolutional Neural Network Application Image Classification Object Detection Object

More information

An Exploration of Computer Vision Techniques for Bird Species Classification

An Exploration of Computer Vision Techniques for Bird Species Classification An Exploration of Computer Vision Techniques for Bird Species Classification Anne L. Alter, Karen M. Wang December 15, 2017 Abstract Bird classification, a fine-grained categorization task, is a complex

More information

Content-Based Image Recovery

Content-Based Image Recovery Content-Based Image Recovery Hong-Yu Zhou and Jianxin Wu National Key Laboratory for Novel Software Technology Nanjing University, China zhouhy@lamda.nju.edu.cn wujx2001@nju.edu.cn Abstract. We propose

More information

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Si Chen The George Washington University sichen@gwmail.gwu.edu Meera Hahn Emory University mhahn7@emory.edu Mentor: Afshin

More information

Compact Deep Invariant Descriptors for Video Retrieval

Compact Deep Invariant Descriptors for Video Retrieval Compact Deep Invariant Descriptors for Video Retrieval Yihang Lou 1,2, Yan Bai 1,2, Jie Lin 4, Shiqi Wang 3,5, Jie Chen 2,5, Vijay Chandrasekhar 3,5, Lingyu Duan 2,5, Tiejun Huang 2,5, Alex Chichung Kot

More information

Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network

Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network Liwen Zheng, Canmiao Fu, Yong Zhao * School of Electronic and Computer Engineering, Shenzhen Graduate School of

More information

3 Object Detection. BVM 2018 Tutorial: Advanced Deep Learning Methods. Paul F. Jaeger, Division of Medical Image Computing

3 Object Detection. BVM 2018 Tutorial: Advanced Deep Learning Methods. Paul F. Jaeger, Division of Medical Image Computing 3 Object Detection BVM 2018 Tutorial: Advanced Deep Learning Methods Paul F. Jaeger, of Medical Image Computing What is object detection? classification segmentation obj. detection (1 label per pixel)

More information

Instance Retrieval at Fine-grained Level Using Multi-Attribute Recognition

Instance Retrieval at Fine-grained Level Using Multi-Attribute Recognition Instance Retrieval at Fine-grained Level Using Multi-Attribute Recognition Roshanak Zakizadeh, Yu Qian, Michele Sasdelli and Eduard Vazquez Cortexica Vision Systems Limited, London, UK Australian Institute

More information

Mask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma

Mask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma Mask R-CNN presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma Mask R-CNN Background Related Work Architecture Experiment Mask R-CNN Background Related Work Architecture Experiment Background From left

More information

TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK

TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK 1 Po-Jen Lai ( 賴柏任 ), 2 Chiou-Shann Fuh ( 傅楸善 ) 1 Dept. of Electrical Engineering, National Taiwan University, Taiwan 2 Dept.

More information

Project 3 Q&A. Jonathan Krause

Project 3 Q&A. Jonathan Krause Project 3 Q&A Jonathan Krause 1 Outline R-CNN Review Error metrics Code Overview Project 3 Report Project 3 Presentations 2 Outline R-CNN Review Error metrics Code Overview Project 3 Report Project 3 Presentations

More information

Visual features detection based on deep neural network in autonomous driving tasks

Visual features detection based on deep neural network in autonomous driving tasks 430 Fomin I., Gromoshinskii D., Stepanov D. Visual features detection based on deep neural network in autonomous driving tasks Ivan Fomin, Dmitrii Gromoshinskii, Dmitry Stepanov Computer vision lab Russian

More information

arxiv: v1 [cs.cv] 16 Nov 2015

arxiv: v1 [cs.cv] 16 Nov 2015 Coarse-to-fine Face Alignment with Multi-Scale Local Patch Regression Zhiao Huang hza@megvii.com Erjin Zhou zej@megvii.com Zhimin Cao czm@megvii.com arxiv:1511.04901v1 [cs.cv] 16 Nov 2015 Abstract Facial

More information

YOLO9000: Better, Faster, Stronger

YOLO9000: Better, Faster, Stronger YOLO9000: Better, Faster, Stronger Date: January 24, 2018 Prepared by Haris Khan (University of Toronto) Haris Khan CSC2548: Machine Learning in Computer Vision 1 Overview 1. Motivation for one-shot object

More information

Volumetric and Multi-View CNNs for Object Classification on 3D Data Supplementary Material

Volumetric and Multi-View CNNs for Object Classification on 3D Data Supplementary Material Volumetric and Multi-View CNNs for Object Classification on 3D Data Supplementary Material Charles R. Qi Hao Su Matthias Nießner Angela Dai Mengyuan Yan Leonidas J. Guibas Stanford University 1. Details

More information

arxiv: v1 [cs.cv] 6 Jul 2016

arxiv: v1 [cs.cv] 6 Jul 2016 arxiv:607.079v [cs.cv] 6 Jul 206 Deep CORAL: Correlation Alignment for Deep Domain Adaptation Baochen Sun and Kate Saenko University of Massachusetts Lowell, Boston University Abstract. Deep neural networks

More information

Deep Neural Networks:

Deep Neural Networks: Deep Neural Networks: Part II Convolutional Neural Network (CNN) Yuan-Kai Wang, 2016 Web site of this course: http://pattern-recognition.weebly.com source: CNN for ImageClassification, by S. Lazebnik,

More information

Robust Face Recognition Based on Convolutional Neural Network

Robust Face Recognition Based on Convolutional Neural Network 2017 2nd International Conference on Manufacturing Science and Information Engineering (ICMSIE 2017) ISBN: 978-1-60595-516-2 Robust Face Recognition Based on Convolutional Neural Network Ying Xu, Hui Ma,

More information

Multi-Glance Attention Models For Image Classification

Multi-Glance Attention Models For Image Classification Multi-Glance Attention Models For Image Classification Chinmay Duvedi Stanford University Stanford, CA cduvedi@stanford.edu Pararth Shah Stanford University Stanford, CA pararth@stanford.edu Abstract We

More information

Industrial Technology Research Institute, Hsinchu, Taiwan, R.O.C ǂ

Industrial Technology Research Institute, Hsinchu, Taiwan, R.O.C ǂ Stop Line Detection and Distance Measurement for Road Intersection based on Deep Learning Neural Network Guan-Ting Lin 1, Patrisia Sherryl Santoso *1, Che-Tsung Lin *ǂ, Chia-Chi Tsai and Jiun-In Guo National

More information

MCMOT: Multi-Class Multi-Object Tracking using Changing Point Detection

MCMOT: Multi-Class Multi-Object Tracking using Changing Point Detection MCMOT: Multi-Class Multi-Object Tracking using Changing Point Detection ILSVRC 2016 Object Detection from Video Byungjae Lee¹, Songguo Jin¹, Enkhbayar Erdenee¹, Mi Young Nam², Young Gui Jung², Phill Kyu

More information

Object Detection. TA : Young-geun Kim. Biostatistics Lab., Seoul National University. March-June, 2018

Object Detection. TA : Young-geun Kim. Biostatistics Lab., Seoul National University. March-June, 2018 Object Detection TA : Young-geun Kim Biostatistics Lab., Seoul National University March-June, 2018 Seoul National University Deep Learning March-June, 2018 1 / 57 Index 1 Introduction 2 R-CNN 3 YOLO 4

More information

Classifying a specific image region using convolutional nets with an ROI mask as input

Classifying a specific image region using convolutional nets with an ROI mask as input Classifying a specific image region using convolutional nets with an ROI mask as input 1 Sagi Eppel Abstract Convolutional neural nets (CNN) are the leading computer vision method for classifying images.

More information

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia Deep learning for dense per-pixel prediction Chunhua Shen The University of Adelaide, Australia Image understanding Classification error Convolution Neural Networks 0.3 0.2 0.1 Image Classification [Krizhevsky

More information

ECCV Presented by: Boris Ivanovic and Yolanda Wang CS 331B - November 16, 2016

ECCV Presented by: Boris Ivanovic and Yolanda Wang CS 331B - November 16, 2016 ECCV 2016 Presented by: Boris Ivanovic and Yolanda Wang CS 331B - November 16, 2016 Fundamental Question What is a good vector representation of an object? Something that can be easily predicted from 2D

More information

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech Convolutional Neural Networks Computer Vision Jia-Bin Huang, Virginia Tech Today s class Overview Convolutional Neural Network (CNN) Training CNN Understanding and Visualizing CNN Image Categorization:

More information

[Supplementary Material] Improving Occlusion and Hard Negative Handling for Single-Stage Pedestrian Detectors

[Supplementary Material] Improving Occlusion and Hard Negative Handling for Single-Stage Pedestrian Detectors [Supplementary Material] Improving Occlusion and Hard Negative Handling for Single-Stage Pedestrian Detectors Junhyug Noh Soochan Lee Beomsu Kim Gunhee Kim Department of Computer Science and Engineering

More information

Real-time object detection towards high power efficiency

Real-time object detection towards high power efficiency Real-time object detection towards high power efficiency Jincheng Yu, Kaiyuan Guo, Yiming Hu, Xuefei Ning, Jiantao Qiu, Huizi Mao, Song Yao, Tianqi Tang, Boxun Li, Yu Wang, and Huazhong Yang Tsinghua University,

More information

Study of Residual Networks for Image Recognition

Study of Residual Networks for Image Recognition Study of Residual Networks for Image Recognition Mohammad Sadegh Ebrahimi Stanford University sadegh@stanford.edu Hossein Karkeh Abadi Stanford University hosseink@stanford.edu Abstract Deep neural networks

More information

Object Detection on Self-Driving Cars in China. Lingyun Li

Object Detection on Self-Driving Cars in China. Lingyun Li Object Detection on Self-Driving Cars in China Lingyun Li Introduction Motivation: Perception is the key of self-driving cars Data set: 10000 images with annotation 2000 images without annotation (not

More information

Deep Learning for Object detection & localization

Deep Learning for Object detection & localization Deep Learning for Object detection & localization RCNN, Fast RCNN, Faster RCNN, YOLO, GAP, CAM, MSROI Aaditya Prakash Sep 25, 2018 Image classification Image classification Whole of image is classified

More information

Lecture 7: Semantic Segmentation

Lecture 7: Semantic Segmentation Semantic Segmentation CSED703R: Deep Learning for Visual Recognition (207F) Segmenting images based on its semantic notion Lecture 7: Semantic Segmentation Bohyung Han Computer Vision Lab. bhhanpostech.ac.kr

More information

LARGE-SCALE PERSON RE-IDENTIFICATION AS RETRIEVAL

LARGE-SCALE PERSON RE-IDENTIFICATION AS RETRIEVAL LARGE-SCALE PERSON RE-IDENTIFICATION AS RETRIEVAL Hantao Yao 1,2, Shiliang Zhang 3, Dongming Zhang 1, Yongdong Zhang 1,2, Jintao Li 1, Yu Wang 4, Qi Tian 5 1 Key Lab of Intelligent Information Processing

More information

SELF SUPERVISED DEEP REPRESENTATION LEARNING FOR FINE-GRAINED BODY PART RECOGNITION

SELF SUPERVISED DEEP REPRESENTATION LEARNING FOR FINE-GRAINED BODY PART RECOGNITION SELF SUPERVISED DEEP REPRESENTATION LEARNING FOR FINE-GRAINED BODY PART RECOGNITION Pengyue Zhang Fusheng Wang Yefeng Zheng Medical Imaging Technologies, Siemens Medical Solutions USA Inc., Princeton,

More information

arxiv: v1 [cs.cv] 20 Dec 2016

arxiv: v1 [cs.cv] 20 Dec 2016 End-to-End Pedestrian Collision Warning System based on a Convolutional Neural Network with Semantic Segmentation arxiv:1612.06558v1 [cs.cv] 20 Dec 2016 Heechul Jung heechul@dgist.ac.kr Min-Kook Choi mkchoi@dgist.ac.kr

More information

CS6501: Deep Learning for Visual Recognition. Object Detection I: RCNN, Fast-RCNN, Faster-RCNN

CS6501: Deep Learning for Visual Recognition. Object Detection I: RCNN, Fast-RCNN, Faster-RCNN CS6501: Deep Learning for Visual Recognition Object Detection I: RCNN, Fast-RCNN, Faster-RCNN Today s Class Object Detection The RCNN Object Detector (2014) The Fast RCNN Object Detector (2015) The Faster

More information

arxiv: v1 [cs.cv] 31 Mar 2016

arxiv: v1 [cs.cv] 31 Mar 2016 Object Boundary Guided Semantic Segmentation Qin Huang, Chunyang Xia, Wenchao Zheng, Yuhang Song, Hao Xu and C.-C. Jay Kuo arxiv:1603.09742v1 [cs.cv] 31 Mar 2016 University of Southern California Abstract.

More information

Cost-alleviative Learning for Deep Convolutional Neural Network-based Facial Part Labeling

Cost-alleviative Learning for Deep Convolutional Neural Network-based Facial Part Labeling [DOI: 10.2197/ipsjtcva.7.99] Express Paper Cost-alleviative Learning for Deep Convolutional Neural Network-based Facial Part Labeling Takayoshi Yamashita 1,a) Takaya Nakamura 1 Hiroshi Fukui 1,b) Yuji

More information

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li Learning to Match Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li 1. Introduction The main tasks in many applications can be formalized as matching between heterogeneous objects, including search, recommendation,

More information

Pedestrian Detection based on Deep Fusion Network using Feature Correlation

Pedestrian Detection based on Deep Fusion Network using Feature Correlation Pedestrian Detection based on Deep Fusion Network using Feature Correlation Yongwoo Lee, Toan Duc Bui and Jitae Shin School of Electronic and Electrical Engineering, Sungkyunkwan University, Suwon, South

More information

International Journal of Computer Engineering and Applications, Volume XII, Special Issue, September 18,

International Journal of Computer Engineering and Applications, Volume XII, Special Issue, September 18, REAL-TIME OBJECT DETECTION WITH CONVOLUTION NEURAL NETWORK USING KERAS Asmita Goswami [1], Lokesh Soni [2 ] Department of Information Technology [1] Jaipur Engineering College and Research Center Jaipur[2]

More information

R-FCN++: Towards Accurate Region-Based Fully Convolutional Networks for Object Detection

R-FCN++: Towards Accurate Region-Based Fully Convolutional Networks for Object Detection The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18) R-FCN++: Towards Accurate Region-Based Fully Convolutional Networks for Object Detection Zeming Li, 1 Yilun Chen, 2 Gang Yu, 2 Yangdong

More information

A Deep Learning Framework for Authorship Classification of Paintings

A Deep Learning Framework for Authorship Classification of Paintings A Deep Learning Framework for Authorship Classification of Paintings Kai-Lung Hua ( 花凱龍 ) Dept. of Computer Science and Information Engineering National Taiwan University of Science and Technology Taipei,

More information

Mimicking Very Efficient Network for Object Detection

Mimicking Very Efficient Network for Object Detection Mimicking Very Efficient Network for Object Detection Quanquan Li 1, Shengying Jin 2, Junjie Yan 1 1 SenseTime 2 Beihang University liquanquan@sensetime.com, jsychffy@gmail.com, yanjunjie@outlook.com Abstract

More information

Supplementary material for Analyzing Filters Toward Efficient ConvNet

Supplementary material for Analyzing Filters Toward Efficient ConvNet Supplementary material for Analyzing Filters Toward Efficient Net Takumi Kobayashi National Institute of Advanced Industrial Science and Technology, Japan takumi.kobayashi@aist.go.jp A. Orthonormal Steerable

More information

Layerwise Interweaving Convolutional LSTM

Layerwise Interweaving Convolutional LSTM Layerwise Interweaving Convolutional LSTM Tiehang Duan and Sargur N. Srihari Department of Computer Science and Engineering The State University of New York at Buffalo Buffalo, NY 14260, United States

More information

PT-NET: IMPROVE OBJECT AND FACE DETECTION VIA A PRE-TRAINED CNN MODEL

PT-NET: IMPROVE OBJECT AND FACE DETECTION VIA A PRE-TRAINED CNN MODEL PT-NET: IMPROVE OBJECT AND FACE DETECTION VIA A PRE-TRAINED CNN MODEL Yingxin Lou 1, Guangtao Fu 2, Zhuqing Jiang 1, Aidong Men 1, and Yun Zhou 2 1 Beijing University of Posts and Telecommunications, Beijing,

More information

Unified, real-time object detection

Unified, real-time object detection Unified, real-time object detection Final Project Report, Group 02, 8 Nov 2016 Akshat Agarwal (13068), Siddharth Tanwar (13699) CS698N: Recent Advances in Computer Vision, Jul Nov 2016 Instructor: Gaurav

More information

SSD: Single Shot MultiBox Detector. Author: Wei Liu et al. Presenter: Siyu Jiang

SSD: Single Shot MultiBox Detector. Author: Wei Liu et al. Presenter: Siyu Jiang SSD: Single Shot MultiBox Detector Author: Wei Liu et al. Presenter: Siyu Jiang Outline 1. Motivations 2. Contributions 3. Methodology 4. Experiments 5. Conclusions 6. Extensions Motivation Motivation

More information

Object Detection. CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR

Object Detection. CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR Object Detection CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR Problem Description Arguably the most important part of perception Long term goals for object recognition: Generalization

More information

Color Naming for Multi-Color Fashion Items

Color Naming for Multi-Color Fashion Items Color Naming for Multi-Color Fashion Items Vacit Oguz Yazici 1,2, Joost van de Weijer 1, and Arnau Ramisa 2 1 Computer Vision Center, Universitat Autonoma de Barcelona, Building O Campus UAB, 08193 Bellaterra,

More information

Rich feature hierarchies for accurate object detection and semantic segmentation

Rich feature hierarchies for accurate object detection and semantic segmentation Rich feature hierarchies for accurate object detection and semantic segmentation Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik Presented by Pandian Raju and Jialin Wu Last class SGD for Document

More information

arxiv: v1 [cs.cv] 2 Sep 2018

arxiv: v1 [cs.cv] 2 Sep 2018 Natural Language Person Search Using Deep Reinforcement Learning Ankit Shah Language Technologies Institute Carnegie Mellon University aps1@andrew.cmu.edu Tyler Vuong Electrical and Computer Engineering

More information

arxiv: v2 [cs.cv] 30 Jul 2016

arxiv: v2 [cs.cv] 30 Jul 2016 arxiv:1512.04065v2 [cs.cv] 30 Jul 2016 Cross-dimensional Weighting for Aggregated Deep Convolutional Features Yannis Kalantidis, Clayton Mellina and Simon Osindero Computer Vision and Machine Learning

More information

arxiv: v1 [cs.cv] 23 Jan 2019

arxiv: v1 [cs.cv] 23 Jan 2019 DeepFashion2: A Versatile Benchmark for Detection, Pose Estimation, Segmentation and Re-Identification of Clothing Images Yuying Ge 1, Ruimao Zhang 1, Lingyun Wu 2, Xiaogang Wang 1, Xiaoou Tang 1, and

More information

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation Object detection using Region Proposals (RCNN) Ernest Cheung COMP790-125 Presentation 1 2 Problem to solve Object detection Input: Image Output: Bounding box of the object 3 Object detection using CNN

More information

EFFECTIVE OBJECT DETECTION FROM TRAFFIC CAMERA VIDEOS. Honghui Shi, Zhichao Liu*, Yuchen Fan, Xinchao Wang, Thomas Huang

EFFECTIVE OBJECT DETECTION FROM TRAFFIC CAMERA VIDEOS. Honghui Shi, Zhichao Liu*, Yuchen Fan, Xinchao Wang, Thomas Huang EFFECTIVE OBJECT DETECTION FROM TRAFFIC CAMERA VIDEOS Honghui Shi, Zhichao Liu*, Yuchen Fan, Xinchao Wang, Thomas Huang Image Formation and Processing (IFP) Group, University of Illinois at Urbana-Champaign

More information

Modern Convolutional Object Detectors

Modern Convolutional Object Detectors Modern Convolutional Object Detectors Faster R-CNN, R-FCN, SSD 29 September 2017 Presented by: Kevin Liang Papers Presented Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

More information

Recognize Complex Events from Static Images by Fusing Deep Channels Supplementary Materials

Recognize Complex Events from Static Images by Fusing Deep Channels Supplementary Materials Recognize Complex Events from Static Images by Fusing Deep Channels Supplementary Materials Yuanjun Xiong 1 Kai Zhu 1 Dahua Lin 1 Xiaoou Tang 1,2 1 Department of Information Engineering, The Chinese University

More information

arxiv: v1 [cs.cv] 26 Jun 2017

arxiv: v1 [cs.cv] 26 Jun 2017 Detecting Small Signs from Large Images arxiv:1706.08574v1 [cs.cv] 26 Jun 2017 Zibo Meng, Xiaochuan Fan, Xin Chen, Min Chen and Yan Tong Computer Science and Engineering University of South Carolina, Columbia,

More information

JOINT DETECTION AND SEGMENTATION WITH DEEP HIERARCHICAL NETWORKS. Zhao Chen Machine Learning Intern, NVIDIA

JOINT DETECTION AND SEGMENTATION WITH DEEP HIERARCHICAL NETWORKS. Zhao Chen Machine Learning Intern, NVIDIA JOINT DETECTION AND SEGMENTATION WITH DEEP HIERARCHICAL NETWORKS Zhao Chen Machine Learning Intern, NVIDIA ABOUT ME 5th year PhD student in physics @ Stanford by day, deep learning computer vision scientist

More information

Supplementary Material: Unconstrained Salient Object Detection via Proposal Subset Optimization

Supplementary Material: Unconstrained Salient Object Detection via Proposal Subset Optimization Supplementary Material: Unconstrained Salient Object via Proposal Subset Optimization 1. Proof of the Submodularity According to Eqns. 10-12 in our paper, the objective function of the proposed optimization

More information

FCHD: A fast and accurate head detector

FCHD: A fast and accurate head detector JOURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 1 FCHD: A fast and accurate head detector Aditya Vora, Johnson Controls Inc. arxiv:1809.08766v2 [cs.cv] 26 Sep 2018 Abstract In this paper, we

More information

Kaggle Data Science Bowl 2017 Technical Report

Kaggle Data Science Bowl 2017 Technical Report Kaggle Data Science Bowl 2017 Technical Report qfpxfd Team May 11, 2017 1 Team Members Table 1: Team members Name E-Mail University Jia Ding dingjia@pku.edu.cn Peking University, Beijing, China Aoxue Li

More information

The Treasure beneath Convolutional Layers: Cross-convolutional-layer Pooling for Image Classification

The Treasure beneath Convolutional Layers: Cross-convolutional-layer Pooling for Image Classification The Treasure beneath Convolutional Layers: Cross-convolutional-layer Pooling for Image Classification Lingqiao Liu 1, Chunhua Shen 1,2, Anton van den Hengel 1,2 The University of Adelaide, Australia 1

More information

Class 5: Attributes and Semantic Features

Class 5: Attributes and Semantic Features Class 5: Attributes and Semantic Features Rogerio Feris, Feb 21, 2013 EECS 6890 Topics in Information Processing Spring 2013, Columbia University http://rogerioferis.com/visualrecognitionandsearch Project

More information

Traffic Multiple Target Detection on YOLOv2

Traffic Multiple Target Detection on YOLOv2 Traffic Multiple Target Detection on YOLOv2 Junhong Li, Huibin Ge, Ziyang Zhang, Weiqin Wang, Yi Yang Taiyuan University of Technology, Shanxi, 030600, China wangweiqin1609@link.tyut.edu.cn Abstract Background

More information

Finding Tiny Faces Supplementary Materials

Finding Tiny Faces Supplementary Materials Finding Tiny Faces Supplementary Materials Peiyun Hu, Deva Ramanan Robotics Institute Carnegie Mellon University {peiyunh,deva}@cs.cmu.edu 1. Error analysis Quantitative analysis We plot the distribution

More information

Mask R-CNN. Kaiming He, Georgia, Gkioxari, Piotr Dollar, Ross Girshick Presenters: Xiaokang Wang, Mengyao Shi Feb. 13, 2018

Mask R-CNN. Kaiming He, Georgia, Gkioxari, Piotr Dollar, Ross Girshick Presenters: Xiaokang Wang, Mengyao Shi Feb. 13, 2018 Mask R-CNN Kaiming He, Georgia, Gkioxari, Piotr Dollar, Ross Girshick Presenters: Xiaokang Wang, Mengyao Shi Feb. 13, 2018 1 Common computer vision tasks Image Classification: one label is generated for

More information

Object Detection for Crime Scene Evidence Analysis using Deep Learning

Object Detection for Crime Scene Evidence Analysis using Deep Learning Object Detection for Crime Scene Evidence Analysis using Deep Learning Surajit Saikia 1,2, E. Fidalgo 1,2, Enrique Alegre 1,2 and 2,3 Laura Fernández-Robles 1 Department of Electrical, Systems and Automation,

More information

Applying Visual User Interest Profiles for Recommendation & Personalisation

Applying Visual User Interest Profiles for Recommendation & Personalisation Applying Visual User Interest Profiles for Recommendation & Personalisation Jiang Zhou, Rami Albatal, and Cathal Gurrin Insight Centre for Data Analytics, Dublin City University jiang.zhou@dcu.ie https://www.insight-centre.org

More information

CSE255 Assignment 1 Improved image-based recommendations for what not to wear dataset

CSE255 Assignment 1 Improved image-based recommendations for what not to wear dataset CSE255 Assignment 1 Improved image-based recommendations for what not to wear dataset Prabhav Agrawal and Soham Shah 23 February 2015 1 Introduction We are interested in modeling the human perception of

More information

Comparison of Fine-tuning and Extension Strategies for Deep Convolutional Neural Networks

Comparison of Fine-tuning and Extension Strategies for Deep Convolutional Neural Networks Comparison of Fine-tuning and Extension Strategies for Deep Convolutional Neural Networks Nikiforos Pittaras 1, Foteini Markatopoulou 1,2, Vasileios Mezaris 1, and Ioannis Patras 2 1 Information Technologies

More information