Recognize Complex Events from Static Images by Fusing Deep Channels Supplementary Materials

Similar documents
Real-time Object Detection CS 229 Course Project

Beyond bags of features: Adding spatial information. Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba

Object Detection Based on Deep Learning

CS 231A Computer Vision (Fall 2011) Problem Set 4

TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011

Recognize Complex Events from Static Images by Fusing Deep Channels

Deep learning for object detection. Slides from Svetlana Lazebnik and many others

Volumetric and Multi-View CNNs for Object Classification on 3D Data Supplementary Material

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen

Semantic segmentation for room categorization

Multi-Glance Attention Models For Image Classification

arxiv: v1 [cs.cv] 20 Dec 2017

Faceted Navigation for Browsing Large Video Collection

TA Section: Problem Set 4

Object Recognition. Computer Vision. Slides from Lana Lazebnik, Fei-Fei Li, Rob Fergus, Antonio Torralba, and Jean Ponce

Efficient Segmentation-Aided Text Detection For Intelligent Robots

Cultural Event Recognition by Subregion Classification with Convolutional Neural Network

Beyond Bags of Features

Beyond Bags of features Spatial information & Shape models

International Journal of Computer Engineering and Applications, Volume XII, Special Issue, September 18,

CLASSIFICATION Experiments

Feature-Fused SSD: Fast Detection for Small Objects

Video Gesture Recognition with RGB-D-S Data Based on 3D Convolutional Networks

Constrained Local Enhancement of Semantic Features by Content-Based Sparsity

Robust Scene Classification with Cross-level LLC Coding on CNN Features

Local Features and Bag of Words Models

Deep Neural Networks:

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION

CS 231A Computer Vision (Fall 2012) Problem Set 4

Improved Spatial Pyramid Matching for Image Classification

arxiv: v1 [cs.cv] 24 Apr 2015

Supervised learning. y = f(x) function

Dynamic Scene Classification using Spatial and Temporal Cues

Part-based and local feature models for generic object recognition

IDENTIFYING PHOTOREALISTIC COMPUTER GRAPHICS USING CONVOLUTIONAL NEURAL NETWORKS

ROBUST SCENE CLASSIFICATION BY GIST WITH ANGULAR RADIAL PARTITIONING. Wei Liu, Serkan Kiranyaz and Moncef Gabbouj

Part based models for recognition. Kristen Grauman

Supplementary Material: Unconstrained Salient Object Detection via Proposal Subset Optimization

Multiple VLAD encoding of CNNs for image classification

Lecture 7: Semantic Segmentation

Visual words. Map high-dimensional descriptors to tokens/words by quantizing the feature space.

CEA LIST s participation to the Scalable Concept Image Annotation task of ImageCLEF 2015

Large-Scale Scene Classification Using Gist Feature

Towards Large-Scale Semantic Representations for Actionable Exploitation. Prof. Trevor Darrell UC Berkeley

A Novel Representation and Pipeline for Object Detection

SEMANTIC SEGMENTATION AS IMAGE REPRESENTATION FOR SCENE RECOGNITION. Ahmed Bassiouny, Motaz El-Saban. Microsoft Advanced Technology Labs, Cairo, Egypt

Bag of Words Models. CS4670 / 5670: Computer Vision Noah Snavely. Bag-of-words models 11/26/2013

Depth Estimation from a Single Image Using a Deep Neural Network Milestone Report

Human Action Recognition Using CNN and BoW Methods Stanford University CS229 Machine Learning Spring 2016

Learning a Representative and Discriminative Part Model with Deep Convolutional Features for Scene Recognition

String distance for automatic image classification

Evaluation of GIST descriptors for web scale image search

Channel Locality Block: A Variant of Squeeze-and-Excitation

Large-scale gesture recognition based on Multimodal data with C3D and TSN

Visual Scene Interpretation as a Dialogue between Vision and Language

Computer Vision Lecture 16

Xiaowei Hu* Lei Zhu* Chi-Wing Fu Jing Qin Pheng-Ann Heng

Tunnel Effect in CNNs: Image Reconstruction From Max-Switch Locations

Geometric VLAD for Large Scale Image Search. Zixuan Wang 1, Wei Di 2, Anurag Bhardwaj 2, Vignesh Jagadesh 2, Robinson Piramuthu 2

LOCAL AND GLOBAL DESCRIPTORS FOR PLACE RECOGNITION IN ROBOTICS

Object Detection for Crime Scene Evidence Analysis using Deep Learning

Additive Manufacturing Defect Detection using Neural Networks. James Ferguson May 16, 2016

Lecture 5: Object Detection

Visual Object Recognition

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

2017 2nd International Conference on Software, Multimedia and Communication Engineering (SMCE 2017) ISBN:

Mutual Information Based Codebooks Construction for Natural Scene Categorization

Recovering Realistic Texture in Image Super-resolution by Deep Spatial Feature Transform Supplementary Material

In Defense of Fully Connected Layers in Visual Representation Transfer

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Classification of objects from Video Data (Group 30)

Ryerson University CP8208. Soft Computing and Machine Intelligence. Naive Road-Detection using CNNS. Authors: Sarah Asiri - Domenic Curro

Real Time Monitoring of CCTV Camera Images Using Object Detectors and Scene Classification for Retail and Surveillance Applications

Deep Spatial Pyramid Ensemble for Cultural Event Recognition

Part Localization by Exploiting Deep Convolutional Networks

SKETCH-BASED AERIAL IMAGE RETRIEVAL

Spatial Hierarchy of Textons Distributions for Scene Classification

Image Retrieval Based on its Contents Using Features Extraction

CNN BASED REGION PROPOSALS FOR EFFICIENT OBJECT DETECTION. Jawadul H. Bappy and Amit K. Roy-Chowdhury

Chinese text-line detection from web videos with fully convolutional networks

CODEBOOK ENHANCEMENT OF VLAD REPRESENTATION FOR VISUAL RECOGNITION

Supplementary Material for Ensemble Diffusion for Retrieval

CS6670: Computer Vision

Proceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong

InterActive: Inter-Layer Activeness Propagation

R-FCN++: Towards Accurate Region-Based Fully Convolutional Networks for Object Detection

Content-Based Image Recovery

Supplementary material for Analyzing Filters Toward Efficient ConvNet

MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK. Wenjie Guan, YueXian Zou*, Xiaoqun Zhou

Scene Recognition using Bag-of-Words

EasyChair Preprint. Real-Time Action Recognition based on Enhanced Motion Vector Temporal Segment Network

GIST. GPU Implementation. Prakhar Jain ( ) Ejaz Ahmed ( ) 3 rd May, 2009

EFFECTIVE OBJECT DETECTION FROM TRAFFIC CAMERA VIDEOS. Honghui Shi, Zhichao Liu*, Yuchen Fan, Xinchao Wang, Thomas Huang

Pedestrian Detection based on Deep Fusion Network using Feature Correlation

Semantic Segmentation

Image Captioning with Object Detection and Localization

Beyond bags of Features

arxiv: v1 [cs.lg] 20 Dec 2013

ImageCLEF 2011

Transcription:

Recognize Complex Events from Static Images by Fusing Deep Channels Supplementary Materials Yuanjun Xiong 1 Kai Zhu 1 Dahua Lin 1 Xiaoou Tang 1,2 1 Department of Information Engineering, The Chinese University of Hong Kong 2 Shenzhen key lab of Comp. Vis. & Pat. Rec., Shenzhen Institutes of Advanced Technology, CAS, China xy012@ie.cuhk.edu.hk zk013@ie.cuhk.edu.hk dhlin@ie.cuhk.edu.hk xtang@ie.cuhk.edu.hk 1. Model Specification Our model is implemented on Caffe [1]. It is a common programming framework for deep learning. The detailed specification of the model can be downloaded from the project website http://personal.ie.cuhk.edu. hk/ xy012/event_recog. It can be read by Caffe and edited with any text editor. 2. WIDER Dataset The dataset can be downloaded from the project website http://personal.ie.cuhk.edu.hk/ xy012/event_recog/wider. 3. Parameters of Compared Methods We compared our method with three different methods. The detailed settings of their parameters are described below. Gist We use the implementation described in [4]. Images are resized to 128 128. The orientation scales are (8,8,8,8). Spatial Pyramid Matching (SPM) The implementation of spatial pyramid matching algorithm is based on [2]. We use pyramids of three levels. The low-level visual features are characterized by SIFT descriptors. These features are encoded with a codebook of 1500 visual terms. regions of size 3 3. The activation features are accumulated to the corresponding regions according to the positions of the bounding boxs. Finally, we get a representation of 4096 9 = 36864 dimensions. 4. Dataset Examples Figure 1, 2, 3, 4, 5, and 6 present sample images of the WIDER dataset, five for each class. Note we resized all images to squares for consistent layout. References [1] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. arxiv preprint arxiv:1408.5093, 2014. 1 [2] S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, volume 2, pages 2169 2178. IEEE, 2006. 1 [3] L.-J. Li, H. Su, L. Fei-Fei, and E. P. Xing. Object bank: A high-level image representation for scene classification & semantic feature sparsification. In Advances in neural information processing systems, pages 1378 1386, 2010. 1 [4] A. Oliva and A. Torralba. Modeling the shape of the scene: A holistic representation of the spatial envelope. International journal of computer vision, 42(3):145 175, 2001. 1 RCNNBank The RCNNBank is implemented based on ObjectBank [3], except that the detector responses are replaced by the activation features obtained using CNN. For each image, we first obtain 500 bounding boxes with highest proposal scores and apply the CNN to derive action features. This results in 500 activation feature vectors of 4096 dimensions. Then we splice the image into evenly spaced 978-1-4673-6964-0/15/$31.00 2015 IEEE 1

Figure 1: Sample images in the dataset.

Figure 2: Sample images in the dataset.

Figure 3: Sample images in the dataset.

Figure 4: Sample images in the dataset.

Figure 5: Sample images in the dataset.

Figure 6: Sample images in the dataset.