CAP 6412 Advanced Computer Vision

Similar documents
CAP 6412 Advanced Computer Vision

Machine Learning 13. week

A Unified Multiplicative Framework for Attribute Learning

An Exploration of Computer Vision Techniques for Bird Species Classification

Additional Remarks on Designing Category-Level Attributes for Discriminative Visual Recognition

Recurrent Neural Networks. Nand Kishore, Audrey Huang, Rohan Batra

Attributes. Computer Vision. James Hays. Many slides from Derek Hoiem

CS 1674: Intro to Computer Vision. Attributes. Prof. Adriana Kovashka University of Pittsburgh November 2, 2016

CAP 6412 Advanced Computer Vision

Class 5: Attributes and Semantic Features

Attributes and More Crowdsourcing

24 hours of Photo Sharing. installation by Erik Kessels

LSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University

Deep Learning for Computer Vision II

Machine Learning Classifiers and Boosting

House Price Prediction Using LSTM

Geodesic Flow Kernel for Unsupervised Domain Adaptation

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

Multiple cosegmentation

Gradient of the lower bound

Estimating Human Pose in Images. Navraj Singh December 11, 2009

CAP 5415 Computer Vision. Fall 2011

Shifting from Naming to Describing: Semantic Attribute Models. Rogerio Feris, June 2014

Recognition of Animal Skin Texture Attributes in the Wild. Amey Dharwadker (aap2174) Kai Zhang (kz2213)

Action recognition in videos

ABC-CNN: Attention Based CNN for Visual Question Answering

Transfer Learning. Style Transfer in Deep Learning

Using Machine Learning to Optimize Storage Systems

CS5670: Computer Vision

CAP 6412 Advanced Computer Vision

CAP 6412 Advanced Computer Vision

Latent Variable Models for Structured Prediction and Content-Based Retrieval

08 An Introduction to Dense Continuous Robotic Mapping

Applying Supervised Learning

Lecture 18: Human Motion Recognition

Part Localization by Exploiting Deep Convolutional Networks

Additional Remarks on Designing Category-Level Attributes for Discriminative Visual Recognition

ECE 6554:Advanced Computer Vision Pose Estimation

Semantic image search using queries

Spatial Localization and Detection. Lecture 8-1

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

CS395T paper review. Indoor Segmentation and Support Inference from RGBD Images. Chao Jia Sep

Improving One-Shot Learning through Fusing Side Information

Artificial Intelligence. Programming Styles

CS229 Final Project: Predicting Expected Response Times

CSC 578 Neural Networks and Deep Learning

Recurrent Neural Nets II

Object Recognition. Lecture 11, April 21 st, Lexing Xie. EE4830 Digital Image Processing

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

CS489/698: Intro to ML

Learning to Segment Object Candidates

Data Mining: Concepts and Techniques. Chapter 9 Classification: Support Vector Machines. Support Vector Machines (SVMs)

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017

Attribute learning in large-scale datasets. Olga Russakovsky and Li Fei-Fei

ECG782: Multidimensional Digital Signal Processing

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation

Image Transformation via Neural Network Inversion

The Caltech-UCSD Birds Dataset

Hide-and-Seek: Forcing a network to be Meticulous for Weakly-supervised Object and Action Localization

Adaptive Action Detection

Lecture 5: Object Detection

Ensemble methods in machine learning. Example. Neural networks. Neural networks

Object Detection Based on Deep Learning

DeepPose & Convolutional Pose Machines

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

Region-based Segmentation and Object Detection

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

Object and Action Detection from a Single Example

Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks

Lecture 21 : A Hybrid: Deep Learning and Graphical Models

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen

Apparel Classifier and Recommender using Deep Learning

CS 1674: Intro to Computer Vision. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh November 16, 2016

Experiments of Image Retrieval Using Weak Attributes

Multi-label Classification. Jingzhou Liu Dec

LSTM for Language Translation and Image Captioning. Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009

Network Traffic Measurements and Analysis

DeCAF: a Deep Convolutional Activation Feature for Generic Visual Recognition

LSTM: An Image Classification Model Based on Fashion-MNIST Dataset

Segmentation and Tracking of Partial Planar Templates

Human Pose Estimation with Deep Learning. Wei Yang

COMP9444 Neural Networks and Deep Learning 5. Geometry of Hidden Units

Natural Language Processing with Deep Learning CS224N/Ling284

CSE 573: Artificial Intelligence Autumn 2010

MINIMUM VARIANCE EXTREME LEARNING MACHINE FOR HUMAN ACTION RECOGNITION. Alexandros Iosifidis, Anastasios Tefas and Ioannis Pitas

Deep Face Recognition. Nathan Sun

Ulas Bagci

Mini-project 2 CMPSCI 689 Spring 2015 Due: Tuesday, April 07, in class

Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models

EVENT DETECTION AND HUMAN BEHAVIOR RECOGNITION. Ing. Lorenzo Seidenari

Computer Vision. Exercise Session 10 Image Categorization

Supervised Learning for Image Segmentation

Object Recognition II

Improving Recognition through Object Sub-categorization

TagProp: Discriminative Metric Learning in Nearest Neighbor Models for Image Annotation

Predict the Likelihood of Responding to Direct Mail Campaign in Consumer Lending Industry

Bayesian model ensembling using meta-trained recurrent neural networks

Week 3: Perceptron and Multi-layer Perceptron

Transcription:

CAP 6412 Advanced Computer Vision http://www.cs.ucf.edu/~bgong/cap6412.html Boqing Gong April 5th, 2016

Today Administrivia LSTM Attribute in computer vision, by Abdullah and Samer

Project II posted, due Tuesday 04/26, 11:59pm http://www.cs.ucf.edu/~bgong/cap6412/proj2.pdf Today: last day to acquire permission for taking option 2

Next week Tuesday (04/12) Javier Lores Thursday (04/14) Fareeha Irfan

Today Administrivia LSTM Attribute in computer vision, by Abdullah and Samer

A Plain RNN Three time steps and beyond Expressive in modeling sequences Training by backpropagation Unstable Vanishing & exploding gradients Troublesome in learning long-term dependencies Training by other methods? Alternatives exist Hard to use Image credits: Richard Socher

LSTM (Long Short-Term Memory) RNN Overwrite the hidden states àmultiplicative gradients LSTM Add to the cell states àadditive gradients Image credits: http://colah.github.io/posts/2015-08-understanding-lstms/

LSTM step by step Memory cell & gates 1 Logistic 0.5 0 0.5-1 -10-5 0 5 10 '(x) = 1 1 + exp( x) Image credits: http://colah.github.io/posts/2015-08-understanding-lstms/

LSTM step by step Additive update to the cell states f t : forget gate t t : input gate Image credits: http://colah.github.io/posts/2015-08-understanding-lstms/

LSTM step by step Forget gate: Forget/remember some information of time step (t-1) Controlled by current input and previous hidden states, jointly Sometimes, also controlled by previous cell states C t-1 Image credits: http://colah.github.io/posts/2015-08-understanding-lstms/

LSTM step by step Input gate & candidate cell states: They determine the new information to be stored, jointly Image credits: http://colah.github.io/posts/2015-08-understanding-lstms/

LSTM step by step Output gate & hidden states: Hidden states depend on cell states Hidden states (& input) are not included by the LSTM unit Image credits: http://colah.github.io/posts/2015-08-understanding-lstms/

LSTM step by step Output depends on hidden states: y t = (W yh h t + b y ) Image credits: http://colah.github.io/posts/2015-08-understanding-lstms/

LSTM in a nutshell An LSTM contains: - Forget gate - Additive operations à additive gradients - Input gate - Output gate - Memory cell It does not contain: - Input x - Hidden states - Output y

Today Administrivia LSTM Attribute in computer vision, by Abdullah and Samer

Attribute Learning By Abdullah Jamal

Outline What is attribute learning? A Unified Multiplicative Framework for Attribute Learning, Kongming Liang, Hong Chang, Shiguang Shan, Xilin Chen, ICCV 2015 Motivation of the research Main Contribution Approach Outline Details of the Proposed Approach Experiments Conclusion Future Directions

Attribute? an inherent characteristic of an object. Color Shape Pattern Texture

What is visual attributes? Attributes are properties observable in images that have humandesignated names, such as Orange, striped, or Furry.

Attributes-based Recognition Dog Furry White Chimpanzee Black Big Tiger Striped Yellow Striped Black White Big Attributes provide a mode of communication between humans and machines! 5

Datasets Animals with Attributes 85 numeric attribute values for each of the 50 animal classes. 30475 images. minimum and maximum number of images from one category is 92 and 1,168 respectively. apascal/ayahoo 64 types of binary attributes annotated for each object sample of the apascaltrain and test sets, and the ayahoo test set 20 categories for apascal (12695 images), and 12 classes for ayahoo set (2644 images). The CUB-200-2011 Birds ( CUB ) 200 categories of bird species with 11,788 images. 312 binary attributes per image.

SUN attribute dataset 102 scene attributes are defined for each of the 14,340 scene images. 717 scene categories. Clothing Attribute Dataset 26 ground truth clothing attributes with 1856 clothing images. ImageNet Attributes (INA) 9600 images from 384 categories. each image is annotated with 25 attributes.

Attributes in Videos Attributes in video can be used in: Human action recognition Social activities of a group of people (e.g. YouTube video of a wedding reception). Surveillance

Datasets Attributes on UIUC Dataset: 22 action attributes are manually defined for each of the 14 human action classes such as walk, hand-clap, jump-forward, and jumpjack. 532 videos. manually defined 22 action attributes such as standing with arm motion, torso translation with arm motion, leg fold and unfold motion Attributes on Mixed Action Dataset: 34 action attributes are manually defined for each of the 21 human action classes and 2910 videos from the mixed UIUC Action, Weizmann(10 classes, 100 videos) KTH datasets(6 classes,2300 videos).

Attributes on Olympic Sports Dataset 39 action attributes are manually defined for each of the 16 human action classes (high-jump, long-jump, triple-jump, pole-vault, basketball lay-up, bowling, tennis-serve, platform diving, discus throw, hammer throw, javelin throw, shot put, springboard diving, snatch (weightlifting), clean and jerk (weightlifting), and gymnastic vault) 781 videos

A Unified Multiplicative Framework for Attribute Learning, ICCV 2015

Motivation Traditionally computer vision has focused on object recognition, classification, segmentation, retrieval and so on. Recent research shows that visual attributes can be benefit traditional learning problems (image search, object recognition etc.) But, attribute learning is still a challenging problem because They are not always predictable directly from input images. The variation of visual attributes is sometimes large across categories.

Limitations in previous methods Correlation between attributes are ignored. Naturally, attribute as properties of objects are correlated with each other, therefore it is more appropriate to learn all the attributes jointly, such as sharing attribute-specific parameters or common semantic representations Some attributes are hard or even unable to predict based on visual appearances. For example, it is impossible to infer color-relevant attribute from an gray image input or predict whether an animal is fast or slow based on an still image. Negative attribute correlation between object and scene. For weakly supervised attribute learning the input image contains both object and scene. It happens sometimes that the scene has some attributes that are negatively related to object attributes. For example, traditional attribute classifier may predict a polar bear swimming in the ocean to have blue attribute. Different visual attribute appearances vary across categories.

Main Contribution Propose a unified multiplicative framework for attribute learning to tackle all the discussed limitations.

Approach Outline The image and category vectors in the unified common space interact multiplicatively to predict the attributes.

Details of Proposed Approach N labeled training images, where xi RD denotes the D-dimensional image feature vector ai {0, 1}T indicates the absence or presence of all binary attributes. label vector yi RC where C is the number of classes. The training images can be expressed in matrix form as X = [x1, x2,.. similarly for the attribute matrix A RT and class label matrix

Multiplicative Attribute Learning Transform training images and labels into shared feature space. Images X and labels Y are parameterized by Dasd represent feature representation of image x i and its class information In multi-task learning framework, the t th (1,..,T) task represent the binary classifier for learning t th attribute.

Discriminative function of the t th attribute of an object in image x i is: As denotes the parameters for the t th classifier in the latent space.

Wx i means to learn a better visual representation for image x i to facilitate attribute classification. The component Uy i is used as a gate for the attribute classifier v t to transfer knowledge from category information. During training stage, all the parameters will be learned to automatically decide how to leverage image, attribute and category information.

Using logistic regression to jointly learn all the attributes. Loss function is defined as the negative log likelihood. Where dsadsdsa are shared across all images and tasks, a ti represent the absence or presence of the attribute and g(x) is a sigmoid function.

Objective function is defined as

Category-Specific Attribute Classifier The discriminative function can be expressed as U j is the j th column of U and y ji is the binary category label which indicates whether image x i belong to category j.

Train a multi-class softmax classifier by minimizing the loss function described as: At test stage, category can be estimated as

With the estimated category information, they also predict the attribute of x by marginalizing the category label as follows: where e j denotes a vector with only one nonzero coordinate of value 1 in j th position

Instance-specific attribute classifier Jointly train the multiclass classification model and attribute classifiers. After joint training, we obtain instance-specific attribute classifiers for x i :

Linear combination of all the category-specific attribute classifiers. For zero-shot learning, the instance-specific attribute classifier for an image from an unseen category can be estimated by the categoryspecific attribute classifiers of all the seen categories.

Optimization Traditional multiplicative models are optimized using alternating optimization algorithms. Converts the main problem into sub-problems and optimizes one parameter in one sub-problem with other being fixed. Such process is alternated until it converges to local minimum They also use alternating optimization to minimize their objective function.

The parameters W and V are initialized using SVD decomposition of logistic regression classifier parameters. The derivative of objective function w.r.t to parameters are: Where o denotes the Hadamard product. To estimate the optimal value of third matrix with two other are fixed, they use L-BFGS algorithm.

Enhancing Category Information Attributes are usually hard to define and costly to acquire. To counter the small scaled attribute dataset problem, they boost their attribute learning by enhancing category information. Suppose there are two types of training data X and X a. The former has both attribute label and category while latter only has category labels. Now the objective function can be written as

Experiments Datasets Animal with Attributes apascal/ayahoo CUB (Caltech-UCSD-Bird) ImageNet Attributes

For category-level attribute definition, they use Animals with Attributes and CUB. For instance-level attribute definition, apascal-ayahoo and ImageNet attributes are used. For Attribute prediction, they randomly split into training, validation and testing. The dimension of latent space is set to the minimum of the number of categories and attributes.

They use 4096-D DeCAF features extracted from CNN. Metrics are mean area under the curve and mean classification accuracy. For Zero-short learning, they use the specified seen and unseen classes of AwA. For CUB dataset, they split into 150 seen classes and 50 seen classes. The performance is measured by normalized multiclass accuracy.

Category-level Attribute Prediction

Instance-level Attribute Prediction Enhancing Instance-level Attribute Prediction:

Category-Sensitive Attribute Prediction

Zero-Shot learning Recognize images from unseen classes based on transferred attribute concepts, referred as zero-shot learning. Assume K seen classes {y 1,y 2,,y K } and L unseen classes {z 1,z 2,,z L }. Attribute classifiers are learned based on the K seen classes. During testing, the unseen category of an image x is determined based on posterior probability

Class prior p(z l ) is identical for all classes. Attribute priors are defined as Attribute-predictive probability of their method :

Conclusion Model explicitly captures the relationship among image, attribute and category in a multiplicative way in the latent feature space. Achieves better performance on four datasets. Reduces the effort of instance-level attribute annotation. Improves the accuracy of zero-shot learning.

Future Work Scene Recognition Image Retrieval Object Classification Precise image descriptions for human interpretation