Internet of things that video

Size: px
Start display at page:

Download "Internet of things that video"

Transcription

1 Video recognition from a sentence Cees Snoek Intelligent Sensory Information Systems Lab University of Amsterdam The Netherlands Internet of things that video 45 billion cameras by 2022 [LDV Capital] 2 1

2 Technology: self-driving cars 3 Forensics: analyzing terrorist behavior 4 2

3 Well-being: elderly monitoring CareMedia project, CMU 2002 Social: media monitoring 6 3

4 Goal Understand what is happening where and when Kissing Shaking hands Hu et al. CVPR 2016 Inspiration: object location in images Find object location in image based on language Later extended to object segment Hu et al. ECCV 2016 window upper right bottom left window ground truth prediction 4

5 Hu et al. ECCV 2016 Object segmentation from a sentence Image embedding spatial feature map through CNN Sentence embedding final hidden state in LSTM Fully convolutional classification match input sentence to every location on the spatial grid and up-sample Tracking Zhenyang Li Ran Tao Efstratios Gavves Cees Snoek Arnold Smeulders Tracking by Natural Language Specification. In CVPR

6 A long standing computer vision challenge Object Tracking? Step 1: Set the box SINT 6

7 Step 2: Start tracking SINT Key contribution Specify the target by language instead of box Track the little green person with the pointy ears and the beige robe 7

8 Challenges How to obtain a tight box around an object from text? Text ambiguity vs object variance vs object invariance? What happens if the description is no longer valid? Model I: Lingual specification only LSTM encodes the text query h ' h " h h $ # LSTM LSTM LSTM LSTM % " % # % & % $ ( = Little green person robe 8

9 Model I: Lingual specification only LSTM encodes the text query Dynamically generate filters from LSTM output h ' h " h h $ # LSTM LSTM LSTM LSTM Dynamic Filter generation ( ) % " % # % & % $ * = Little green person robe Model I: Lingual specification only LSTM encodes the text query Dynamically generate filters from LSTM output Convolve with input frame ( ) * ) conv5 h ' h " h h $ # LSTM LSTM LSTM LSTM Dynamic Filter generation * ) % " % # % & % $ + = Little green person robe 9

10 Model I: Lingual specification only Tracking by repeated detection Little green person robe! "! #! $! % & = 0,, + Model II: Lingual first, then visual Use Model I for initialization, then track Little green person robe ' ( ' ) ' * ' + Dynamic filters! = 0,, & Tao et al. CVPR

11 Model III: Lingual & visual Adapts the lingual specification over time Little green person robe ' ( ' ) ' * ' + Dynamic filters Attention module! = 0,, & Datasets, pre-training and evaluation Extended existing object tracking datasets with language description about the target box in the first frame Lingual OTB99, all videos from OTB51 and 48 from OTB100 Lingual ImageNet, 4 videos for each of the 25 object categories. 11

12 Datasets, pre-training and evaluation Extended existing object tracking datasets with language description about the target box in the first frame Lingual OTB99, all videos from OTB51 and 48 from OTB100 Lingual ImageNet, 4 videos for each of the 25 object categories. VGG pre-trained on ImageNet, fine-tuned on lingual sets Lingual network pre-trained on ReferIt Tracker evaluation: AUC (area under curve) score ReferIt dataset Kazemsadeh et al., EMNLP 2014 Natural language expressions for segmented regions. 20,000 images 130,525 expressions 96,654 segmented image regions. 10K images for training and validation and 10K for testing. 12

13 Target identification by language Hu et al. CVPR16 Rohrbach et al. ECCV16 Hu et al. ECCV16 Ours percentage 9.6 of 26.7 test samples with IoU19.3 x 23.3 P@ P@ IoU Target identification by language Hu et al. CVPR16 Hu et al. ECCV16 Ours P@ P@ P@ P@ P@ IoU

14 Target identification by language Hu et al. CVPR16 Hu et al. ECCV16 Ours IoU Cross-modal dynamic filter generation makes sense Model results Sorted by first frame accuracy Model 2 and 3 profit from good initialization Hard to recover from poor initialization, then model 1 best choice Lingual only Lingual, then visual Lingual & visual 14

15 White car on the left Lingual only Lingual, then visual Lingual & visual Girl in yellow shirt and purple pants Lingual only Lingual, then visual Lingual & visual 15

16 The black and white dog Lingual only Lingual, then visual Lingual & visual Track by language and provided box Female skater in red Ground truth Box specification Language and box specification 16

17 Track by language and provided box People on the right next to a big tree Ground truth Box specification Language and box specification New applications Tracking objects in multiple videos simultaneously No first-frame requirement, live monitoring across streams Man with blue pants 17

18 Conclusion New type of human-machine interaction for video recognition. More robust tracking. Representation enables novel application scenarios. Vacancies in my lab: 1 assistant professor & 4 PhD students Thank you 18

Fully-Convolutional Siamese Networks for Object Tracking

Fully-Convolutional Siamese Networks for Object Tracking Fully-Convolutional Siamese Networks for Object Tracking Luca Bertinetto*, Jack Valmadre*, João Henriques, Andrea Vedaldi and Philip Torr www.robots.ox.ac.uk/~luca luca.bertinetto@eng.ox.ac.uk Tracking

More information

Fully-Convolutional Siamese Networks for Object Tracking

Fully-Convolutional Siamese Networks for Object Tracking Fully-Convolutional Siamese Networks for Object Tracking Luca Bertinetto*, Jack Valmadre*, João Henriques, Andrea Vedaldi and Philip Torr www.robots.ox.ac.uk/~luca luca.bertinetto@eng.ox.ac.uk Tracking

More information

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun Presented by Tushar Bansal Objective 1. Get bounding box for all objects

More information

Spatial Localization and Detection. Lecture 8-1

Spatial Localization and Detection. Lecture 8-1 Lecture 8: Spatial Localization and Detection Lecture 8-1 Administrative - Project Proposals were due on Saturday Homework 2 due Friday 2/5 Homework 1 grades out this week Midterm will be in-class on Wednesday

More information

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Si Chen The George Washington University sichen@gwmail.gwu.edu Meera Hahn Emory University mhahn7@emory.edu Mentor: Afshin

More information

JOINT DETECTION AND SEGMENTATION WITH DEEP HIERARCHICAL NETWORKS. Zhao Chen Machine Learning Intern, NVIDIA

JOINT DETECTION AND SEGMENTATION WITH DEEP HIERARCHICAL NETWORKS. Zhao Chen Machine Learning Intern, NVIDIA JOINT DETECTION AND SEGMENTATION WITH DEEP HIERARCHICAL NETWORKS Zhao Chen Machine Learning Intern, NVIDIA ABOUT ME 5th year PhD student in physics @ Stanford by day, deep learning computer vision scientist

More information

16-785: Integrated Intelligence in Robotics: Vision, Language, and Planning. Spring 2018 Lecture 14. Image to Text

16-785: Integrated Intelligence in Robotics: Vision, Language, and Planning. Spring 2018 Lecture 14. Image to Text 16-785: Integrated Intelligence in Robotics: Vision, Language, and Planning Spring 2018 Lecture 14. Image to Text Input Output Classification tasks 4/1/18 CMU 16-785: Integrated Intelligence in Robotics

More information

Apparel Classifier and Recommender using Deep Learning

Apparel Classifier and Recommender using Deep Learning Apparel Classifier and Recommender using Deep Learning Live Demo at: http://saurabhg.me/projects/tag-that-apparel Saurabh Gupta sag043@ucsd.edu Siddhartha Agarwal siagarwa@ucsd.edu Apoorve Dave a1dave@ucsd.edu

More information

Lecture 7: Semantic Segmentation

Lecture 7: Semantic Segmentation Semantic Segmentation CSED703R: Deep Learning for Visual Recognition (207F) Segmenting images based on its semantic notion Lecture 7: Semantic Segmentation Bohyung Han Computer Vision Lab. bhhanpostech.ac.kr

More information

ABC-CNN: Attention Based CNN for Visual Question Answering

ABC-CNN: Attention Based CNN for Visual Question Answering ABC-CNN: Attention Based CNN for Visual Question Answering CIS 601 PRESENTED BY: MAYUR RUMALWALA GUIDED BY: DR. SUNNIE CHUNG AGENDA Ø Introduction Ø Understanding CNN Ø Framework of ABC-CNN Ø Datasets

More information

Computer Vision: Making machines see

Computer Vision: Making machines see Computer Vision: Making machines see Roberto Cipolla Department of Engineering http://www.eng.cam.ac.uk/~cipolla/people.html http://www.toshiba.eu/eu/cambridge-research- Laboratory/ Vision: what is where

More information

Object Detection on Self-Driving Cars in China. Lingyun Li

Object Detection on Self-Driving Cars in China. Lingyun Li Object Detection on Self-Driving Cars in China Lingyun Li Introduction Motivation: Perception is the key of self-driving cars Data set: 10000 images with annotation 2000 images without annotation (not

More information

Object Detection Based on Deep Learning

Object Detection Based on Deep Learning Object Detection Based on Deep Learning Yurii Pashchenko AI Ukraine 2016, Kharkiv, 2016 Image classification (mostly what you ve seen) http://tutorial.caffe.berkeleyvision.org/caffe-cvpr15-detection.pdf

More information

3 Object Detection. BVM 2018 Tutorial: Advanced Deep Learning Methods. Paul F. Jaeger, Division of Medical Image Computing

3 Object Detection. BVM 2018 Tutorial: Advanced Deep Learning Methods. Paul F. Jaeger, Division of Medical Image Computing 3 Object Detection BVM 2018 Tutorial: Advanced Deep Learning Methods Paul F. Jaeger, of Medical Image Computing What is object detection? classification segmentation obj. detection (1 label per pixel)

More information

Semantic image search using queries

Semantic image search using queries Semantic image search using queries Shabaz Basheer Patel, Anand Sampat Department of Electrical Engineering Stanford University CA 94305 shabaz@stanford.edu,asampat@stanford.edu Abstract Previous work,

More information

Exploiting noisy web data for largescale visual recognition

Exploiting noisy web data for largescale visual recognition Exploiting noisy web data for largescale visual recognition Lamberto Ballan University of Padova, Italy CVPRW WebVision - Jul 26, 2017 Datasets drive computer vision progress ImageNet Slide credit: O.

More information

Yiqi Yan. May 10, 2017

Yiqi Yan. May 10, 2017 Yiqi Yan May 10, 2017 P a r t I F u n d a m e n t a l B a c k g r o u n d s Convolution Single Filter Multiple Filters 3 Convolution: case study, 2 filters 4 Convolution: receptive field receptive field

More information

Mask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma

Mask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma Mask R-CNN presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma Mask R-CNN Background Related Work Architecture Experiment Mask R-CNN Background Related Work Architecture Experiment Background From left

More information

Deep Character-Level Click-Through Rate Prediction for Sponsored Search

Deep Character-Level Click-Through Rate Prediction for Sponsored Search Deep Character-Level Click-Through Rate Prediction for Sponsored Search Bora Edizel - Phd Student UPF Amin Mantrach - Criteo Research Xiao Bai - Oath This work was done at Yahoo and will be presented as

More information

Object Detection. CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR

Object Detection. CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR Object Detection CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR Problem Description Arguably the most important part of perception Long term goals for object recognition: Generalization

More information

YOLO9000: Better, Faster, Stronger

YOLO9000: Better, Faster, Stronger YOLO9000: Better, Faster, Stronger Date: January 24, 2018 Prepared by Haris Khan (University of Toronto) Haris Khan CSC2548: Machine Learning in Computer Vision 1 Overview 1. Motivation for one-shot object

More information

Rich feature hierarchies for accurate object detection and semantic segmentation

Rich feature hierarchies for accurate object detection and semantic segmentation Rich feature hierarchies for accurate object detection and semantic segmentation Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik Presented by Pandian Raju and Jialin Wu Last class SGD for Document

More information

Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking

Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking Martin Danelljan, Andreas Robinson, Fahad Shahbaz Khan, Michael Felsberg 2 Discriminative Correlation Filters (DCF)

More information

Deep learning for object detection. Slides from Svetlana Lazebnik and many others

Deep learning for object detection. Slides from Svetlana Lazebnik and many others Deep learning for object detection Slides from Svetlana Lazebnik and many others Recent developments in object detection 80% PASCAL VOC mean0average0precision0(map) 70% 60% 50% 40% 30% 20% 10% Before deep

More information

Learning Semantic Video Captioning using Data Generated with Grand Theft Auto

Learning Semantic Video Captioning using Data Generated with Grand Theft Auto A dark car is turning left on an exit Learning Semantic Video Captioning using Data Generated with Grand Theft Auto Alex Polis Polichroniadis Data Scientist, MSc Kolia Sadeghi Applied Mathematician, PhD

More information

Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction

Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction by Noh, Hyeonwoo, Paul Hongsuck Seo, and Bohyung Han.[1] Presented : Badri Patro 1 1 Computer Vision Reading

More information

CEA LIST s participation to the Scalable Concept Image Annotation task of ImageCLEF 2015

CEA LIST s participation to the Scalable Concept Image Annotation task of ImageCLEF 2015 CEA LIST s participation to the Scalable Concept Image Annotation task of ImageCLEF 2015 Etienne Gadeski, Hervé Le Borgne, and Adrian Popescu CEA, LIST, Laboratory of Vision and Content Engineering, France

More information

Photo OCR ( )

Photo OCR ( ) Photo OCR (2017-2018) Xiang Bai Huazhong University of Science and Technology Outline VALSE2018, DaLian Xiang Bai 2 Deep Direct Regression for Multi-Oriented Scene Text Detection [He et al., ICCV, 2017.]

More information

Research at Google: With a Case of YouTube-8M. Joonseok Lee, Google Research

Research at Google: With a Case of YouTube-8M. Joonseok Lee, Google Research Research at Google: With a Case of YouTube-8M Joonseok Lee, Google Research Google Research We tackle the most challenging problems in Computer Science and related fields. Being bold and taking risks allows

More information

OBJECT DETECTION HYUNG IL KOO

OBJECT DETECTION HYUNG IL KOO OBJECT DETECTION HYUNG IL KOO INTRODUCTION Computer Vision Tasks Classification + Localization Classification: C-classes Input: image Output: class label Evaluation metric: accuracy Localization Input:

More information

Scene Text Recognition for Augmented Reality. Sagar G V Adviser: Prof. Bharadwaj Amrutur Indian Institute Of Science

Scene Text Recognition for Augmented Reality. Sagar G V Adviser: Prof. Bharadwaj Amrutur Indian Institute Of Science Scene Text Recognition for Augmented Reality Sagar G V Adviser: Prof. Bharadwaj Amrutur Indian Institute Of Science Outline Research area and motivation Finding text in natural scenes Prior art Improving

More information

Deep Incremental Scene Understanding. Federico Tombari & Christian Rupprecht Technical University of Munich, Germany

Deep Incremental Scene Understanding. Federico Tombari & Christian Rupprecht Technical University of Munich, Germany Deep Incremental Scene Understanding Federico Tombari & Christian Rupprecht Technical University of Munich, Germany C. Couprie et al. "Toward Real-time Indoor Semantic Segmentation Using Depth Information"

More information

Person Re-identification for Improved Multi-person Multi-camera Tracking by Continuous Entity Association

Person Re-identification for Improved Multi-person Multi-camera Tracking by Continuous Entity Association Person Re-identification for Improved Multi-person Multi-camera Tracking by Continuous Entity Association Neeti Narayan, Nishant Sankaran, Devansh Arpit, Karthik Dantu, Srirangaraj Setlur, Venu Govindaraju

More information

LEARNING TO INFER GRAPHICS PROGRAMS FROM HAND DRAWN IMAGES

LEARNING TO INFER GRAPHICS PROGRAMS FROM HAND DRAWN IMAGES LEARNING TO INFER GRAPHICS PROGRAMS FROM HAND DRAWN IMAGES Kevin Ellis - MIT, Daniel Ritchie - Brown University, Armando Solar-Lezama - MIT, Joshua b. Tenenbaum - MIT Presented by : Maliha Arif Advanced

More information

Using Machine Learning for Classification of Cancer Cells

Using Machine Learning for Classification of Cancer Cells Using Machine Learning for Classification of Cancer Cells Camille Biscarrat University of California, Berkeley I Introduction Cell screening is a commonly used technique in the development of new drugs.

More information

Multilayer and Multimodal Fusion of Deep Neural Networks for Video Classification

Multilayer and Multimodal Fusion of Deep Neural Networks for Video Classification Multilayer and Multimodal Fusion of Deep Neural Networks for Video Classification Xiaodong Yang, Pavlo Molchanov, Jan Kautz INTELLIGENT VIDEO ANALYTICS Surveillance event detection Human-computer interaction

More information

R-FCN: Object Detection with Really - Friggin Convolutional Networks

R-FCN: Object Detection with Really - Friggin Convolutional Networks R-FCN: Object Detection with Really - Friggin Convolutional Networks Jifeng Dai Microsoft Research Li Yi Tsinghua Univ. Kaiming He FAIR Jian Sun Microsoft Research NIPS, 2016 Or Region-based Fully Convolutional

More information

(Deep) Learning for Robot Perception and Navigation. Wolfram Burgard

(Deep) Learning for Robot Perception and Navigation. Wolfram Burgard (Deep) Learning for Robot Perception and Navigation Wolfram Burgard Deep Learning for Robot Perception (and Navigation) Lifeng Bo, Claas Bollen, Thomas Brox, Andreas Eitel, Dieter Fox, Gabriel L. Oliveira,

More information

Fast CNN-Based Object Tracking Using Localization Layers and Deep Features Interpolation

Fast CNN-Based Object Tracking Using Localization Layers and Deep Features Interpolation Fast CNN-Based Object Tracking Using Localization Layers and Deep Features Interpolation Al-Hussein A. El-Shafie Faculty of Engineering Cairo University Giza, Egypt elshafie_a@yahoo.com Mohamed Zaki Faculty

More information

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation Object detection using Region Proposals (RCNN) Ernest Cheung COMP790-125 Presentation 1 2 Problem to solve Object detection Input: Image Output: Bounding box of the object 3 Object detection using CNN

More information

Lecture 5: Object Detection

Lecture 5: Object Detection Object Detection CSED703R: Deep Learning for Visual Recognition (2017F) Lecture 5: Object Detection Bohyung Han Computer Vision Lab. bhhan@postech.ac.kr 2 Traditional Object Detection Algorithms Region-based

More information

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality

More information

Fast trajectory matching using small binary images

Fast trajectory matching using small binary images Title Fast trajectory matching using small binary images Author(s) Zhuo, W; Schnieders, D; Wong, KKY Citation The 3rd International Conference on Multimedia Technology (ICMT 2013), Guangzhou, China, 29

More information

SurfNet: Generating 3D shape surfaces using deep residual networks-supplementary Material

SurfNet: Generating 3D shape surfaces using deep residual networks-supplementary Material SurfNet: Generating 3D shape surfaces using deep residual networks-supplementary Material Ayan Sinha MIT Asim Unmesh IIT Kanpur Qixing Huang UT Austin Karthik Ramani Purdue sinhayan@mit.edu a.unmesh@gmail.com

More information

END-TO-END CHINESE TEXT RECOGNITION

END-TO-END CHINESE TEXT RECOGNITION END-TO-END CHINESE TEXT RECOGNITION Jie Hu 1, Tszhang Guo 1, Ji Cao 2, Changshui Zhang 1 1 Department of Automation, Tsinghua University 2 Beijing SinoVoice Technology November 15, 2017 Presentation at

More information

Encoder-Decoder Networks for Semantic Segmentation. Sachin Mehta

Encoder-Decoder Networks for Semantic Segmentation. Sachin Mehta Encoder-Decoder Networks for Semantic Segmentation Sachin Mehta Outline > Overview of Semantic Segmentation > Encoder-Decoder Networks > Results What is Semantic Segmentation? Input: RGB Image Output:

More information

PSU Student Research Symposium 2017 Bayesian Optimization for Refining Object Proposals, with an Application to Pedestrian Detection Anthony D.

PSU Student Research Symposium 2017 Bayesian Optimization for Refining Object Proposals, with an Application to Pedestrian Detection Anthony D. PSU Student Research Symposium 2017 Bayesian Optimization for Refining Object Proposals, with an Application to Pedestrian Detection Anthony D. Rhodes 5/10/17 What is Machine Learning? Machine learning

More information

Visual features detection based on deep neural network in autonomous driving tasks

Visual features detection based on deep neural network in autonomous driving tasks 430 Fomin I., Gromoshinskii D., Stepanov D. Visual features detection based on deep neural network in autonomous driving tasks Ivan Fomin, Dmitrii Gromoshinskii, Dmitry Stepanov Computer vision lab Russian

More information

Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields

Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields Authors: Zhe Cao, Tomas Simon, Shih-En Wei, Yaser Sheikh Presented by: Suraj Kesavan, Priscilla Jennifer ECS 289G: Visual Recognition

More information

A Statistical Approach to Culture Colors Distribution in Video Sensors Angela D Angelo, Jean-Luc Dugelay

A Statistical Approach to Culture Colors Distribution in Video Sensors Angela D Angelo, Jean-Luc Dugelay A Statistical Approach to Culture Colors Distribution in Video Sensors Angela D Angelo, Jean-Luc Dugelay VPQM 2010, Scottsdale, Arizona, U.S.A, January 13-15 Outline Introduction Proposed approach Colors

More information

CS231N Section. Video Understanding 6/1/2018

CS231N Section. Video Understanding 6/1/2018 CS231N Section Video Understanding 6/1/2018 Outline Background / Motivation / History Video Datasets Models Pre-deep learning CNN + RNN 3D convolution Two-stream What we ve seen in class so far... Image

More information

Object detection with CNNs

Object detection with CNNs Object detection with CNNs 80% PASCAL VOC mean0average0precision0(map) 70% 60% 50% 40% 30% 20% 10% Before CNNs After CNNs 0% 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 year Region proposals

More information

Fully Convolutional Networks for Semantic Segmentation

Fully Convolutional Networks for Semantic Segmentation Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell UC Berkeley Chaim Ginzburg for Deep Learning seminar 1 Semantic Segmentation Define a pixel-wise labeling

More information

arxiv:submit/ [cs.cv] 13 Jan 2018

arxiv:submit/ [cs.cv] 13 Jan 2018 Benchmark Visual Question Answer Models by using Focus Map Wenda Qiu Yueyang Xianzang Zhekai Zhang Shanghai Jiaotong University arxiv:submit/2130661 [cs.cv] 13 Jan 2018 Abstract Inferring and Executing

More information

Bilinear Models for Fine-Grained Visual Recognition

Bilinear Models for Fine-Grained Visual Recognition Bilinear Models for Fine-Grained Visual Recognition Subhransu Maji College of Information and Computer Sciences University of Massachusetts, Amherst Fine-grained visual recognition Example: distinguish

More information

Large-scale Video Classification with Convolutional Neural Networks

Large-scale Video Classification with Convolutional Neural Networks Large-scale Video Classification with Convolutional Neural Networks Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, Li Fei-Fei Note: Slide content mostly from : Bay Area

More information

Machine Learning 13. week

Machine Learning 13. week Machine Learning 13. week Deep Learning Convolutional Neural Network Recurrent Neural Network 1 Why Deep Learning is so Popular? 1. Increase in the amount of data Thanks to the Internet, huge amount of

More information

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides Deep Learning in Visual Recognition Thanks Da Zhang for the slides Deep Learning is Everywhere 2 Roadmap Introduction Convolutional Neural Network Application Image Classification Object Detection Object

More information

Fine-tuning Pre-trained Large Scaled ImageNet model on smaller dataset for Detection task

Fine-tuning Pre-trained Large Scaled ImageNet model on smaller dataset for Detection task Fine-tuning Pre-trained Large Scaled ImageNet model on smaller dataset for Detection task Kyunghee Kim Stanford University 353 Serra Mall Stanford, CA 94305 kyunghee.kim@stanford.edu Abstract We use a

More information

Two-Stream Convolutional Networks for Action Recognition in Videos

Two-Stream Convolutional Networks for Action Recognition in Videos Two-Stream Convolutional Networks for Action Recognition in Videos Karen Simonyan Andrew Zisserman Cemil Zalluhoğlu Introduction Aim Extend deep Convolution Networks to action recognition in video. Motivation

More information

Foundations for Summarizing and Learning Latent Structure in Video

Foundations for Summarizing and Learning Latent Structure in Video Foundations for Summarizing and Learning Latent Structure in Video Presenter: Kevin Pitstick, MTS Engineer PI: Ed Morris, MTS Senior Engineer Copyright 2017 Carnegie Mellon University. All Rights Reserved.

More information

Apparel Classification using CNNs

Apparel Classification using CNNs Apparel Classification using CNNs Rohit Patki ICME Stanford University rpatki@stanford.edu Suhas Suresha ICME Stanford University suhas17@stanford.edu Abstract Apparel classification from images finds

More information

Modeling and Propagating CNNs in a Tree Structure for Visual Tracking

Modeling and Propagating CNNs in a Tree Structure for Visual Tracking The Visual Object Tracking Challenge Workshop 2016 Modeling and Propagating CNNs in a Tree Structure for Visual Tracking Hyeonseob Nam* Mooyeol Baek* Bohyung Han Dept. of Computer Science and Engineering

More information

April 4-7, 2016 Silicon Valley

April 4-7, 2016 Silicon Valley April 4-7, 2016 Silicon Valley Neural Attention for Object Tracking Brian Cheung bcheung@berkeley.edu Redwood Center for Theoretical Neuroscience, UC Berkeley Visual Computing Research, NVIDIA Source:

More information

Joint Object Detection and Viewpoint Estimation using CNN features

Joint Object Detection and Viewpoint Estimation using CNN features Joint Object Detection and Viewpoint Estimation using CNN features Carlos Guindel, David Martín and José M. Armingol cguindel@ing.uc3m.es Intelligent Systems Laboratory Universidad Carlos III de Madrid

More information

Hide-and-Seek: Forcing a network to be Meticulous for Weakly-supervised Object and Action Localization

Hide-and-Seek: Forcing a network to be Meticulous for Weakly-supervised Object and Action Localization Hide-and-Seek: Forcing a network to be Meticulous for Weakly-supervised Object and Action Localization Krishna Kumar Singh and Yong Jae Lee University of California, Davis ---- Paper Presentation Yixian

More information

Human Pose Estimation with Deep Learning. Wei Yang

Human Pose Estimation with Deep Learning. Wei Yang Human Pose Estimation with Deep Learning Wei Yang Applications Understand Activities Family Robots American Heist (2014) - The Bank Robbery Scene 2 What do we need to know to recognize a crime scene? 3

More information

Class 5: Attributes and Semantic Features

Class 5: Attributes and Semantic Features Class 5: Attributes and Semantic Features Rogerio Feris, Feb 21, 2013 EECS 6890 Topics in Information Processing Spring 2013, Columbia University http://rogerioferis.com/visualrecognitionandsearch Project

More information

Deep Learning for Computer Vision with MATLAB By Jon Cherrie

Deep Learning for Computer Vision with MATLAB By Jon Cherrie Deep Learning for Computer Vision with MATLAB By Jon Cherrie 2015 The MathWorks, Inc. 1 Deep learning is getting a lot of attention "Dahl and his colleagues won $22,000 with a deeplearning system. 'We

More information

Learning-based Localization

Learning-based Localization Learning-based Localization Eric Brachmann ECCV 2018 Tutorial on Visual Localization - Feature-based vs. Learned Approaches Torsten Sattler, Eric Brachmann Roadmap Machine Learning Basics [10min] Convolutional

More information

Predicting ground-level scene Layout from Aerial imagery. Muhammad Hasan Maqbool

Predicting ground-level scene Layout from Aerial imagery. Muhammad Hasan Maqbool Predicting ground-level scene Layout from Aerial imagery Muhammad Hasan Maqbool Objective Given the overhead image predict its ground level semantic segmentation Predicted ground level labeling Overhead/Aerial

More information

Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab.

Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab. [ICIP 2017] Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab., POSTECH Pedestrian Detection Goal To draw bounding boxes that

More information

Multi-View 3D Object Detection Network for Autonomous Driving

Multi-View 3D Object Detection Network for Autonomous Driving Multi-View 3D Object Detection Network for Autonomous Driving Xiaozhi Chen, Huimin Ma, Ji Wan, Bo Li, Tian Xia CVPR 2017 (Spotlight) Presented By: Jason Ku Overview Motivation Dataset Network Architecture

More information

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro CMU 15-781 Lecture 18: Deep learning and Vision: Convolutional neural networks Teacher: Gianni A. Di Caro DEEP, SHALLOW, CONNECTED, SPARSE? Fully connected multi-layer feed-forward perceptrons: More powerful

More information

Martian lava field, NASA, Wikipedia

Martian lava field, NASA, Wikipedia Martian lava field, NASA, Wikipedia Old Man of the Mountain, Franconia, New Hampshire Pareidolia http://smrt.ccel.ca/203/2/6/pareidolia/ Reddit for more : ) https://www.reddit.com/r/pareidolia/top/ Pareidolia

More information

Towards Large-Scale Semantic Representations for Actionable Exploitation. Prof. Trevor Darrell UC Berkeley

Towards Large-Scale Semantic Representations for Actionable Exploitation. Prof. Trevor Darrell UC Berkeley Towards Large-Scale Semantic Representations for Actionable Exploitation Prof. Trevor Darrell UC Berkeley traditional surveillance sensor emerging crowd sensor Desired capabilities: spatio-temporal reconstruction

More information

CS 231A Computer Vision (Winter 2014) Problem Set 3

CS 231A Computer Vision (Winter 2014) Problem Set 3 CS 231A Computer Vision (Winter 2014) Problem Set 3 Due: Feb. 18 th, 2015 (11:59pm) 1 Single Object Recognition Via SIFT (45 points) In his 2004 SIFT paper, David Lowe demonstrates impressive object recognition

More information

MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK. Wenjie Guan, YueXian Zou*, Xiaoqun Zhou

MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK. Wenjie Guan, YueXian Zou*, Xiaoqun Zhou MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK Wenjie Guan, YueXian Zou*, Xiaoqun Zhou ADSPLAB/Intelligent Lab, School of ECE, Peking University, Shenzhen,518055, China

More information

A Deep Learning Approach to Vehicle Speed Estimation

A Deep Learning Approach to Vehicle Speed Estimation A Deep Learning Approach to Vehicle Speed Estimation Benjamin Penchas bpenchas@stanford.edu Tobin Bell tbell@stanford.edu Marco Monteiro marcorm@stanford.edu ABSTRACT Given car dashboard video footage,

More information

Web-Scale Image Search and Their Applications

Web-Scale Image Search and Their Applications Web-Scale Image Search and Their Applications Sung-Eui Yoon KAIST http://sglab.kaist.ac.kr Project Guidelines: Project Topics Any topics related to the course theme are okay You can find topics by browsing

More information

ECCV Presented by: Boris Ivanovic and Yolanda Wang CS 331B - November 16, 2016

ECCV Presented by: Boris Ivanovic and Yolanda Wang CS 331B - November 16, 2016 ECCV 2016 Presented by: Boris Ivanovic and Yolanda Wang CS 331B - November 16, 2016 Fundamental Question What is a good vector representation of an object? Something that can be easily predicted from 2D

More information

Person Retrieval in Surveillance Video using Height, Color and Gender

Person Retrieval in Surveillance Video using Height, Color and Gender Person Retrieval in Surveillance Video using Height, Color and Gender Hiren Galiyawala 1, Kenil Shah 2, Vandit Gajjar 1, Mehul S. Raval 1 1 School of Engineering and Applied Science, Ahmedabad University,

More information

Segmentation as Selective Search for Object Recognition in ILSVRC2011

Segmentation as Selective Search for Object Recognition in ILSVRC2011 Segmentation as Selective Search for Object Recognition in ILSVRC2011 Koen van de Sande Jasper Uijlings Arnold Smeulders Theo Gevers Nicu Sebe Cees Snoek University of Amsterdam, University of Trento ILSVRC2011

More information

Deep Learning For Video Classification. Presented by Natalie Carlebach & Gil Sharon

Deep Learning For Video Classification. Presented by Natalie Carlebach & Gil Sharon Deep Learning For Video Classification Presented by Natalie Carlebach & Gil Sharon Overview Of Presentation Motivation Challenges of video classification Common datasets 4 different methods presented in

More information

CAP 6412 Advanced Computer Vision

CAP 6412 Advanced Computer Vision CAP 6412 Advanced Computer Vision http://www.cs.ucf.edu/~bgong/cap6412.html Boqing Gong April 21st, 2016 Today Administrivia Free parameters in an approach, model, or algorithm? Egocentric videos by Aisha

More information

Adaptive Gesture Recognition System Integrating Multiple Inputs

Adaptive Gesture Recognition System Integrating Multiple Inputs Adaptive Gesture Recognition System Integrating Multiple Inputs Master Thesis - Colloquium Tobias Staron University of Hamburg Faculty of Mathematics, Informatics and Natural Sciences Technical Aspects

More information

Final Report: Smart Trash Net: Waste Localization and Classification

Final Report: Smart Trash Net: Waste Localization and Classification Final Report: Smart Trash Net: Waste Localization and Classification Oluwasanya Awe oawe@stanford.edu Robel Mengistu robel@stanford.edu December 15, 2017 Vikram Sreedhar vsreed@stanford.edu Abstract Given

More information

Rich feature hierarchies for accurate object detection and semant

Rich feature hierarchies for accurate object detection and semant Rich feature hierarchies for accurate object detection and semantic segmentation Speaker: Yucong Shen 4/5/2018 Develop of Object Detection 1 DPM (Deformable parts models) 2 R-CNN 3 Fast R-CNN 4 Faster

More information

Image Classification pipeline. Lecture 2-1

Image Classification pipeline. Lecture 2-1 Lecture 2: Image Classification pipeline Lecture 2-1 Administrative: Piazza For questions about midterm, poster session, projects, etc, use Piazza! SCPD students: Use your @stanford.edu address to register

More information

SocialML: machine learning for social media video creators

SocialML: machine learning for social media video creators SocialML: machine learning for social media video creators Tomasz Trzcinski a,b, Adam Bielski b, Pawel Cyrta b and Matthew Zak b a Warsaw University of Technology b Tooploox firstname.lastname@tooploox.com

More information

Binary Convolutional Neural Network on RRAM

Binary Convolutional Neural Network on RRAM Binary Convolutional Neural Network on RRAM Tianqi Tang, Lixue Xia, Boxun Li, Yu Wang, Huazhong Yang Dept. of E.E, Tsinghua National Laboratory for Information Science and Technology (TNList) Tsinghua

More information

CS 1674: Intro to Computer Vision. Attributes. Prof. Adriana Kovashka University of Pittsburgh November 2, 2016

CS 1674: Intro to Computer Vision. Attributes. Prof. Adriana Kovashka University of Pittsburgh November 2, 2016 CS 1674: Intro to Computer Vision Attributes Prof. Adriana Kovashka University of Pittsburgh November 2, 2016 Plan for today What are attributes and why are they useful? (paper 1) Attributes for zero-shot

More information

LSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University

LSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University LSTM and its variants for visual recognition Xiaodan Liang xdliang328@gmail.com Sun Yat-sen University Outline Context Modelling with CNN LSTM and its Variants LSTM Architecture Variants Application in

More information

Detection III: Analyzing and Debugging Detection Methods

Detection III: Analyzing and Debugging Detection Methods CS 1699: Intro to Computer Vision Detection III: Analyzing and Debugging Detection Methods Prof. Adriana Kovashka University of Pittsburgh November 17, 2015 Today Review: Deformable part models How can

More information

Cascade Region Regression for Robust Object Detection

Cascade Region Regression for Robust Object Detection Large Scale Visual Recognition Challenge 2015 (ILSVRC2015) Cascade Region Regression for Robust Object Detection Jiankang Deng, Shaoli Huang, Jing Yang, Hui Shuai, Zhengbo Yu, Zongguang Lu, Qiang Ma, Yali

More information

CS839: Probabilistic Graphical Models. Lecture 22: The Attention Mechanism. Theo Rekatsinas

CS839: Probabilistic Graphical Models. Lecture 22: The Attention Mechanism. Theo Rekatsinas CS839: Probabilistic Graphical Models Lecture 22: The Attention Mechanism Theo Rekatsinas 1 Why Attention? Consider machine translation: We need to pay attention to the word we are currently translating.

More information

Segmentation. Bottom up Segmentation Semantic Segmentation

Segmentation. Bottom up Segmentation Semantic Segmentation Segmentation Bottom up Segmentation Semantic Segmentation Semantic Labeling of Street Scenes Ground Truth Labels 11 classes, almost all occur simultaneously, large changes in viewpoint, scale sky, road,

More information

RoI Pooling Based Fast Multi-Domain Convolutional Neural Networks for Visual Tracking

RoI Pooling Based Fast Multi-Domain Convolutional Neural Networks for Visual Tracking Advances in Intelligent Systems Research, volume 133 2nd International Conference on Artificial Intelligence and Industrial Engineering (AIIE2016) RoI Pooling Based Fast Multi-Domain Convolutional Neural

More information

YOLO 9000 TAEWAN KIM

YOLO 9000 TAEWAN KIM YOLO 9000 TAEWAN KIM DNN-based Object Detection R-CNN MultiBox SPP-Net DeepID-Net NoC Fast R-CNN DeepBox MR-CNN 2013.11 Faster R-CNN YOLO AttentionNet DenseBox SSD Inside-OutsideNet(ION) G-CNN NIPS 15

More information

CS 223B Computer Vision Problem Set 3

CS 223B Computer Vision Problem Set 3 CS 223B Computer Vision Problem Set 3 Due: Feb. 22 nd, 2011 1 Probabilistic Recursion for Tracking In this problem you will derive a method for tracking a point of interest through a sequence of images.

More information