Predicting ground-level scene Layout from Aerial imagery. Muhammad Hasan Maqbool

Size: px
Start display at page:

Download "Predicting ground-level scene Layout from Aerial imagery. Muhammad Hasan Maqbool"

Transcription

1 Predicting ground-level scene Layout from Aerial imagery Muhammad Hasan Maqbool

2 Objective Given the overhead image predict its ground level semantic segmentation Predicted ground level labeling Overhead/Aerial Image Ground Level Image

3 Overall Architecture (Training) VGG Labeling Transform Ground Level Labeling (Predicted) Overhead/Aerial Image Cross Entropy Loss SegNet Ground Level Image Ground Level Labeling

4 Ground Truth of Ground-level Segmentation 1. Generate the segmentation of ground level image using (pre-trained model) 1. SegNet This gives Ground Truth for training 2. To avoid manual pixel annotation of their new dataset (overhead imagery) Ground Level Image

5 Algorithm Overview Aerial Image to Ground Level (Predicted) Segmentation 512 Conv 1 x Conv 1 x 1 4 Conv 1 x 1 17 x 17 = 289 Matrix Multiplication 1 x 1 Conv 128 Conv 1 x 1 64 Conv 1 x 1 1 Conv 1 x x x 40

6 Algorithm Overview Predicted and Ground Truth Layouts Distance Minimization Minimize the pixel distance between the ground layout and the aerial-to-ground layout prediction Ground Level Layout (ground truth) Predicted Ground Level Layout Cross Entropy Loss

7 Network A Network A Hypercolumn 17x17x960 17x17x512 17x17x512 17x17x4 Three conv layers employ 1x1 convolution Reduce channel dimension from 960 to 4

8 Network S Encode the global information of the aerial image Network S 17x17x512 17x17x128 17x17x64 17x17x1 Three conv layers employ 1x1 convolution Reduce channel dimension from 512 to 1

9 Determine the Size of the Transformation Matrix Aerial image layout l a is multiplied by transformation matrix M to get the predicted ground-level layout l g M 17x17x1 * Vec = l a Reshape 8x40x1 l g Based on the input size and the desired output size, we can determine the size of M -----> In this case, we have M = 320x289

10 Construct Input to Network F How to set the coordinates for each pixel will be explained later S network (i, j, y, x) 17x17x1 Reshape 1x1x289 Width (W) = 289 Height: H =320 1x1x4 Concatenate coordinates Height: H =320 Width (W) = 289

11 Network F Transformation matrix

12 How to determine the coordinates? (i, j, y, x)

13 How to determine the coordinates? Recall the following procedure is used to get the predicted ground-level layout Reshape Transformation matrix

14 How to determine the coordinates? (i, j) = (0, 0) i j l a i: row index j: column index Vec 17x17 l a (i, j) = (0, 16) (i, j) = (1, 0) (i, j) = (1, 16) 289-D vector (i, j) = (16, 16)

15 How to determine the coordinates? 320-D vector Reshape y x 8x40 l g y: row index x: column index

16 How to determine the coordinates? 320-D vector (y, x) = (0, 0) (y, x) = (0, 39) (y, x) = (1, 0) x 8x40 (y, x) = (1, 39) Reshape y l g y: row index x: column index (y, x) = (7, 39)

17 How to determine the coordinates? 320x289 0th row 39th row * (i, j) = (0, 0) (i, j) = (0, 16) (i, j) = (1, 0) (i, j) = (1, 16) (y, x) = (0, 0) 320-D (rotate to a row vector due to space limit) (y, x) = (0, 39) (y, x) = (1, 0) (y, x) = (1, 39) (I, j, y, x) = (0, 0, 0, 0) (I, j, y, x) = (0, 1, 0, 0) (I, j, y, x) = (0, 16, 0, 0) (I, j, y, x) = (1, 0, 0, 0) (y, x) = (7, 39) (I, j, y, x) = (16, 16, 0, 0) 289-D (i, j) = (16, 16) One row determines a single pixel value at location (y, x)

18 How to determine the coordinates? 320x289 0th row 39th row * (i, j) = (0, 0) (i, j) = (0, 16) (i, j) = (1, 0) (i, j) = (1, 16) (y, x) = (0, 0) (y, x) = (0, 39) (y, x) = (1, 0) (y, x) = (1, 39) (I, j, y, x) = (0, 0, 0, 39) (I, j, y, x) = (0, 1, 0, 39) (I, j, y, x) = (0, 16, 0, 39) (I, j, y, x) = (1, 0, 0, 39) 320-D (y, x) = (7, 39) (I, j, y, x) = (16, 16, 0, 39) 289-D (i, j) = (16, 16) One row determines a single pixel value at location (y, x)

19 Interpret the transformation matrix 320x289 0th row 39th row * 289-D (i, j) = (0, 0) (i, j) = (0, 16) (i, j) = (1, 0) (i, j) = (1, 16) (i, j) = (16, 16) 320-D One row determines a single pixel value at location (y, x) This is Weight the 17x17 pixels in I a Each row of the transformation matrix e.g. Determine one pixel in I g

20 Transformation Matrix Visualization The transformation matrix encodes the relationship between the aerial-level pixels and the ground-level pixels x

21 Dataset

22 Cross-View Dataset CVUSA contains approximately 1.5 million geo-tagged pairs of ground and aerial images from across the United States. Google Street View panoramas of CVUSA as our ground level images Corresponding aerial images at zoom level 19 35,532 image pairs for training and 8,884 image pairs for testing.

23 Cross-View Dataset: An Example In the aerial images, north is the up direction. In the ground images, north is the central column. Aerial Ground

24 Cross-View Dataset

25 Cross-View Dataset

26 Evaluation and Applications

27 Weakly Supervised Learning

28 Weakly Supervised Learning for Aerial Image Classification and Segmentation

29 As Pre-trained Model

30 Network Pre-training for Aerial Image Segmentation

31 Network Pre-training for Aerial Image Segmentation Per-class Precision on the ISPRS Segmentation Task

32 Network Pre-training for Aerial Image Segmentation

33 Orientation Estimation

34 Orientation Estimation Given a street view image I g, estimate its orientation 1. Get the corresponding aerial image (using GPS tag), I a 2. Infer the semantic labeling of the query image, I g L g 3. Predict the ground image labels from the aerial image using the proposed network, I a -> Lg 4. Compute the cross entropy between Lg and Lg in a sliding window fashion across all possible orientations 5. The smallest entropy is the optimal estimated orientation

35 Orientation Estimation Their CVUSA dataset contains aligned image pairs Therefore, the learned network predicts the ground (panorama image) layout in a fixed orientation served as reference

36 Orientation Estimation Query street view image Get aerial image Get segmentation (e.g. SegNet) Get predicted groundlevel layout using the proposed network W H 1 column = 360 degree / W Predicted ground-level layout of the panorama image

37 Orientation Estimation

38 Geocalibration

39 Fine-Grained Geocalibration

40 Fine-Grained Geocalibration

41 Ground Level Image Synthesization

42 Ground-level Image Synthesization

43 Ground-level Image Synthesization

44 Conclusion A novel strategy to relate aerial-level imagery and the ground-level imagery A novel strategy to exploit the automatically labeled ground images as a form of weak supervision for aerial imagery understanding Show the potential of this method in geocalibration and ground appearance synthetization

45 Questions

arxiv: v1 [cs.cv] 8 Dec 2016

arxiv: v1 [cs.cv] 8 Dec 2016 Predicting Ground-Level Scene Layout from Aerial Imagery Menghua Zhai Zachary Bessinger Scott Workman Nathan Jacobs Computer Science, University of Kentucky {ted, zach, scott, jacobs}@cs.uky.edu arxiv:1612.02709v1

More information

Encoder-Decoder Networks for Semantic Segmentation. Sachin Mehta

Encoder-Decoder Networks for Semantic Segmentation. Sachin Mehta Encoder-Decoder Networks for Semantic Segmentation Sachin Mehta Outline > Overview of Semantic Segmentation > Encoder-Decoder Networks > Results What is Semantic Segmentation? Input: RGB Image Output:

More information

Predicting Ground-Level Scene Layout from Aerial Imagery

Predicting Ground-Level Scene Layout from Aerial Imagery Predicting Ground-Level Scene Layout from Aerial Imagery Menghua Zhai ted@cs.uky.edu Zachary Bessinger Scott Workman zach@cs.uky.edu scott@cs.uky.edu Computer Science, University of Kentucky Nathan Jacobs

More information

Fully Convolutional Networks for Semantic Segmentation

Fully Convolutional Networks for Semantic Segmentation Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell UC Berkeley Chaim Ginzburg for Deep Learning seminar 1 Semantic Segmentation Define a pixel-wise labeling

More information

Places Challenge 2017

Places Challenge 2017 Places Challenge 2017 Scene Parsing Task CASIA_IVA_JD Jun Fu, Jing Liu, Longteng Guo, Haijie Tian, Fei Liu, Hanqing Lu Yong Li, Yongjun Bao, Weipeng Yan National Laboratory of Pattern Recognition, Institute

More information

YOLO9000: Better, Faster, Stronger

YOLO9000: Better, Faster, Stronger YOLO9000: Better, Faster, Stronger Date: January 24, 2018 Prepared by Haris Khan (University of Toronto) Haris Khan CSC2548: Machine Learning in Computer Vision 1 Overview 1. Motivation for one-shot object

More information

Unsupervised Learning of Spatiotemporally Coherent Metrics

Unsupervised Learning of Spatiotemporally Coherent Metrics Unsupervised Learning of Spatiotemporally Coherent Metrics Ross Goroshin, Joan Bruna, Jonathan Tompson, David Eigen, Yann LeCun arxiv 2015. Presented by Jackie Chu Contributions Insight between slow feature

More information

Lecture 7: Semantic Segmentation

Lecture 7: Semantic Segmentation Semantic Segmentation CSED703R: Deep Learning for Visual Recognition (207F) Segmenting images based on its semantic notion Lecture 7: Semantic Segmentation Bohyung Han Computer Vision Lab. bhhanpostech.ac.kr

More information

08 An Introduction to Dense Continuous Robotic Mapping

08 An Introduction to Dense Continuous Robotic Mapping NAVARCH/EECS 568, ROB 530 - Winter 2018 08 An Introduction to Dense Continuous Robotic Mapping Maani Ghaffari March 14, 2018 Previously: Occupancy Grid Maps Pose SLAM graph and its associated dense occupancy

More information

Supplementary Material for Zoom and Learn: Generalizing Deep Stereo Matching to Novel Domains

Supplementary Material for Zoom and Learn: Generalizing Deep Stereo Matching to Novel Domains Supplementary Material for Zoom and Learn: Generalizing Deep Stereo Matching to Novel Domains Jiahao Pang 1 Wenxiu Sun 1 Chengxi Yang 1 Jimmy Ren 1 Ruichao Xiao 1 Jin Zeng 1 Liang Lin 1,2 1 SenseTime Research

More information

TorontoCity: Seeing the World with a Million Eyes

TorontoCity: Seeing the World with a Million Eyes TorontoCity: Seeing the World with a Million Eyes Authors Shenlong Wang, Min Bai, Gellert Mattyus, Hang Chu, Wenjie Luo, Bin Yang Justin Liang, Joel Cheverie, Sanja Fidler, Raquel Urtasun * Project Completed

More information

Hide-and-Seek: Forcing a network to be Meticulous for Weakly-supervised Object and Action Localization

Hide-and-Seek: Forcing a network to be Meticulous for Weakly-supervised Object and Action Localization Hide-and-Seek: Forcing a network to be Meticulous for Weakly-supervised Object and Action Localization Krishna Kumar Singh and Yong Jae Lee University of California, Davis ---- Paper Presentation Yixian

More information

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009 Analysis: TextonBoost and Semantic Texton Forests Daniel Munoz 16-721 Februrary 9, 2009 Papers [shotton-eccv-06] J. Shotton, J. Winn, C. Rother, A. Criminisi, TextonBoost: Joint Appearance, Shape and Context

More information

Multi-View 3D Object Detection Network for Autonomous Driving

Multi-View 3D Object Detection Network for Autonomous Driving Multi-View 3D Object Detection Network for Autonomous Driving Xiaozhi Chen, Huimin Ma, Ji Wan, Bo Li, Tian Xia CVPR 2017 (Spotlight) Presented By: Jason Ku Overview Motivation Dataset Network Architecture

More information

PARTIAL STYLE TRANSFER USING WEAKLY SUPERVISED SEMANTIC SEGMENTATION. Shin Matsuo Wataru Shimoda Keiji Yanai

PARTIAL STYLE TRANSFER USING WEAKLY SUPERVISED SEMANTIC SEGMENTATION. Shin Matsuo Wataru Shimoda Keiji Yanai PARTIAL STYLE TRANSFER USING WEAKLY SUPERVISED SEMANTIC SEGMENTATION Shin Matsuo Wataru Shimoda Keiji Yanai Department of Informatics, The University of Electro-Communications, Tokyo 1-5-1 Chofugaoka,

More information

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Si Chen The George Washington University sichen@gwmail.gwu.edu Meera Hahn Emory University mhahn7@emory.edu Mentor: Afshin

More information

Image georeferencing is the process of developing a model to transform from pixel coordinates into GIS coordinates such as meters on the ground.

Image georeferencing is the process of developing a model to transform from pixel coordinates into GIS coordinates such as meters on the ground. Image georeferencing is the process of developing a model to transform from pixel coordinates into GIS coordinates such as meters on the ground. Image rectification is the process of using your georeferencing

More information

CIS 580, Machine Perception, Spring 2015 Homework 1 Due: :59AM

CIS 580, Machine Perception, Spring 2015 Homework 1 Due: :59AM CIS 580, Machine Perception, Spring 2015 Homework 1 Due: 2015.02.09. 11:59AM Instructions. Submit your answers in PDF form to Canvas. This is an individual assignment. 1 Camera Model, Focal Length and

More information

Figure 1: Workflow of object-based classification

Figure 1: Workflow of object-based classification Technical Specifications Object Analyst Object Analyst is an add-on package for Geomatica that provides tools for segmentation, classification, and feature extraction. Object Analyst includes an all-in-one

More information

Rich feature hierarchies for accurate object detection and semantic segmentation

Rich feature hierarchies for accurate object detection and semantic segmentation Rich feature hierarchies for accurate object detection and semantic segmentation BY; ROSS GIRSHICK, JEFF DONAHUE, TREVOR DARRELL AND JITENDRA MALIK PRESENTER; MUHAMMAD OSAMA Object detection vs. classification

More information

PSU Student Research Symposium 2017 Bayesian Optimization for Refining Object Proposals, with an Application to Pedestrian Detection Anthony D.

PSU Student Research Symposium 2017 Bayesian Optimization for Refining Object Proposals, with an Application to Pedestrian Detection Anthony D. PSU Student Research Symposium 2017 Bayesian Optimization for Refining Object Proposals, with an Application to Pedestrian Detection Anthony D. Rhodes 5/10/17 What is Machine Learning? Machine learning

More information

Deconvolution Networks

Deconvolution Networks Deconvolution Networks Johan Brynolfsson Mathematical Statistics Centre for Mathematical Sciences Lund University December 6th 2016 1 / 27 Deconvolution Neural Networks 2 / 27 Image Deconvolution True

More information

Collaborative Mapping with Streetlevel Images in the Wild. Yubin Kuang Co-founder and Computer Vision Lead

Collaborative Mapping with Streetlevel Images in the Wild. Yubin Kuang Co-founder and Computer Vision Lead Collaborative Mapping with Streetlevel Images in the Wild Yubin Kuang Co-founder and Computer Vision Lead Mapillary Mapillary is a street-level imagery platform, powered by collaboration and computer vision.

More information

Rich feature hierarchies for accurate object detection and semant

Rich feature hierarchies for accurate object detection and semant Rich feature hierarchies for accurate object detection and semantic segmentation Speaker: Yucong Shen 4/5/2018 Develop of Object Detection 1 DPM (Deformable parts models) 2 R-CNN 3 Fast R-CNN 4 Faster

More information

CIS680: Vision & Learning Assignment 2.b: RPN, Faster R-CNN and Mask R-CNN Due: Nov. 21, 2018 at 11:59 pm

CIS680: Vision & Learning Assignment 2.b: RPN, Faster R-CNN and Mask R-CNN Due: Nov. 21, 2018 at 11:59 pm CIS680: Vision & Learning Assignment 2.b: RPN, Faster R-CNN and Mask R-CNN Due: Nov. 21, 2018 at 11:59 pm Instructions This is an individual assignment. Individual means each student must hand in their

More information

Class 5: Attributes and Semantic Features

Class 5: Attributes and Semantic Features Class 5: Attributes and Semantic Features Rogerio Feris, Feb 21, 2013 EECS 6890 Topics in Information Processing Spring 2013, Columbia University http://rogerioferis.com/visualrecognitionandsearch Project

More information

Mask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma

Mask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma Mask R-CNN presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma Mask R-CNN Background Related Work Architecture Experiment Mask R-CNN Background Related Work Architecture Experiment Background From left

More information

CSE 559A: Computer Vision

CSE 559A: Computer Vision CSE 559A: Computer Vision Fall 2018: T-R: 11:30-1pm @ Lopata 101 Instructor: Ayan Chakrabarti (ayan@wustl.edu). Course Staff: Zhihao Xia, Charlie Wu, Han Liu http://www.cse.wustl.edu/~ayan/courses/cse559a/

More information

Human Pose Estimation with Deep Learning. Wei Yang

Human Pose Estimation with Deep Learning. Wei Yang Human Pose Estimation with Deep Learning Wei Yang Applications Understand Activities Family Robots American Heist (2014) - The Bank Robbery Scene 2 What do we need to know to recognize a crime scene? 3

More information

Deep Learning for Computer Vision with MATLAB By Jon Cherrie

Deep Learning for Computer Vision with MATLAB By Jon Cherrie Deep Learning for Computer Vision with MATLAB By Jon Cherrie 2015 The MathWorks, Inc. 1 Deep learning is getting a lot of attention "Dahl and his colleagues won $22,000 with a deeplearning system. 'We

More information

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU, Machine Learning 10-701, Fall 2015 Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October 6, 2015 Eric Xing @ CMU, 2015 1 A perennial challenge in computer vision: feature engineering SIFT Spin image

More information

End-to-End Localization and Ranking for Relative Attributes

End-to-End Localization and Ranking for Relative Attributes End-to-End Localization and Ranking for Relative Attributes Krishna Kumar Singh and Yong Jae Lee Presented by Minhao Cheng [Farhadi et al. 2009, Kumar et al. 2009, Lampert et al. 2009, [Slide: Xiao and

More information

Latent Variable Models for Structured Prediction and Content-Based Retrieval

Latent Variable Models for Structured Prediction and Content-Based Retrieval Latent Variable Models for Structured Prediction and Content-Based Retrieval Ariadna Quattoni Universitat Politècnica de Catalunya Joint work with Borja Balle, Xavier Carreras, Adrià Recasens, Antonio

More information

Towards Weakly- and Semi- Supervised Object Localization and Semantic Segmentation

Towards Weakly- and Semi- Supervised Object Localization and Semantic Segmentation Towards Weakly- and Semi- Supervised Object Localization and Semantic Segmentation Lecturer: Yunchao Wei Image Formation and Processing (IFP) Group University of Illinois at Urbanahttps://weiyc.githu Champaign

More information

Visual Computing TUM

Visual Computing TUM Visual Computing Group @ TUM Visual Computing Group @ TUM BundleFusion Real-time 3D Reconstruction Scalable scene representation Global alignment and re-localization TOG 17 [Dai et al.]: BundleFusion Real-time

More information

Learning to Segment Object Candidates

Learning to Segment Object Candidates Learning to Segment Object Candidates Pedro Pinheiro, Ronan Collobert and Piotr Dollar Presented by - Sivaraman, Kalpathy Sitaraman, M.S. in Computer Science, University of Virginia Facebook Artificial

More information

RSRN: Rich Side-output Residual Network for Medial Axis Detection

RSRN: Rich Side-output Residual Network for Medial Axis Detection RSRN: Rich Side-output Residual Network for Medial Axis Detection Chang Liu, Wei Ke, Jianbin Jiao, and Qixiang Ye University of Chinese Academy of Sciences, Beijing, China {liuchang615, kewei11}@mails.ucas.ac.cn,

More information

Recurrent Transformer Networks for Semantic Correspondence

Recurrent Transformer Networks for Semantic Correspondence Neural Information Processing Systems (NeurIPS) 2018 Recurrent Transformer Networks for Semantic Correspondence Seungryong Kim 1, Stepthen Lin 2, Sangryul Jeon 1, Dongbo Min 3, Kwanghoon Sohn 1 Dec. 05,

More information

Table of contents. DMXzone Google Maps Manual DMXzone.com

Table of contents. DMXzone Google Maps Manual DMXzone.com Table of contents Table of contents... 1 About DMXzone Google Maps... 2 Features in Detail... 3 The Basics: Insterting DMXzone Google Maps on a Page... 16 Advanced: Creating Dynamic DMXzone Google Maps...

More information

Object Localization, Segmentation, Classification, and Pose Estimation in 3D Images using Deep Learning

Object Localization, Segmentation, Classification, and Pose Estimation in 3D Images using Deep Learning Allan Zelener Dissertation Proposal December 12 th 2016 Object Localization, Segmentation, Classification, and Pose Estimation in 3D Images using Deep Learning Overview 1. Introduction to 3D Object Identification

More information

PIXELS TO VOXELS: MODELING VISUAL REPRESENTATION IN THE HUMAN BRAIN

PIXELS TO VOXELS: MODELING VISUAL REPRESENTATION IN THE HUMAN BRAIN PIXELS TO VOXELS: MODELING VISUAL REPRESENTATION IN THE HUMAN BRAIN By Pulkit Agrawal, Dustin Stansbury, Jitendra Malik, Jack L. Gallant University of California Berkeley Presented by Tim Patzelt AGENDA

More information

Dynamic Routing Between Capsules. Yiting Ethan Li, Haakon Hukkelaas, and Kaushik Ram Ramasamy

Dynamic Routing Between Capsules. Yiting Ethan Li, Haakon Hukkelaas, and Kaushik Ram Ramasamy Dynamic Routing Between Capsules Yiting Ethan Li, Haakon Hukkelaas, and Kaushik Ram Ramasamy Problems & Results Object classification in images without losing information about important parts of the picture.

More information

Scene Text Recognition for Augmented Reality. Sagar G V Adviser: Prof. Bharadwaj Amrutur Indian Institute Of Science

Scene Text Recognition for Augmented Reality. Sagar G V Adviser: Prof. Bharadwaj Amrutur Indian Institute Of Science Scene Text Recognition for Augmented Reality Sagar G V Adviser: Prof. Bharadwaj Amrutur Indian Institute Of Science Outline Research area and motivation Finding text in natural scenes Prior art Improving

More information

Constrained Convolutional Neural Networks for Weakly Supervised Segmentation. Deepak Pathak, Philipp Krähenbühl and Trevor Darrell

Constrained Convolutional Neural Networks for Weakly Supervised Segmentation. Deepak Pathak, Philipp Krähenbühl and Trevor Darrell Constrained Convolutional Neural Networks for Weakly Supervised Segmentation Deepak Pathak, Philipp Krähenbühl and Trevor Darrell 1 Multi-class Image Segmentation Assign a class label to each pixel in

More information

Joseph Camilo 1, Rui Wang 1, Leslie M. Collins 1, Senior Member, IEEE, Kyle Bradbury 2, Member, IEEE, and Jordan M. Malof 1,Member, IEEE

Joseph Camilo 1, Rui Wang 1, Leslie M. Collins 1, Senior Member, IEEE, Kyle Bradbury 2, Member, IEEE, and Jordan M. Malof 1,Member, IEEE Application of a semantic segmentation convolutional neural network for accurate automatic detection and mapping of solar photovoltaic arrays in aerial imagery Joseph Camilo 1, Rui Wang 1, Leslie M. Collins

More information

Efficient Segmentation-Aided Text Detection For Intelligent Robots

Efficient Segmentation-Aided Text Detection For Intelligent Robots Efficient Segmentation-Aided Text Detection For Intelligent Robots Junting Zhang, Yuewei Na, Siyang Li, C.-C. Jay Kuo University of Southern California Outline Problem Definition and Motivation Related

More information

SSD: Single Shot MultiBox Detector. Author: Wei Liu et al. Presenter: Siyu Jiang

SSD: Single Shot MultiBox Detector. Author: Wei Liu et al. Presenter: Siyu Jiang SSD: Single Shot MultiBox Detector Author: Wei Liu et al. Presenter: Siyu Jiang Outline 1. Motivations 2. Contributions 3. Methodology 4. Experiments 5. Conclusions 6. Extensions Motivation Motivation

More information

Fine-tuning Pre-trained Large Scaled ImageNet model on smaller dataset for Detection task

Fine-tuning Pre-trained Large Scaled ImageNet model on smaller dataset for Detection task Fine-tuning Pre-trained Large Scaled ImageNet model on smaller dataset for Detection task Kyunghee Kim Stanford University 353 Serra Mall Stanford, CA 94305 kyunghee.kim@stanford.edu Abstract We use a

More information

Andrei Polzounov (Universitat Politecnica de Catalunya, Barcelona, Spain), Artsiom Ablavatski (A*STAR Institute for Infocomm Research, Singapore),

Andrei Polzounov (Universitat Politecnica de Catalunya, Barcelona, Spain), Artsiom Ablavatski (A*STAR Institute for Infocomm Research, Singapore), WordFences: Text Localization and Recognition ICIP 2017 Andrei Polzounov (Universitat Politecnica de Catalunya, Barcelona, Spain), Artsiom Ablavatski (A*STAR Institute for Infocomm Research, Singapore),

More information

Joint Object Detection and Viewpoint Estimation using CNN features

Joint Object Detection and Viewpoint Estimation using CNN features Joint Object Detection and Viewpoint Estimation using CNN features Carlos Guindel, David Martín and José M. Armingol cguindel@ing.uc3m.es Intelligent Systems Laboratory Universidad Carlos III de Madrid

More information

Instance-aware Semantic Segmentation via Multi-task Network Cascades

Instance-aware Semantic Segmentation via Multi-task Network Cascades Instance-aware Semantic Segmentation via Multi-task Network Cascades Jifeng Dai, Kaiming He, Jian Sun Microsoft research 2016 Yotam Gil Amit Nativ Agenda Introduction Highlights Implementation Further

More information

Multi-modal, multi-temporal data analysis for urban remote sensing

Multi-modal, multi-temporal data analysis for urban remote sensing Multi-modal, multi-temporal data analysis for urban remote sensing Xiaoxiang Zhu www.sipeo.bgu.tum.de 10.10.2017 Uncontrolled Growth in Mumbai Slums Leads to Massive Fires and Floods So2Sat: Big Data for

More information

arxiv: v1 [cs.cv] 20 Dec 2016

arxiv: v1 [cs.cv] 20 Dec 2016 End-to-End Pedestrian Collision Warning System based on a Convolutional Neural Network with Semantic Segmentation arxiv:1612.06558v1 [cs.cv] 20 Dec 2016 Heechul Jung heechul@dgist.ac.kr Min-Kook Choi mkchoi@dgist.ac.kr

More information

Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing

Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing Supplementary Material Introduction In this supplementary material, Section 2 details the 3D annotation for CAD models and real

More information

Exploiting noisy web data for largescale visual recognition

Exploiting noisy web data for largescale visual recognition Exploiting noisy web data for largescale visual recognition Lamberto Ballan University of Padova, Italy CVPRW WebVision - Jul 26, 2017 Datasets drive computer vision progress ImageNet Slide credit: O.

More information

A MULTI-RESOLUTION FUSION MODEL INCORPORATING COLOR AND ELEVATION FOR SEMANTIC SEGMENTATION

A MULTI-RESOLUTION FUSION MODEL INCORPORATING COLOR AND ELEVATION FOR SEMANTIC SEGMENTATION A MULTI-RESOLUTION FUSION MODEL INCORPORATING COLOR AND ELEVATION FOR SEMANTIC SEGMENTATION Wenkai Zhang a, b, Hai Huang c, *, Matthias Schmitz c, Xian Sun a, Hongqi Wang a, Helmut Mayer c a Key Laboratory

More information

Region-based Segmentation and Object Detection

Region-based Segmentation and Object Detection Region-based Segmentation and Object Detection Stephen Gould Tianshi Gao Daphne Koller Presented at NIPS 2009 Discussion and Slides by Eric Wang April 23, 2010 Outline Introduction Model Overview Model

More information

assessment of its interpretable features

assessment of its interpretable features Black-box model explained through an assessment of its interpretable features Francesco Ventura, Tania Cerquitelli, Francesco Giacalone Introduction In the last few years algorithms have been widely exploited

More information

Supplementary Materials for Learning to Parse Wireframes in Images of Man-Made Environments

Supplementary Materials for Learning to Parse Wireframes in Images of Man-Made Environments Supplementary Materials for Learning to Parse Wireframes in Images of Man-Made Environments Kun Huang, Yifan Wang, Zihan Zhou 2, Tianjiao Ding, Shenghua Gao, and Yi Ma 3 ShanghaiTech University {huangkun,

More information

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun Presented by Tushar Bansal Objective 1. Get bounding box for all objects

More information

Using Machine Learning for Classification of Cancer Cells

Using Machine Learning for Classification of Cancer Cells Using Machine Learning for Classification of Cancer Cells Camille Biscarrat University of California, Berkeley I Introduction Cell screening is a commonly used technique in the development of new drugs.

More information

Aerial photography: Principles. Visual interpretation of aerial imagery

Aerial photography: Principles. Visual interpretation of aerial imagery Aerial photography: Principles Visual interpretation of aerial imagery Overview Introduction Benefits of aerial imagery Image interpretation Elements Tasks Strategies Keys Accuracy assessment Benefits

More information

Internet of things that video

Internet of things that video Video recognition from a sentence Cees Snoek Intelligent Sensory Information Systems Lab University of Amsterdam The Netherlands Internet of things that video 45 billion cameras by 2022 [LDV Capital] 2

More information

PanoContext: A Whole-room 3D Context Model for Panoramic Scene Understanding

PanoContext: A Whole-room 3D Context Model for Panoramic Scene Understanding PanoContext: A Whole-room 3D Context Model for Panoramic Scene Understanding Yinda Zhang Shuran Song Ping Tan Jianxiong Xiao Princeton University Simon Fraser University Alicia Clark PanoContext October

More information

A Fast Linear Registration Framework for Multi-Camera GIS Coordination

A Fast Linear Registration Framework for Multi-Camera GIS Coordination A Fast Linear Registration Framework for Multi-Camera GIS Coordination Karthik Sankaranarayanan James W. Davis Dept. of Computer Science and Engineering Ohio State University Columbus, OH 4320 USA {sankaran,jwdavis}@cse.ohio-state.edu

More information

LSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University

LSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University LSTM and its variants for visual recognition Xiaodan Liang xdliang328@gmail.com Sun Yat-sen University Outline Context Modelling with CNN LSTM and its Variants LSTM Architecture Variants Application in

More information

Extracting Layers and Recognizing Features for Automatic Map Understanding. Yao-Yi Chiang

Extracting Layers and Recognizing Features for Automatic Map Understanding. Yao-Yi Chiang Extracting Layers and Recognizing Features for Automatic Map Understanding Yao-Yi Chiang 0 Outline Introduction/ Problem Motivation Map Processing Overview Map Decomposition Feature Recognition Discussion

More information

Edge and corner detection

Edge and corner detection Edge and corner detection Prof. Stricker Doz. G. Bleser Computer Vision: Object and People Tracking Goals Where is the information in an image? How is an object characterized? How can I find measurements

More information

Learning and Recognizing Visual Object Categories Without First Detecting Features

Learning and Recognizing Visual Object Categories Without First Detecting Features Learning and Recognizing Visual Object Categories Without First Detecting Features Daniel Huttenlocher 2007 Joint work with D. Crandall and P. Felzenszwalb Object Category Recognition Generic classes rather

More information

arxiv: v1 [cs.cv] 9 Aug 2017

arxiv: v1 [cs.cv] 9 Aug 2017 A Unified Model for Near and Remote Sensing Scott Workman 1 Menghua Zhai 1 David J. Crandall 2 Nathan Jacobs 1 scott@cs.uky.edu ted@cs.uky.edu djcran@indiana.edu jacobs@cs.uky.edu 1 University of Kentucky

More information

Structured Prediction using Convolutional Neural Networks

Structured Prediction using Convolutional Neural Networks Overview Structured Prediction using Convolutional Neural Networks Bohyung Han bhhan@postech.ac.kr Computer Vision Lab. Convolutional Neural Networks (CNNs) Structured predictions for low level computer

More information

Large-scale Video Classification with Convolutional Neural Networks

Large-scale Video Classification with Convolutional Neural Networks Large-scale Video Classification with Convolutional Neural Networks Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, Li Fei-Fei Note: Slide content mostly from : Bay Area

More information

Deep Learning For Video Classification. Presented by Natalie Carlebach & Gil Sharon

Deep Learning For Video Classification. Presented by Natalie Carlebach & Gil Sharon Deep Learning For Video Classification Presented by Natalie Carlebach & Gil Sharon Overview Of Presentation Motivation Challenges of video classification Common datasets 4 different methods presented in

More information

SUMMARY. they need large amounts of computational resources to train

SUMMARY. they need large amounts of computational resources to train Learning to Label Seismic Structures with Deconvolution Networks and Weak Labels Yazeed Alaudah, Shan Gao, and Ghassan AlRegib {alaudah,gaoshan427,alregib}@gatech.edu Center for Energy and Geo Processing

More information

Part Localization by Exploiting Deep Convolutional Networks

Part Localization by Exploiting Deep Convolutional Networks Part Localization by Exploiting Deep Convolutional Networks Marcel Simon, Erik Rodner, and Joachim Denzler Computer Vision Group, Friedrich Schiller University of Jena, Germany www.inf-cv.uni-jena.de Abstract.

More information

In Live Computer Vision

In Live Computer Vision EVA 2 : Exploiting Temporal Redundancy In Live Computer Vision Mark Buckler, Philip Bedoukian, Suren Jayasuriya, Adrian Sampson International Symposium on Computer Architecture (ISCA) Tuesday June 5, 2018

More information

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia Deep learning for dense per-pixel prediction Chunhua Shen The University of Adelaide, Australia Image understanding Classification error Convolution Neural Networks 0.3 0.2 0.1 Image Classification [Krizhevsky

More information

Articulated Pose Estimation with Flexible Mixtures-of-Parts

Articulated Pose Estimation with Flexible Mixtures-of-Parts Articulated Pose Estimation with Flexible Mixtures-of-Parts PRESENTATION: JESSE DAVIS CS 3710 VISUAL RECOGNITION Outline Modeling Special Cases Inferences Learning Experiments Problem and Relevance Problem:

More information

THE goal of action detection is to detect every occurrence

THE goal of action detection is to detect every occurrence JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 1 An End-to-end 3D Convolutional Neural Network for Action Detection and Segmentation in Videos Rui Hou, Student Member, IEEE, Chen Chen, Member,

More information

Fuzzy Set Theory in Computer Vision: Example 3

Fuzzy Set Theory in Computer Vision: Example 3 Fuzzy Set Theory in Computer Vision: Example 3 Derek T. Anderson and James M. Keller FUZZ-IEEE, July 2017 Overview Purpose of these slides are to make you aware of a few of the different CNN architectures

More information

Mask R-CNN. By Kaiming He, Georgia Gkioxari, Piotr Dollar and Ross Girshick Presented By Aditya Sanghi

Mask R-CNN. By Kaiming He, Georgia Gkioxari, Piotr Dollar and Ross Girshick Presented By Aditya Sanghi Mask R-CNN By Kaiming He, Georgia Gkioxari, Piotr Dollar and Ross Girshick Presented By Aditya Sanghi Types of Computer Vision Tasks http://cs231n.stanford.edu/ Semantic vs Instance Segmentation Image

More information

Synscapes A photorealistic syntehtic dataset for street scene parsing Jonas Unger Department of Science and Technology Linköpings Universitet.

Synscapes A photorealistic syntehtic dataset for street scene parsing Jonas Unger Department of Science and Technology Linköpings Universitet. Synscapes A photorealistic syntehtic dataset for street scene parsing Jonas Unger Department of Science and Technology Linköpings Universitet 7D Labs VINNOVA https://7dlabs.com Photo-realistic image synthesis

More information

UnFlow: Unsupervised Learning of Optical Flow with a Bidirectional Census Loss

UnFlow: Unsupervised Learning of Optical Flow with a Bidirectional Census Loss UnFlow: Unsupervised Learning of Optical Flow with a Bidirectional Census Loss AAAI 2018, New Orleans, USA Simon Meister, Junhwa Hur, and Stefan Roth Department of Computer Science, TU Darmstadt 2 Deep

More information

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

COMP 551 Applied Machine Learning Lecture 16: Deep Learning COMP 551 Applied Machine Learning Lecture 16: Deep Learning Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted, all

More information

Bridging the Gap Between Local and Global Approaches for 3D Object Recognition. Isma Hadji G. N. DeSouza

Bridging the Gap Between Local and Global Approaches for 3D Object Recognition. Isma Hadji G. N. DeSouza Bridging the Gap Between Local and Global Approaches for 3D Object Recognition Isma Hadji G. N. DeSouza Outline Introduction Motivation Proposed Methods: 1. LEFT keypoint Detector 2. LGS Feature Descriptor

More information

Computer Vision: Making machines see

Computer Vision: Making machines see Computer Vision: Making machines see Roberto Cipolla Department of Engineering http://www.eng.cam.ac.uk/~cipolla/people.html http://www.toshiba.eu/eu/cambridge-research- Laboratory/ Vision: what is where

More information

Technical Publications

Technical Publications GE Medical Systems Technical Publications Direction 2188003-100 Revision 0 Tissue Volume Analysis DICOM for DICOM V3.0 Copyright 1997 By General Electric Co. Do not duplicate REVISION HISTORY REV DATE

More information

FINE-GRAINED image classification aims to recognize. Fast Fine-grained Image Classification via Weakly Supervised Discriminative Localization

FINE-GRAINED image classification aims to recognize. Fast Fine-grained Image Classification via Weakly Supervised Discriminative Localization 1 Fast Fine-grained Image Classification via Weakly Supervised Discriminative Localization Xiangteng He, Yuxin Peng and Junjie Zhao arxiv:1710.01168v1 [cs.cv] 30 Sep 2017 Abstract Fine-grained image classification

More information

Supplementary: Cross-modal Deep Variational Hand Pose Estimation

Supplementary: Cross-modal Deep Variational Hand Pose Estimation Supplementary: Cross-modal Deep Variational Hand Pose Estimation Adrian Spurr, Jie Song, Seonwook Park, Otmar Hilliges ETH Zurich {spurra,jsong,spark,otmarh}@inf.ethz.ch Encoder/Decoder Linear(512) Table

More information

Deep Learning Explained Module 4: Convolution Neural Networks (CNN or Conv Nets)

Deep Learning Explained Module 4: Convolution Neural Networks (CNN or Conv Nets) Deep Learning Explained Module 4: Convolution Neural Networks (CNN or Conv Nets) Sayan D. Pathak, Ph.D., Principal ML Scientist, Microsoft Roland Fernandez, Senior Researcher, Microsoft Module Outline

More information

Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing Supplementary Material

Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing Supplementary Material Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing Supplementary Material Chi Li, M. Zeeshan Zia 2, Quoc-Huy Tran 2, Xiang Yu 2, Gregory D. Hager, and Manmohan Chandraker 2 Johns

More information

Reset Cursor Tool Clicking on the Reset Cursor tool will clear all map and tool selections and allow tooltips to be displayed.

Reset Cursor Tool Clicking on the Reset Cursor tool will clear all map and tool selections and allow tooltips to be displayed. SMS Featured Icons: Mapping Toolbar This document includes a brief description of some of the most commonly used tools in the SMS Desktop Software map window toolbar as well as shows you the toolbar shortcuts

More information

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation Object detection using Region Proposals (RCNN) Ernest Cheung COMP790-125 Presentation 1 2 Problem to solve Object detection Input: Image Output: Bounding box of the object 3 Object detection using CNN

More information

Creating Affordable and Reliable Autonomous Vehicle Systems

Creating Affordable and Reliable Autonomous Vehicle Systems Creating Affordable and Reliable Autonomous Vehicle Systems Shaoshan Liu shaoshan.liu@perceptin.io Autonomous Driving Localization Most crucial task of autonomous driving Solutions: GNSS but withvariations,

More information

CS 534: Computer Vision Texture

CS 534: Computer Vision Texture CS 534: Computer Vision Texture Ahmed Elgammal Dept of Computer Science CS 534 Texture - 1 Outlines Finding templates by convolution What is Texture Co-occurrence matrices for texture Spatial Filtering

More information

Object Geolocation from Crowdsourced Street Level Imagery

Object Geolocation from Crowdsourced Street Level Imagery Object Geolocation from Crowdsourced Street Level Imagery Vladimir A. Krylov and Rozenn Dahyot ADAPT Centre, School of Computer Science and Statistics, Trinity College Dublin, Dublin, Ireland {vladimir.krylov,rozenn.dahyot}@tcd.ie

More information

Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab.

Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab. [ICIP 2017] Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab., POSTECH Pedestrian Detection Goal To draw bounding boxes that

More information

MOTION ESTIMATION USING CONVOLUTIONAL NEURAL NETWORKS. Mustafa Ozan Tezcan

MOTION ESTIMATION USING CONVOLUTIONAL NEURAL NETWORKS. Mustafa Ozan Tezcan MOTION ESTIMATION USING CONVOLUTIONAL NEURAL NETWORKS Mustafa Ozan Tezcan Boston University Department of Electrical and Computer Engineering 8 Saint Mary s Street Boston, MA 2215 www.bu.edu/ece Dec. 19,

More information

End-to-End Learning of Polygons for Remote Sensing Image Classification

End-to-End Learning of Polygons for Remote Sensing Image Classification End-to-End Learning of Polygons for Remote Sensing Image Classification Nicolas Girard, Yuliya Tarabalka To cite this version: Nicolas Girard, Yuliya Tarabalka. End-to-End Learning of Polygons for Remote

More information

EE-559 Deep learning Networks for semantic segmentation

EE-559 Deep learning Networks for semantic segmentation EE-559 Deep learning 7.4. Networks for semantic segmentation François Fleuret https://fleuret.org/ee559/ Mon Feb 8 3:35:5 UTC 209 ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE The historical approach to image

More information