TorontoCity: Seeing the World with a Million Eyes

Similar documents
Paris-Lille-3D: A Point Cloud Dataset for Urban Scene Segmentation and Classification

TorontoCity: Seeing the World with a Million Eyes

arxiv: v1 [cs.cv] 16 Mar 2018

Spatial Localization and Detection. Lecture 8-1

S7348: Deep Learning in Ford's Autonomous Vehicles. Bryan Goodman Argo AI 9 May 2017

Urban 3D Challenge & Future Directions

arxiv: v2 [cs.cv] 12 Jul 2018

Training models for road scene understanding with automated ground truth Dan Levi

Measuring the World: Designing Robust Vehicle Localization for Autonomous Driving. Frank Schuster, Dr. Martin Haueis

Object Detection. CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR

Visual Perception for Autonomous Driving on the NVIDIA DrivePX2 and using SYNTHIA

Predicting ground-level scene Layout from Aerial imagery. Muhammad Hasan Maqbool

HDNET: Exploiting HD Maps for 3D Object Detection

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation

Multi-View 3D Object Detection Network for Autonomous Driving

Mobile Mapping and Navigation. Brad Kohlmeyer NAVTEQ Research

Simulation: A Must for Autonomous Driving

Hide-and-Seek: Forcing a network to be Meticulous for Weakly-supervised Object and Action Localization

(Deep) Learning for Robot Perception and Navigation. Wolfram Burgard

Efficient Segmentation-Aided Text Detection For Intelligent Robots

JOINT DETECTION AND SEGMENTATION WITH DEEP HIERARCHICAL NETWORKS. Zhao Chen Machine Learning Intern, NVIDIA

Flow-Based Video Recognition

Pedestrian Detection Using Correlated Lidar and Image Data EECS442 Final Project Fall 2016

Object Detection on Self-Driving Cars in China. Lingyun Li

Finding Tiny Faces Supplementary Materials

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia

Exploiting noisy web data for largescale visual recognition

CAP 6412 Advanced Computer Vision

iqmulus/terramobilita benchmark on Urban Analysis

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Presented at the FIG Congress 2018, May 6-11, 2018 in Istanbul, Turkey

DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS VEHICLE LANDSCAPE. Dennis Lui August 2017

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Self Driving. DNN * * Reinforcement * Unsupervised *

Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing

Visual features detection based on deep neural network in autonomous driving tasks

MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK. Wenjie Guan, YueXian Zou*, Xiaoqun Zhou

arxiv: v1 [cs.cv] 2 Sep 2018

Depth from Stereo. Dominic Cheng February 7, 2018

Sports Field Localization

Image Features: Local Descriptors. Sanja Fidler CSC420: Intro to Image Understanding 1/ 58

Realtime Object Detection and Segmentation for HD Mapping

Team G-RMI: Google Research & Machine Intelligence

Mask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma

Training models for road scene understanding with automated ground truth Dan Levi

SGN: Sequential Grouping Networks for Instance Segmentation

Rich feature hierarchies for accurate object detection and semant

Deep Learning for Remote Sensing

Detecting and Parsing of Visual Objects: Humans and Animals. Alan Yuille (UCLA)

Characterization and Benchmarking of Deep Learning. Natalia Vassilieva, PhD Sr. Research Manager

Amodal and Panoptic Segmentation. Stephanie Liu, Andrew Zhou

CONTENT ENGINEERING & VISION LABORATORY. Régis Vinciguerra

W4. Perception & Situation Awareness & Decision making

Project 3 Q&A. Jonathan Krause

Synscapes A photorealistic syntehtic dataset for street scene parsing Jonas Unger Department of Science and Technology Linköpings Universitet.

3D Object Proposals using Stereo Imagery for Accurate Object Class Detection

Cost-alleviative Learning for Deep Convolutional Neural Network-based Facial Part Labeling

YOLO9000: Better, Faster, Stronger

Vision based autonomous driving - A survey of recent methods. -Tejus Gupta

Collaborative Mapping with Streetlevel Images in the Wild. Yubin Kuang Co-founder and Computer Vision Lead

Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing Supplementary Material

Mask R-CNN. By Kaiming He, Georgia Gkioxari, Piotr Dollar and Ross Girshick Presented By Aditya Sanghi

3D LIDAR Point Cloud based Intersection Recognition for Autonomous Driving

VINet: Visual-Inertial Odometry as a Sequence-to-Sequence Learning Problem

Cloud-based Large Scale Video Analysis

AUTONOMOUS DRONE NAVIGATION WITH DEEP LEARNING

Integrating LIDAR into Stereo for Fast and Improved Disparity Computation

Developing Algorithms for Robotics and Autonomous Systems

NRM435 Spring 2017 Accuracy Assessment of GIS Data

Structured Prediction using Convolutional Neural Networks

Deep Multi-Sensor Lane Detection

Deformable Part Models

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs

Joint Object Detection and Viewpoint Estimation using CNN features

DEFECT INSPECTION FROM SCRATCH TO PRODUCTION. Andrew Liu, Ryan Shen Deep Learning Solution Architect

OBJECT DETECTION HYUNG IL KOO

N.J.P.L.S. An Introduction to LiDAR Concepts and Applications

An introduction to Machine Learning silicon

Object Geolocation from Crowdsourced Street Level Imagery

Urban Scene Segmentation, Recognition and Remodeling. Part III. Jinglu Wang 11/24/2016 ACCV 2016 TUTORIAL

Creating Affordable and Reliable Autonomous Vehicle Systems

Where s the Boss? : Monte Carlo Localization for an Autonomous Ground Vehicle using an Aerial Lidar Map

Fuzzy Set Theory in Computer Vision: Example 3

Encoder-Decoder Networks for Semantic Segmentation. Sachin Mehta

DS595/CS525: Urban Network Analysis --Urban Mobility Prof. Yanhua Li

Regionlet Object Detector with Hand-crafted and CNN Feature

NATIONWIDE POINT CLOUDS AND 3D GEO- INFORMATION: CREATION AND MAINTENANCE GEORGE VOSSELMAN

Computer Vision Lecture 16

DEEP NEURAL NETWORKS FOR OBJECT DETECTION

All You Want To Know About CNNs. Yukun Zhu

Computer Vision Lecture 16

Vision based indoor object detection for a drone

Defense Data Generation in Distributed Deep Learning System Se-Yoon Oh / ADD-IDAR

CIS680: Vision & Learning Assignment 2.b: RPN, Faster R-CNN and Mask R-CNN Due: Nov. 21, 2018 at 11:59 pm

3D Object Recognition and Scene Understanding from RGB-D Videos. Yu Xiang Postdoctoral Researcher University of Washington

DEEP BLIND IMAGE QUALITY ASSESSMENT

Supplementary Materials for Learning to Parse Wireframes in Images of Man-Made Environments

Trimble Geospatial Division Integrated Solutions for Geomatics professions. Volker Zirn Regional Sales Representative

Dense Tracking and Mapping for Autonomous Quadrocopters. Jürgen Sturm

Deep Learning for Object detection & localization

Transcription:

TorontoCity: Seeing the World with a Million Eyes

Authors Shenlong Wang, Min Bai, Gellert Mattyus, Hang Chu, Wenjie Luo, Bin Yang Justin Liang, Joel Cheverie, Sanja Fidler, Raquel Urtasun * Project Completed by Summer 2016

Why Toronto? The best place to live in the world* Toronto 4 *According to 2015 Global Liveability Ranking

Why Toronto? The best place to live in the world* Toronto 4 The places you are working at: Boston 36 Pittsburgh 39 San Francisco 49 Los Angeles 51 *According to 2015 Global Liveability Ranking

A dataset over 700 km 2 region!

From all the views!

Dataset Aerial Data Source

Dataset Ground Level Panorama Aerial Data Source

Dataset Ground Level Panorama Aerial LIDAR Data Source

Dataset Ground Level Panorama Aerial LIDAR Stereo Data Source

Dataset Ground Level Panorama Drone Aerial LIDAR Stereo Data Source

Dataset Ground Level Panorama Drone Aerial LIDAR Stereo Airborne LIDAR Data Source

Why we need this? Mapping for Autonomous Driving Smart City Benchmarking: Large-Scale Machine Learning / Deep Learning 3D Vision Remote Sensing Robotics Source: Here 360

Why we need this? Mapping for Autonomous Driving Smart City Benchmarking: Large-Scale Machine Learning / Deep Learning 3D Vision Remote Sensing Robotics Source: Toronto SmartCity Summit

Why we need this? Mapping for Autonomous Driving Smart City Benchmarking: Large-Scale Machine Learning / Deep Learning 3D Vision Remote Sensing Robotics

Annotations Manual annotation? Impossible! Suppose each 500x500 image costs $1 to annotate pixel-wise labels, we need to pay $11M to create ground-truth only for the aerial images.

Annotations Manual annotation? Impossible! Suppose each 500x500 image costs $1 to annotate pixel-wise labels, we need to pay $11M to create ground-truth only for the aerial images. I m not as rich as Jensen

Annotations Manual annotation? Impossible! Suppose each 500x500 image costs $1 to annotate pixel-wise labels, we need to pay $11M to create ground-truth only for the aerial images. I m not as rich as Jensen However, humans already collect rich knowledge about the world!

Annotations Manual annotation? Impossible! Suppose each 500x500 image costs $1 to annotate pixel-wise labels, we need to pay $1139200 to create ground-truth only for the aerial images. I m not as rich as Jensen Humans already collect rich knowledge about the world! Use maps!

Map as Annotations HD Map Maps

Map as Annotations HD Map 3D Building Maps

Map as Annotations HD Map 3D Building Meta Data Maps

Together, the rich sources of data enable a plethora of exciting tasks!

Building Footprint Extraction

Road Curb and Centerline Extraction

Building Instance Segmentation

Zoning Prediction Commercial Residential Institutional

Technical Difficulties Mis-alignment and Data Noise Aerial-ground images mis-alignment from raw GPS location data Road centerline is shifted Building s shape/location is not accurate

Data Pre-processing and Alignment Appearance based Ground-aerial Alignment Before Alignment After Alignment

Data Pre-processing and Alignment Instance-wise Aerial-map Alignment Before alignment

Data Pre-processing and Alignment Instance-wise Aerial-map Alignment After alignment

Data Pre-processing and Alignment Robust Road Surface Generation Input Road Curb and Centreline (Noisy) Polygonized Road Surface

Pilot Study with Neural Networks Building Contour and Road Curb/Centerline Extraction GT ResNet

Pilot Study with Neural Networks Semantic Segmentation Method Road Building Mean FCN 74.94% 73.88% 74.41% ResNet-56 82.72% 78.80% 80.76% Metric: Intersection-over-union (IOU), higher is better

Pilot Study with Neural Networks Building Instance Segmentation Input DWT

Pilot Study with Neural Networks Building Instance Segmentation Method Weighted Coverage Average Precision Recall-50% Precision-50% FCN 41.92% 11.37% 21.50% 36.00% ResNet-56 40.65% 12.13% 18.90% 45.36% Deep Watershed Transform 56.22% 21.22% 67.16% 63.67% Metric: Weighted Coverage, AP, Precision-50%, Recall-50%, higher is better

Pilot Study with Neural Networks Building Instance Segmentation Method Weighted Coverage Average Precision Recall-50% Precision-50% FCN 41.92% 11.37% 21.50% 36.00% ResNet-56 40.65% 12.13% 18.90% 45.36% Deep Watershed Transform 56.22% 21.22% 67.16% 63.67% Join the other talk today to know more about the deep watershed instance segmentation: Wednesday, May 10, 4:00 PM - 4:25 PM Room 210G

Pilot Study with Neural Networks Ground-view Road Segmentation True Positive: Yellow; False Negative: Green; False Positive: Red

Pilot Study with Neural Networks Ground-view Road Segmentation Method Non-Road IOU Road IOU Mean IOU FCN 97.3% 95.8% 96.5% ResNet-56 97.8% 96.6% 97.2% Metric: Intersection-over-Union, higher is better

Pilot Study with Neural Networks Ground-view Zoning Classification Top-1 Accuracy Method From-Scratch Pre-trained from ImageNet AlexNet 66.48% 75.49 GoogLeNet 75.08% 77.95% ResNet 75.65% 79.33% Metric: Top-1 Accuracy, higher is better

Statistics # of buildings: 397846 Total area: 712.5 km 2 Total length of road: 8439 km

Statistics Building height distribution Zoning type distribution

Conclusion We propose a large dataset with from different views and sensors Maps are used to create GT annotations In future we have many more exciting tasks to come Check our paper for more details: https://arxiv.org/abs/1612.00423 Data available soon. Stay tuned and welcome to over-fit Join the other talk today to know more about the deep watershed instance segmentation: Wednesday, May 10, 4:00 PM - 4:25 PM Room 210G