GPU Accelerating Speeded-Up Robust Features Timothy B. Terriberry, Lindley M. French, and John Helmsen

Similar documents
SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014

Chapter 3 Image Registration. Chapter 3 Image Registration

SURF. Lecture6: SURF and HOG. Integral Image. Feature Evaluation with Integral Image

Feature Descriptors. CS 510 Lecture #21 April 29 th, 2013

Implementation and Comparison of Feature Detection Methods in Image Mosaicing

SCALE INVARIANT FEATURE TRANSFORM (SIFT)

Accelerated Wide Baseline Matching using OpenCL

SIFT Descriptor Extraction on the GPU for Large-Scale Video Analysis. Hannes Fassold, Jakub Rosner

Scale Invariant Feature Transform

Scale Invariant Feature Transform

Features Points. Andrea Torsello DAIS Università Ca Foscari via Torino 155, Mestre (VE)

The SIFT (Scale Invariant Feature

Outline 7/2/201011/6/

Reconstruction of Images Distorted by Water Waves

EE368 Project Report CD Cover Recognition Using Modified SIFT Algorithm

Data-Parallel Algorithms on GPUs. Mark Harris NVIDIA Developer Technology

Augmented Reality VU. Computer Vision 3D Registration (2) Prof. Vincent Lepetit

BSB663 Image Processing Pinar Duygulu. Slides are adapted from Selim Aksoy

EECS150 - Digital Design Lecture 14 FIFO 2 and SIFT. Recap and Outline

Local Feature Detectors

Object Detection by Point Feature Matching using Matlab

Motion illusion, rotating snakes

CS 4495 Computer Vision A. Bobick. CS 4495 Computer Vision. Features 2 SIFT descriptor. Aaron Bobick School of Interactive Computing

Dominic Filion, Senior Engineer Blizzard Entertainment. Rob McNaughton, Lead Technical Artist Blizzard Entertainment

Overview. Videos are everywhere. But can take up large amounts of resources. Exploit redundancy to reduce file size

Comparison of Feature Detection and Matching Approaches: SIFT and SURF

Fast Scale Invariant Feature Detection and Matching on Programmable Graphics Hardware

CS5670: Computer Vision

Image Processing Tricks in OpenGL. Simon Green NVIDIA Corporation

Local Features: Detection, Description & Matching

Direct Rendering. Direct Rendering Goals

Motion Estimation and Optical Flow Tracking

Lecture 6: Texture. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011)

Using GPUs to compute the multilevel summation of electrostatic forces

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology

CS 378: Autonomous Intelligent Robotics. Instructor: Jivko Sinapov

Computer Graphics. Texture Filtering & Sampling Theory. Hendrik Lensch. Computer Graphics WS07/08 Texturing

Optimizing DirectX Graphics. Richard Huddy European Developer Relations Manager

Scott Smith Advanced Image Processing March 15, Speeded-Up Robust Features SURF

3D from Photographs: Automatic Matching of Images. Dr Francesco Banterle

Local Image Features

AK Computer Vision Feature Point Detectors and Descriptors

Optimizing for DirectX Graphics. Richard Huddy European Developer Relations Manager

CS427 Multicore Architecture and Parallel Computing

School of Computing University of Utah

GeForce3 OpenGL Performance. John Spitzer

Feature Based Registration - Image Alignment

Image Matching. AKA: Image registration, the correspondence problem, Tracking,

CS 563 Advanced Topics in Computer Graphics QSplat. by Matt Maziarz

Hardware Acceleration of Feature Detection and Description Algorithms on Low Power Embedded Platforms

SIFT: Scale Invariant Feature Transform

Real - Time Rendering. Pipeline optimization. Michal Červeňanský Juraj Starinský

A LOW-POWER VGA FULL-FRAME FEATURE EXTRACTION PROCESSOR. Dongsuk Jeon, Yejoong Kim, Inhee Lee, Zhengya Zhang, David Blaauw, and Dennis Sylvester

Fast BVH Construction on GPUs

Computer Vision for HCI. Topics of This Lecture

Rapid Natural Scene Text Segmentation

FFT-Based Astronomical Image Registration and Stacking using GPU

Computer Vision. Exercise 3 Panorama Stitching 09/12/2013. Compute Vision : Exercise 3 Panorama Stitching

Image Features: Detection, Description, and Matching and their Applications

Here s the general problem we want to solve efficiently: Given a light and a set of pixels in view space, resolve occlusion between each pixel and

Click to edit title style

Mattan Erez. The University of Texas at Austin

SURF: Speeded Up Robust Features. CRV Tutorial Day 2010 David Chi Chung Tam Ryerson University

L10 Layered Depth Normal Images. Introduction Related Work Structured Point Representation Boolean Operations Conclusion

Key properties of local features

Object Detection Design challenges

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS

Image Features. Work on project 1. All is Vanity, by C. Allan Gilbert,

Local Image Features

Pictures at an Exhibition

CIS 665 GPU Programming and Architecture

Feature Detection. Raul Queiroz Feitosa. 3/30/2017 Feature Detection 1

CS4670: Computer Vision

Mobile Performance Tools and GPU Performance Tuning. Lars M. Bishop, NVIDIA Handheld DevTech Jason Allen, NVIDIA Handheld DevTools

Automatic Image Alignment (feature-based)

A Configurable Parallel Hardware Architecture for Efficient Integral Histogram Image Computing

Obtaining Feature Correspondences

Feature descriptors. Alain Pagani Prof. Didier Stricker. Computer Vision: Object and People Tracking

PowerVR Hardware. Architecture Overview for Developers

FPGA-Based Feature Detection

A Comparison of SIFT and SURF

Category vs. instance recognition

Fast Natural Feature Tracking for Mobile Augmented Reality Applications

A Comparison of SIFT, PCA-SIFT and SURF

Implementing the Scale Invariant Feature Transform(SIFT) Method

TA Section 7 Problem Set 3. SIFT (Lowe 2004) Shape Context (Belongie et al. 2002) Voxel Coloring (Seitz and Dyer 1999)

Real-Time Hair Simulation and Rendering on the GPU. Louis Bavoil

Face Recognition using SURF Features and SVM Classifier

Render-To-Texture Caching. D. Sim Dietrich Jr.

Object Recognition Algorithms for Computer Vision System: A Survey

Local features and image matching. Prof. Xin Yang HUST

GeForce4. John Montrym Henry Moreton

Lecture 25: Board Notes: Threads and GPUs

Harder case. Image matching. Even harder case. Harder still? by Diva Sian. by swashford

Prof. Feng Liu. Spring /26/2017

How to Work on Next Gen Effects Now: Bridging DX10 and DX9. Guennadi Riguer ATI Technologies

CIS 665 GPU Programming and Architecture

Image matching. Announcements. Harder case. Even harder case. Project 1 Out today Help session at the end of class. by Diva Sian.

Architectures. Michael Doggett Department of Computer Science Lund University 2009 Tomas Akenine-Möller and Michael Doggett 1

Object Category Detection: Sliding Windows

Transcription:

GPU Accelerating Speeded-Up Robust Features Timothy B. Terriberry, Lindley M. French, and John Helmsen

Overview of ArgonST Manufacturer of integrated sensor hardware and sensor analysis systems 2 RF, COMINT, ELINT, EO/IR, LIDAR, Multispectral, Hyperspectral, Acoustic Research Group Focus Artificial Intelligence and Machine Learning Automated Scene Understanding Visual Navigation

Au COIN Project 3 Automated Understanding via Collective Image Navigation Advanced, ultra-tight coupling methods for visual navigation Partnered with the Air Force Institute of Technology This research: Front-end processing

Outline 4 Introduction Overview of SURF GPU Implementation Details Results Conclusion

Robust Image Features Summarize by a small number of interest points Less data More entropy Robust features Relatively insensitive to view changes Match more reliably SURF (Bay et. al 2006) 5 Scale, rotation, affine, perspective, etc. Simple to compute, small features

SURF: Detection 6 Use determinant of Hessian Components are convolution of the image with Gaussian derivatives Approximate these with box filters Very easy to compute (constant time) Does not impair detection

SURF: Detection 7 Scale invariance Run detector at many scales Take the 3 3 3 local maxima Fit a quadratic patch to get sub-pixel resolution Rotation invariance Compute local orientation of the image near the interest point Compute descriptor relative to the local coordinate system

SURF: Description Split region around feature into 16 bins Each bin: sum 25 high-frequency Haar wavelet responses in x and y Also sum magnitude of responses ( dx, dy, dx, dy ) 16 = 64 dimensions 8 Same or better matching performance as SIFT (128 dimensions) Contrast invariance: normalize to a unit vector

Outline 9 Introduction Overview of SURF GPU Implementation Details Results Conclusion

Implementation Details Target platform: GeForce Go 7950 GTX OpenGL+Cg instead of CUDA No 32-bit integer textures No hardware blending of 32-bit floats Performance target 10 fps at 1280x1024 (speed of the camera) Bottleneck: memory bandwidth 10 Computation is almost free Texture lookups are expensive

Integral Image Computation The Integral Image allows constant time summation over arbitrarily large regions Each pixel contains the sum of all the values in the original image to the left and above it The sum of any rectangular region can be computed with four lookups 11

1-D Parallel Approach 12 Sum across columns in parallel, then across rows ~1000-degree parallelism (good) ~2000 passes (not as bad as you'd think) Ping-Pong between two textures to avoid readafter write dependencies Bad: Texture cache is 2-D (8 8 pixel blocks) Cache is flushed between rendering passes If we only use one row (column) in each rendering pass, we're wasting 7/8 of the memory bandwidth

2-D Parallel Approach (Blelloch) Sum within a column (row) in parallel as well Two phase approach, O(log n) passes each Upsweep: Collects local sums Downsweep: Distributes cumulative sums 13

Moment Pyramid Algorithm Blelloch still sums across columns, then rows Can we sum in both directions at once? To generate an integral image from a ¼ resolution integral image, need 4 pieces: Sum of upperleft region for odd x, odd y Sum of left row for odd x, even y 14 Sum of upper column for even x, odd y Original pixel for even x, even y

Moment Pyramid Algorithm Sum of upperleft region for odd x, odd y Sum of left row for odd x, even y Original pixel for even x, even y Where do we get the row/column sums? Output three values during upsweep 15 Sum of upper column for even x, odd y Sum of all 4 values, sum of 2 odd x, sum of 2 odd y Apply Blelloch's algorithm to make row/column sums on each level cumulative

Moment Pyramid Algorithm Downsweeps: Distribute cumulative sums Upsweeps: Collect local sums 16

Moment Pyramid Algorithm Why is this actually faster? Only read/write a full-sized image once Extremely good cache use More reads than texels fetched Algorithm 1D Ping Pong 2D Blelloch Reads 4N 12.5% 2N (2N) 4N 1.00 10N 100.0% 4N (6N) 6N 3.89 (4.33N) 4.33N 4.63 2D Moment Pyramid 7.67N 17 Cache Real Adds (effective) Writes Speed up Efficiency 109.5% 3.33N

Box Filters Gaussian derivatives for feature location Applied at many different resolutions and scales 18 Identify both position and size

Box Filters Simple implementation requires 32 lookups 19 Too many!

Box Filters Simple implementation requires 32 lookups Many differences separated by common offsets Compute differences for all pixels in one pass Reuse each result for several pixels in another pass Can easily reduce to 17 lookups per scale 47% reduction in running time Do 3 scales at once: 13.33 per scale 20 Too many! 77% reduction in running time

Box Filter Sampling Locations Pass 1 21 Pass 2 Pass 4 Pass 3 Pass 5

Point Location Compute Hessian determinant from box filters Find 3 3 3 local maxima over threshold 22 Multiple passes of EarlyZ culling Tried stencil buffer approach, but had driver problems on Linux Convert to a flat array of coordinates using the HistoPyramid algorithm (Ziegler et al. 2006)

Orientation Detection Compute HF Haar responses in a 6s radius Sort by angle, use sliding window to find max 23 Sorting on a GPU is about as slow as on a CPU! Don't sort: histogram R2VB (Scheuermann & Hensley 2007) RMS error 0.20 using 256 bins Sum sliding window with Blelloch's algorithm

Feature Descriptor Need oriented Haar responses 24 Can only sum over rectangular regions Compute axis aligned responses Rotate the resulting vector

Outline Introduction Overview of SURF GPU Implementation Details Results Conclusion 25

Framerate vs. Resolution GeForce Go 7950 GTX GeForce 8800 GTX Does not include time to transfer image to the card, as this can be done asynchronously, and affects only latency, not throughput. 26

Go 7950 GTX Performance Breakdown 50.00% 40.00% 30.00% T0=1.00 T0=0.50 T0=0.25 20.00% 10.00% 0.00% Radial Undistortion Integral Image Box Filters Point Location Orient. Detection Feature Extraction Execution time (in %) in each stage of the algorithm for various threshold levels 27

8800 GTX Performance Breakdown 50.00% 40.00% 30.00% T0=1.00 T0=0.50 T0=0.25 20.00% 10.00% 0.00% Radial Undistortion Integral Image Box Filters Point Location Orient. Detection Feature Extraction Execution time (in %) in each stage of the algorithm for various threshold levels 28

7 Series vs. 8 Series 5 Speed-up 4 3 T0=1.00 T0=0.50 T0=0.25 2 1 0 Radial Undistortion Integral Image Box Filters Point Loc. Orient. Detection Feature Extraction Overall Improvement of the 8800 GTX over the Go 7950 GTX in each stage of the algorithm for various threshold levels 29

Registration Examples 30 + + +

Outline Introduction Overview of SURF GPU Implementation Details Results Conclusion 31

Conclusion 32 Lots of pieces 2-D parallel prefix sums (integral image) Common subexpression elimination (box filters) EarlyZ culling (point location) HistoPyramid (point location) Scattered writes for histogram generation (orientation detection) 1-D parallel prefix sums (orientation detection)

Conclusion Can process video in real time on a laptop 33 New cards will only be faster Scales to high resolutions on a desktop while still real time Enables a whole host of algorithms which require robust features as input Recognition, Tracking, Structure from Motion, Visual Navigation, etc.

Future Improvements CUDA Skip the graphics pipeline No render to texture API for multipass algorithms 32-bit Integer Textures Can reduce memory bandwidth by at least half Hardware bilinear interpolation for 32-bit floats 34 Need to add an extra copy Or avoid the texture cache (a large portion of local memory) Big speed gain for Haar responses

Questions? 35