Photo Tourism: Exploring Photo Collections in 3D

Click! Click! Oooo!! Click! Zoom click! Click! Some other camera noise!! Photo Tourism: Exploring Photo Collections in 3D Click! Click! Ahhh! Click! Click! Overview of Research at Microsoft, 2007 Jeremy Kuzub Presented to Systems and Computer Engineering Faculty, Carleton University, 2007

A stack of photos. is not how we see reality! Imagine exploring a 3D space where each photo is hung as a window into the real 3D environment:

Challenge: Where was this photo taken? Can we algorithmically derive the 3D spatial relation of a large group of 2D images? Can we use this to generate an interactive photo browser that puts photos in spatial context?

More Challenge: Can the group of photos include: different cameras different times of day and year Different viewing angles, focal lengths Various text annotations (flickr, Google Images, etc.) Can the data set be from reality and not from controlled tests?

The New Photo Collection Interface: Scene visualization. Fly around popular world sites in 3D by morphing between photos. Object-based photo browsing. Show me more images that contain this object or part of the scene. Where was I? Tell me where I was when I took this picture. What am I looking at? Tell me about objects visible in this image by transferring annotations from similar images. Natural Transitions between photos Find and use spatial relations Derive original camera location Identify objects within photos

And if your Mom asks? Mom query: What do you do in that lab all day with computers? Answer: I am a specialist in image-based modeling (IBM) and image-based rendering (IBR)! I specialize in synthesizing new views of a scene from a set of input photographs, Mom!

The Process: Image Annotations ( Cool Statue! etc.) IB Modeling IB Rendering Photo collection context and understanding! Photo Collection User Navigation System

Image Annotations Image Based Modeling Process: IB Modeling IB Rendering Photo collection context and understanding! Reconstructing cameras from photo pairs Photo Collection User Navigation System Unique Feature Identification SIFT algorithm Uniquely identifiable points in image 1 and 2 matching of points between images 1 and 2 Nearest-neighbor algorithm Matched point pairs in images 1 and 2 Determine best motion vectors between matched points RANSAC algorithm Unique point motion tracks between images 1 and 2 Reconstruct Camera parameters and 3D location from tracks Structure From Motion algorithm

Example

From photo pairs To photo sets New photos are added to existing photo pairs one at a time this reduces computational complexity The new photo must have some common unique points with the existing photo pair Bundle-adjustment is used to make sure all three cameras agree on photo locations This becomes computationally intensive with larger photo sets as agreement is sought for ALL cameras sharing common points. (minutes to hours to days for bundle-adjustment)

Pinning down all cameras Once the locations of cameras are determined relative to one another it is beneficial to lock down the entire scene in absolute space (i.e. which way is north?) A single camera or point in the group with associated GPS coordinates can lock down all cameras, but more data leads to more accuracy. Allows better context for a photo set: matching of photosets to existing datasets, such as geo-survey data or existing 3D geometry.

Image Annotations Image Based Rendering Process: IB Modeling IB Rendering Photo collection context and understanding! Reconstructing the scene in an intuitive way Photo Collection User Navigation System Database of each photo s camera location in 3D space (From Image-based Modeling) User interface and navigation 3D rendering engine Context and Understanding! Photo set

User interface: Flying 3D navigation Top-down map View of current photo Spatially related photos to current photos Annotations for objects within photos.

Navigation Window: Representing a sparse 3D space The 3D structure of the space derived from photos is a point cloud This sparse point cloud gives and impression of the environment espectially during transitions between camera locations photos fill in the details Removed the difficulty of texture mapping a full environment more flexibility of input image set (flickr, google etc) Space Shuttle image set point cloud representation with camera frustra

Alternate presentation Camera frustra, semi-transparent photos, and line segments from SfM ( Not ultimately selected for productized version of Photosynth)

Photo Transitions Tweening Path Camera 2 Camera 1 Movement from one camera (1 photo) to another: Linear Interpolation of camera position and rotation Smooth motion using acceleration and deceleration Smooth change in camera focal length during movement Distort images as planes in 3D during transition

Sample Transition between images

Annotation Transfer Original Annotations (Flickr, Facebook, etc) Annotations Algorithmically transferred to other photos Leverages common identifiable points (from SIFT algorithm) in multiple images. Annotations covering a set of points in one image can be transferred to other images that contain those same points Algorithm must determine what points are part of the Annotated area (points too close or far from the camera (out of plane) must be eliminated Algorithm to transfer annotations to other photos uses a weighting function based on : High number of relevant points in the photo Best viewing angle of all points The Annotated object occupies a high percentage of the photo

Time as a Dimension Using SIFT, common identifiable points in photos can be found with some invariance to weather, lighting, time of day, season, year. Caveat: SIFT is only partially invariant to lighting changes Images matched in this way can allows a user to navigate through time as well as space.

Image sorting by Similarity Images are ordered in such a way that adjacent images share the highest number of common identifiable points Allows a user to navigate in a way that is most similar to touring (walking, etc.) Thumbnail images along bottom of interface similar to conventional image browsing. Adjacent photos with 4 common points

Navigation Tool Zoom-out Search Function: find other photos which contain all the points of the current photo Criterion for success: The bounding box of all the these points must appear smaller in the candidate Zoom-Out photo Can also function as a find details or zoom-in function. Original photo with 4 points Candidate zoomed-out photo

Limitations Speed of bundle adjustment it can take hours or days to fit the SIFTdetermined keypoints. This increases with the number of photos in a set. Fortunately this is a one-time process and does not effect photo browsing performance. Representation of 3D space: Only keypoints are rendered in 3D space It is impractical to map photos as textures on to 3D geometry, since the photo set is incomplete and camera parameters cannot be determined accurately enough.

Advantages Images sets can be from the wild Flickr, Google, Facebook any online photo database Robust and intuitive navigation system gives spatial context to large sets of photos Photo sets can be locked to geographic data after-the-fact. Addition text annotation of objects within photos can be automatically transferred to other photos of the same object Photos can be added to the sets organically at a later date Wow-factor

</presentation> Pretty Amazing Demos: Google: photosynth Photo Tourism: Exploring Photo Collections in 3D Noah Snavely University ofwashington Steven M. Seitz University of Washington Richard Szeliski Microsoft Research