Browsing: Query Refinement and Video Synopsis

Similar documents
Video Synopsis and Indexing

EVERYONE is familiar with the time-consuming activity

3/1/2010. Acceleration Techniques V1.2. Goals. Overview. Based on slides from Celine Loscos (v1.0)

C E N T E R A T H O U S T O N S C H O O L of H E A L T H I N F O R M A T I O N S C I E N C E S. Image Operations I

Introduction to Medical Imaging (5XSA0) Module 5

Mesh segmentation. Florent Lafarge Inria Sophia Antipolis - Mediterranee

ITERATIVE COLLISION RESOLUTION IN WIRELESS NETWORKS

Motion in 2D image sequences

COMPUTER VISION > OPTICAL FLOW UTRECHT UNIVERSITY RONALD POPPE

Regularization and Markov Random Fields (MRF) CS 664 Spring 2008

What have we leaned so far?

Marcel Worring Intelligent Sensory Information Systems

Combinatorial optimization and its applications in image Processing. Filip Malmberg

Markov Networks in Computer Vision

Detecting and Identifying Moving Objects in Real-Time

CS448f: Image Processing For Photography and Vision. Graph Cuts

Topologies. Maurizio Palesi. Maurizio Palesi 1

Markov Networks in Computer Vision. Sargur Srihari

CS 664 Segmentation. Daniel Huttenlocher

Machine Learning for Video-Based Rendering

Broad field that includes low-level operations as well as complex high-level algorithms

CHAPTER 5 MOTION DETECTION AND ANALYSIS

Wook Kim. 14 September Korea University Computer Graphics Lab.

In this lecture, we are going to talk about image segmentation, essentially defined as methods for grouping pixels together.

Optical Flow-Based Motion Estimation. Thanks to Steve Seitz, Simon Baker, Takeo Kanade, and anyone else who helped develop these slides.

Course Evaluations. h"p:// 4 Random Individuals will win an ATI Radeon tm HD2900XT

Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation

Image Segmentation. Shengnan Wang

Segmentation Based Stereo. Michael Bleyer LVA Stereo Vision

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample

Idle Object Detection in Video for Banking ATM Applications

Applications. Foreground / background segmentation Finding skin-colored regions. Finding the moving objects. Intelligent scissors

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

A Tutorial Guide to Tribology Plug-in

Filtering and Enhancing Images

Feature Tracking and Optical Flow

Motion Estimation. There are three main types (or applications) of motion estimation:

Feature Tracking and Optical Flow

Algorithms for Markov Random Fields in Computer Vision

Network-on-chip (NOC) Topologies

CSE 417 Network Flows (pt 4) Min Cost Flows

Getting to Know Your Data

2D rendering takes a photo of the 2D scene with a virtual camera that selects an axis aligned rectangle from the scene. The photograph is placed into

Segmentation and Grouping

Volumetric Scene Reconstruction from Multiple Views

Optimal Video Adaptation and Skimming Using a Utility-Based Framework

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

Applications of Explicit Early-Z Culling

intro, applications MRF, labeling... how it can be computed at all? Applications in segmentation: GraphCut, GrabCut, demos

Topologies. Maurizio Palesi. Maurizio Palesi 1

Video Textures. Arno Schödl Richard Szeliski David H. Salesin Irfan Essa. presented by Marco Meyer. Video Textures

Norbert Schuff VA Medical Center and UCSF

Mixture Models and EM

Generative and discriminative classification techniques

Supervised texture detection in images

Subdivision Of Triangular Terrain Mesh Breckon, Chenney, Hobbs, Hoppe, Watts

EE795: Computer Vision and Intelligent Systems

A Robust Wipe Detection Algorithm

Applied Bayesian Nonparametrics 5. Spatial Models via Gaussian Processes, not MRFs Tutorial at CVPR 2012 Erik Sudderth Brown University

10708 Graphical Models: Homework 4

Semi-supervised learning and active learning

Announcements. Image Segmentation. From images to objects. Extracting objects. Status reports next Thursday ~5min presentations in class

Some material taken from: Yuri Boykov, Western Ontario

Prof. Fanny Ficuciello Robotics for Bioengineering Visual Servoing

Object Extraction Using Image Segmentation and Adaptive Constraint Propagation

LightSlice: Matrix Slice Sampling for the Many-Lights Problem

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, 2013 ISSN:

EE795: Computer Vision and Intelligent Systems

Understanding Geospatial Data Models

Motion and Optical Flow. Slides from Ce Liu, Steve Seitz, Larry Zitnick, Ali Farhadi

Deep Learning For Video Classification. Presented by Natalie Carlebach & Gil Sharon

EEM 463 Introduction to Image Processing. Week 3: Intensity Transformations

70 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 6, NO. 1, FEBRUARY ClassView: Hierarchical Video Shot Classification, Indexing, and Accessing

Video Alignment. Literature Survey. Spring 2005 Prof. Brian Evans Multidimensional Digital Signal Processing Project The University of Texas at Austin

Applications. Oversampled 3D scan data. ~150k triangles ~80k triangles

BSB663 Image Processing Pinar Duygulu. Slides are adapted from Selim Aksoy

Panoramic Video Texture

Subdivision. Outline. Key Questions. Subdivision Surfaces. Advanced Computer Graphics (Spring 2013) Video: Geri s Game (outside link)

Colour Segmentation-based Computation of Dense Optical Flow with Application to Video Object Segmentation

CS 4495 Computer Vision Motion and Optic Flow

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

Stereo Vision II: Dense Stereo Matching

Peripheral drift illusion

What is GOSET? GOSET stands for Genetic Optimization System Engineering Tool

Piecewise-Planar 3D Reconstruction with Edge and Corner Regularization

Video Alignment. Final Report. Spring 2005 Prof. Brian Evans Multidimensional Digital Signal Processing Project The University of Texas at Austin

Human Body Recognition and Tracking: How the Kinect Works. Kinect RGB-D Camera. What the Kinect Does. How Kinect Works: Overview

Convexization in Markov Chain Monte Carlo

Light Field Occlusion Removal

Segmentation: Clustering, Graph Cut and EM

Region-based Segmentation

EXAM SOLUTIONS. Image Processing and Computer Vision Course 2D1421 Monday, 13 th of March 2006,

Models for grids. Computer vision: models, learning and inference. Multi label Denoising. Binary Denoising. Denoising Goal.

Algorithms for Data Science

Definition, Detection, and Evaluation of Meeting Events in Airport Surveillance Videos

Motion Capture & Simulation

Displacement estimation

Clustering: Classic Methods and Modern Views

Level-set MCMC Curve Sampling and Geometric Conditional Simulation

Coarse-to-fine image registration

Transcription:

April 23, 2009

CuZero: Frontier of Interactive Visual Search

Goals Finding interesting content in video and images Solution Better video and image search Video synopsis to quickly scan long video

Current Situation Video Long and dull, http://152.3.114.19/view/view.shtml

Need to shorten clip without cutting too many frames Need to only cut out unimportant frames Need to handle lighting and scenery changes Need to handle never ending video

Current Situation - Video Search Query: you re yes and you re no you re up and you re down

Minimal Search Requirements Video Need to know content of video/images Need to understand dialog (video) Need to have results containing all arguments Allow user to specify they mean real animals Specify view of object/animal they are interested in (images)

Current Situation - Image Search

Algorithm Results s Outline CuZero: Frontier of Interactive Visual Search

Algorithm Results s Employs a unique query process that allows zero-latency query formation for an informed human search. Relevant visual concepts discovered from various strategies are automatically recommended in real time... Also introduces a new intuitive visualization system.

Algorithm Results s Demo GeoTag Columbia

Algorithm Results s Average Precision comparison No user provided labels and performed in 1/3 the time

Algorithm Results s Summary Combine existing conceptual resources Use concept information to assist in query formation Visualize results Plot results to allow for combining concepts Allow for advanced queries to form ( geo info, etc )

Algorithm Results s Pluses Zero latency process to aide in query formation Interactively choose best query suggestion Demonstrates interactive and dynamic weighting allows for results to be found in less time Asynchronous updates for speedy results.

Algorithm Results s Potential Minuses Works on a small domain Concept map gets cluttered quickly Doesn t address any computer vision problems Is keyword to concept mapping the right paradigm? Can the automatic analytics scale? Authors want Automated Alert

Outline CuZero: Frontier of Interactive Visual Search Approach Algorithm Display Strategies Evaluation and Results CuZero: Frontier of Interactive Visual Search

Approach Algorithm Display Strategies Evaluation and Results Novel An original approach to relevance feedback based on Graph-Cut Incorporates unlabeled data

Approach Algorithm Display Strategies Evaluation and Results Unique vs Non-Unique categories

Olivetti CuZero: Frontier of Interactive Visual Search Approach Algorithm Display Strategies Evaluation and Results

Approach Algorithm Display Strategies Evaluation and Results An example of an RF session on Corel database. First results found after submitting the top left image as a query (left) the result after 5 iterations

Approach Algorithm Display Strategies Evaluation and Results Basic Approaches Query by Example Relevance Feedback user labels subset of images as +/- based on unknown metric

Approach Algorithm Display Strategies Evaluation and Results Model image set topology ( include unlabeled ) using a graph Label images with binary class labels Partition using min-cuts which is strictly equivalent to minimizing an Energy function containing: A fidelity term ensuring the consistency of labels of partition ( provided by the user ) A regularization term ensuring that neighboring data are likely the same label

Approach Algorithm Display Strategies Evaluation and Results Assumptions Consistent user Decision boundary is likely to be in low density regions of the input space

Approach Algorithm Display Strategies Evaluation and Results Present initial display - perhaps random - which user labels Train a decision algorithm Choose new display ( techniques discussed later )

Energy Function Approach Algorithm Display Strategies Evaluation and Results Where the first term ( fidelity ) measures the error when mislabeling a training sample. Second term ( regularizer ) ensures that training samples in the neighborhood of X i are assigned the same ( or close ) label. They use a triangle kernel to measure image differences and use a Gaussian to normalize these between zero and one. Because they have this continuous distribution it can be plugged in to the Generalized Potts Model.

Approach Algorithm Display Strategies Evaluation and Results When labeled, Image to Sink or Source links are weighted as infinity

Approach Algorithm Display Strategies Evaluation and Results Display Strategies Exploitation - Select in order to refine the current estimate Choose unlabeled images on min-cut edges ( efficient for single mode searches ) Exploration - Find uncharted Territory Randomly select far from decision boundary Combination - Choose a balance Take a fraction of each

Exploitation vs Exploration Approach Algorithm Display Strategies Evaluation and Results Exploitation Exploration

Evaluation CuZero: Frontier of Interactive Visual Search Approach Algorithm Display Strategies Evaluation and Results Let K be the cardinality of the classes of interest. Let Z t be a random variable standing for the total number of relevant images until iteration t. E(Z t ) = K r=1 rp(z t = r) Also measure performance by the balanced generalization error of the classifier f t at iteration t.

Approach Algorithm Display Strategies Evaluation and Results Recall vs Iterations dependent on Neighborhood size ( topology information ) Olivetti (top) and Swedish (bottom) Far from ideal for Recall

Approach Algorithm Display Strategies Evaluation and Results Display strategies dependent on class types Olivetti (top) and Swedish (bottom)

Approach Algorithm Display Strategies Evaluation and Results Corel Largest disparity for Exploration, but combined shows steady growth

Approach Algorithm Display Strategies Evaluation and Results Olivetti (top) and Swedish (bottom) Graph-Cut error rates are consistently best

Approach Algorithm Display Strategies Evaluation and Results Corel Graph-cuts in the lead, but we stop at 30 iterations Error rate is very choppy...

Approach Algorithm Display Strategies Evaluation and Results Summary Use an image to initialize a Query Choose combination Exploit/Explore images Create Sink/Source infinity links when labeled Cut and Iterate

Approach Algorithm Display Strategies Evaluation and Results Summary Use an image to initialize a Query Choose combination Exploit/Explore images Create Sink/Source infinity links when labeled Cut and Iterate Questions/Issues Display choice is dependent on the type of data. Exploration is never the best strategy, maybe if data was noisier? 30 Iterations ( too much? too little? )

Outline CuZero: Frontier of Interactive Visual Search Region based Object Based Special Cases Cost Extensions CuZero: Frontier of Interactive Visual Search

Region based Object Based Special Cases Cost Extensions Goal: Create video synopsis of movies, shortening long movies for quick viewing ( http://www.vision.huji.ac.il/video-synopsis/ Billiards)

Region based Object Based Special Cases Cost Extensions Differences from previous work The video synopsis is itself a video, expressing the dynamics of the scene Reduce as much spatiotemporal redundancy as possible Others often fast-forward or skip frames

Recombination Region based Object Based Special Cases Cost Extensions

Example of splicing Region based Object Based Special Cases Cost Extensions

Region based Object Based Special Cases Cost Extensions Two approaches Region based Object based

Region based Object Based Special Cases Cost Extensions Requirements Synopsis is substantially shorter than the original video Maximum activity ( interest ) from original video should appear in synopsis Object dynamics should be preserved Visible seams and fragmented objects avoided

Energy Equations Region based Object Based Special Cases Cost Extensions E(M) = E a(m) + αe d (M) Activity of a pixel, χ(x, y, t) = I (x, y, t) B(x, y, t) Activity loss, E a(m) = P (x,y,t) I χ(x, y, t) P (x,y,t) S χ(x, y, M(x, y, t)) Discontinuity cost, E d (M) = P (x,y,t) S Pi S((x, y, t) + e i) I ((x, y, M(x, y, t)) + e i ) 2 So across all pixels E a(m) = P x,y (P K t=1 χ(x, y, t) P K t=1 χ(x, y, M(x, y) + t)) and E d (M) = P P P K x,y i t=1 S((x, y, t) + e i) I ((x, y, M(x, y) + t) + e i ) 2 Where e i are the six unit vectors representing the six spatiotemporal neighbors

Region based Object Based Special Cases Cost Extensions Ensure that the neighborhoods of A and B are similar when moving between Image and Background. This is ensured on the right by restricting consecutive synopsis pixels to come from consecutive input pixels. Q: How are regions selected?

CuZero: Frontier of Interactive Visual Search Region based Object Based Special Cases Cost Extensions Construct background I I temporal median light to dark in 4 min chunks ( surveillance cameras ) Background subtraction and min-cut isolated objects

Action tubes Region based Object Based Special Cases Cost Extensions

Action tubes Region based Object Based Special Cases Cost Extensions

New Energy Region based Object Based Special Cases Cost Extensions New equation accounts for stitching cost E(M) = b B E a(ˆb) + b,b B (αe t(ˆb, ˆb ) + βe c (ˆb, ˆb )) Where E a is activity cost E t is temporal consistency E c is collision cost.

Region based Object Based Special Cases Cost Extensions Activity cost: penalize for object not in synopsis giving partial credit for objects cut off for lack of time Collision Cost: Sum of multiplied activities over shared time sequence Temporal consistency cost: Interaction diminishes exponentially with time

Energy Minimization Region based Object Based Special Cases Cost Extensions The global energy function described earlier allows us to represent as a MRF which can be optimized via Belief propagation or graph cuts. They use an unspecified greedy algorithm.

Region based Object Based Special Cases Cost Extensions Stroboscopic and Panoramic - Long Tubes

Region based Object Based Special Cases Cost Extensions Stroboscopic and Panoramic - Long Tubes

Region based Object Based Special Cases Cost Extensions Stroboscopic and Panoramic - Obj Tracking Coherent background and chopped up video

Region based Object Based Special Cases Cost Extensions Endless Video Goal is in part to be fast for querying Online Create background by temporal medians Object ( tube ) detection and creation Create queue of objects Remove objects if queue is full Query stage Create time lapse background Select tubes and compute optimal temporal arrangement Stitch

Region based Object Based Special Cases Cost Extensions Removing from obj queue ( Estimating obj importance ) importance : activity value from earlier collision cost : sum of active pixels normalized and spatial distribution for obj compared for correlation age : Assume density of objects in queue should decrease exponentially N t = K 1 σ e t σ

Collision cost Region based Object Based Special Cases Cost Extensions Correlation between the two activity traces provides collision cost

Region based Object Based Special Cases Cost Extensions Synopsis generation Generating background video Consistency cost computed for each object for each possible time Energy minimization determines which tubes appear and at what times Combine tubes with background

Region based Object Based Special Cases Cost Extensions Time lapse background contradiction Goal create background of the full time of recording and background of activities Solution Create Temporal histogram of activity and one of uniform time Interpolate to create actual video histogram

Background consistency Region based Object Based Special Cases Cost Extensions Want object to background consistency so new equation introduces a difference from background component to the energy function Additionally, less than perfect segmentation so when stitching there is blending

In Application Region based Object Based Special Cases Cost Extensions All the weighted components of the energy function allows users to vary variables and role of background vs scene or type of object

Phase transition weighting Region based Object Based Special Cases Cost Extensions Background objects will appear and disappear for no reason Moving objects will disappear when stopped ( causes flickering ) ( phase transitions should be inserted into background at original time )

Region based Object Based Special Cases Cost Extensions Object extraction ( governed by min-cut ) is done in parallel and possible in hardware 3GHZ 320x240 runs at 10 fps Most expensive is collision cost, every relative shift between pairs of objects K objects over T time steps or T K 2 Solutions Coarse intervals Lower resolution Bounding boxes

Region based Object Based Special Cases Cost Extensions Actual times for cost computation 334,000 frames ( 24hr parking ) with 262 objects becomes 450 frames in 65 seconds 100,000 frames ( 30hr airport ) with 500 objects requires 80 seconds There are T K possible temporal arrangements Convergence in parking example 59s and Airport 290s In general they throw out objects of low likelihood so airport goes from 1,917 objects to 500 from above

Novel Create object tubes Region based Object Based Special Cases Cost Extensions Create Median backgrounds and subtract Find best min collision video for a given synopsis length

Novel Create object tubes Region based Object Based Special Cases Cost Extensions Create Median backgrounds and subtract Find best min collision video for a given synopsis length System changes Small motions ( leaves ) or no motion large animals ( bears) are important Have tubes occlude each other based on their spatial location in scene

Novel Create object tubes Region based Object Based Special Cases Cost Extensions Create Median backgrounds and subtract Find best min collision video for a given synopsis length System changes Small motions ( leaves ) or no motion large animals ( bears) are important Have tubes occlude each other based on their spatial location in scene User input Specify duration of the video synopsis and percentage of objects and try to minimize collisions Specify percentage of objects and penalty for collision so you optimize duration

Outline CuZero: Frontier of Interactive Visual Search CuZero: Frontier of Interactive Visual Search

Graph-Cut Query by image Arbitrary set of images CuZero Start with text and then allow ranking Those with trained concept categories All systems are trying to enable you to find content faster, but they work on different medium and sources.