ECS 189G: Intro to Computer Vision, Spring 2015 Problem Set 3

Similar documents
Recognition. Topics that we will try to cover:

EE368/CS232 Digital Image Processing Winter

Part-based and local feature models for generic object recognition

Lecture 24: Image Retrieval: Part II. Visual Computing Systems CMU , Fall 2013

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

Indexing local features and instance recognition May 14 th, 2015

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011

Local features: detection and description May 12 th, 2015

Keypoint-based Recognition and Object Search

CS1114 Assignment 5, Part 2

CS6670: Computer Vision

CS1114 Section: SIFT April 3, 2013

Indexing local features and instance recognition May 16 th, 2017

CS 4495 Computer Vision Classification 3: Bag of Words. Aaron Bobick School of Interactive Computing

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014

Part based models for recognition. Kristen Grauman

ECE 5424: Introduction to Machine Learning

CS 231A Computer Vision (Winter 2014) Problem Set 3

CS 231A Computer Vision (Fall 2011) Problem Set 4

Bag of Words Models. CS4670 / 5670: Computer Vision Noah Snavely. Bag-of-words models 11/26/2013

CS231A Section 6: Problem Set 3

CS4495 Fall 2014 Computer Vision Problem Set 5: Optic Flow

By Suren Manvelyan,

CSE547: Machine Learning for Big Data Spring Problem Set 1. Please read the homework submission policies.

TA Section: Problem Set 4

HMC CS 158, Fall 2017 Problem Set 3 Programming: Regularized Polynomial Regression

Large-scale visual recognition The bag-of-words representation

Local Features: Detection, Description & Matching

Introduction. Introduction. Related Research. SIFT method. SIFT method. Distinctive Image Features from Scale-Invariant. Scale.

Object Recognition and Augmented Reality

Indexing local features and instance recognition May 15 th, 2018

Image classification Computer Vision Spring 2018, Lecture 18

Statement of integrity: I did not, and will not, violate the rules of academic integrity on this exam.

CSE152 Introduction to Computer Vision Assignment 3 (SP15) Instructor: Ben Ochoa Maximum Points : 85 Deadline : 11:59 p.m., Friday, 29-May-2015

Motion illusion, rotating snakes

Local Image Features

Distances and Kernels. Motivation

Efficient Representation of Local Geometry for Large Scale Object Retrieval

From Structure-from-Motion Point Clouds to Fast Location Recognition

15-110: Principles of Computing, Spring 2018

Local Features Tutorial: Nov. 8, 04

Visuelle Perzeption für Mensch- Maschine Schnittstellen

Local features: detection and description. Local invariant features

Hamming embedding and weak geometric consistency for large scale image search

CAP 5415 Computer Vision Fall 2012

Object of interest discovery in video sequences

CS 231A Computer Vision (Winter 2018) Problem Set 3

How to Request a Client using the UCC Self Serve Website. The following provides a detailed description of how to request a client...

Homework 3: Map-Reduce, Frequent Itemsets, LSH, Streams (due March 16 th, 9:30am in class hard-copy please)

Exercise 4: Image Retrieval with Vocabulary Trees

Feature Matching and Robust Fitting

General Instructions. Questions

Assignment 3 ITCS-6010/8010: Cloud Computing for Data Analysis

Midterm Wed. Local features: detection and description. Today. Last time. Local features: main components. Goal: interest operator repeatability

Basic Problem Addressed. The Approach I: Training. Main Idea. The Approach II: Testing. Why a set of vocabularies?

CS 4495 Computer Vision A. Bobick. CS 4495 Computer Vision. Features 2 SIFT descriptor. Aaron Bobick School of Interactive Computing

Image gradients and edges

Local invariant features

6.819 / 6.869: Advances in Computer Vision

CS 231A Computer Vision (Fall 2012) Problem Set 3

Feature-based methods for image matching

CIS581: Computer Vision and Computational Photography Project 4, Part B: Convolutional Neural Networks (CNNs) Due: Dec.11, 2017 at 11:59 pm

Linear Algebra Review

CS229: Action Recognition in Tennis

Patch-based Object Recognition. Basic Idea

Bias-Variance Trade-off (cont d) + Image Representations

Scale Invariant Feature Transform

University of Virginia Department of Computer Science. CS 4501: Information Retrieval Fall 2015

Lecture 12 Recognition

BossaNova at ImageCLEF 2012 Flickr Photo Annotation Task

Large Scale Image Retrieval

Local features and image matching. Prof. Xin Yang HUST

CS 6320 Computer Vision Homework 2 (Due Date February 15 th )

Fitting: Voting and the Hough Transform April 23 rd, Yong Jae Lee UC Davis

Computer Vision CSCI-GA Assignment 1.

CS143 Introduction to Computer Vision Homework assignment 1.

Today. Main questions 10/30/2008. Bag of words models. Last time: Local invariant features. Harris corner detector: rotation invariant detection

16720 Computer Vision: Homework 3 Template Tracking and Layered Motion.

Feature Matching + Indexing and Retrieval

Scale Invariant Feature Transform

Large Scale 3D Reconstruction by Structure from Motion

Video Google: A Text Retrieval Approach to Object Matching in Videos

Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in class hard-copy please)

CSE 152 : Introduction to Computer Vision, Spring 2018 Assignment 5

CSE 252A Computer Vision Homework 3 Instructor: Ben Ochoa Due : Monday, November 21, 2016, 11:59 PM

CS 231A Computer Vision (Fall 2012) Problem Set 4

Mobile Visual Search with Word-HOG Descriptors

Beyond bags of Features

Local Image Features

Multiple-Choice Questionnaire Group C

Object Recognition Tools for Educational Robots

Large scale object/scene recognition

EE368 Project Report CD Cover Recognition Using Modified SIFT Algorithm

Structure from Motion

CS 223B Computer Vision Problem Set 3

Image Retrieval with a Visual Thesaurus

Discriminative classifiers for image recognition

University of Wisconsin-Madison Spring 2018 BMI/CS 776: Advanced Bioinformatics Homework #2

CSE 361 Fall 2017 Lab Assignment L2: Defusing a Binary Bomb Assigned: Wednesday Sept. 20 Due: Wednesday Oct. 04 at 11:59 pm

Preliminary Local Feature Selection by Support Vector Machine for Bag of Features

Transcription:

ECS 189G: Intro to Computer Vision, Spring 2015 Problem Set 3 Instructor: Yong Jae Lee (yjlee@cs.ucdavis.edu) TA: Ahsan Abdullah (aabdullah@ucdavis.edu) TA: Vivek Dubey (vvkdubey@ucdavis.edu) Due: Wednesday, June 3 rd, 11:59 PM Instructions 1. Answer sheets must be submitted on SmartSite. Hard copies will not be accepted. 2. Please submit your answer sheet containing the written answers in a file named: FirstName_LastName_PS3.pdf. Please put your name on the answer sheet. 3. Please submit your code and input/output images in a zip file named: FirstName_LastName_PS3.zip. Your code and input images should all be saved in the main directory. Your output images can be saved either in the main directory or in a sub-directory called output in your main directory. 4. You may collaborate with other students. However, you need to write and implement your own solutions. Please list the names of students you discussed the assignment with. 5. For the implementation questions, make sure your code is documented, is bug-free, and works out of the box. Please be sure to submit all main and helper functions. Be sure to not include absolute paths. Points will be deducted if your code does not run out of the box. 6. If plots are required, you must include them in your answer sheet (pdf) and your c ode must display them when run. Points will be deducted for not following this protocol. 1 Short answer problems [15 points] 1. When performing interest point detection with the Laplacian of Gaussian, how would results differ if we were to (a) take any positions that are local maxima in scale-space, or (b) take any positions whose filter response exceeds a threshold? Specifically, what is the impact on repeatability or distinctiveness of the resulting interest points? 2. What exactly does the value recorded in a single dimension of a SIFT keypoint descriptor signify? 3. If using SIFT with the Generalized Hough Transform to perform recognition of an object instance, what is the dimensionality of the Hough parameter space? Explain your answer.

2 Programming: Video search with bag of visual words [85 points] For this problem, you will implement a video search method to retrieve relevant frames from a video based on the features in a query region selected from some frame. We are providing the image data and some starter code for this assignment. Provided data You can access pre-computed SIFT features here: /usr/local/189data/sift/ (accessible from CSIF machines) or \\coe-itss-bfs.engr.ucdavis.edu\classdata\ecs189\materials\sift\ (accessible from COE machines in your T drive; e.g., Academic Surge lab machines) The associated images are stored here: /usr/local/189data/frames/ (accessible from CSIF machines) or \\coe-itss-bfs.engr.ucdavis.edu\classdata\ecs189\materials\frames\ (accessible from COE machines in your T drive; e.g., Academic Surge lab machines) Please note the data takes about 3 GB, so you should *not* try to copy it to your home directory. Just point to the data files directly in your code. If you would like to download the data (e.g., to work on your own machine), they are also available here: SIFT features: https://ucdavis.box.com/s/4rrs7bp6cszrg6xrb6cr6v2mupnemwgu Associated Images: https://ucdavis.box.com/s/0pn2bmlfr98o8hs3yjhjdgiluz9lvald Each.mat file in the provided SIFT data corresponds to a single image, and contains the following variables, where n is the number of detected SIFT features in that image: descriptors nx128 single // SIFT vectors as rows imname 1x57 char // name of image file that goes with this data

numfeats 1x1 single // number of detected features orients nx1 single // orientations of the patches positions nx2 single // positions of the patch centers scales nx1 single // scales of the patches Provided code The following are the provided code files. You are not required to use any of these functions, but you will probably find them helpful. You can access the code here: http://web.cs.ucdavis.edu/~yjlee/teaching/ecs189g-spring2015/psets/pset3/pset3code.zip loaddataexample.m: Run this first and make sure you understand the data format. It is a script that shows a loop of data files, and how to access each descriptor. It also shows how to use some of the other functions below. displaysiftpatches.m: given SIFT descriptor info, it draws the patches on top of an image getpatchfromsiftparameters.m: given SIFT descriptor info, it extracts the image patch itself and returns as a single image selectregion.m: given an image and list of feature positions, it allows a user to draw a polygon showing a region of interest, and then returns the indices within the list of positions that fell within the polygon. dist2.m: a fast implementation of computing pairwise distances between two matrices for which each row is a data point kmeansml.m: a faster k-means implementation that takes the data points as columns What to implement and discuss in the write-up Write one script for each of the following (along with any helper functions you find useful), and in your pdf writeup report on the results, explain, and show images where appropriate. Your code must access the frames and the SIFT features from subfolders called frames and sift, respectively, in your main working directory. 1. Raw descriptor matching [20 pts]: Allow a user to select a region of interest (see provided selectregion.m) in one frame, and then match descriptors in that region to descriptors in the second image based on Euclidean distance in SIFT space. Display the selected region of interest in the first image (a polygon), and the matched features in the second image, something like the below example. Use the two images and associated features in the provided file twoframedata.mat (in the zip file) to demonstrate. Note, no visual vocabulary should be used for this one. Name your script rawdescriptormatches.m

2. Visualizing the vocabulary [20 pts]: Build a visual vocabulary. Display example image patches associated with two of the visual words. Choose two words that are distinct to illustrate what the different words are capturing, and display enough patch examples so the word content is evident (e.g., say 25 patches per word displayed). See provided helper function getpatchfromsiftparameters.m. Explain what you see. Name your script visualizevocabulary.m. Please submit your visual words in a file called kmeans.mat. This file should contain a matrix of size kx128 called kmeans. 3. Full frame queries [20 pts]: After testing your code for bag-of-words visual search, choose 3 different frames from the entire video dataset to serve as queries. Display each query frame and its M=5 most similar frames (in rank order) based on the normalized scalar product between their bag of words histograms. Explain the results. Name your script fullframequeries.m 4. Region queries [25 pts]: Select your favorite query regions from within 4 frames (which may be different than those used above) to demonstrate the retrieved frames when only a portion of the SIFT descriptors are used to form a bag of words. Try to include example(s) where the same object is found in the most similar M frames but amidst different objects or backgrounds, and also include a failure case. Display each query region (marked in the frame as a polygon) and its M=5 most similar frames. Explain the results, including possible reasons for the failure cases. Name your script regionqueries.m Tips: overview of framework requirements The basic framework will require these components: Compute nearest raw SIFT descriptors. Use the Euclidean distance between SIFT descriptors to determine which are nearest among two images descriptors. That is, match features from one image to the other, without quantizing to visual words. Form a visual vocabulary. Cluster a large, representative random sample of SIFT descriptors from some portion of the frames using k-means. Let the k centers be the visual words. The value of k is a free parameter; for this data something like k=1500 should work, but feel free to play with this parameter [see Matlab s kmeans function, or provided kmeansml.m code]. Note: you may run out of memory if you use all the provided SIFT descriptors to build the vocabulary.

Map a raw SIFT descriptor to its visual word. The raw descriptor is assigned to the nearest visual word. [see provided dist2.m code for fast distance computations] Map an image s features into its bag-of-words histogram. The histogram for image I j is a k- dimensional vector: F (I j ) = [ freq 1,j, freq 2,j,, freq k,j], where each entry freq i,j counts the number of occurrences of the i-th visual word in that image, and k is the number of total words in the vocabulary. In other words, a single image s list of n SIFT descriptors yields a k- dimensional bag of words histogram. [Matlab s histc is a useful function] Compute similarity scores. Compare two bag-of-words histograms using the normalized scalar product. Sort the similarity scores between a query histogram and the histograms associated with the rest of the images in the video. Pull up the images associated with the M most similar examples. [see Matlab s sort function] Form a query from a region within a frame. Select a polygonal region interactively with the mouse, and compute a bag of words histogram from only the SIFT descriptors that fall within that region. Optionally, weight it with tf-idf. [see provided selectregion.m code] 3 OPTIONAL: Extra credit (up to 10 points each, max 20 points total) Stop list and tf-idf. Implement a stop list to ignore very common words, and apply tf-idf weighting to the bags of words. Discuss and create an experiment to illustrate the impact on your results. Spatial verification. Implement a spatial consistency check to post-process and re-rank the shortlist produced based on the normalized scalar product scores. Demonstrate a query example where this improves the results. This assignment is adapted from the PS4 assignments of Kristen Grauman s CS 376: Computer Vision at UT Austin and Devi Parikh s ECE 5554: Computer Vision at Virginia Tech.