Retrieving images based on a specific place in a living room

Size: px
Start display at page:

Download "Retrieving images based on a specific place in a living room"

Transcription

1 Retrieving images based on a specific place in a living room Anouk E.M. Visser Bachelor thesis Credits: 18 EC Bachelor Opleiding Kunstmatige Intelligentie University of Amsterdam Faculty of Science Science Park XH Amsterdam Supervisor Dr.ir. Leo Dorst Intelligent Systems Laboratory Amsterdam Faculty of Science Science Park XH Amsterdam June 28th, 2013

2 CONTENTS Abstract 3 1. Introduction 4 2. Related Work Location Recognition Features 6 3. Dataset Transformations Informative and Confusing Elements 8 4. Object Recognition Selection of Candidates Voting System Changes as Informative Features Informativeness Detecting Change Retrieval Evaluation Dataset Evaluation of the Object Recognition Task Evaluation of the Retrieval Task Evaluation of the Change Detection Task Future work Conclusion 23 References 23

3 3 ABSTRACT We propose an application that allows a user to use his or her mobile device to capture a specific place in the living room, after which the system will retrieve images that capture the same place. To build this system we will use a dataset of family photos that has not been annotated. Retrieving images based on a place in the living room is a difficult task, as the images from the dataset are meant to capture people instead of a place. Because people are not unique to a place, the system needs to be able to avoid confusing features (like people) when retrieving images that were taken at the query-place. In addition to that, the places may have changed over time. Our system first recognizes an object that is unique to the query-place from among the images in the dataset. The system will then identify features of the query-place that have changed over time, because we have identified changing feature as being most informative about a location. For every year that a significant change occured the system will provide a bag of words containing informative features that are used to retrieve images from the dataset that capture the query-location.

4 4 1. INTRODUCTION Most families have a large collection of family photos. A large number of these images were taken in the living room spread over a long period of time. The images are intended to depict people and events that happen in the living room, but most of the time we can recognize that the image captures a specific place in the living room as well. Nowadays, families save their photo collection on the computer, the only way the user can go through them is by date or in some cases by face. We propose an application that allows the user to use his or her mobile device to capture a specific place in the living room, after which the system presents the user with images that were taken at the same place. We define a place as the 3D structure visible in the query image, rather than the actual camera location of the query, which was proposed by Knopp, Sivic & Pajdla (2010, p. 1). The dataset used for development and evaluation of the system consists of more than 1300 images taken in the same living room between 1992 and The system for retrieving images that capture the same place as the query-image (we will call this the query-place ) uses a dataset that has not been annotated. The input to the system is the query-image as captured by the user s mobile device and an image of an object that is present in the query-image. The object, which we will call the object of interest must have two characteristics: it must be unique to the place and it must have remained stable over time while other features or objects at the place may have changed. An example of a query-image and an object of interest can be found in figure 6. The system uses a method of object recognition to recognize the object of interest from among the images in the dataset (we will refer to this task as the object recognition task ), resulting in a set of images that contain the object of interest. This set is used to find other features that co-occur with the object of interest that the user has provided. However, in this stage of the process the system needs to be able to decide whether the feature that co-occurs with the feature the user provided is in fact indicative of the query-place. The resulting features form a bag of words that represent the place of the query-image. This bag of words is used to evaluate the images in the dataset, after which the system can decide whether they capture the same place as the query-image. To realize this system, there are two research questions that need to be answered: (1) How can the system recognize the object of interest in the images from among the dataset? (2) How can the system recognize the query-place in absence of the object of interest? The first question will focus on recognizing the object provided by the user in other images. However, it is not certain that the object is present. This means that the system needs to be capable of deciding whether the object is present or not. Furthermore, we propose a solution to the question of how the system could recognize the place even if the object identified by the user is no longer present. To answer this question we investigate how the system can automatically decide whether a feature is informative or confusing, how to detect when a place has undergone significant changes and how to recognize a place even after such changes have occurred. There are several difficulties in answering the questions formulated above. Unlike other location recognition systems [3] [13], our system is built to work on a dataset that has not been annotated.

5 In city-scale location recognition systems it is relatively easy to annotate the dataset because the geolocations of different locations in the city differ sufficiently from each other. However, when recognizing places in the living room, the geolocation will not provide sufficient information about the actual place, which is the reason why the system should be able to work on a dataset that has not been annotated. For a task like object or location recognition, one would usually analyze the annotated images (the training set ) to find informative features that are indicative of a location or an object. The absence of this information complicates the recognition tasks as there is no reference to what an actual match between places or objects looks like. However, by evaluating an image locally (i.e. without using information obtained from other images in the dataset) we are able to offer much faster performance as we can omit the preprocessing of the dataset, making this solution more suitable for a mobile device. The fact that the images in the dataset are not intended to capture a place forms another difficulty. Most of the time we will find that the place itself is occluded by a person or people thus making it difficult to recognize the exact place. In addition, when looking for correspondences between images to find out whether they were taken at the same place, the system must avoid any people who happen to in the image as they are not informative about a place. Finally, the places occurring in the dataset are captured under varying circumstances, meaning that the system should be invariant to changes such as changes in viewpoint, lighting and scale RELATED WORK Despite the differences between the system we propose and most other location recognition systems, there are many different techniques and ideas from previous work that can be applied for our system as well. Finding matches between images is most commonly done through comparing feature descriptors. Not only is there a large variety of feature detectors that have been proposed [1] [4] [8] [9] [12], many examples of how these detectors have been used in location recognition systems can be found as well [5] [6] [13]. [3] Also proposes a method of avoiding confusing features Location Recognition. A location recognition system is a system that can identify the location in a given query-image, either by returning the exact location or other images that were taken at or around the same location. City-scale location recognition [13] investigates the task of location recognition in a large image dataset (of around images). The authors propose to use a vocabulary tree when the size of the dataset increases, as they have found that the performance of an approach that uses invariant feature matching drops when the dataset grows. In addition to introducing a vocabulary tree for the task of location recognition, they also find that some features are more informative than others. They propose a method to calculate the information gain for every feature, using the annotations from their dataset. Although we do not posses this information, we would be interested in distinguishing between informative and non-informative features. The presented method for location recognition is evaluated using 278 query images with which they achieve a recognition rate higher than 70%. The authors note that this rate is higher than expected, as the training data and the test data were not obtained at the same time, meaning that the locations in the images had undergone changes. In our task the timespan between the oldest image in dataset

6 6 and the query-image will be considerably larger than the timespan mentioned in this paper. However, the recognition rate obtained by the authors suggests that minor changes at locations do not necessarily have a negative effect on the location recognition task. Knopp, Sivic and Pajdla [3] propose a system that recognizes the place of a query-image using a bag-of-features model to which they add a method of avoiding features that cause confusion between different places. As the dataset they are using is geotagged, they are able to query the dataset for images taken in different places. They find that features that occur in the majority of places depict confusing objects. They show that the performance of the location recognition system is improved when removing confusing features from the total set of features. Like [13], this method for identifying confusing features also uses an annotated dataset to find out which features are informative and which are not. Because we propose a system for place recognition that does not require a ground truth to operate, we are not able to analyze the features in the same way. However, we may assume that the dataset covers several different places. By examining the complete dataset and finding features that occur in most of the images, confusing features could be identified. In addition, once the system has made an estimation of which images were taken in the same place, this set can be used for finding additional confusing and informative features Features. To find images taken at the same place, invariant features need to be detected to match the query-image with images from the dataset. Among the feature detectors that OpenCV [2] provides are SIFT [8], SURF [1], ORB [12], BRISK [4] and MSER [9]. The Scale Invariant Feature Transform (SIFT) is presented by Lowe in his article Distinctive image features from scale-invariant keypoints [8]. SIFT features can be very useful for object recognition, because they are invariant to rotation and scale and they are robust under changes in viewpoint, illumination and noise. SIFT features are often used when features have to be extracted [5] [6] [7] [11] [13]. SURF [1] is a popular alternative to SIFT and is being used in modern location recognition systems like [3]. The SURF feature detector is inspired by SIFT, but offers much faster performance while maintaining comparable performance. MSER [9] is a region detector, that is capable of extracting the so called maximally stable extremal regions. These regions are invariant to changes in viewpoint, scale and lighting. The detection of extremal regions is achieved by thresholding the image at all gray levels, while simultaneously measuring the area of connected components at each threshold. The extremal regions are then identified by finding the regions that remained stable over all thresholds. In A Comparison of Affine Region Detectors [10] affine region detectors are tested on invariance. The region detectors were applied to two images, obtained by duplicating the original image and applying a transformation like an affine transformation, rotation or changes in scale and lighting. The region detectors were tested on many aspects under which their repeatability and their performance in a matching test. MSER obtains the highest score in many of the experiments carried out by the authors. Looking at our dataset we find that affine transformations next to changes in scale and lighting are the most common transformations. Based on their characteristics SIFT, SURF and MSER all seem good feature detectors to use for solving our problem. As both ORB [12] and BRISK [4] are not invariant under common transformation like affine transformations as well as changes in lighting and rotation, they are not widely used. In 1 we have provided an overview of the reviewed features and their invariances.

7 7 Feature Detector Perspective Rotation Lighting Scale MSER ORB BRISK SURF SIFT FIGURE 1. Overview of reviewed features and their invariances. In OpenCV a clear distinction is made between feature detectors and feature descriptors. OpenCV provides an implementation of both a SURF and an MSER detector. A feature detector returns the feature defined by its position in the detected image (the coordinates) and its size (i.e. the diameter of the circle that can be drawn around the feature). A feature descriptor provides a way of comparing features, OpenCV also offers a selection of feature descriptors under which a SURF feature descriptor, based on [1]. Unfortunately, it does not provide a descriptor specifically designed for MSER, however [11] shows that a descriptor that was not specifically designed for the detector itself can be used to compare features as well, in this case they use a SIFT descriptor on features detected by MSER. In OpenCV feature descriptors are represented as a vector to make it possible to compute the distances between descriptors efficiently. In chapter 4.1 we will motivate our choice of the feature detectors and descriptors that are used for the object recognition task. 3. DATASET The dataset used for development and evaluation of the system consists of more than 1300 images taken in the same living room between 1992 and In this section we will give a short overview of the transformations that can be observed in the dataset and we explain briefly what features can be found in the images in the dataset and which of them we expect to be informative or confusing Transformations. Looking at the dataset of images taken solely in the living room, there are a few characteristic transformations to which our system needs to be invariant. The most important and most difficult transformations we should cope with are affine transformations. This are transformations where straight lines remain straight and parallel lines remain parallel, while angles and lengths may change. In our dataset we observe many images that capture a place from a different viewpoint or perspective. In figure 2b and 2c we observe that he plinth across the wall is visible in both images but has undergone an affine transformation (the lines are still parallel and straight, but the angle has changed). In 2a the wall is not visible, but we can observe the lamp and potentially the table from a different perspective. In addition to changes in perspective we also observe a lot of changes in lighting. Scale is a common transformation as well, but is correlated with the notion of the change of perspective as things might appear smaller when they are further away. However, the scale of the image can determine whether the place is recognizable, sometimes we may find images that are zoomed in on a person resulting in a blurred or barely visible background. Transformations in scale can complicate the task of recognizing the place.

8 8 (A) (B) (C) FIGURE 2. Changes in viewpoint, scale and lighting at the same place Informative and Confusing Elements. A place recognition system for living rooms needs to be able to identify a place by the presence of one or more features. However it will also need to ignore certain features as they are not informative about a place and can cause confusion for the retrieval task. We define informative features as elements of an image that provide enough information (alone, or possibly in correlation with other features) about the place that is captured in the image. Opposite to informative features, confusing features are elements of an image that are not unique to a place but appear simultaneously with informative features complicating the task of identifying informative features. Elements we find in an image can approximately be categorized into four classes: peripherals (such as thermostats, sockets, lamps, heaters, etc.), features of the building, furniture and people. Intuitively we find that peripherals and furniture are most indicative of the place we want to recognize, potentially followed by features of the building, whereas the presence of people is not informative at all. We could wonder when and where we observe these features, in figure 3 these categories are evaluated on three points: are these features specific for a place, their dependency on time (i.e. do they change over time?) and whether they can be found in all images. We find that what we intuitively thought of as confusing features (people) holds the inverse characteristics compared to what we intuitively thought of as informative features (furniture, peripherals). Category Place Specific Time-Dependent All Images Peripherals Furniture People Building Features FIGURE 3. Informative and confusing features. Which features are place specific? Which of them are time-dependent (i.e. do they change over time?). Are they present in all images?

9 9 4. OBJECT RECOGNITION The system is presented with a query-image depicting the query-place and is given the task to find other images from the dataset depicting the same place. To be able to do this, the system will ask the user to point out an object that is unique to the query-place and has remained stable over time while other features or objects at the query-place may have changed. We will call this object the object of interest. A first selection of images will be made by automatically collecting images from the dataset in which the object of interest is present. This set will later be used to find features, other than the object of interest, that are informative about the query-place. To create this set of images the system will attempt to recognize the object of interest in the images from the dataset. We will cal the image in which the system tries to recognize the object the queried image. We propose a recognition system that assigns a recognition score to every queried image. The recognition score holds a certainty that the object of interest is present in the queried image. We will first describe how the system creates a set of candidates by matching features between the image of the object of interest and the queried image. Then we will describe how the system assigns points to the candidates based on three detection methods Selection of Candidates. In order to create a set of candidates for the scoring system, the system makes use of feature detectors. Remember from Literature Review that feature detectors as implemented by OpenCV will provide the location of the features in the queried image accompanied by the size of this feature. With the feature s location in the queried image and its size, the system is able to extract interesting sub-images containing one specific feature. These sub-images will serve as input for the voting system. In chapter 3 we have identified the three most common transformations among the images in the dataset: changes in perspective, lighting and scale. Based on the findings in figure 1, MSER and SIFT are the best feature detectors to use for our dataset as they are invariant to the three most common transformations. MSER and SIFT are also representative for two main classes of objects: blob-like objects and objects that are more complex (i.e. they have a lot of corners). Although SURF is not invariant to lighting, we have chosen to use SURF instead of SIFT because it offers much faster performance. For the object recognition task both detectors will be used to be able to detect different objects. We will use the SURF feature descriptor to compare features that were detected by both SURF and MSER. It is very unlikely that the object of interest itself is one MSER or SURF feature, an extreme example of this is a couch that may consist of a number of different blobs, corners and edges. Even an object like a thermostat can be identified by four or more corners and multiple blobs that, for example, make of the screen. For this reason, the object recognition task uses features detected in the image of the object of interest to match against features found in the queried image. Doing so, we allow the object recognizer to recognize the object even if part of the object is not visible (when, for example, someone is standing in front of it). While half of the object is covered, the system might still be able to detect enough unique features of the object to recognize it. MSER detects regions at different scales, but generally will not detect as many features as SURF. In the rare case that more MSER features are detected in the image of the object of interest than SURF features, the object of interest is probably better described by MSER features than by SURF features and the system will only consider MSER features from that point onwards.

10 10 The object recognition task first detects features in both the image of the object of interest and the queried image, after which the detected features are converted to descriptors, allowing the system to compare them. For every descriptor from the image of the object of interest x, the system will find the best match from among the descriptors in the queried image. The Fast Approximate Nearest Neighbor Search Library (FLANN) as provided by OpenCV is capable of finding a descriptor from the queried image that best matches x. A best match is described as follows: arg min y D d(x, y), (1) where D is the set of descriptors found in the queried image, x is a descriptor from the image of the object of interest and d(x, y) is the euclidean distance between the two vectors calculated as: d(x, y) = n (x i y i ) 2 (2) i=1 This defines the best match between two descriptors as the descriptor from the queried image that has a minimal distance to the descriptor from the image of the object of interest. Because every descriptor in the image of the object of interest is matched to a descriptor from the queried image and it is not necessarily true that every descriptor from the image of the object of interest is present in the queried image, there are possibly some wrong matches among the set of matches. To eliminate the wrong matches, the set will be reduced by allowing only matches with a distance not greater than two times the minimum distance found among all matches. This results in a set of features that are most likely good matches between the queried image and the image of the object of interest. The features in the queried image that belonged to the best matching descriptors are now used to extract sub-images from the queried image. We will refer to the resulting set of sub-images as the candidates for the voting system Voting System. The distances between the descriptors from the image of the object of interest and the best matching descriptors in the queried image, do not yet contain enough information to be able to tell whether the object of interest is present in the queried image or not. When the object is present, it is very likely that the best matching descriptors point to the object of interest. However, as the system is forced to find the best match, there is no guarantee that this will be an actual match. We might find that when comparing the best match with the object of interest of two different queried images, the distances between the best matching descriptors vary greatly. One solution to this problem might be to compute the distances between all the descriptors in the dataset to identify to what extent the distances differ. However, the dataset used in the place recognition task we describe has not been annotated. The consequence of this is that the system cannot know whether the computed distance belongs to an actual match or a false match. In addition to that, detecting features, converting them to descriptors and then matching the descriptors demands considerable computing power, performing these actions on all images in the dataset requires either a very long time or a very powerful CPU, both of which are not available when developing a system for a mobile device. We propose a solution that assumes that multiple detection methods will be capable of recognizing the object whenever it is present in the queried image. When the object of interest is present in the queried image, we expect that the three methods will all be biased towards

11 one particular candidate, whereas in absence of the object of interest in the queried image we might expect that the methods will all find their favorite candidate, but there is no unanimity among them. We propose three detection methods that will appoint points to every candidate as selected by the method described in 4.1. The first of the three methods will assign points based on the distance between the descriptors of the candidate and the object of interest, the second method will assign points based on how many other matches were found in the surrounding area and the third method will assign points based on the similarity of the color histograms of the candidate and the object of interest. Every method will assign points according to a positional voting system. The method ranks the candidates after which points are assigned accordingly to the ranking. The highest ranked candidate receives the highest number of points. The first method that will assign points to the candidates uses the distances between the descriptors found in the previous step and will be called distances. The distances between the candidates and the image of the object of interest will be ranked in ascending order because the smallest distances are generally representative of the best matches. A difference with the points distribution between distances and the other methods is the extra points distances can choose to assign if the smallest distance found is smaller than 0.25 (distances between descriptors range from 0 to 1, where 0 is the minimum distance that can be found). We have set this number experimentally, as we found that it is very rare that such a small distance is found and this tends to point at a really good match. Like distances, the second method also makes use of the features detected in the previous step. Because descriptors from the object of interest are mapped to descriptors in the queried image, we expect that when an actual match is found, matches in the queried image will be close to each other as they should represent the different features found on the object of interest. This is illustrated in figure 4. We will call this method coordinates count. The system will consider every feature and count the amount of features that are found in the surrounding area (a Manhattan distance of 10 pixels is considered the surrounding area). Since two different feature detectors are used for detecting features, at least two features have to be found in candidate to be eligible to receive points by coordinates count. Opposed to the ranking in distances, the values found are ranked in descending order after which the system assigns points to each of the candidates according to the positional voting system. The third and last method will assign points based on the similarity between the color histograms of the candidate and the object of interest and will be called color similarity. It will construct both a BGR (Blue Green Red) histogram and an HSV (Hue Saturation Value) histogram for the image of the object of interest and the candidate. BGR color histograms are not invariant to lighting, which is the reason that the system also constructs an HSV color histogram that is more invariant to lighting than the BGR histogram because it separates the image intensity from the color information. The correlations of the BGR color histograms and HSV color histograms are then computed and averaged. The correlations are ranked in descending order after which the system assigns points to each of the candidates according to the positional voting system like the other two methods. Every candidate has been assigned a number of points by each of the three detection methods. A recognition score can now be assigned to the queried image. The recognition score of the queried image is the number of points that was assigned to the candidate with the highest number 11

12 12 (A) (B) FIGURE 4. Matching descriptors of the object of interest (thermostat, top-left) and the queried image. In (a) we observe that the best matching features all point towards the object of interest, the good matches are positioned very close together in the image. In (b) the object of interest is not visible, the good matches do not clutter together as the good matches in (a). of points. To decide whether the object of interest is present in the queried image, we will choose a recognition threshold for the total amounts of points. Based on the recognition threshold the system will decide whether the object of interest is present or not. The points that will be assigned by each method are 12, 10, 8, 7 and 6 points. Distances can assign an additional 10 points to the candidates that have a feature descriptor distance that is smaller than 10 points. In addition to this, the regular points assigned by distances are weighed twice, because distances vote is generally considered most important, as it contains information about how similar the geometry is. This means that the minimum amount of points a candidate can receive is 0 and the maximum amount of points is 58. Instead of a recognition threshold for the total number of points, we could also choose to put constraints on the distribution of the points. We could for example ask for the candidate that received points from every method. However, we find that when the system needs to be able to recognize different objects, not all objects are characterized by the same methods. It could be that for certain objects color similarity is not a good predictor of that object and for objects that do not count many features coordinates count is not considered a very good predictor. Therefore, we have decided to put a constraint on the total number of points to decide whether the object of interest is present in the queried image. 5. CHANGES AS INFORMATIVE FEATURES The user of the system has captured an image with his or her mobile device and has pointed out an object of interest that is both informative about the place depicted in the query-image and has remained constant where other features may have changed. The system has attempted to recognize the object among the images from the dataset by assigning a recognition score and by only allowing images with a recognition score greater than a predefined recognition threshold. The result of this is a set of images that most likely contain the object of interest and thus, the query-place. Next, the system has to find the images that do also depict the query-place, but where the object of interest

13 is absent. In order to this, the system needs to find more features aside from the object of interest that are informative about a place. A bag of words containing these features will be constructed and used for the retrieval of images that depict the query-place. Every image in the dataset has a timestamp, at least containing the year in which the image was taken. To identify informative features, we will allow the system to use the timestamp of the images. Using the timestamp and the recognition score as described in the previous chapter, the system should be able to detect whether a feature found at the query-place has changed over the course of time, and if so, in what year it was most likely removed or replaced by another feature. We assume that informative features change over time, whereas confusing features remain stable. We will first provide a motivation for this assumption, followed by an explanation of how the system will detect changes. Finally, we will discuss how we use several bags of words, now filled with informative features of the query-place, for different timespans to retrieve images from the dataset that depict the query-place Informativeness. If we consider an oversimplified example of a place and how it has changed over time, we find that there are multiple features that are indicative of the place. An example of a place can be found in figure 5. In the dataset used for evaluation, this place has been labelled as place NA. This place undergoes many changes over the course of 20 years. While the presence of either the thermostat (T2) or the couch (C3) is a very good indicator of place NA in 2013, this may not have been the case in the past. It turns out that T1 and C2 are equally good predictors of place NA between 2000 and The main reason for this is that these features have only occurred at place NA, whereas a confusing feature like a person is visible at more places. Since the system is trying to fill the bag of words with words containing features that occur simultaneously with the object of interest, we need to find a method that avoids confusing features being added to the bag of words. Features that are indicative of a place and not confusing, have two main characteristics: they occur simultaneously with the object of interest they do not occur at other places than the query-place To comply with the first criterion, the system can consider all features that can be found in an image with the object of interest. The second criterion can be met by analyzing all the other images in the dataset (the images that do not contain the object of interest) and identifying features that are not only always present at the query-place but also in many other places. There are two main disadvantages to this method. First of all, our dataset is not annotated, which causes an uncertainty when finding features that match often in the complete dataset. It is possible that an informative feature is observed in absence of the object of interest too many times, making it look like the feature occurs at too many other places. This would mean that this feature would be considered confusing, while it does not actual occur at any other places other than the query-place. Secondly, it takes a lot of computing power to analyze all the images in the dataset. Therefore, we will approach this problem in a different way. Looking at figure 3 we observe that the features we have defined as confusing are not dependent on time, whereas features that are informative are. The biggest challenge of recognizing the place in a living room from a set of family photos is recognizing the place, even after it has changed completely. It seems that the features which are changing over time, are most indicative of the place. We will assume that features that remain stable over time at a certain place will also occur at other places, whereas the things that change at a certain place are informative about it. This means that the system needs only consider images containing the object 13

14 14 FIGURE 5. Changes at location NA (top) and OE (bottom). The arrows show the presence of two objects over time, the objects are numbered in order of appearance. T represents a thermostat and C a couch. of interest, even when it needs to detect features that are confusing. By detecting which features change and which remain stable over the course of time, the system is able to decide whether a feature is informative or not Detecting Change. To detect features that change, the system will use sub-images from the query-image rather than features that can be detected by MSER or SURF. The system will extract sub-images of pixels from the query-image. These sub-image will serve as the input to the object recognition method as the object that should be recognized in the queried image. The queried images that will serve as input to the object recognition system are images that have been selected from among the set of images containing the object of interest. The system will select images that most likely contain the object of interest and in addition to that, it selects exactly three images from every year where the object of interest was present. The system now attempts to recognize the sub-image in all the images that are representative of the query-place over the timespan in which the object of interest is present. Remember from chapter 4 that this results in an object recognition score. The object recognition scores, if ordered by date, form a signal, which we will call the recognition signal. By analyzing the recognition signal, the system decides whether the word has changed over time, and if so, in what year the word was most likely replaced. We will give a brief overview of what kind of recognition signals we expect for the two different types of features, then we will show how the system chooses the split-year and finally we will describe how the system will evaluate whether a change has actually occurred. To be able to detect the year in which a change occurred, the system needs to have a representation of a recognition signal that changes over time. When it comes to informative features we expect to see a sudden drop of object recognition scores at the time where they were replaced by other features. Thus, the pattern we expect is constantly high in the period in which the feature

15 was present, and is constantly low in the period where it was absent. When measuring change over time, the point where the relatively high level of recognition drops, is the year where the informative feature has been replaced. We will call this the split-year. The function that describes this course over time best is: { a if t < tsplit y(t, t split ) = (3) b if t > t split A piecewise function, where t split is the split-year, t the year and a and b are two levels of recognition (being typically high or typically low recognition scores). For confusing features we do not expect to find a split-year, but we do expect to find different recognition signals. If we take for example the confusing feature people, we find that a person can be characterized by several different features. We expect to encounter faces everywhere, whereas clothes will not appear as often in the complete dataset while still being confusing. For faces we expect to find a constant relatively high recognition signal, whereas clothes should result in a constant relatively low recognition signal. The function that describes the feature s presence over time best is then y(t) = a, with a being a typically high or typically low recognition score, depending on what the confusing feature represents. To detect the split-year the system will try to fit the function described in equation 3 to the recognition signal. The system will calculate the sum of squared errors of the recognition signal and y(t, t split ) for different values of t split. The values for t split that will be considered are the years between the first year where the object of interest was seen and the last. To find the split-year, the system will minimize the sum of squared errors between the recognition signal and y(t, t split ): arg min tsplit 15 n (x i y(t i, t split )) 2, (4) i=1 where x i is the i th datapoint from the recognition signal, n is the number of data points of the recognition signal and t i is the year of measurement of x i. In addition to t split there are other parameters in y(t, t split ) that influence the value for the sum of squared errors: a and b. Looking carefully at the problem it turns out that for the interval [t start, t split ] and the interval [t split, t end ] a line of the form y = a should be fit to the recognition signal. As it turns out the average value of the points that it has to fit, is the value for a that minimizes the sum of squared errors. While fitting the recognition signal to y(t, t split ) the system keeps track of the minimum sum of squared errors, the value of t split for which this was the case and the maximum sum of squared errors. When a feature remains stable over time, a piecewise function is not the best function to fit to the recognition signal of that feature. If the function we are trying to fit to the recognition signal is a function that does not describe the course of the recognition signal very well, we do not expect to find a value for t split that minimizes the sum of squared errors and is significantly different from the other possible values of t split. When fitting the same function to a feature that has changed over time we expect that the minimum sum of squared errors will be significantly different from the maximum sum of squared errors. To decide whether the recognition signal actually changed over time, we will introduce a new parameter: the LH-value. The LH-value is the ratio of the minimum sum of squared errors to the maximum sum of squared errors of a specific sub-image. Combined

16 16 with the accompanying split-year, the LH-value will be used during the retrieval phase to retrieve images that were taken at the same place as the query-image Retrieval. To create a bag of words from the sub-images of the query-image the system selects all sub-images with an LH-value lower than a specific parameter. We found that with LH-value = 0.65 the system performs best. All the sub-images with an LH-value lower than 0.65 will be added to the bag of words. In the previous step the system has detected the split-year of every sub-image. Because the sub-images we have now added to the bag of words have undergone a significantly large change over the course of time, we will use only these split-years to determine the split-year for the place in the query-image. The system selects the split-year that occurs most frequently from among the split-years of the sub-images in the bag of words. The system will now consider all the images in the dataset that were taken in the period starting from the split-year up to and including the year in which the current query-image was taken. For every one of these images the object recognition method will assign a recognition score to every sub-image from the bag of words. These recognition scores will be averaged, resulting in an average recognition score of the complete bag of words in the queried image. Based on this average the system will decide whether the queried image depicts the same place as the query-place. We have found that images with an average recognition score greater than 37 depict the same place as the query-image. The system has now only considered images taken between the split-year and the year of the current query-image. To determine whether the remaining images depict the same place, the system will construct another bag of words. In the split-year a significant change has occurred, meaning that the bag of words from the query-image does not represent the same place in and before the split-year very well. The system selects an image taken in the split-year from the images that contain the object of interest (if an image with the object of interest is not available from that splityear, it will consider the year before the split-year until it has found an image) as the new queryimage. Again a recognition signal for all the sub-images is created by obtaining the recognition score in all the images that contain the object of interest and are selected to represent the year in which this object was present. Next, retrieval is done with the resulting bag of words for this query-image. If the new split-year is smaller than the current year, the system will again stop and change the bag of words when it is appropriate. Otherwise, it will continue retrieving images until it has reached the oldest image in the dataset. 6. EVALUATION In this section we will give an evaluation of the performance of the system. First we will give an overview of the dataset and the labels we have assigned to the images for evaluation purposes. Next, we will evaluate performance of the object recognition method as we proposed in chapter 4, after which we will measure the performance of the complete retrieval task. Finally, we provide suggestions to improve the system in future work Dataset. To evaluate the system s performance we have assigned labels to all the images in the dataset. The most important label assigned to the images, is a tag that refers to a place in the living room. In addition to this, we have also provided every image with an array of objects that are present (different couches, for example, have different tags) and a boolean that describes whether

17 17 (A) (B) F IGURE 6. The query-image (a) and the object of interest (b). the place is recognizable or not. We have identified thirteen places in the living-room, but it turns out that there are three places that are very popular for taking pictures. Over 60% of all images was taken at only three different places. To evaluate the performance of the system, we query the system with an image where the place was labelled NA. NA can be described as the place around the couch. Approximately 22% of the images in the dataset also depict this place. The query-image which we will use to evaluate the performance of the system can be found in figure 6. Given this query-image, the goal of the system is to retrieve images from the dataset that depict the same place. We will measure the performance of the system through monitoring the precision, recall and F1-measure for different LH-values and years. The images we have labelled as being not recognizable account for approximately 10% of the dataset, the evaluation of these images will not count towards the values for precision, recall and the F1-measure Evaluation of the Object Recognition Task. To evaluate the part of the system that recognizes objects, we will provide the system with the query-image and a separate image of the thermostat that can be seen in this image (see figure 6). We ask the system to find the object of interest (i.e. the thermostat) in the images from among the dataset. Because the part of the system that recognizes objects is provided with a recognition threshold to decide whether the object is present or not, we will first measure the performance for different values of this threshold. Using the optimal recognition threshold, we continue evaluating the object recognition task for every year in which the object of interest was present at the query-place. In figure 7 the results of both evaluation tasks are shown. In 7a the course of precision, recall and the F1-measure is set out against different values for the recognition threshold. Because the system will use the images in which the object of interest is present to construct a bag of words, it is important to achieve the highest precision possible. This reduces the chance that an image that does not depict the query-place is used for retrieving images that do. The highest precision we achieved

18 18 (A) (B) FIGURE 7. Results of evaluation of the object recognition task. (a) Shows how precision, recall and the F1-measure change when the recognition threshold changes and (b) shows how the precision, recall and F1-measure change over the years using a recognition threshold of 50. is 0.20, this performance was achieved by using a recognition threshold of 50. If we apply this recognition threshold to the task of recognizing the thermostat among all images in the dataset, we find the results shown in figure 7b. Although the precision, recall and F1-measure sometimes turn to 0, this does not mean that the system does not return any results, it only means that the system does not return any true positives. The result for 2011 can be explained by looking at the dataset. It turns out that there were no pictures including the object of interest, making it impossible to score higher than 0. This is also the case for the years before 2006, as the thermostat was never seen in those years. We also observe a peak in After inspection of the dataset it turned out that there were many images containing the thermostat in that particular year Evaluation of the Retrieval Task. We will now evaluate the performance of the retrieval system. To do this we will first find the recognition threshold that maximizes the F1-measure. For the evaluation of the object recognition task, we have set the recognition threshold based on the highest precision. However, for the retrieval task it turns out that the highest precision can reach the optimal value of 1 when only one image is retrieved that captures the same place as the queryplace. Although we want to achieve the highest precision possible, we do not wish to retrieve a total of only one or two images. Therefore, we will choose the parameters of the retrieval system that maximize the F1-measure. Again we will provide the system with the query-image and the object of interest as seen in figure 6. The system retrieves images containing the object of interest with the recognition threshold that we have found by evaluating the object recognition task. The resulting set of images is used to detect changing features and their split-years. For this particular retrieval task we have used an LH-value of Remember that the LH-value influences the number of features that are selected as begin informative and thus, the sizes of the bags of words that are being used for different years.

19 19 FIGURE 8. Precision, recall and F1-measure over time. The results were obtained by using an LH-value of 0.75 and a recognition threshold of 34. The images that were used to detect change were automatically retrieved by recognizing the thermostat in the images from among the dataset using a recognition threshold of 50. Using this LH-value we found that using a recognition threshold of 34 resulted in the highest F1- measure. This recognition threshold is significantly lower than the recognition threshold that we have identified as the optimal recognition threshold while evaluating the object recognition task. We find that when using a bag of words approach it is possible that not every word is present in every image of the query-place. This means that the recognition thresholds need to be lower when recognizing multiple words, compared to recognizing one word. In figure 8 the results of the retrieval task are shown. The overall precision of this retrieval task is 0.21, the recall 1 and the F1-measure It turns out that the system returns every image in the dataset, resulting in a very high recall but a very low precision. The results can be explained by looking at the output of the object recognition task. This part of the system is used to retrieve a set of images containing the object, to obtain the recognition signal used for change detection and to retrieve images that depict the same place. We have obtained a precision of 0.20 when recognizing the thermostat in the previous evaluation task. The low precision means that a significantly large part of the images in which the system recognizes the thermostat does not contain this object or even captures the query-place. In addition to this, the split-year the system detects is 2011, in this year the thermostat was not present in any of the images. However, the system has identified at least one image taken in 2011 as containing a thermostat (see figure 9). This image does not capture the query-location, but the system uses a selection of sub-images from this image to retrieve images that were taken at the original queryplace. This explains why precision drops significantly after 2011, which is the split-year where an incorrect image was used as the new query-image.

20 20 F IGURE 9. The new query-image for 2011 selected by the system Evaluation of the Change Detection Task. To separately evaluate the performance of the part of the system that detects change and constructs different bags of words based on different query-images, we will disable the part of the system that automatically recognizes images containing the object of interest. Instead we have provided the system with a set of images that contain the thermostat manually. We have evaluated the retrieval system with this adjustment with two different LH-values: 0.65 and We have found that a recognition threshold of 37 for LH-value 0.65 and a recognition threshold of 38 for LH- value 0.75 maximizes the F1-measure. For both LHvalues the system has detected the same split-years: 2011 and Normally the system would have set a new query-image by selecting the image with the highest object recognition score, but we have disabled that part of the system. Instead, we let the system present a selection of possible new query-images, after which we will select an image ourselves. Letting the user select the new query-image has two main advantages. First of all, when the system selects an image that does not capture the query-location, the user will prevent the system from making the mistake of using this image as the query-image like it did in the previous chapter. Second, the user can select an image that is representative of the location. We found that many of the images taken in 2010 are not of the same quality of the images in other years. As it turns out, a young family member was given a new camera for his birthday resulting in a set of unconventional images. Also, the more common family photos from that year contained more people than usual. Because of the large number of people in these images, the places in the living room were barely visible. For this reason we decided to select an image of late 2009 as the new query-image, because it was much more representative of the location in The overall F1-measure for the LH-value of 0.65 is slightly higher than the overall F1-measure for the LH-value of The results of the retrieval task where an LH-value of 0.65 and an average recognition threshold of 37 were used, are shown in figure 10. We find that the recall is considerably lower than the recall we found when automatically retrieving images

Image Features: Detection, Description, and Matching and their Applications

Image Features: Detection, Description, and Matching and their Applications Image Features: Detection, Description, and Matching and their Applications Image Representation: Global Versus Local Features Features/ keypoints/ interset points are interesting locations in the image.

More information

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS Cognitive Robotics Original: David G. Lowe, 004 Summary: Coen van Leeuwen, s1460919 Abstract: This article presents a method to extract

More information

EE368 Project Report CD Cover Recognition Using Modified SIFT Algorithm

EE368 Project Report CD Cover Recognition Using Modified SIFT Algorithm EE368 Project Report CD Cover Recognition Using Modified SIFT Algorithm Group 1: Mina A. Makar Stanford University mamakar@stanford.edu Abstract In this report, we investigate the application of the Scale-Invariant

More information

Introduction. Introduction. Related Research. SIFT method. SIFT method. Distinctive Image Features from Scale-Invariant. Scale.

Introduction. Introduction. Related Research. SIFT method. SIFT method. Distinctive Image Features from Scale-Invariant. Scale. Distinctive Image Features from Scale-Invariant Keypoints David G. Lowe presented by, Sudheendra Invariance Intensity Scale Rotation Affine View point Introduction Introduction SIFT (Scale Invariant Feature

More information

Discovering Visual Hierarchy through Unsupervised Learning Haider Razvi

Discovering Visual Hierarchy through Unsupervised Learning Haider Razvi Discovering Visual Hierarchy through Unsupervised Learning Haider Razvi hrazvi@stanford.edu 1 Introduction: We present a method for discovering visual hierarchy in a set of images. Automatically grouping

More information

Outline 7/2/201011/6/

Outline 7/2/201011/6/ Outline Pattern recognition in computer vision Background on the development of SIFT SIFT algorithm and some of its variations Computational considerations (SURF) Potential improvement Summary 01 2 Pattern

More information

CS 231A Computer Vision (Fall 2012) Problem Set 3

CS 231A Computer Vision (Fall 2012) Problem Set 3 CS 231A Computer Vision (Fall 2012) Problem Set 3 Due: Nov. 13 th, 2012 (2:15pm) 1 Probabilistic Recursion for Tracking (20 points) In this problem you will derive a method for tracking a point of interest

More information

Features Points. Andrea Torsello DAIS Università Ca Foscari via Torino 155, Mestre (VE)

Features Points. Andrea Torsello DAIS Università Ca Foscari via Torino 155, Mestre (VE) Features Points Andrea Torsello DAIS Università Ca Foscari via Torino 155, 30172 Mestre (VE) Finding Corners Edge detectors perform poorly at corners. Corners provide repeatable points for matching, so

More information

Motion illusion, rotating snakes

Motion illusion, rotating snakes Motion illusion, rotating snakes Local features: main components 1) Detection: Find a set of distinctive key points. 2) Description: Extract feature descriptor around each interest point as vector. x 1

More information

CS4670: Computer Vision

CS4670: Computer Vision CS4670: Computer Vision Noah Snavely Lecture 6: Feature matching and alignment Szeliski: Chapter 6.1 Reading Last time: Corners and blobs Scale-space blob detector: Example Feature descriptors We know

More information

Part-Based Skew Estimation for Mathematical Expressions

Part-Based Skew Estimation for Mathematical Expressions Soma Shiraishi, Yaokai Feng, and Seiichi Uchida shiraishi@human.ait.kyushu-u.ac.jp {fengyk,uchida}@ait.kyushu-u.ac.jp Abstract We propose a novel method for the skew estimation on text images containing

More information

Chapter 3 Image Registration. Chapter 3 Image Registration

Chapter 3 Image Registration. Chapter 3 Image Registration Chapter 3 Image Registration Distributed Algorithms for Introduction (1) Definition: Image Registration Input: 2 images of the same scene but taken from different perspectives Goal: Identify transformation

More information

Local Features: Detection, Description & Matching

Local Features: Detection, Description & Matching Local Features: Detection, Description & Matching Lecture 08 Computer Vision Material Citations Dr George Stockman Professor Emeritus, Michigan State University Dr David Lowe Professor, University of British

More information

Detecting Printed and Handwritten Partial Copies of Line Drawings Embedded in Complex Backgrounds

Detecting Printed and Handwritten Partial Copies of Line Drawings Embedded in Complex Backgrounds 9 1th International Conference on Document Analysis and Recognition Detecting Printed and Handwritten Partial Copies of Line Drawings Embedded in Complex Backgrounds Weihan Sun, Koichi Kise Graduate School

More information

BSB663 Image Processing Pinar Duygulu. Slides are adapted from Selim Aksoy

BSB663 Image Processing Pinar Duygulu. Slides are adapted from Selim Aksoy BSB663 Image Processing Pinar Duygulu Slides are adapted from Selim Aksoy Image matching Image matching is a fundamental aspect of many problems in computer vision. Object or scene recognition Solving

More information

CS 223B Computer Vision Problem Set 3

CS 223B Computer Vision Problem Set 3 CS 223B Computer Vision Problem Set 3 Due: Feb. 22 nd, 2011 1 Probabilistic Recursion for Tracking In this problem you will derive a method for tracking a point of interest through a sequence of images.

More information

Trademark Matching and Retrieval in Sport Video Databases

Trademark Matching and Retrieval in Sport Video Databases Trademark Matching and Retrieval in Sport Video Databases Andrew D. Bagdanov, Lamberto Ballan, Marco Bertini and Alberto Del Bimbo {bagdanov, ballan, bertini, delbimbo}@dsi.unifi.it 9th ACM SIGMM International

More information

Determinant of homography-matrix-based multiple-object recognition

Determinant of homography-matrix-based multiple-object recognition Determinant of homography-matrix-based multiple-object recognition 1 Nagachetan Bangalore, Madhu Kiran, Anil Suryaprakash Visio Ingenii Limited F2-F3 Maxet House Liverpool Road Luton, LU1 1RS United Kingdom

More information

Scale Invariant Feature Transform

Scale Invariant Feature Transform Scale Invariant Feature Transform Why do we care about matching features? Camera calibration Stereo Tracking/SFM Image moiaicing Object/activity Recognition Objection representation and recognition Image

More information

Local features and image matching. Prof. Xin Yang HUST

Local features and image matching. Prof. Xin Yang HUST Local features and image matching Prof. Xin Yang HUST Last time RANSAC for robust geometric transformation estimation Translation, Affine, Homography Image warping Given a 2D transformation T and a source

More information

Harder case. Image matching. Even harder case. Harder still? by Diva Sian. by swashford

Harder case. Image matching. Even harder case. Harder still? by Diva Sian. by swashford Image matching Harder case by Diva Sian by Diva Sian by scgbt by swashford Even harder case Harder still? How the Afghan Girl was Identified by Her Iris Patterns Read the story NASA Mars Rover images Answer

More information

CS 4495 Computer Vision A. Bobick. CS 4495 Computer Vision. Features 2 SIFT descriptor. Aaron Bobick School of Interactive Computing

CS 4495 Computer Vision A. Bobick. CS 4495 Computer Vision. Features 2 SIFT descriptor. Aaron Bobick School of Interactive Computing CS 4495 Computer Vision Features 2 SIFT descriptor Aaron Bobick School of Interactive Computing Administrivia PS 3: Out due Oct 6 th. Features recap: Goal is to find corresponding locations in two images.

More information

Scale Invariant Feature Transform

Scale Invariant Feature Transform Why do we care about matching features? Scale Invariant Feature Transform Camera calibration Stereo Tracking/SFM Image moiaicing Object/activity Recognition Objection representation and recognition Automatic

More information

Implementing the Scale Invariant Feature Transform(SIFT) Method

Implementing the Scale Invariant Feature Transform(SIFT) Method Implementing the Scale Invariant Feature Transform(SIFT) Method YU MENG and Dr. Bernard Tiddeman(supervisor) Department of Computer Science University of St. Andrews yumeng@dcs.st-and.ac.uk Abstract The

More information

Combining Appearance and Topology for Wide

Combining Appearance and Topology for Wide Combining Appearance and Topology for Wide Baseline Matching Dennis Tell and Stefan Carlsson Presented by: Josh Wills Image Point Correspondences Critical foundation for many vision applications 3-D reconstruction,

More information

CS231A Section 6: Problem Set 3

CS231A Section 6: Problem Set 3 CS231A Section 6: Problem Set 3 Kevin Wong Review 6 -! 1 11/09/2012 Announcements PS3 Due 2:15pm Tuesday, Nov 13 Extra Office Hours: Friday 6 8pm Huang Common Area, Basement Level. Review 6 -! 2 Topics

More information

Image matching on a mobile device

Image matching on a mobile device Image matching on a mobile device Honours project Authors: Steve Nowee Nick de Wolf Eva van Weel Supervisor: Jan van Gemert Contents 1 Introduction 3 2 Theory 4 2.1 Bag of words...........................

More information

EE368 Project: Visual Code Marker Detection

EE368 Project: Visual Code Marker Detection EE368 Project: Visual Code Marker Detection Kahye Song Group Number: 42 Email: kahye@stanford.edu Abstract A visual marker detection algorithm has been implemented and tested with twelve training images.

More information

Local Image Features

Local Image Features Local Image Features Computer Vision Read Szeliski 4.1 James Hays Acknowledgment: Many slides from Derek Hoiem and Grauman&Leibe 2008 AAAI Tutorial Flashed Face Distortion 2nd Place in the 8th Annual Best

More information

Feature descriptors and matching

Feature descriptors and matching Feature descriptors and matching Detections at multiple scales Invariance of MOPS Intensity Scale Rotation Color and Lighting Out-of-plane rotation Out-of-plane rotation Better representation than color:

More information

Local Features Tutorial: Nov. 8, 04

Local Features Tutorial: Nov. 8, 04 Local Features Tutorial: Nov. 8, 04 Local Features Tutorial References: Matlab SIFT tutorial (from course webpage) Lowe, David G. Distinctive Image Features from Scale Invariant Features, International

More information

Local Image Features

Local Image Features Local Image Features Computer Vision CS 143, Brown Read Szeliski 4.1 James Hays Acknowledgment: Many slides from Derek Hoiem and Grauman&Leibe 2008 AAAI Tutorial This section: correspondence and alignment

More information

Local features: detection and description. Local invariant features

Local features: detection and description. Local invariant features Local features: detection and description Local invariant features Detection of interest points Harris corner detection Scale invariant blob detection: LoG Description of local patches SIFT : Histograms

More information

Harder case. Image matching. Even harder case. Harder still? by Diva Sian. by swashford

Harder case. Image matching. Even harder case. Harder still? by Diva Sian. by swashford Image matching Harder case by Diva Sian by Diva Sian by scgbt by swashford Even harder case Harder still? How the Afghan Girl was Identified by Her Iris Patterns Read the story NASA Mars Rover images Answer

More information

Evaluation and comparison of interest points/regions

Evaluation and comparison of interest points/regions Introduction Evaluation and comparison of interest points/regions Quantitative evaluation of interest point/region detectors points / regions at the same relative location and area Repeatability rate :

More information

SCALE INVARIANT FEATURE TRANSFORM (SIFT)

SCALE INVARIANT FEATURE TRANSFORM (SIFT) 1 SCALE INVARIANT FEATURE TRANSFORM (SIFT) OUTLINE SIFT Background SIFT Extraction Application in Content Based Image Search Conclusion 2 SIFT BACKGROUND Scale-invariant feature transform SIFT: to detect

More information

Computer Vision for HCI. Topics of This Lecture

Computer Vision for HCI. Topics of This Lecture Computer Vision for HCI Interest Points Topics of This Lecture Local Invariant Features Motivation Requirements, Invariances Keypoint Localization Features from Accelerated Segment Test (FAST) Harris Shi-Tomasi

More information

Image matching. Announcements. Harder case. Even harder case. Project 1 Out today Help session at the end of class. by Diva Sian.

Image matching. Announcements. Harder case. Even harder case. Project 1 Out today Help session at the end of class. by Diva Sian. Announcements Project 1 Out today Help session at the end of class Image matching by Diva Sian by swashford Harder case Even harder case How the Afghan Girl was Identified by Her Iris Patterns Read the

More information

EECS150 - Digital Design Lecture 14 FIFO 2 and SIFT. Recap and Outline

EECS150 - Digital Design Lecture 14 FIFO 2 and SIFT. Recap and Outline EECS150 - Digital Design Lecture 14 FIFO 2 and SIFT Oct. 15, 2013 Prof. Ronald Fearing Electrical Engineering and Computer Sciences University of California, Berkeley (slides courtesy of Prof. John Wawrzynek)

More information

The SIFT (Scale Invariant Feature

The SIFT (Scale Invariant Feature The SIFT (Scale Invariant Feature Transform) Detector and Descriptor developed by David Lowe University of British Columbia Initial paper ICCV 1999 Newer journal paper IJCV 2004 Review: Matt Brown s Canonical

More information

A System of Image Matching and 3D Reconstruction

A System of Image Matching and 3D Reconstruction A System of Image Matching and 3D Reconstruction CS231A Project Report 1. Introduction Xianfeng Rui Given thousands of unordered images of photos with a variety of scenes in your gallery, you will find

More information

Feature descriptors. Alain Pagani Prof. Didier Stricker. Computer Vision: Object and People Tracking

Feature descriptors. Alain Pagani Prof. Didier Stricker. Computer Vision: Object and People Tracking Feature descriptors Alain Pagani Prof. Didier Stricker Computer Vision: Object and People Tracking 1 Overview Previous lectures: Feature extraction Today: Gradiant/edge Points (Kanade-Tomasi + Harris)

More information

Recognition of Degraded Handwritten Characters Using Local Features. Markus Diem and Robert Sablatnig

Recognition of Degraded Handwritten Characters Using Local Features. Markus Diem and Robert Sablatnig Recognition of Degraded Handwritten Characters Using Local Features Markus Diem and Robert Sablatnig Glagotica the oldest slavonic alphabet Saint Catherine's Monastery, Mount Sinai Challenges in interpretation

More information

Fundamentals of Stereo Vision Michael Bleyer LVA Stereo Vision

Fundamentals of Stereo Vision Michael Bleyer LVA Stereo Vision Fundamentals of Stereo Vision Michael Bleyer LVA Stereo Vision What Happened Last Time? Human 3D perception (3D cinema) Computational stereo Intuitive explanation of what is meant by disparity Stereo matching

More information

Midterm Wed. Local features: detection and description. Today. Last time. Local features: main components. Goal: interest operator repeatability

Midterm Wed. Local features: detection and description. Today. Last time. Local features: main components. Goal: interest operator repeatability Midterm Wed. Local features: detection and description Monday March 7 Prof. UT Austin Covers material up until 3/1 Solutions to practice eam handed out today Bring a 8.5 11 sheet of notes if you want Review

More information

EE795: Computer Vision and Intelligent Systems

EE795: Computer Vision and Intelligent Systems EE795: Computer Vision and Intelligent Systems Spring 2012 TTh 17:30-18:45 FDH 204 Lecture 09 130219 http://www.ee.unlv.edu/~b1morris/ecg795/ 2 Outline Review Feature Descriptors Feature Matching Feature

More information

Local invariant features

Local invariant features Local invariant features Tuesday, Oct 28 Kristen Grauman UT-Austin Today Some more Pset 2 results Pset 2 returned, pick up solutions Pset 3 is posted, due 11/11 Local invariant features Detection of interest

More information

Local features: detection and description May 12 th, 2015

Local features: detection and description May 12 th, 2015 Local features: detection and description May 12 th, 2015 Yong Jae Lee UC Davis Announcements PS1 grades up on SmartSite PS1 stats: Mean: 83.26 Standard Dev: 28.51 PS2 deadline extended to Saturday, 11:59

More information

Computer vision: models, learning and inference. Chapter 13 Image preprocessing and feature extraction

Computer vision: models, learning and inference. Chapter 13 Image preprocessing and feature extraction Computer vision: models, learning and inference Chapter 13 Image preprocessing and feature extraction Preprocessing The goal of pre-processing is to try to reduce unwanted variation in image due to lighting,

More information

Image Features: Local Descriptors. Sanja Fidler CSC420: Intro to Image Understanding 1/ 58

Image Features: Local Descriptors. Sanja Fidler CSC420: Intro to Image Understanding 1/ 58 Image Features: Local Descriptors Sanja Fidler CSC420: Intro to Image Understanding 1/ 58 [Source: K. Grauman] Sanja Fidler CSC420: Intro to Image Understanding 2/ 58 Local Features Detection: Identify

More information

Requirements for region detection

Requirements for region detection Region detectors Requirements for region detection For region detection invariance transformations that should be considered are illumination changes, translation, rotation, scale and full affine transform

More information

Slant Correction using Histograms

Slant Correction using Histograms Slant Correction using Histograms Frank de Zeeuw Bachelor s Thesis in Artificial Intelligence Supervised by Axel Brink & Tijn van der Zant July 12, 2006 Abstract Slant is one of the characteristics that

More information

Object Recognition with Invariant Features

Object Recognition with Invariant Features Object Recognition with Invariant Features Definition: Identify objects or scenes and determine their pose and model parameters Applications Industrial automation and inspection Mobile robots, toys, user

More information

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality

More information

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014 SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014 SIFT SIFT: Scale Invariant Feature Transform; transform image

More information

Local Feature Detectors

Local Feature Detectors Local Feature Detectors Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr Slides adapted from Cordelia Schmid and David Lowe, CVPR 2003 Tutorial, Matthew Brown,

More information

CS 231A Computer Vision (Winter 2014) Problem Set 3

CS 231A Computer Vision (Winter 2014) Problem Set 3 CS 231A Computer Vision (Winter 2014) Problem Set 3 Due: Feb. 18 th, 2015 (11:59pm) 1 Single Object Recognition Via SIFT (45 points) In his 2004 SIFT paper, David Lowe demonstrates impressive object recognition

More information

LOCAL AND GLOBAL DESCRIPTORS FOR PLACE RECOGNITION IN ROBOTICS

LOCAL AND GLOBAL DESCRIPTORS FOR PLACE RECOGNITION IN ROBOTICS 8th International DAAAM Baltic Conference "INDUSTRIAL ENGINEERING - 19-21 April 2012, Tallinn, Estonia LOCAL AND GLOBAL DESCRIPTORS FOR PLACE RECOGNITION IN ROBOTICS Shvarts, D. & Tamre, M. Abstract: The

More information

Learning to Recognize Faces in Realistic Conditions

Learning to Recognize Faces in Realistic Conditions 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

A Novel Algorithm for Color Image matching using Wavelet-SIFT

A Novel Algorithm for Color Image matching using Wavelet-SIFT International Journal of Scientific and Research Publications, Volume 5, Issue 1, January 2015 1 A Novel Algorithm for Color Image matching using Wavelet-SIFT Mupuri Prasanth Babu *, P. Ravi Shankar **

More information

Augmented Reality VU. Computer Vision 3D Registration (2) Prof. Vincent Lepetit

Augmented Reality VU. Computer Vision 3D Registration (2) Prof. Vincent Lepetit Augmented Reality VU Computer Vision 3D Registration (2) Prof. Vincent Lepetit Feature Point-Based 3D Tracking Feature Points for 3D Tracking Much less ambiguous than edges; Point-to-point reprojection

More information

A Rapid Automatic Image Registration Method Based on Improved SIFT

A Rapid Automatic Image Registration Method Based on Improved SIFT Available online at www.sciencedirect.com Procedia Environmental Sciences 11 (2011) 85 91 A Rapid Automatic Image Registration Method Based on Improved SIFT Zhu Hongbo, Xu Xuejun, Wang Jing, Chen Xuesong,

More information

Hidden Loop Recovery for Handwriting Recognition

Hidden Loop Recovery for Handwriting Recognition Hidden Loop Recovery for Handwriting Recognition David Doermann Institute of Advanced Computer Studies, University of Maryland, College Park, USA E-mail: doermann@cfar.umd.edu Nathan Intrator School of

More information

A Keypoint Descriptor Inspired by Retinal Computation

A Keypoint Descriptor Inspired by Retinal Computation A Keypoint Descriptor Inspired by Retinal Computation Bongsoo Suh, Sungjoon Choi, Han Lee Stanford University {bssuh,sungjoonchoi,hanlee}@stanford.edu Abstract. The main goal of our project is to implement

More information

Image Processing. Image Features

Image Processing. Image Features Image Processing Image Features Preliminaries 2 What are Image Features? Anything. What they are used for? Some statements about image fragments (patches) recognition Search for similar patches matching

More information

Chapter 2 Basic Structure of High-Dimensional Spaces

Chapter 2 Basic Structure of High-Dimensional Spaces Chapter 2 Basic Structure of High-Dimensional Spaces Data is naturally represented geometrically by associating each record with a point in the space spanned by the attributes. This idea, although simple,

More information

CS664 Lecture #21: SIFT, object recognition, dynamic programming

CS664 Lecture #21: SIFT, object recognition, dynamic programming CS664 Lecture #21: SIFT, object recognition, dynamic programming Some material taken from: Sebastian Thrun, Stanford http://cs223b.stanford.edu/ Yuri Boykov, Western Ontario David Lowe, UBC http://www.cs.ubc.ca/~lowe/keypoints/

More information

CAP 5415 Computer Vision Fall 2012

CAP 5415 Computer Vision Fall 2012 CAP 5415 Computer Vision Fall 01 Dr. Mubarak Shah Univ. of Central Florida Office 47-F HEC Lecture-5 SIFT: David Lowe, UBC SIFT - Key Point Extraction Stands for scale invariant feature transform Patented

More information

Obtaining Feature Correspondences

Obtaining Feature Correspondences Obtaining Feature Correspondences Neill Campbell May 9, 2008 A state-of-the-art system for finding objects in images has recently been developed by David Lowe. The algorithm is termed the Scale-Invariant

More information

HISTOGRAMS OF ORIENTATIO N GRADIENTS

HISTOGRAMS OF ORIENTATIO N GRADIENTS HISTOGRAMS OF ORIENTATIO N GRADIENTS Histograms of Orientation Gradients Objective: object recognition Basic idea Local shape information often well described by the distribution of intensity gradients

More information

Feature-based methods for image matching

Feature-based methods for image matching Feature-based methods for image matching Bag of Visual Words approach Feature descriptors SIFT descriptor SURF descriptor Geometric consistency check Vocabulary tree Digital Image Processing: Bernd Girod,

More information

Real-time Door Detection based on AdaBoost learning algorithm

Real-time Door Detection based on AdaBoost learning algorithm Real-time Door Detection based on AdaBoost learning algorithm Jens Hensler, Michael Blaich, and Oliver Bittel University of Applied Sciences Konstanz, Germany Laboratory for Mobile Robots Brauneggerstr.

More information

A Novel Real-Time Feature Matching Scheme

A Novel Real-Time Feature Matching Scheme Sensors & Transducers, Vol. 165, Issue, February 01, pp. 17-11 Sensors & Transducers 01 by IFSA Publishing, S. L. http://www.sensorsportal.com A Novel Real-Time Feature Matching Scheme Ying Liu, * Hongbo

More information

2D Image Processing Feature Descriptors

2D Image Processing Feature Descriptors 2D Image Processing Feature Descriptors Prof. Didier Stricker Kaiserlautern University http://ags.cs.uni-kl.de/ DFKI Deutsches Forschungszentrum für Künstliche Intelligenz http://av.dfki.de 1 Overview

More information

EDGE EXTRACTION ALGORITHM BASED ON LINEAR PERCEPTION ENHANCEMENT

EDGE EXTRACTION ALGORITHM BASED ON LINEAR PERCEPTION ENHANCEMENT EDGE EXTRACTION ALGORITHM BASED ON LINEAR PERCEPTION ENHANCEMENT Fan ZHANG*, Xianfeng HUANG, Xiaoguang CHENG, Deren LI State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing,

More information

Character Recognition

Character Recognition Character Recognition 5.1 INTRODUCTION Recognition is one of the important steps in image processing. There are different methods such as Histogram method, Hough transformation, Neural computing approaches

More information

CSE 527: Introduction to Computer Vision

CSE 527: Introduction to Computer Vision CSE 527: Introduction to Computer Vision Week 5 - Class 1: Matching, Stitching, Registration September 26th, 2017 ??? Recap Today Feature Matching Image Alignment Panoramas HW2! Feature Matches Feature

More information

Robust Online Object Learning and Recognition by MSER Tracking

Robust Online Object Learning and Recognition by MSER Tracking Computer Vision Winter Workshop 28, Janez Perš (ed.) Moravske Toplice, Slovenia, February 4 6 Slovenian Pattern Recognition Society, Ljubljana, Slovenia Robust Online Object Learning and Recognition by

More information

Universiteit Leiden Computer Science

Universiteit Leiden Computer Science Universiteit Leiden Computer Science Optimizing octree updates for visibility determination on dynamic scenes Name: Hans Wortel Student-no: 0607940 Date: 28/07/2011 1st supervisor: Dr. Michael Lew 2nd

More information

Local Patch Descriptors

Local Patch Descriptors Local Patch Descriptors Slides courtesy of Steve Seitz and Larry Zitnick CSE 803 1 How do we describe an image patch? How do we describe an image patch? Patches with similar content should have similar

More information

School of Computing University of Utah

School of Computing University of Utah School of Computing University of Utah Presentation Outline 1 2 3 4 Main paper to be discussed David G. Lowe, Distinctive Image Features from Scale-Invariant Keypoints, IJCV, 2004. How to find useful keypoints?

More information

SEARCH BY MOBILE IMAGE BASED ON VISUAL AND SPATIAL CONSISTENCY. Xianglong Liu, Yihua Lou, Adams Wei Yu, Bo Lang

SEARCH BY MOBILE IMAGE BASED ON VISUAL AND SPATIAL CONSISTENCY. Xianglong Liu, Yihua Lou, Adams Wei Yu, Bo Lang SEARCH BY MOBILE IMAGE BASED ON VISUAL AND SPATIAL CONSISTENCY Xianglong Liu, Yihua Lou, Adams Wei Yu, Bo Lang State Key Laboratory of Software Development Environment Beihang University, Beijing 100191,

More information

Yudistira Pictures; Universitas Brawijaya

Yudistira Pictures; Universitas Brawijaya Evaluation of Feature Detector-Descriptor for Real Object Matching under Various Conditions of Ilumination and Affine Transformation Novanto Yudistira1, Achmad Ridok2, Moch Ali Fauzi3 1) Yudistira Pictures;

More information

Using Corner Feature Correspondences to Rank Word Images by Similarity

Using Corner Feature Correspondences to Rank Word Images by Similarity Using Corner Feature Correspondences to Rank Word Images by Similarity Jamie L. Rothfeder, Shaolei Feng and Toni M. Rath Multi-Media Indexing and Retrieval Group Center for Intelligent Information Retrieval

More information

Key properties of local features

Key properties of local features Key properties of local features Locality, robust against occlusions Must be highly distinctive, a good feature should allow for correct object identification with low probability of mismatch Easy to etract

More information

Coarse-to-fine image registration

Coarse-to-fine image registration Today we will look at a few important topics in scale space in computer vision, in particular, coarseto-fine approaches, and the SIFT feature descriptor. I will present only the main ideas here to give

More information

Shape Descriptor using Polar Plot for Shape Recognition.

Shape Descriptor using Polar Plot for Shape Recognition. Shape Descriptor using Polar Plot for Shape Recognition. Brijesh Pillai ECE Graduate Student, Clemson University bpillai@clemson.edu Abstract : This paper presents my work on computing shape models that

More information

Specular 3D Object Tracking by View Generative Learning

Specular 3D Object Tracking by View Generative Learning Specular 3D Object Tracking by View Generative Learning Yukiko Shinozuka, Francois de Sorbier and Hideo Saito Keio University 3-14-1 Hiyoshi, Kohoku-ku 223-8522 Yokohama, Japan shinozuka@hvrl.ics.keio.ac.jp

More information

Bridging the Gap Between Local and Global Approaches for 3D Object Recognition. Isma Hadji G. N. DeSouza

Bridging the Gap Between Local and Global Approaches for 3D Object Recognition. Isma Hadji G. N. DeSouza Bridging the Gap Between Local and Global Approaches for 3D Object Recognition Isma Hadji G. N. DeSouza Outline Introduction Motivation Proposed Methods: 1. LEFT keypoint Detector 2. LGS Feature Descriptor

More information

Comparison of Local Feature Descriptors

Comparison of Local Feature Descriptors Department of EECS, University of California, Berkeley. December 13, 26 1 Local Features 2 Mikolajczyk s Dataset Caltech 11 Dataset 3 Evaluation of Feature Detectors Evaluation of Feature Deriptors 4 Applications

More information

An Introduction to Content Based Image Retrieval

An Introduction to Content Based Image Retrieval CHAPTER -1 An Introduction to Content Based Image Retrieval 1.1 Introduction With the advancement in internet and multimedia technologies, a huge amount of multimedia data in the form of audio, video and

More information

Midterm Examination CS 534: Computational Photography

Midterm Examination CS 534: Computational Photography Midterm Examination CS 534: Computational Photography November 3, 2016 NAME: Problem Score Max Score 1 6 2 8 3 9 4 12 5 4 6 13 7 7 8 6 9 9 10 6 11 14 12 6 Total 100 1 of 8 1. [6] (a) [3] What camera setting(s)

More information

Designing Applications that See Lecture 7: Object Recognition

Designing Applications that See Lecture 7: Object Recognition stanford hci group / cs377s Designing Applications that See Lecture 7: Object Recognition Dan Maynes-Aminzade 29 January 2008 Designing Applications that See http://cs377s.stanford.edu Reminders Pick up

More information

Edge and corner detection

Edge and corner detection Edge and corner detection Prof. Stricker Doz. G. Bleser Computer Vision: Object and People Tracking Goals Where is the information in an image? How is an object characterized? How can I find measurements

More information

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science. Professor William Hoff Dept of Electrical Engineering &Computer Science http://inside.mines.edu/~whoff/ 1 Object Recognition in Large Databases Some material for these slides comes from www.cs.utexas.edu/~grauman/courses/spring2011/slides/lecture18_index.pptx

More information

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao Classifying Images with Visual/Textual Cues By Steven Kappes and Yan Cao Motivation Image search Building large sets of classified images Robotics Background Object recognition is unsolved Deformable shaped

More information

Recognizing hand-drawn images using shape context

Recognizing hand-drawn images using shape context Recognizing hand-drawn images using shape context Gyozo Gidofalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92037 gyozo@cs.ucsd.edu Abstract The objective

More information

Keypoint-based Recognition and Object Search

Keypoint-based Recognition and Object Search 03/08/11 Keypoint-based Recognition and Object Search Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem Notices I m having trouble connecting to the web server, so can t post lecture

More information

Sea Turtle Identification by Matching Their Scale Patterns

Sea Turtle Identification by Matching Their Scale Patterns Sea Turtle Identification by Matching Their Scale Patterns Technical Report Rajmadhan Ekambaram and Rangachar Kasturi Department of Computer Science and Engineering, University of South Florida Abstract

More information

Implementation and Comparison of Feature Detection Methods in Image Mosaicing

Implementation and Comparison of Feature Detection Methods in Image Mosaicing IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p-ISSN: 2278-8735 PP 07-11 www.iosrjournals.org Implementation and Comparison of Feature Detection Methods in Image

More information