IDE-3D: Predicting Indoor Depth Utilizing Geometric and Monocular Cues

Size: px
Start display at page:

Download "IDE-3D: Predicting Indoor Depth Utilizing Geometric and Monocular Cues"

Transcription

1 2016 International Conference on Computational Science and Computational Intelligence IDE-3D: Predicting Indoor Depth Utilizing Geometric and Monocular Cues Taylor Ripke Department of Computer Science Central Michigan University Mount Pleasant, MI Roger Lee Department of Computer Science Central Michigan University Mount Pleasant, MI Abstract Depth estimation and spatial awareness given a single monocular image is a challenging task for a computer as depth information is not retained when the 3D world is projected onto a 2D plane. Therefore, we must combine our prior knowledge with other monocular cues present in the image, such as occlusion, texture variations, and shadows to understand the depth of the image. In this paper, we present IDE-3D (Indoor Depth Estimation 3D), a tool designed to generate a box model depth map of an indoor environment. The program combines a variety of input from the image, including 3D geometric shape estimation utilizing local and global scene structures, pixel analysis, and outlier removal, to produce a depthmap of the image with acceptable results. We generate a box model of the room and apply our best fit algorithm to calculate the predicted depth of the room by analyzing the horizontal plane and apply a depthmap gradient to it. The current application shows a successful implementation of our best fit algorithm in the controlled experiment by incorporating a box model and texture gradient approach. Future work will include estimating the same depth using an object s shift relative to the focus. Keywords-Image Segmentation, Depth Perception I. INTRODUCTION Throughout the years, research in computer vision has expanded into numerous subfields, including: object recognition, neural networks, and depth estimation. These areas provide the foundation for many applications used today; however, there is still much work to be done. In this paper, we investigate an approach to estimating depth in a single, monocular image utilizing pixel and geometrical analysis. Given several images, it is possible to accurately measure the depth of an image. Computers can mimic the triangulation and overlay done by humans to measure the depth of an object using two cameras. However, the ability to measure depth given only a single image is a challenging task as there are very few monocular cues present in the image. Therefore, it is important to pay attention to the other monocular cues in the image, such as: shading, perspective, size familiarity, and occlusion. Figure 1. Computed depth map from our best fit algorithm using the box model approach. For this project, we had motivation from prior research in the field [5,6,7,8] and want to build upon previous techniques and contribute another method to enhance depth prediction in monocular images. To do this, our project is divided into two segments. In the first segment, we will be generating a box model of a room utilizing our best fit algorithm after removing the outliers and objects from the calculations to build the box model of the room. The approach relies on finding the ground/wall boundary as shown in Fig 1. Given a threshold, the boundaries are determined by ignoring everything that does not resemble a line. Our algorithm does not perform well where the boundary is not present, as discussed in the results. However, in cases where the algorithm can find the boundary, it separates the room into distinct regions where it will apply the second phase of the algorithm. The current program is not designed to account for objects /16 $ IEEE DOI /CSCI

2 During the second phase of the algorithm, we apply a gradient technique to illustrate the depth of the room as accurately as possible. It does this by exploiting the geometry of the scene and finding similar patterns of pixels that form the boundaries of objects. As observed in Fig 1, the green to blue texture on the right wall shows the progression of depth in the image. The wall in the back of the image is a solid blue, showing that it has been classified as being the same distance from the camera. Finally, the increasing progression of yellow on the ground also shows the progression of depth. We test our program by providing it with various indoor images that we gathered ourselves. We show that with the current state of the program, it effectively produces box models of the rooms and applies a gradient to show the progression of depth in the scene. Rather than considering applying a gradient to the image as a whole, we segmented it using the box model approach and applied the gradients on the surfaces produced by the algorithm. In the next section, we will discuss the previous work done in the field and the impact it has had on current research. A variety of techniques will be presented that illustrate numerous approaches that can be used to estimate depth in images. A current trend in research suggests a strong need to exploit the higher level information in the scene, rather than focusing on local cues alone. The ability to understand what a group of pixels represent is more valuable than evaluating each pixel individually. However, it is important to consider all monocular cues present as they provide crucial information about the image. II. BACKGROUND AND RELATED WORK As stated previously, estimating depth in a single image is a challenging task as depth information is lost when the image is created. Regardless, humans have the capability to perceive depth in a monocular image using the information they have gained throughout their lifetime. Therefore, it should not be unreasonable to think that one day computers may do the same. Perhaps one of the biggest problems in computer vision is effectively and efficiency recognizing and classifying objects. A computer may make the mistake of thinking that a person up close is taller than a skyscraper far away if the picture is taken at the right angle. However, it is possible to produce an accurate depth map utilizing other monocular cues present in the image. Previous techniques will be presented that show the numerous ways depth can be perceived in a single image. Most approaches taken recently have focused on depth at the local scale [1,2,3,4]. While their results have been successful, it is important to use high level information from the global structure of the scene. Individual pixels and local information alone is not enough to determine the context of the image. For example, if a computer was shown an array of blue pixels, it would have a hard time identifying whether it is viewing the sky, the ocean, a river, or perhaps a blueberry. Rather, if we can recognize that the image is being taken outside, we can infer that the blue is the sky if it is in the upper half of the image. A. Zhuo et al. Most approaches taken in the past have focused on the local cues rather than exploiting the global structure of the image. Zhuo et al. developed a hierarchical representation of the scene, which combines the local depth with mid-level and global scene structures. They were able to formulate a single image depth estimation in a graphical model by encoding the interactions across different layers of their hierarchy. By doing so, they were able to produce detailed depth estimates and get higher-level information from the scene [5]. After conducting their experiments, they found that the mid-level structures provided the most to the final accuracy of their model. In the future they plan to use semantic labels as a part of the depth estimation calculations [5]. That should significantly increase the accuracy of the results as utilizing their high level information about the scene, they can classify difference objects and recognize that they sky is far away. B. Liu et al. Utilizing a different approach, Liu et al. used a pool of images for which the depth is known to help them calculate the depth in an unknown image. They treated the task as a discrete-continuous optimization problem, where the discrete variables represented the relationship between neighboring superpixels and the continuous variables encoded the depth of the superpixels. By performing inference in a graphical model utilizing the particle belief propagation, they found a solution to the discretecontinuous optimization problem. By using the images where the depth is known, they can use them to compute the unary potentials in the graphical model [6]. Similar to Zhuo et. al., they plan to incorporate the use of semantic labeling in their estimations. C. Hedau et al. Using the geometric information in a scene and the geometric representation of an object, it is possible to produce a detector for that object. The detector they built unifies contextual and geometric information to produce a probabilistic model of the scene. The locations of the walls and the floor in the image can refine the estimation for the 3D object. They show that it is possible to derive a 3D interpretation of the location of the object from a 2D image [7]. In addition, Hedau et. al. also considered the challenge of recovering spatial layout of indoor scenes given a monocular image. In most rooms, the distinct boundary that

3 marks the division between the floor and wall is partially or entirely occluded by furniture or objects in the room. Most algorithms used to identify the geometric context of the room rely on finding the ground-wall boundary. Rather, they employ a structured learning algorithm to find the parameters for their algorithm based on global perspective cues [8]. The algorithm employed in our research currently relies on finding the ground-wall boundary to produce a depth map. As shown in the results, we show that our model has deficiencies when it cannot find the ground wall. D. Saxena et al. Researchers have been able to recreate the sense of depth perception in computers when analyzing an image to a certain degree of success under certain circumstances. As challenging as optical illusions are to humans, they are even more difficult for a computer. Some approaches, as taken by Saxena et al. utilized a supervised training approach as they collected a training set of monocular images (indoor and outdoor environments such as trees, buildings, and sidewalks) and their corresponding ground-truth depth maps [3]. Using a Markov Random Field which incorporates multiscale local- and global- image features, they were able to model the depths and relation between depths at different points in the image [2]. Their approach combines monocular and stereo (triangulation) cues to estimate depth to show improvements over utilizing only monocular or stereo cues. E. Eigen et al. Other work has involved using deep network stacks to predict depth. Eigen et al. describes how they employed two deep network stacks to make a global prediction and another that refines the prediction locally. It is important to note that they applied a scale-invariant error to help measure depth relations rather than the scale itself [4]. After applying the training sets, their project achieved much success on both NYU Depth and KITTI without the need for superpixelation. In the next section, we will describe the methodology and implementation of our model as inspired by previous research. We will begin with a brief overview of the system followed by a detailed look at the approach we took. In the experiment section, we will describe how we achieved the results as shown in Fig 4 and further areas for improvement. Finally, we will discuss how we can improve our program in the future. III. METHODOLOGY A. Overview Contrary to computers, humans have a remarkable capability of perceiving depth, even if one eye is not involved. Various cues such as shading, perspective, size familiarity and occlusion are important in depth perception. The most powerful form of depth perception a human uses is stereo disparity. Each eye sends an image to the brain which combines them to give the sense of 3D. However, some individuals are born without stereo vision. Rather, their brain compensates for this through active spatial creativity. In a sense, their brain is overcompensating by paying attention to other visual cues present. The approaches outlined previously present successful research concerning the problem of depth perception given a single monocular image. The approach we explored used a geometric representation of the image utilizing a box model. B. Our Approach Existing systems utilize various techniques to recover depth information lost when the 3D world is portrayed on a 2D plane with the most prominent focus being geometric information. Arguably the most difficult task is object recognition. A computer views an image as an array of pixels. Therefore, it makes logical sense to try and find patterns that resemble what we are trying to find. We would expect in most situations to find a horizontal row of similarly colored pixels that represent the floor/wall boundary. However, object occlusion can corrupt the algorithms calculations unless it is accounted for. Specifically, we are interested in developing a best fit algorithm that can detect the boundaries of the room after the room has been analyzed. We want to be able to find the boundaries of the room and calculate a line of best fit, or one that has the most pixels of the same color within a certain threshold. The algorithm itself cannot accomplish this task alone. Instead, we must try and remove the existential information from the scene. C. Proposed System For our algorithm to be able to estimate the depth of an image, some preprocessing must be done for maximum optimization and accuracy. First, we apply a Sobel operator identify the edges. Once the image s edges have been identified, we run another an algorithm that identifies edges based specifically on similar color. The outliers and objects deemed to not be lines are removed by identifying regions of curvature. An outlier in this context can be described as a pixel or a group of pixels that do not conform or represent an object and have no meaning. For example, a part of the wall may have a marker stain or a nail hole in it. These will be identified by the algorithm, but they are not essential to the larger picture. Following the successful completion of the outlier removal algorithm, we then scan the image to check for connectivity. Given some pixel, we want to check if that pixel is a part of a larger group of pixels. For example, if there was a picture frame on the wall, and our algorithm targeted the upper left pixel of the frame, we want to check

4 if there are more pixels around that pixel and if they form any distinct shape. If there are not any pixels or not enough pixels within a given radius, then we remove them from the calculations. More specifically, we want to identify patterns of pixels that are round or not straight, as they would not be considered a boundary. While checking, each pixel is stored in a temporary array in case the algorithm believes that it is not part of the larger picture so that they can all be removed easily. Next, we apply our best fit algorithm which generates boundary lines each time it finds a pixel identified as a potential boundary. Each generation, the slope of the line is modified to see if that generation would be better than the previous. Once all boundaries have been identified, a gradient is applied to the image to show the gradual transition in depth at any point. Figure 2. Pseudocode for Best Fit Algorithm D. Implementation The first step was to apply a basic Sobel edge detection to identify the edges. Following, we grouped pixels into distinct regions based on color similarity. Next, we removed the outliers from the calculation. Utilizing a predetermined value, such as 10, we can tell the program to remove any pixels that do not have a connection of 10 or greater. This greatly reduces the amount of pixels that need to be scanned by the best fit algorithm. This threshold can be modified to produce different results. Finally, the most prominent feature of our program is the best fit algorithm. As shown in Fig 2, the best fit algorithm loops through each pixel of the image until it comes across one that has a connectivity of greater than 10. In a 240 generation while loop, the slope of the line is changed and the pixels generated are overlaid on top of the pixels remaining after the outlier removal algorithm. The program keeps a running tally of how many pixels are on top of the others. The slope with the best fit is chosen and that is where a room boundary is placed. As you can see in Fig 4, the algorithm accurately finds the boundaries of the room when the horizontal axis is found. E. Experiment The experiment was conducted by capturing four indoor static images and then letting our program evaluate them. In this controlled experiment, we took images that had a ground/wall boundary defined, as well as some that didn t. The program first identified the color variations in the image and shaded them magenta if the adjacent pixels exceeded a predetermined threshold. This allowed the program to identify the boundaries between objects with relative certainty. Next, we applied an outlier removal algorithm that checked the connectivity of the magenta pixels to determine whether the object was an object or boundary, such as the ground/wall border. The outlier removal algorithm scanned through the image sequentially until it found a magenta pixel. When it found one, it would check the connectivity by scanning the surrounding pixels and seeing whether they were linear or if they had a curve. The initial threshold was thirty pixels. If those thirty pixels resembled a linear line, they were kept. Otherwise, if the pixels did not have those criteria, they were set back to the original color of the image. This helped narrow down what the algorithm had to process on. Next, our best fit algorithm was applied to calculate the most likely spots for walls. The small excerpt of pseudocode in Fig 2 shows the core process of the algorithm. The idea before performing the best fit algorithm is that if there are magenta pixels remaining before this algorithm, then there is a possibility that this is a boundary. The algorithm begins on a magenta pixel and calculates 240 variations of lines that the pixel could progress upon. A simple counter variable is used to keep track of the line where most of the magenta pixels fall upon. For every pixel we calculated this and created the lines. Once the best fit algorithm identified a line, it removed them from the next calculation to have the ability to calculate more lines

5 Once the boundaries of the floor and walls were determined, the box model was created. Next, we applied a texture gradient pattern to the box model to produce a depth map of the room. The color shifts from the outer edges of the image towards the center gradually, showing the progression in depth. F. Results The current program can in most cases identify the boundaries of the room and correctly produce a box model. Although the program is currently limited in the controlled experiment, further improvements can be made to increase the accuracy and the approach that it uses to model depth. Please refer to Fig 3 and 4 to view the results of the experiment. The limiting factor in our current method is the ability to define the ground wall boundary. As shown in Fig 4, the algorithm appears to correctly identify where the ground meets the wall; however, we believe that this can be improved further. The texture gradient did not apply correctly in this scenario unlike in the previous examples displayed. Figure 3. Comparison of common features shared between programs Initially we wanted to compute a depth and assign each pixel or group of pixels a depth value that would say how far from the camera they were. We conducted an initial experiment to measure the depth at certain distances from the camera computed by our program but soon found inaccuracies at different angles and depths. These results were therefore not included in this study as further development is necessary to improve its accuracy. IV. FUTURE WORK Currently we are developing a method to more accurately predict edges, specifically by analyzing focus. The Sobel edge detection algorithm is an efficient algorithm; however, it has difficulties when considering objects of similar color. For example, a distinct different is examining the depth of two white pieces of paper at different distances from the camera. Given other monocular cues such as texture and occlusion, the most prominent method of object depth is binocular vision. We recognize that the closer our eyes are together, the closer an object is. Similarly, the further away an object is, our eyes separate. When concentrating on an object in the fovea up close, objects farther away will be duplicated. An example can be demonstrated by holding your nose close to your face and focusing on it and hold your other thumb at arm s length away. The further thumb will be duplicated. When we go to focus on the further thumb, the closer one is duplicated in our periphery. In particular, the research discussed in this paper highlights an approach to producing a depth map using a box model generated from a single image. For future applications, we are developing a system where the difference of a shifted object will help us determine depth more accurately. Specifically, two cameras side by side will take an image focused on an object and the difference in the shift from objects in the foreground and background give us information on their depth. For example, a similar experiment will be run comparing the results of our monocular to the stereo vision. The depth of an object or room can also be inferred by evaluating multiple duplicate edge lines and object variance. An object farther away moves a smaller amount when blinking your eyes compared to when the object is up close. At different depths in a room, an object s shift can be compared via two different cameras. We would expect objects closer to move faster in respect to another object farther away. We also expect object duplicates (the same object viewed at different angles) to become separated by a greater difference over time. A future version of this application will estimate depth information of objects in an environment directly related to the focus. V. CONCLUSION The purpose of this program was to generate a box model of the room given the geometric cues present and apply and gradient to the image to show the progression of depth in the image. As shown in Fig 4, the results of our program were successful in the controlled experiment. However, as discussed previously, there are limitations to it. The current application was developed to quickly estimate the depth of a room. Further research is needed to improve the accuracy of

6 this program and will incorporate the methods discussed in the future work. Figure 4. Results of experiment REFERENCES [8] V. Hedau, D. Hoiem, and D. Forsyth.Recovering the spatial layout of cluttered rooms. In Proceedings of ICCV, [1] B. Liu, S. Gould, and D. Koller. Single Image Depth Estimation from Predicted Semantic Labels. Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR). [2] A. Saxena, S.H. Chung, and A.Y. Ng. 3-D depth reconstruction from a single still image. International journal of Computer Vision, 76(1):53-69, 2008, 268 [3] A. Saxena, M.Sun, and A.Y. Ng. Learning 3-D Scene Structure from a Single Still Image. In Proceedings of IEEE International Conference on Computer Vision (ICCV), [4] D. Eigen, C. Puhrsch, and R. Fergus. Depth map prediction from a single image using a multi-scale deep network. In: NIPS (2014) [5] W. Zhuo, M. Salzmann, X. He, M. Liu Indoor Scene Structure Analysis for Single Image Depth Estimation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015). [6] M. Liu, M. Salzmann, X. He. Discrete-Continuous Depth Estimation from a Single Image. Proceedings of the IEEE Conference on Computer Vision and Pattern Recgonition. (2014) [7] Hedau, D. Hoiem, and D. Forsyth.Thinking inside the box: Using appearance models and context based on room geometry. In Proceedings of ECCV, 2010, pg

Optimizing Monocular Cues for Depth Estimation from Indoor Images

Optimizing Monocular Cues for Depth Estimation from Indoor Images Optimizing Monocular Cues for Depth Estimation from Indoor Images Aditya Venkatraman 1, Sheetal Mahadik 2 1, 2 Department of Electronics and Telecommunication, ST Francis Institute of Technology, Mumbai,

More information

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009 Learning and Inferring Depth from Monocular Images Jiyan Pan April 1, 2009 Traditional ways of inferring depth Binocular disparity Structure from motion Defocus Given a single monocular image, how to infer

More information

Correcting User Guided Image Segmentation

Correcting User Guided Image Segmentation Correcting User Guided Image Segmentation Garrett Bernstein (gsb29) Karen Ho (ksh33) Advanced Machine Learning: CS 6780 Abstract We tackle the problem of segmenting an image into planes given user input.

More information

Contexts and 3D Scenes

Contexts and 3D Scenes Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem Administrative stuffs Final project presentation Nov 30 th 3:30 PM 4:45 PM Grading Three senior graders (30%)

More information

STRUCTURAL EDGE LEARNING FOR 3-D RECONSTRUCTION FROM A SINGLE STILL IMAGE. Nan Hu. Stanford University Electrical Engineering

STRUCTURAL EDGE LEARNING FOR 3-D RECONSTRUCTION FROM A SINGLE STILL IMAGE. Nan Hu. Stanford University Electrical Engineering STRUCTURAL EDGE LEARNING FOR 3-D RECONSTRUCTION FROM A SINGLE STILL IMAGE Nan Hu Stanford University Electrical Engineering nanhu@stanford.edu ABSTRACT Learning 3-D scene structure from a single still

More information

Contexts and 3D Scenes

Contexts and 3D Scenes Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem Administrative stuffs Final project presentation Dec 1 st 3:30 PM 4:45 PM Goodwin Hall Atrium Grading Three

More information

Depth. Common Classification Tasks. Example: AlexNet. Another Example: Inception. Another Example: Inception. Depth

Depth. Common Classification Tasks. Example: AlexNet. Another Example: Inception. Another Example: Inception. Depth Common Classification Tasks Recognition of individual objects/faces Analyze object-specific features (e.g., key points) Train with images from different viewing angles Recognition of object classes Analyze

More information

3D Spatial Layout Propagation in a Video Sequence

3D Spatial Layout Propagation in a Video Sequence 3D Spatial Layout Propagation in a Video Sequence Alejandro Rituerto 1, Roberto Manduchi 2, Ana C. Murillo 1 and J. J. Guerrero 1 arituerto@unizar.es, manduchi@soe.ucsc.edu, acm@unizar.es, and josechu.guerrero@unizar.es

More information

Context. CS 554 Computer Vision Pinar Duygulu Bilkent University. (Source:Antonio Torralba, James Hays)

Context. CS 554 Computer Vision Pinar Duygulu Bilkent University. (Source:Antonio Torralba, James Hays) Context CS 554 Computer Vision Pinar Duygulu Bilkent University (Source:Antonio Torralba, James Hays) A computer vision goal Recognize many different objects under many viewing conditions in unconstrained

More information

Practice Exam Sample Solutions

Practice Exam Sample Solutions CS 675 Computer Vision Instructor: Marc Pomplun Practice Exam Sample Solutions Note that in the actual exam, no calculators, no books, and no notes allowed. Question 1: out of points Question 2: out of

More information

Data-driven Depth Inference from a Single Still Image

Data-driven Depth Inference from a Single Still Image Data-driven Depth Inference from a Single Still Image Kyunghee Kim Computer Science Department Stanford University kyunghee.kim@stanford.edu Abstract Given an indoor image, how to recover its depth information

More information

CS 4758: Automated Semantic Mapping of Environment

CS 4758: Automated Semantic Mapping of Environment CS 4758: Automated Semantic Mapping of Environment Dongsu Lee, ECE, M.Eng., dl624@cornell.edu Aperahama Parangi, CS, 2013, alp75@cornell.edu Abstract The purpose of this project is to program an Erratic

More information

Last update: May 4, Vision. CMSC 421: Chapter 24. CMSC 421: Chapter 24 1

Last update: May 4, Vision. CMSC 421: Chapter 24. CMSC 421: Chapter 24 1 Last update: May 4, 200 Vision CMSC 42: Chapter 24 CMSC 42: Chapter 24 Outline Perception generally Image formation Early vision 2D D Object recognition CMSC 42: Chapter 24 2 Perception generally Stimulus

More information

CS 558: Computer Vision 13 th Set of Notes

CS 558: Computer Vision 13 th Set of Notes CS 558: Computer Vision 13 th Set of Notes Instructor: Philippos Mordohai Webpage: www.cs.stevens.edu/~mordohai E-mail: Philippos.Mordohai@stevens.edu Office: Lieb 215 Overview Context and Spatial Layout

More information

Discrete Optimization of Ray Potentials for Semantic 3D Reconstruction

Discrete Optimization of Ray Potentials for Semantic 3D Reconstruction Discrete Optimization of Ray Potentials for Semantic 3D Reconstruction Marc Pollefeys Joined work with Nikolay Savinov, Christian Haene, Lubor Ladicky 2 Comparison to Volumetric Fusion Higher-order ray

More information

A Survey of Light Source Detection Methods

A Survey of Light Source Detection Methods A Survey of Light Source Detection Methods Nathan Funk University of Alberta Mini-Project for CMPUT 603 November 30, 2003 Abstract This paper provides an overview of the most prominent techniques for light

More information

Perceptual Grouping from Motion Cues Using Tensor Voting

Perceptual Grouping from Motion Cues Using Tensor Voting Perceptual Grouping from Motion Cues Using Tensor Voting 1. Research Team Project Leader: Graduate Students: Prof. Gérard Medioni, Computer Science Mircea Nicolescu, Changki Min 2. Statement of Project

More information

Segmentation and Tracking of Partial Planar Templates

Segmentation and Tracking of Partial Planar Templates Segmentation and Tracking of Partial Planar Templates Abdelsalam Masoud William Hoff Colorado School of Mines Colorado School of Mines Golden, CO 800 Golden, CO 800 amasoud@mines.edu whoff@mines.edu Abstract

More information

CS395T paper review. Indoor Segmentation and Support Inference from RGBD Images. Chao Jia Sep

CS395T paper review. Indoor Segmentation and Support Inference from RGBD Images. Chao Jia Sep CS395T paper review Indoor Segmentation and Support Inference from RGBD Images Chao Jia Sep 28 2012 Introduction What do we want -- Indoor scene parsing Segmentation and labeling Support relationships

More information

CS 223B Computer Vision Problem Set 3

CS 223B Computer Vision Problem Set 3 CS 223B Computer Vision Problem Set 3 Due: Feb. 22 nd, 2011 1 Probabilistic Recursion for Tracking In this problem you will derive a method for tracking a point of interest through a sequence of images.

More information

Perception, Part 2 Gleitman et al. (2011), Chapter 5

Perception, Part 2 Gleitman et al. (2011), Chapter 5 Perception, Part 2 Gleitman et al. (2011), Chapter 5 Mike D Zmura Department of Cognitive Sciences, UCI Psych 9A / Psy Beh 11A February 27, 2014 T. M. D'Zmura 1 Visual Reconstruction of a Three-Dimensional

More information

DEPTH AND GEOMETRY FROM A SINGLE 2D IMAGE USING TRIANGULATION

DEPTH AND GEOMETRY FROM A SINGLE 2D IMAGE USING TRIANGULATION 2012 IEEE International Conference on Multimedia and Expo Workshops DEPTH AND GEOMETRY FROM A SINGLE 2D IMAGE USING TRIANGULATION Yasir Salih and Aamir S. Malik, Senior Member IEEE Centre for Intelligent

More information

Segmentation. Bottom up Segmentation Semantic Segmentation

Segmentation. Bottom up Segmentation Semantic Segmentation Segmentation Bottom up Segmentation Semantic Segmentation Semantic Labeling of Street Scenes Ground Truth Labels 11 classes, almost all occur simultaneously, large changes in viewpoint, scale sky, road,

More information

CS 231A Computer Vision (Fall 2012) Problem Set 3

CS 231A Computer Vision (Fall 2012) Problem Set 3 CS 231A Computer Vision (Fall 2012) Problem Set 3 Due: Nov. 13 th, 2012 (2:15pm) 1 Probabilistic Recursion for Tracking (20 points) In this problem you will derive a method for tracking a point of interest

More information

A Simple Vision System

A Simple Vision System Chapter 1 A Simple Vision System 1.1 Introduction In 1966, Seymour Papert wrote a proposal for building a vision system as a summer project [4]. The abstract of the proposal starts stating a simple goal:

More information

Interpreting 3D Scenes Using Object-level Constraints

Interpreting 3D Scenes Using Object-level Constraints Interpreting 3D Scenes Using Object-level Constraints Rehan Hameed Stanford University 353 Serra Mall, Stanford CA rhameed@stanford.edu Abstract As humans we are able to understand the structure of a scene

More information

Why study Computer Vision?

Why study Computer Vision? Why study Computer Vision? Images and movies are everywhere Fast-growing collection of useful applications building representations of the 3D world from pictures automated surveillance (who s doing what)

More information

International Journal of Advance Engineering and Research Development

International Journal of Advance Engineering and Research Development Scientific Journal of Impact Factor (SJIF): 4.14 International Journal of Advance Engineering and Research Development Volume 3, Issue 3, March -2016 e-issn (O): 2348-4470 p-issn (P): 2348-6406 Research

More information

Prof. Feng Liu. Spring /27/2014

Prof. Feng Liu. Spring /27/2014 Prof. Feng Liu Spring 2014 http://www.cs.pdx.edu/~fliu/courses/cs510/ 05/27/2014 Last Time Video Stabilization 2 Today Stereoscopic 3D Human depth perception 3D displays 3 Stereoscopic media Digital Visual

More information

Frequent Inner-Class Approach: A Semi-supervised Learning Technique for One-shot Learning

Frequent Inner-Class Approach: A Semi-supervised Learning Technique for One-shot Learning Frequent Inner-Class Approach: A Semi-supervised Learning Technique for One-shot Learning Izumi Suzuki, Koich Yamada, Muneyuki Unehara Nagaoka University of Technology, 1603-1, Kamitomioka Nagaoka, Niigata

More information

Feature Transfer and Matching in Disparate Stereo Views through the use of Plane Homographies

Feature Transfer and Matching in Disparate Stereo Views through the use of Plane Homographies Feature Transfer and Matching in Disparate Stereo Views through the use of Plane Homographies M. Lourakis, S. Tzurbakis, A. Argyros, S. Orphanoudakis Computer Vision and Robotics Lab (CVRL) Institute of

More information

What is Computer Vision?

What is Computer Vision? Perceptual Grouping in Computer Vision Gérard Medioni University of Southern California What is Computer Vision? Computer Vision Attempt to emulate Human Visual System Perceive visual stimuli with cameras

More information

PERCEIVING DEPTH AND SIZE

PERCEIVING DEPTH AND SIZE PERCEIVING DEPTH AND SIZE DEPTH Cue Approach Identifies information on the retina Correlates it with the depth of the scene Different cues Previous knowledge Slide 3 Depth Cues Oculomotor Monocular Binocular

More information

Fundamentals of Stereo Vision Michael Bleyer LVA Stereo Vision

Fundamentals of Stereo Vision Michael Bleyer LVA Stereo Vision Fundamentals of Stereo Vision Michael Bleyer LVA Stereo Vision What Happened Last Time? Human 3D perception (3D cinema) Computational stereo Intuitive explanation of what is meant by disparity Stereo matching

More information

CS381V Experiment Presentation. Chun-Chen Kuo

CS381V Experiment Presentation. Chun-Chen Kuo CS381V Experiment Presentation Chun-Chen Kuo The Paper Indoor Segmentation and Support Inference from RGBD Images. N. Silberman, D. Hoiem, P. Kohli, and R. Fergus. ECCV 2012. 50 100 150 200 250 300 350

More information

Structured Completion Predictors Applied to Image Segmentation

Structured Completion Predictors Applied to Image Segmentation Structured Completion Predictors Applied to Image Segmentation Dmitriy Brezhnev, Raphael-Joel Lim, Anirudh Venkatesh December 16, 2011 Abstract Multi-image segmentation makes use of global and local features

More information

3D Shape Analysis with Multi-view Convolutional Networks. Evangelos Kalogerakis

3D Shape Analysis with Multi-view Convolutional Networks. Evangelos Kalogerakis 3D Shape Analysis with Multi-view Convolutional Networks Evangelos Kalogerakis 3D model repositories [3D Warehouse - video] 3D geometry acquisition [KinectFusion - video] 3D shapes come in various flavors

More information

Real-Time Human Detection using Relational Depth Similarity Features

Real-Time Human Detection using Relational Depth Similarity Features Real-Time Human Detection using Relational Depth Similarity Features Sho Ikemura, Hironobu Fujiyoshi Dept. of Computer Science, Chubu University. Matsumoto 1200, Kasugai, Aichi, 487-8501 Japan. si@vision.cs.chubu.ac.jp,

More information

Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation

Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation Obviously, this is a very slow process and not suitable for dynamic scenes. To speed things up, we can use a laser that projects a vertical line of light onto the scene. This laser rotates around its vertical

More information

Lecture 14: Computer Vision

Lecture 14: Computer Vision CS/b: Artificial Intelligence II Prof. Olga Veksler Lecture : Computer Vision D shape from Images Stereo Reconstruction Many Slides are from Steve Seitz (UW), S. Narasimhan Outline Cues for D shape perception

More information

COMP 102: Computers and Computing

COMP 102: Computers and Computing COMP 102: Computers and Computing Lecture 23: Computer Vision Instructor: Kaleem Siddiqi (siddiqi@cim.mcgill.ca) Class web page: www.cim.mcgill.ca/~siddiqi/102.html What is computer vision? Broadly speaking,

More information

lecture 10 - depth from blur, binocular stereo

lecture 10 - depth from blur, binocular stereo This lecture carries forward some of the topics from early in the course, namely defocus blur and binocular disparity. The main emphasis here will be on the information these cues carry about depth, rather

More information

Human Upper Body Pose Estimation in Static Images

Human Upper Body Pose Estimation in Static Images 1. Research Team Human Upper Body Pose Estimation in Static Images Project Leader: Graduate Students: Prof. Isaac Cohen, Computer Science Mun Wai Lee 2. Statement of Project Goals This goal of this project

More information

Object Detection by 3D Aspectlets and Occlusion Reasoning

Object Detection by 3D Aspectlets and Occlusion Reasoning Object Detection by 3D Aspectlets and Occlusion Reasoning Yu Xiang University of Michigan Silvio Savarese Stanford University In the 4th International IEEE Workshop on 3D Representation and Recognition

More information

EE795: Computer Vision and Intelligent Systems

EE795: Computer Vision and Intelligent Systems EE795: Computer Vision and Intelligent Systems Spring 2012 TTh 17:30-18:45 FDH 204 Lecture 14 130307 http://www.ee.unlv.edu/~b1morris/ecg795/ 2 Outline Review Stereo Dense Motion Estimation Translational

More information

Separating Objects and Clutter in Indoor Scenes

Separating Objects and Clutter in Indoor Scenes Separating Objects and Clutter in Indoor Scenes Salman H. Khan School of Computer Science & Software Engineering, The University of Western Australia Co-authors: Xuming He, Mohammed Bennamoun, Ferdous

More information

Binocular cues to depth PSY 310 Greg Francis. Lecture 21. Depth perception

Binocular cues to depth PSY 310 Greg Francis. Lecture 21. Depth perception Binocular cues to depth PSY 310 Greg Francis Lecture 21 How to find the hidden word. Depth perception You can see depth in static images with just one eye (monocular) Pictorial cues However, motion and

More information

Complex Sensors: Cameras, Visual Sensing. The Robotics Primer (Ch. 9) ECE 497: Introduction to Mobile Robotics -Visual Sensors

Complex Sensors: Cameras, Visual Sensing. The Robotics Primer (Ch. 9) ECE 497: Introduction to Mobile Robotics -Visual Sensors Complex Sensors: Cameras, Visual Sensing The Robotics Primer (Ch. 9) Bring your laptop and robot everyday DO NOT unplug the network cables from the desktop computers or the walls Tuesday s Quiz is on Visual

More information

Can Similar Scenes help Surface Layout Estimation?

Can Similar Scenes help Surface Layout Estimation? Can Similar Scenes help Surface Layout Estimation? Santosh K. Divvala, Alexei A. Efros, Martial Hebert Robotics Institute, Carnegie Mellon University. {santosh,efros,hebert}@cs.cmu.edu Abstract We describe

More information

Lecture 10: Semantic Segmentation and Clustering

Lecture 10: Semantic Segmentation and Clustering Lecture 10: Semantic Segmentation and Clustering Vineet Kosaraju, Davy Ragland, Adrien Truong, Effie Nehoran, Maneekwan Toyungyernsub Department of Computer Science Stanford University Stanford, CA 94305

More information

Detecting motion by means of 2D and 3D information

Detecting motion by means of 2D and 3D information Detecting motion by means of 2D and 3D information Federico Tombari Stefano Mattoccia Luigi Di Stefano Fabio Tonelli Department of Electronics Computer Science and Systems (DEIS) Viale Risorgimento 2,

More information

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Si Chen The George Washington University sichen@gwmail.gwu.edu Meera Hahn Emory University mhahn7@emory.edu Mentor: Afshin

More information

Automatic Photo Popup

Automatic Photo Popup Automatic Photo Popup Derek Hoiem Alexei A. Efros Martial Hebert Carnegie Mellon University What Is Automatic Photo Popup Introduction Creating 3D models from images is a complex process Time-consuming

More information

Decomposing a Scene into Geometric and Semantically Consistent Regions

Decomposing a Scene into Geometric and Semantically Consistent Regions Decomposing a Scene into Geometric and Semantically Consistent Regions Stephen Gould sgould@stanford.edu Richard Fulton rafulton@cs.stanford.edu Daphne Koller koller@cs.stanford.edu IEEE International

More information

arxiv: v1 [cs.cv] 31 Mar 2016

arxiv: v1 [cs.cv] 31 Mar 2016 Object Boundary Guided Semantic Segmentation Qin Huang, Chunyang Xia, Wenchao Zheng, Yuhang Song, Hao Xu and C.-C. Jay Kuo arxiv:1603.09742v1 [cs.cv] 31 Mar 2016 University of Southern California Abstract.

More information

Support surfaces prediction for indoor scene understanding

Support surfaces prediction for indoor scene understanding 2013 IEEE International Conference on Computer Vision Support surfaces prediction for indoor scene understanding Anonymous ICCV submission Paper ID 1506 Abstract In this paper, we present an approach to

More information

Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture David Eigen, Rob Fergus

Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture David Eigen, Rob Fergus Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture David Eigen, Rob Fergus Presented by: Rex Ying and Charles Qi Input: A Single RGB Image Estimate

More information

CS 534: Computer Vision Segmentation and Perceptual Grouping

CS 534: Computer Vision Segmentation and Perceptual Grouping CS 534: Computer Vision Segmentation and Perceptual Grouping Spring 2005 Ahmed Elgammal Dept of Computer Science CS 534 Segmentation - 1 Where are we? Image Formation Human vision Cameras Geometric Camera

More information

Computational Foundations of Cognitive Science

Computational Foundations of Cognitive Science Computational Foundations of Cognitive Science Lecture 16: Models of Object Recognition Frank Keller School of Informatics University of Edinburgh keller@inf.ed.ac.uk February 23, 2010 Frank Keller Computational

More information

Think-Pair-Share. What visual or physiological cues help us to perceive 3D shape and depth?

Think-Pair-Share. What visual or physiological cues help us to perceive 3D shape and depth? Think-Pair-Share What visual or physiological cues help us to perceive 3D shape and depth? [Figure from Prados & Faugeras 2006] Shading Focus/defocus Images from same point of view, different camera parameters

More information

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia Deep learning for dense per-pixel prediction Chunhua Shen The University of Adelaide, Australia Image understanding Classification error Convolution Neural Networks 0.3 0.2 0.1 Image Classification [Krizhevsky

More information

Multi-Class Segmentation with Relative Location Prior

Multi-Class Segmentation with Relative Location Prior Multi-Class Segmentation with Relative Location Prior Stephen Gould, Jim Rodgers, David Cohen, Gal Elidan, Daphne Koller Department of Computer Science, Stanford University International Journal of Computer

More information

Definition, Detection, and Evaluation of Meeting Events in Airport Surveillance Videos

Definition, Detection, and Evaluation of Meeting Events in Airport Surveillance Videos Definition, Detection, and Evaluation of Meeting Events in Airport Surveillance Videos Sung Chun Lee, Chang Huang, and Ram Nevatia University of Southern California, Los Angeles, CA 90089, USA sungchun@usc.edu,

More information

Visuelle Perzeption für Mensch- Maschine Schnittstellen

Visuelle Perzeption für Mensch- Maschine Schnittstellen Visuelle Perzeption für Mensch- Maschine Schnittstellen Vorlesung, WS 2009 Prof. Dr. Rainer Stiefelhagen Dr. Edgar Seemann Institut für Anthropomatik Universität Karlsruhe (TH) http://cvhci.ira.uka.de

More information

TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK

TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK 1 Po-Jen Lai ( 賴柏任 ), 2 Chiou-Shann Fuh ( 傅楸善 ) 1 Dept. of Electrical Engineering, National Taiwan University, Taiwan 2 Dept.

More information

12/3/2009. What is Computer Vision? Applications. Application: Assisted driving Pedestrian and car detection. Application: Improving online search

12/3/2009. What is Computer Vision? Applications. Application: Assisted driving Pedestrian and car detection. Application: Improving online search Introduction to Artificial Intelligence V22.0472-001 Fall 2009 Lecture 26: Computer Vision Rob Fergus Dept of Computer Science, Courant Institute, NYU Slides from Andrew Zisserman What is Computer Vision?

More information

CRF Based Point Cloud Segmentation Jonathan Nation

CRF Based Point Cloud Segmentation Jonathan Nation CRF Based Point Cloud Segmentation Jonathan Nation jsnation@stanford.edu 1. INTRODUCTION The goal of the project is to use the recently proposed fully connected conditional random field (CRF) model to

More information

Viewpoint Invariant Features from Single Images Using 3D Geometry

Viewpoint Invariant Features from Single Images Using 3D Geometry Viewpoint Invariant Features from Single Images Using 3D Geometry Yanpeng Cao and John McDonald Department of Computer Science National University of Ireland, Maynooth, Ireland {y.cao,johnmcd}@cs.nuim.ie

More information

Methods for Automatically Modeling and Representing As-built Building Information Models

Methods for Automatically Modeling and Representing As-built Building Information Models NSF GRANT # CMMI-0856558 NSF PROGRAM NAME: Automating the Creation of As-built Building Information Models Methods for Automatically Modeling and Representing As-built Building Information Models Daniel

More information

Shadows in the graphics pipeline

Shadows in the graphics pipeline Shadows in the graphics pipeline Steve Marschner Cornell University CS 569 Spring 2008, 19 February There are a number of visual cues that help let the viewer know about the 3D relationships between objects

More information

Lecture 6: Edge Detection

Lecture 6: Edge Detection #1 Lecture 6: Edge Detection Saad J Bedros sbedros@umn.edu Review From Last Lecture Options for Image Representation Introduced the concept of different representation or transformation Fourier Transform

More information

(Refer Slide Time 00:17) Welcome to the course on Digital Image Processing. (Refer Slide Time 00:22)

(Refer Slide Time 00:17) Welcome to the course on Digital Image Processing. (Refer Slide Time 00:22) Digital Image Processing Prof. P. K. Biswas Department of Electronics and Electrical Communications Engineering Indian Institute of Technology, Kharagpur Module Number 01 Lecture Number 02 Application

More information

Learning visual odometry with a convolutional network

Learning visual odometry with a convolutional network Learning visual odometry with a convolutional network Kishore Konda 1, Roland Memisevic 2 1 Goethe University Frankfurt 2 University of Montreal konda.kishorereddy@gmail.com, roland.memisevic@gmail.com

More information

CS 231A Computer Vision (Winter 2014) Problem Set 3

CS 231A Computer Vision (Winter 2014) Problem Set 3 CS 231A Computer Vision (Winter 2014) Problem Set 3 Due: Feb. 18 th, 2015 (11:59pm) 1 Single Object Recognition Via SIFT (45 points) In his 2004 SIFT paper, David Lowe demonstrates impressive object recognition

More information

Combining PGMs and Discriminative Models for Upper Body Pose Detection

Combining PGMs and Discriminative Models for Upper Body Pose Detection Combining PGMs and Discriminative Models for Upper Body Pose Detection Gedas Bertasius May 30, 2014 1 Introduction In this project, I utilized probabilistic graphical models together with discriminative

More information

Project 4 Results. Representation. Data. Learning. Zachary, Hung-I, Paul, Emanuel. SIFT and HoG are popular and successful.

Project 4 Results. Representation. Data. Learning. Zachary, Hung-I, Paul, Emanuel. SIFT and HoG are popular and successful. Project 4 Results Representation SIFT and HoG are popular and successful. Data Hugely varying results from hard mining. Learning Non-linear classifier usually better. Zachary, Hung-I, Paul, Emanuel Project

More information

Cost-alleviative Learning for Deep Convolutional Neural Network-based Facial Part Labeling

Cost-alleviative Learning for Deep Convolutional Neural Network-based Facial Part Labeling [DOI: 10.2197/ipsjtcva.7.99] Express Paper Cost-alleviative Learning for Deep Convolutional Neural Network-based Facial Part Labeling Takayoshi Yamashita 1,a) Takaya Nakamura 1 Hiroshi Fukui 1,b) Yuji

More information

Advance Shadow Edge Detection and Removal (ASEDR)

Advance Shadow Edge Detection and Removal (ASEDR) International Journal of Computational Intelligence Research ISSN 0973-1873 Volume 13, Number 2 (2017), pp. 253-259 Research India Publications http://www.ripublication.com Advance Shadow Edge Detection

More information

Automatic Tracking of Moving Objects in Video for Surveillance Applications

Automatic Tracking of Moving Objects in Video for Surveillance Applications Automatic Tracking of Moving Objects in Video for Surveillance Applications Manjunath Narayana Committee: Dr. Donna Haverkamp (Chair) Dr. Arvin Agah Dr. James Miller Department of Electrical Engineering

More information

Stereo: Disparity and Matching

Stereo: Disparity and Matching CS 4495 Computer Vision Aaron Bobick School of Interactive Computing Administrivia PS2 is out. But I was late. So we pushed the due date to Wed Sept 24 th, 11:55pm. There is still *no* grace period. To

More information

Computer and Machine Vision

Computer and Machine Vision Computer and Machine Vision Lecture Week 4 Part-2 February 5, 2014 Sam Siewert Outline of Week 4 Practical Methods for Dealing with Camera Streams, Frame by Frame and De-coding/Re-encoding for Analysis

More information

Miniature faking. In close-up photo, the depth of field is limited.

Miniature faking. In close-up photo, the depth of field is limited. Miniature faking In close-up photo, the depth of field is limited. http://en.wikipedia.org/wiki/file:jodhpur_tilt_shift.jpg Miniature faking Miniature faking http://en.wikipedia.org/wiki/file:oregon_state_beavers_tilt-shift_miniature_greg_keene.jpg

More information

LSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University

LSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University LSTM and its variants for visual recognition Xiaodan Liang xdliang328@gmail.com Sun Yat-sen University Outline Context Modelling with CNN LSTM and its Variants LSTM Architecture Variants Application in

More information

Removing Shadows from Images

Removing Shadows from Images Removing Shadows from Images Zeinab Sadeghipour Kermani School of Computing Science Simon Fraser University Burnaby, BC, V5A 1S6 Mark S. Drew School of Computing Science Simon Fraser University Burnaby,

More information

Room Reconstruction from a Single Spherical Image by Higher-order Energy Minimization

Room Reconstruction from a Single Spherical Image by Higher-order Energy Minimization Room Reconstruction from a Single Spherical Image by Higher-order Energy Minimization Kosuke Fukano, Yoshihiko Mochizuki, Satoshi Iizuka, Edgar Simo-Serra, Akihiro Sugimoto, and Hiroshi Ishikawa Waseda

More information

COS Lecture 10 Autonomous Robot Navigation

COS Lecture 10 Autonomous Robot Navigation COS 495 - Lecture 10 Autonomous Robot Navigation Instructor: Chris Clark Semester: Fall 2011 1 Figures courtesy of Siegwart & Nourbakhsh Control Structure Prior Knowledge Operator Commands Localization

More information

LEARNING BOUNDARIES WITH COLOR AND DEPTH. Zhaoyin Jia, Andrew Gallagher, Tsuhan Chen

LEARNING BOUNDARIES WITH COLOR AND DEPTH. Zhaoyin Jia, Andrew Gallagher, Tsuhan Chen LEARNING BOUNDARIES WITH COLOR AND DEPTH Zhaoyin Jia, Andrew Gallagher, Tsuhan Chen School of Electrical and Computer Engineering, Cornell University ABSTRACT To enable high-level understanding of a scene,

More information

OCCLUSION BOUNDARIES ESTIMATION FROM A HIGH-RESOLUTION SAR IMAGE

OCCLUSION BOUNDARIES ESTIMATION FROM A HIGH-RESOLUTION SAR IMAGE OCCLUSION BOUNDARIES ESTIMATION FROM A HIGH-RESOLUTION SAR IMAGE Wenju He, Marc Jäger, and Olaf Hellwich Berlin University of Technology FR3-1, Franklinstr. 28, 10587 Berlin, Germany {wenjuhe, jaeger,

More information

Single-view 3D Reconstruction

Single-view 3D Reconstruction Single-view 3D Reconstruction 10/12/17 Computational Photography Derek Hoiem, University of Illinois Some slides from Alyosha Efros, Steve Seitz Notes about Project 4 (Image-based Lighting) You can work

More information

Outdoor Scene Reconstruction from Multiple Image Sequences Captured by a Hand-held Video Camera

Outdoor Scene Reconstruction from Multiple Image Sequences Captured by a Hand-held Video Camera Outdoor Scene Reconstruction from Multiple Image Sequences Captured by a Hand-held Video Camera Tomokazu Sato, Masayuki Kanbara and Naokazu Yokoya Graduate School of Information Science, Nara Institute

More information

Important concepts in binocular depth vision: Corresponding and non-corresponding points. Depth Perception 1. Depth Perception Part II

Important concepts in binocular depth vision: Corresponding and non-corresponding points. Depth Perception 1. Depth Perception Part II Depth Perception Part II Depth Perception 1 Binocular Cues to Depth Depth Information Oculomotor Visual Accomodation Convergence Binocular Monocular Static Cues Motion Parallax Perspective Size Interposition

More information

Introduction to 3D Concepts

Introduction to 3D Concepts PART I Introduction to 3D Concepts Chapter 1 Scene... 3 Chapter 2 Rendering: OpenGL (OGL) and Adobe Ray Tracer (ART)...19 1 CHAPTER 1 Scene s0010 1.1. The 3D Scene p0010 A typical 3D scene has several

More information

CS4495/6495 Introduction to Computer Vision

CS4495/6495 Introduction to Computer Vision CS4495/6495 Introduction to Computer Vision 9C-L1 3D perception Some slides by Kelsey Hawkins Motivation Why do animals, people & robots need vision? To detect and recognize objects/landmarks Is that a

More information

Overview. Related Work Tensor Voting in 2-D Tensor Voting in 3-D Tensor Voting in N-D Application to Vision Problems Stereo Visual Motion

Overview. Related Work Tensor Voting in 2-D Tensor Voting in 3-D Tensor Voting in N-D Application to Vision Problems Stereo Visual Motion Overview Related Work Tensor Voting in 2-D Tensor Voting in 3-D Tensor Voting in N-D Application to Vision Problems Stereo Visual Motion Binary-Space-Partitioned Images 3-D Surface Extraction from Medical

More information

Combining Monocular and Stereo Depth Cues

Combining Monocular and Stereo Depth Cues Combining Monocular and Stereo Depth Cues Fraser Cameron December 16, 2005 Abstract A lot of work has been done extracting depth from image sequences, and relatively less has been done using only single

More information

Focusing Attention on Visual Features that Matter

Focusing Attention on Visual Features that Matter TSAI, KUIPERS: FOCUSING ATTENTION ON VISUAL FEATURES THAT MATTER 1 Focusing Attention on Visual Features that Matter Grace Tsai gstsai@umich.edu Benjamin Kuipers kuipers@umich.edu Electrical Engineering

More information

Gaze interaction (2): models and technologies

Gaze interaction (2): models and technologies Gaze interaction (2): models and technologies Corso di Interazione uomo-macchina II Prof. Giuseppe Boccignone Dipartimento di Scienze dell Informazione Università di Milano boccignone@dsi.unimi.it http://homes.dsi.unimi.it/~boccignone/l

More information

Perspective and vanishing points

Perspective and vanishing points Last lecture when I discussed defocus blur and disparities, I said very little about neural computation. Instead I discussed how blur and disparity are related to each other and to depth in particular,

More information

Automatic Colorization of Grayscale Images

Automatic Colorization of Grayscale Images Automatic Colorization of Grayscale Images Austin Sousa Rasoul Kabirzadeh Patrick Blaes Department of Electrical Engineering, Stanford University 1 Introduction ere exists a wealth of photographic images,

More information

Basic distinctions. Definitions. Epstein (1965) familiar size experiment. Distance, depth, and 3D shape cues. Distance, depth, and 3D shape cues

Basic distinctions. Definitions. Epstein (1965) familiar size experiment. Distance, depth, and 3D shape cues. Distance, depth, and 3D shape cues Distance, depth, and 3D shape cues Pictorial depth cues: familiar size, relative size, brightness, occlusion, shading and shadows, aerial/ atmospheric perspective, linear perspective, height within image,

More information