IDE-3D: Predicting Indoor Depth Utilizing Geometric and Monocular Cues
|
|
- Noreen Brooks
- 5 years ago
- Views:
Transcription
1 2016 International Conference on Computational Science and Computational Intelligence IDE-3D: Predicting Indoor Depth Utilizing Geometric and Monocular Cues Taylor Ripke Department of Computer Science Central Michigan University Mount Pleasant, MI Roger Lee Department of Computer Science Central Michigan University Mount Pleasant, MI Abstract Depth estimation and spatial awareness given a single monocular image is a challenging task for a computer as depth information is not retained when the 3D world is projected onto a 2D plane. Therefore, we must combine our prior knowledge with other monocular cues present in the image, such as occlusion, texture variations, and shadows to understand the depth of the image. In this paper, we present IDE-3D (Indoor Depth Estimation 3D), a tool designed to generate a box model depth map of an indoor environment. The program combines a variety of input from the image, including 3D geometric shape estimation utilizing local and global scene structures, pixel analysis, and outlier removal, to produce a depthmap of the image with acceptable results. We generate a box model of the room and apply our best fit algorithm to calculate the predicted depth of the room by analyzing the horizontal plane and apply a depthmap gradient to it. The current application shows a successful implementation of our best fit algorithm in the controlled experiment by incorporating a box model and texture gradient approach. Future work will include estimating the same depth using an object s shift relative to the focus. Keywords-Image Segmentation, Depth Perception I. INTRODUCTION Throughout the years, research in computer vision has expanded into numerous subfields, including: object recognition, neural networks, and depth estimation. These areas provide the foundation for many applications used today; however, there is still much work to be done. In this paper, we investigate an approach to estimating depth in a single, monocular image utilizing pixel and geometrical analysis. Given several images, it is possible to accurately measure the depth of an image. Computers can mimic the triangulation and overlay done by humans to measure the depth of an object using two cameras. However, the ability to measure depth given only a single image is a challenging task as there are very few monocular cues present in the image. Therefore, it is important to pay attention to the other monocular cues in the image, such as: shading, perspective, size familiarity, and occlusion. Figure 1. Computed depth map from our best fit algorithm using the box model approach. For this project, we had motivation from prior research in the field [5,6,7,8] and want to build upon previous techniques and contribute another method to enhance depth prediction in monocular images. To do this, our project is divided into two segments. In the first segment, we will be generating a box model of a room utilizing our best fit algorithm after removing the outliers and objects from the calculations to build the box model of the room. The approach relies on finding the ground/wall boundary as shown in Fig 1. Given a threshold, the boundaries are determined by ignoring everything that does not resemble a line. Our algorithm does not perform well where the boundary is not present, as discussed in the results. However, in cases where the algorithm can find the boundary, it separates the room into distinct regions where it will apply the second phase of the algorithm. The current program is not designed to account for objects /16 $ IEEE DOI /CSCI
2 During the second phase of the algorithm, we apply a gradient technique to illustrate the depth of the room as accurately as possible. It does this by exploiting the geometry of the scene and finding similar patterns of pixels that form the boundaries of objects. As observed in Fig 1, the green to blue texture on the right wall shows the progression of depth in the image. The wall in the back of the image is a solid blue, showing that it has been classified as being the same distance from the camera. Finally, the increasing progression of yellow on the ground also shows the progression of depth. We test our program by providing it with various indoor images that we gathered ourselves. We show that with the current state of the program, it effectively produces box models of the rooms and applies a gradient to show the progression of depth in the scene. Rather than considering applying a gradient to the image as a whole, we segmented it using the box model approach and applied the gradients on the surfaces produced by the algorithm. In the next section, we will discuss the previous work done in the field and the impact it has had on current research. A variety of techniques will be presented that illustrate numerous approaches that can be used to estimate depth in images. A current trend in research suggests a strong need to exploit the higher level information in the scene, rather than focusing on local cues alone. The ability to understand what a group of pixels represent is more valuable than evaluating each pixel individually. However, it is important to consider all monocular cues present as they provide crucial information about the image. II. BACKGROUND AND RELATED WORK As stated previously, estimating depth in a single image is a challenging task as depth information is lost when the image is created. Regardless, humans have the capability to perceive depth in a monocular image using the information they have gained throughout their lifetime. Therefore, it should not be unreasonable to think that one day computers may do the same. Perhaps one of the biggest problems in computer vision is effectively and efficiency recognizing and classifying objects. A computer may make the mistake of thinking that a person up close is taller than a skyscraper far away if the picture is taken at the right angle. However, it is possible to produce an accurate depth map utilizing other monocular cues present in the image. Previous techniques will be presented that show the numerous ways depth can be perceived in a single image. Most approaches taken recently have focused on depth at the local scale [1,2,3,4]. While their results have been successful, it is important to use high level information from the global structure of the scene. Individual pixels and local information alone is not enough to determine the context of the image. For example, if a computer was shown an array of blue pixels, it would have a hard time identifying whether it is viewing the sky, the ocean, a river, or perhaps a blueberry. Rather, if we can recognize that the image is being taken outside, we can infer that the blue is the sky if it is in the upper half of the image. A. Zhuo et al. Most approaches taken in the past have focused on the local cues rather than exploiting the global structure of the image. Zhuo et al. developed a hierarchical representation of the scene, which combines the local depth with mid-level and global scene structures. They were able to formulate a single image depth estimation in a graphical model by encoding the interactions across different layers of their hierarchy. By doing so, they were able to produce detailed depth estimates and get higher-level information from the scene [5]. After conducting their experiments, they found that the mid-level structures provided the most to the final accuracy of their model. In the future they plan to use semantic labels as a part of the depth estimation calculations [5]. That should significantly increase the accuracy of the results as utilizing their high level information about the scene, they can classify difference objects and recognize that they sky is far away. B. Liu et al. Utilizing a different approach, Liu et al. used a pool of images for which the depth is known to help them calculate the depth in an unknown image. They treated the task as a discrete-continuous optimization problem, where the discrete variables represented the relationship between neighboring superpixels and the continuous variables encoded the depth of the superpixels. By performing inference in a graphical model utilizing the particle belief propagation, they found a solution to the discretecontinuous optimization problem. By using the images where the depth is known, they can use them to compute the unary potentials in the graphical model [6]. Similar to Zhuo et. al., they plan to incorporate the use of semantic labeling in their estimations. C. Hedau et al. Using the geometric information in a scene and the geometric representation of an object, it is possible to produce a detector for that object. The detector they built unifies contextual and geometric information to produce a probabilistic model of the scene. The locations of the walls and the floor in the image can refine the estimation for the 3D object. They show that it is possible to derive a 3D interpretation of the location of the object from a 2D image [7]. In addition, Hedau et. al. also considered the challenge of recovering spatial layout of indoor scenes given a monocular image. In most rooms, the distinct boundary that
3 marks the division between the floor and wall is partially or entirely occluded by furniture or objects in the room. Most algorithms used to identify the geometric context of the room rely on finding the ground-wall boundary. Rather, they employ a structured learning algorithm to find the parameters for their algorithm based on global perspective cues [8]. The algorithm employed in our research currently relies on finding the ground-wall boundary to produce a depth map. As shown in the results, we show that our model has deficiencies when it cannot find the ground wall. D. Saxena et al. Researchers have been able to recreate the sense of depth perception in computers when analyzing an image to a certain degree of success under certain circumstances. As challenging as optical illusions are to humans, they are even more difficult for a computer. Some approaches, as taken by Saxena et al. utilized a supervised training approach as they collected a training set of monocular images (indoor and outdoor environments such as trees, buildings, and sidewalks) and their corresponding ground-truth depth maps [3]. Using a Markov Random Field which incorporates multiscale local- and global- image features, they were able to model the depths and relation between depths at different points in the image [2]. Their approach combines monocular and stereo (triangulation) cues to estimate depth to show improvements over utilizing only monocular or stereo cues. E. Eigen et al. Other work has involved using deep network stacks to predict depth. Eigen et al. describes how they employed two deep network stacks to make a global prediction and another that refines the prediction locally. It is important to note that they applied a scale-invariant error to help measure depth relations rather than the scale itself [4]. After applying the training sets, their project achieved much success on both NYU Depth and KITTI without the need for superpixelation. In the next section, we will describe the methodology and implementation of our model as inspired by previous research. We will begin with a brief overview of the system followed by a detailed look at the approach we took. In the experiment section, we will describe how we achieved the results as shown in Fig 4 and further areas for improvement. Finally, we will discuss how we can improve our program in the future. III. METHODOLOGY A. Overview Contrary to computers, humans have a remarkable capability of perceiving depth, even if one eye is not involved. Various cues such as shading, perspective, size familiarity and occlusion are important in depth perception. The most powerful form of depth perception a human uses is stereo disparity. Each eye sends an image to the brain which combines them to give the sense of 3D. However, some individuals are born without stereo vision. Rather, their brain compensates for this through active spatial creativity. In a sense, their brain is overcompensating by paying attention to other visual cues present. The approaches outlined previously present successful research concerning the problem of depth perception given a single monocular image. The approach we explored used a geometric representation of the image utilizing a box model. B. Our Approach Existing systems utilize various techniques to recover depth information lost when the 3D world is portrayed on a 2D plane with the most prominent focus being geometric information. Arguably the most difficult task is object recognition. A computer views an image as an array of pixels. Therefore, it makes logical sense to try and find patterns that resemble what we are trying to find. We would expect in most situations to find a horizontal row of similarly colored pixels that represent the floor/wall boundary. However, object occlusion can corrupt the algorithms calculations unless it is accounted for. Specifically, we are interested in developing a best fit algorithm that can detect the boundaries of the room after the room has been analyzed. We want to be able to find the boundaries of the room and calculate a line of best fit, or one that has the most pixels of the same color within a certain threshold. The algorithm itself cannot accomplish this task alone. Instead, we must try and remove the existential information from the scene. C. Proposed System For our algorithm to be able to estimate the depth of an image, some preprocessing must be done for maximum optimization and accuracy. First, we apply a Sobel operator identify the edges. Once the image s edges have been identified, we run another an algorithm that identifies edges based specifically on similar color. The outliers and objects deemed to not be lines are removed by identifying regions of curvature. An outlier in this context can be described as a pixel or a group of pixels that do not conform or represent an object and have no meaning. For example, a part of the wall may have a marker stain or a nail hole in it. These will be identified by the algorithm, but they are not essential to the larger picture. Following the successful completion of the outlier removal algorithm, we then scan the image to check for connectivity. Given some pixel, we want to check if that pixel is a part of a larger group of pixels. For example, if there was a picture frame on the wall, and our algorithm targeted the upper left pixel of the frame, we want to check
4 if there are more pixels around that pixel and if they form any distinct shape. If there are not any pixels or not enough pixels within a given radius, then we remove them from the calculations. More specifically, we want to identify patterns of pixels that are round or not straight, as they would not be considered a boundary. While checking, each pixel is stored in a temporary array in case the algorithm believes that it is not part of the larger picture so that they can all be removed easily. Next, we apply our best fit algorithm which generates boundary lines each time it finds a pixel identified as a potential boundary. Each generation, the slope of the line is modified to see if that generation would be better than the previous. Once all boundaries have been identified, a gradient is applied to the image to show the gradual transition in depth at any point. Figure 2. Pseudocode for Best Fit Algorithm D. Implementation The first step was to apply a basic Sobel edge detection to identify the edges. Following, we grouped pixels into distinct regions based on color similarity. Next, we removed the outliers from the calculation. Utilizing a predetermined value, such as 10, we can tell the program to remove any pixels that do not have a connection of 10 or greater. This greatly reduces the amount of pixels that need to be scanned by the best fit algorithm. This threshold can be modified to produce different results. Finally, the most prominent feature of our program is the best fit algorithm. As shown in Fig 2, the best fit algorithm loops through each pixel of the image until it comes across one that has a connectivity of greater than 10. In a 240 generation while loop, the slope of the line is changed and the pixels generated are overlaid on top of the pixels remaining after the outlier removal algorithm. The program keeps a running tally of how many pixels are on top of the others. The slope with the best fit is chosen and that is where a room boundary is placed. As you can see in Fig 4, the algorithm accurately finds the boundaries of the room when the horizontal axis is found. E. Experiment The experiment was conducted by capturing four indoor static images and then letting our program evaluate them. In this controlled experiment, we took images that had a ground/wall boundary defined, as well as some that didn t. The program first identified the color variations in the image and shaded them magenta if the adjacent pixels exceeded a predetermined threshold. This allowed the program to identify the boundaries between objects with relative certainty. Next, we applied an outlier removal algorithm that checked the connectivity of the magenta pixels to determine whether the object was an object or boundary, such as the ground/wall border. The outlier removal algorithm scanned through the image sequentially until it found a magenta pixel. When it found one, it would check the connectivity by scanning the surrounding pixels and seeing whether they were linear or if they had a curve. The initial threshold was thirty pixels. If those thirty pixels resembled a linear line, they were kept. Otherwise, if the pixels did not have those criteria, they were set back to the original color of the image. This helped narrow down what the algorithm had to process on. Next, our best fit algorithm was applied to calculate the most likely spots for walls. The small excerpt of pseudocode in Fig 2 shows the core process of the algorithm. The idea before performing the best fit algorithm is that if there are magenta pixels remaining before this algorithm, then there is a possibility that this is a boundary. The algorithm begins on a magenta pixel and calculates 240 variations of lines that the pixel could progress upon. A simple counter variable is used to keep track of the line where most of the magenta pixels fall upon. For every pixel we calculated this and created the lines. Once the best fit algorithm identified a line, it removed them from the next calculation to have the ability to calculate more lines
5 Once the boundaries of the floor and walls were determined, the box model was created. Next, we applied a texture gradient pattern to the box model to produce a depth map of the room. The color shifts from the outer edges of the image towards the center gradually, showing the progression in depth. F. Results The current program can in most cases identify the boundaries of the room and correctly produce a box model. Although the program is currently limited in the controlled experiment, further improvements can be made to increase the accuracy and the approach that it uses to model depth. Please refer to Fig 3 and 4 to view the results of the experiment. The limiting factor in our current method is the ability to define the ground wall boundary. As shown in Fig 4, the algorithm appears to correctly identify where the ground meets the wall; however, we believe that this can be improved further. The texture gradient did not apply correctly in this scenario unlike in the previous examples displayed. Figure 3. Comparison of common features shared between programs Initially we wanted to compute a depth and assign each pixel or group of pixels a depth value that would say how far from the camera they were. We conducted an initial experiment to measure the depth at certain distances from the camera computed by our program but soon found inaccuracies at different angles and depths. These results were therefore not included in this study as further development is necessary to improve its accuracy. IV. FUTURE WORK Currently we are developing a method to more accurately predict edges, specifically by analyzing focus. The Sobel edge detection algorithm is an efficient algorithm; however, it has difficulties when considering objects of similar color. For example, a distinct different is examining the depth of two white pieces of paper at different distances from the camera. Given other monocular cues such as texture and occlusion, the most prominent method of object depth is binocular vision. We recognize that the closer our eyes are together, the closer an object is. Similarly, the further away an object is, our eyes separate. When concentrating on an object in the fovea up close, objects farther away will be duplicated. An example can be demonstrated by holding your nose close to your face and focusing on it and hold your other thumb at arm s length away. The further thumb will be duplicated. When we go to focus on the further thumb, the closer one is duplicated in our periphery. In particular, the research discussed in this paper highlights an approach to producing a depth map using a box model generated from a single image. For future applications, we are developing a system where the difference of a shifted object will help us determine depth more accurately. Specifically, two cameras side by side will take an image focused on an object and the difference in the shift from objects in the foreground and background give us information on their depth. For example, a similar experiment will be run comparing the results of our monocular to the stereo vision. The depth of an object or room can also be inferred by evaluating multiple duplicate edge lines and object variance. An object farther away moves a smaller amount when blinking your eyes compared to when the object is up close. At different depths in a room, an object s shift can be compared via two different cameras. We would expect objects closer to move faster in respect to another object farther away. We also expect object duplicates (the same object viewed at different angles) to become separated by a greater difference over time. A future version of this application will estimate depth information of objects in an environment directly related to the focus. V. CONCLUSION The purpose of this program was to generate a box model of the room given the geometric cues present and apply and gradient to the image to show the progression of depth in the image. As shown in Fig 4, the results of our program were successful in the controlled experiment. However, as discussed previously, there are limitations to it. The current application was developed to quickly estimate the depth of a room. Further research is needed to improve the accuracy of
6 this program and will incorporate the methods discussed in the future work. Figure 4. Results of experiment REFERENCES [8] V. Hedau, D. Hoiem, and D. Forsyth.Recovering the spatial layout of cluttered rooms. In Proceedings of ICCV, [1] B. Liu, S. Gould, and D. Koller. Single Image Depth Estimation from Predicted Semantic Labels. Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR). [2] A. Saxena, S.H. Chung, and A.Y. Ng. 3-D depth reconstruction from a single still image. International journal of Computer Vision, 76(1):53-69, 2008, 268 [3] A. Saxena, M.Sun, and A.Y. Ng. Learning 3-D Scene Structure from a Single Still Image. In Proceedings of IEEE International Conference on Computer Vision (ICCV), [4] D. Eigen, C. Puhrsch, and R. Fergus. Depth map prediction from a single image using a multi-scale deep network. In: NIPS (2014) [5] W. Zhuo, M. Salzmann, X. He, M. Liu Indoor Scene Structure Analysis for Single Image Depth Estimation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015). [6] M. Liu, M. Salzmann, X. He. Discrete-Continuous Depth Estimation from a Single Image. Proceedings of the IEEE Conference on Computer Vision and Pattern Recgonition. (2014) [7] Hedau, D. Hoiem, and D. Forsyth.Thinking inside the box: Using appearance models and context based on room geometry. In Proceedings of ECCV, 2010, pg
Optimizing Monocular Cues for Depth Estimation from Indoor Images
Optimizing Monocular Cues for Depth Estimation from Indoor Images Aditya Venkatraman 1, Sheetal Mahadik 2 1, 2 Department of Electronics and Telecommunication, ST Francis Institute of Technology, Mumbai,
More informationLearning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009
Learning and Inferring Depth from Monocular Images Jiyan Pan April 1, 2009 Traditional ways of inferring depth Binocular disparity Structure from motion Defocus Given a single monocular image, how to infer
More informationCorrecting User Guided Image Segmentation
Correcting User Guided Image Segmentation Garrett Bernstein (gsb29) Karen Ho (ksh33) Advanced Machine Learning: CS 6780 Abstract We tackle the problem of segmenting an image into planes given user input.
More informationContexts and 3D Scenes
Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem Administrative stuffs Final project presentation Nov 30 th 3:30 PM 4:45 PM Grading Three senior graders (30%)
More informationSTRUCTURAL EDGE LEARNING FOR 3-D RECONSTRUCTION FROM A SINGLE STILL IMAGE. Nan Hu. Stanford University Electrical Engineering
STRUCTURAL EDGE LEARNING FOR 3-D RECONSTRUCTION FROM A SINGLE STILL IMAGE Nan Hu Stanford University Electrical Engineering nanhu@stanford.edu ABSTRACT Learning 3-D scene structure from a single still
More informationContexts and 3D Scenes
Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem Administrative stuffs Final project presentation Dec 1 st 3:30 PM 4:45 PM Goodwin Hall Atrium Grading Three
More informationDepth. Common Classification Tasks. Example: AlexNet. Another Example: Inception. Another Example: Inception. Depth
Common Classification Tasks Recognition of individual objects/faces Analyze object-specific features (e.g., key points) Train with images from different viewing angles Recognition of object classes Analyze
More information3D Spatial Layout Propagation in a Video Sequence
3D Spatial Layout Propagation in a Video Sequence Alejandro Rituerto 1, Roberto Manduchi 2, Ana C. Murillo 1 and J. J. Guerrero 1 arituerto@unizar.es, manduchi@soe.ucsc.edu, acm@unizar.es, and josechu.guerrero@unizar.es
More informationContext. CS 554 Computer Vision Pinar Duygulu Bilkent University. (Source:Antonio Torralba, James Hays)
Context CS 554 Computer Vision Pinar Duygulu Bilkent University (Source:Antonio Torralba, James Hays) A computer vision goal Recognize many different objects under many viewing conditions in unconstrained
More informationPractice Exam Sample Solutions
CS 675 Computer Vision Instructor: Marc Pomplun Practice Exam Sample Solutions Note that in the actual exam, no calculators, no books, and no notes allowed. Question 1: out of points Question 2: out of
More informationData-driven Depth Inference from a Single Still Image
Data-driven Depth Inference from a Single Still Image Kyunghee Kim Computer Science Department Stanford University kyunghee.kim@stanford.edu Abstract Given an indoor image, how to recover its depth information
More informationCS 4758: Automated Semantic Mapping of Environment
CS 4758: Automated Semantic Mapping of Environment Dongsu Lee, ECE, M.Eng., dl624@cornell.edu Aperahama Parangi, CS, 2013, alp75@cornell.edu Abstract The purpose of this project is to program an Erratic
More informationLast update: May 4, Vision. CMSC 421: Chapter 24. CMSC 421: Chapter 24 1
Last update: May 4, 200 Vision CMSC 42: Chapter 24 CMSC 42: Chapter 24 Outline Perception generally Image formation Early vision 2D D Object recognition CMSC 42: Chapter 24 2 Perception generally Stimulus
More informationCS 558: Computer Vision 13 th Set of Notes
CS 558: Computer Vision 13 th Set of Notes Instructor: Philippos Mordohai Webpage: www.cs.stevens.edu/~mordohai E-mail: Philippos.Mordohai@stevens.edu Office: Lieb 215 Overview Context and Spatial Layout
More informationDiscrete Optimization of Ray Potentials for Semantic 3D Reconstruction
Discrete Optimization of Ray Potentials for Semantic 3D Reconstruction Marc Pollefeys Joined work with Nikolay Savinov, Christian Haene, Lubor Ladicky 2 Comparison to Volumetric Fusion Higher-order ray
More informationA Survey of Light Source Detection Methods
A Survey of Light Source Detection Methods Nathan Funk University of Alberta Mini-Project for CMPUT 603 November 30, 2003 Abstract This paper provides an overview of the most prominent techniques for light
More informationPerceptual Grouping from Motion Cues Using Tensor Voting
Perceptual Grouping from Motion Cues Using Tensor Voting 1. Research Team Project Leader: Graduate Students: Prof. Gérard Medioni, Computer Science Mircea Nicolescu, Changki Min 2. Statement of Project
More informationSegmentation and Tracking of Partial Planar Templates
Segmentation and Tracking of Partial Planar Templates Abdelsalam Masoud William Hoff Colorado School of Mines Colorado School of Mines Golden, CO 800 Golden, CO 800 amasoud@mines.edu whoff@mines.edu Abstract
More informationCS395T paper review. Indoor Segmentation and Support Inference from RGBD Images. Chao Jia Sep
CS395T paper review Indoor Segmentation and Support Inference from RGBD Images Chao Jia Sep 28 2012 Introduction What do we want -- Indoor scene parsing Segmentation and labeling Support relationships
More informationCS 223B Computer Vision Problem Set 3
CS 223B Computer Vision Problem Set 3 Due: Feb. 22 nd, 2011 1 Probabilistic Recursion for Tracking In this problem you will derive a method for tracking a point of interest through a sequence of images.
More informationPerception, Part 2 Gleitman et al. (2011), Chapter 5
Perception, Part 2 Gleitman et al. (2011), Chapter 5 Mike D Zmura Department of Cognitive Sciences, UCI Psych 9A / Psy Beh 11A February 27, 2014 T. M. D'Zmura 1 Visual Reconstruction of a Three-Dimensional
More informationDEPTH AND GEOMETRY FROM A SINGLE 2D IMAGE USING TRIANGULATION
2012 IEEE International Conference on Multimedia and Expo Workshops DEPTH AND GEOMETRY FROM A SINGLE 2D IMAGE USING TRIANGULATION Yasir Salih and Aamir S. Malik, Senior Member IEEE Centre for Intelligent
More informationSegmentation. Bottom up Segmentation Semantic Segmentation
Segmentation Bottom up Segmentation Semantic Segmentation Semantic Labeling of Street Scenes Ground Truth Labels 11 classes, almost all occur simultaneously, large changes in viewpoint, scale sky, road,
More informationCS 231A Computer Vision (Fall 2012) Problem Set 3
CS 231A Computer Vision (Fall 2012) Problem Set 3 Due: Nov. 13 th, 2012 (2:15pm) 1 Probabilistic Recursion for Tracking (20 points) In this problem you will derive a method for tracking a point of interest
More informationA Simple Vision System
Chapter 1 A Simple Vision System 1.1 Introduction In 1966, Seymour Papert wrote a proposal for building a vision system as a summer project [4]. The abstract of the proposal starts stating a simple goal:
More informationInterpreting 3D Scenes Using Object-level Constraints
Interpreting 3D Scenes Using Object-level Constraints Rehan Hameed Stanford University 353 Serra Mall, Stanford CA rhameed@stanford.edu Abstract As humans we are able to understand the structure of a scene
More informationWhy study Computer Vision?
Why study Computer Vision? Images and movies are everywhere Fast-growing collection of useful applications building representations of the 3D world from pictures automated surveillance (who s doing what)
More informationInternational Journal of Advance Engineering and Research Development
Scientific Journal of Impact Factor (SJIF): 4.14 International Journal of Advance Engineering and Research Development Volume 3, Issue 3, March -2016 e-issn (O): 2348-4470 p-issn (P): 2348-6406 Research
More informationProf. Feng Liu. Spring /27/2014
Prof. Feng Liu Spring 2014 http://www.cs.pdx.edu/~fliu/courses/cs510/ 05/27/2014 Last Time Video Stabilization 2 Today Stereoscopic 3D Human depth perception 3D displays 3 Stereoscopic media Digital Visual
More informationFrequent Inner-Class Approach: A Semi-supervised Learning Technique for One-shot Learning
Frequent Inner-Class Approach: A Semi-supervised Learning Technique for One-shot Learning Izumi Suzuki, Koich Yamada, Muneyuki Unehara Nagaoka University of Technology, 1603-1, Kamitomioka Nagaoka, Niigata
More informationFeature Transfer and Matching in Disparate Stereo Views through the use of Plane Homographies
Feature Transfer and Matching in Disparate Stereo Views through the use of Plane Homographies M. Lourakis, S. Tzurbakis, A. Argyros, S. Orphanoudakis Computer Vision and Robotics Lab (CVRL) Institute of
More informationWhat is Computer Vision?
Perceptual Grouping in Computer Vision Gérard Medioni University of Southern California What is Computer Vision? Computer Vision Attempt to emulate Human Visual System Perceive visual stimuli with cameras
More informationPERCEIVING DEPTH AND SIZE
PERCEIVING DEPTH AND SIZE DEPTH Cue Approach Identifies information on the retina Correlates it with the depth of the scene Different cues Previous knowledge Slide 3 Depth Cues Oculomotor Monocular Binocular
More informationFundamentals of Stereo Vision Michael Bleyer LVA Stereo Vision
Fundamentals of Stereo Vision Michael Bleyer LVA Stereo Vision What Happened Last Time? Human 3D perception (3D cinema) Computational stereo Intuitive explanation of what is meant by disparity Stereo matching
More informationCS381V Experiment Presentation. Chun-Chen Kuo
CS381V Experiment Presentation Chun-Chen Kuo The Paper Indoor Segmentation and Support Inference from RGBD Images. N. Silberman, D. Hoiem, P. Kohli, and R. Fergus. ECCV 2012. 50 100 150 200 250 300 350
More informationStructured Completion Predictors Applied to Image Segmentation
Structured Completion Predictors Applied to Image Segmentation Dmitriy Brezhnev, Raphael-Joel Lim, Anirudh Venkatesh December 16, 2011 Abstract Multi-image segmentation makes use of global and local features
More information3D Shape Analysis with Multi-view Convolutional Networks. Evangelos Kalogerakis
3D Shape Analysis with Multi-view Convolutional Networks Evangelos Kalogerakis 3D model repositories [3D Warehouse - video] 3D geometry acquisition [KinectFusion - video] 3D shapes come in various flavors
More informationReal-Time Human Detection using Relational Depth Similarity Features
Real-Time Human Detection using Relational Depth Similarity Features Sho Ikemura, Hironobu Fujiyoshi Dept. of Computer Science, Chubu University. Matsumoto 1200, Kasugai, Aichi, 487-8501 Japan. si@vision.cs.chubu.ac.jp,
More informationRange Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation
Obviously, this is a very slow process and not suitable for dynamic scenes. To speed things up, we can use a laser that projects a vertical line of light onto the scene. This laser rotates around its vertical
More informationLecture 14: Computer Vision
CS/b: Artificial Intelligence II Prof. Olga Veksler Lecture : Computer Vision D shape from Images Stereo Reconstruction Many Slides are from Steve Seitz (UW), S. Narasimhan Outline Cues for D shape perception
More informationCOMP 102: Computers and Computing
COMP 102: Computers and Computing Lecture 23: Computer Vision Instructor: Kaleem Siddiqi (siddiqi@cim.mcgill.ca) Class web page: www.cim.mcgill.ca/~siddiqi/102.html What is computer vision? Broadly speaking,
More informationlecture 10 - depth from blur, binocular stereo
This lecture carries forward some of the topics from early in the course, namely defocus blur and binocular disparity. The main emphasis here will be on the information these cues carry about depth, rather
More informationHuman Upper Body Pose Estimation in Static Images
1. Research Team Human Upper Body Pose Estimation in Static Images Project Leader: Graduate Students: Prof. Isaac Cohen, Computer Science Mun Wai Lee 2. Statement of Project Goals This goal of this project
More informationObject Detection by 3D Aspectlets and Occlusion Reasoning
Object Detection by 3D Aspectlets and Occlusion Reasoning Yu Xiang University of Michigan Silvio Savarese Stanford University In the 4th International IEEE Workshop on 3D Representation and Recognition
More informationEE795: Computer Vision and Intelligent Systems
EE795: Computer Vision and Intelligent Systems Spring 2012 TTh 17:30-18:45 FDH 204 Lecture 14 130307 http://www.ee.unlv.edu/~b1morris/ecg795/ 2 Outline Review Stereo Dense Motion Estimation Translational
More informationSeparating Objects and Clutter in Indoor Scenes
Separating Objects and Clutter in Indoor Scenes Salman H. Khan School of Computer Science & Software Engineering, The University of Western Australia Co-authors: Xuming He, Mohammed Bennamoun, Ferdous
More informationBinocular cues to depth PSY 310 Greg Francis. Lecture 21. Depth perception
Binocular cues to depth PSY 310 Greg Francis Lecture 21 How to find the hidden word. Depth perception You can see depth in static images with just one eye (monocular) Pictorial cues However, motion and
More informationComplex Sensors: Cameras, Visual Sensing. The Robotics Primer (Ch. 9) ECE 497: Introduction to Mobile Robotics -Visual Sensors
Complex Sensors: Cameras, Visual Sensing The Robotics Primer (Ch. 9) Bring your laptop and robot everyday DO NOT unplug the network cables from the desktop computers or the walls Tuesday s Quiz is on Visual
More informationCan Similar Scenes help Surface Layout Estimation?
Can Similar Scenes help Surface Layout Estimation? Santosh K. Divvala, Alexei A. Efros, Martial Hebert Robotics Institute, Carnegie Mellon University. {santosh,efros,hebert}@cs.cmu.edu Abstract We describe
More informationLecture 10: Semantic Segmentation and Clustering
Lecture 10: Semantic Segmentation and Clustering Vineet Kosaraju, Davy Ragland, Adrien Truong, Effie Nehoran, Maneekwan Toyungyernsub Department of Computer Science Stanford University Stanford, CA 94305
More informationDetecting motion by means of 2D and 3D information
Detecting motion by means of 2D and 3D information Federico Tombari Stefano Mattoccia Luigi Di Stefano Fabio Tonelli Department of Electronics Computer Science and Systems (DEIS) Viale Risorgimento 2,
More informationDeep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks
Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Si Chen The George Washington University sichen@gwmail.gwu.edu Meera Hahn Emory University mhahn7@emory.edu Mentor: Afshin
More informationAutomatic Photo Popup
Automatic Photo Popup Derek Hoiem Alexei A. Efros Martial Hebert Carnegie Mellon University What Is Automatic Photo Popup Introduction Creating 3D models from images is a complex process Time-consuming
More informationDecomposing a Scene into Geometric and Semantically Consistent Regions
Decomposing a Scene into Geometric and Semantically Consistent Regions Stephen Gould sgould@stanford.edu Richard Fulton rafulton@cs.stanford.edu Daphne Koller koller@cs.stanford.edu IEEE International
More informationarxiv: v1 [cs.cv] 31 Mar 2016
Object Boundary Guided Semantic Segmentation Qin Huang, Chunyang Xia, Wenchao Zheng, Yuhang Song, Hao Xu and C.-C. Jay Kuo arxiv:1603.09742v1 [cs.cv] 31 Mar 2016 University of Southern California Abstract.
More informationSupport surfaces prediction for indoor scene understanding
2013 IEEE International Conference on Computer Vision Support surfaces prediction for indoor scene understanding Anonymous ICCV submission Paper ID 1506 Abstract In this paper, we present an approach to
More informationPredicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture David Eigen, Rob Fergus
Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture David Eigen, Rob Fergus Presented by: Rex Ying and Charles Qi Input: A Single RGB Image Estimate
More informationCS 534: Computer Vision Segmentation and Perceptual Grouping
CS 534: Computer Vision Segmentation and Perceptual Grouping Spring 2005 Ahmed Elgammal Dept of Computer Science CS 534 Segmentation - 1 Where are we? Image Formation Human vision Cameras Geometric Camera
More informationComputational Foundations of Cognitive Science
Computational Foundations of Cognitive Science Lecture 16: Models of Object Recognition Frank Keller School of Informatics University of Edinburgh keller@inf.ed.ac.uk February 23, 2010 Frank Keller Computational
More informationThink-Pair-Share. What visual or physiological cues help us to perceive 3D shape and depth?
Think-Pair-Share What visual or physiological cues help us to perceive 3D shape and depth? [Figure from Prados & Faugeras 2006] Shading Focus/defocus Images from same point of view, different camera parameters
More informationDeep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia
Deep learning for dense per-pixel prediction Chunhua Shen The University of Adelaide, Australia Image understanding Classification error Convolution Neural Networks 0.3 0.2 0.1 Image Classification [Krizhevsky
More informationMulti-Class Segmentation with Relative Location Prior
Multi-Class Segmentation with Relative Location Prior Stephen Gould, Jim Rodgers, David Cohen, Gal Elidan, Daphne Koller Department of Computer Science, Stanford University International Journal of Computer
More informationDefinition, Detection, and Evaluation of Meeting Events in Airport Surveillance Videos
Definition, Detection, and Evaluation of Meeting Events in Airport Surveillance Videos Sung Chun Lee, Chang Huang, and Ram Nevatia University of Southern California, Los Angeles, CA 90089, USA sungchun@usc.edu,
More informationVisuelle Perzeption für Mensch- Maschine Schnittstellen
Visuelle Perzeption für Mensch- Maschine Schnittstellen Vorlesung, WS 2009 Prof. Dr. Rainer Stiefelhagen Dr. Edgar Seemann Institut für Anthropomatik Universität Karlsruhe (TH) http://cvhci.ira.uka.de
More informationTRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK
TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK 1 Po-Jen Lai ( 賴柏任 ), 2 Chiou-Shann Fuh ( 傅楸善 ) 1 Dept. of Electrical Engineering, National Taiwan University, Taiwan 2 Dept.
More information12/3/2009. What is Computer Vision? Applications. Application: Assisted driving Pedestrian and car detection. Application: Improving online search
Introduction to Artificial Intelligence V22.0472-001 Fall 2009 Lecture 26: Computer Vision Rob Fergus Dept of Computer Science, Courant Institute, NYU Slides from Andrew Zisserman What is Computer Vision?
More informationCRF Based Point Cloud Segmentation Jonathan Nation
CRF Based Point Cloud Segmentation Jonathan Nation jsnation@stanford.edu 1. INTRODUCTION The goal of the project is to use the recently proposed fully connected conditional random field (CRF) model to
More informationViewpoint Invariant Features from Single Images Using 3D Geometry
Viewpoint Invariant Features from Single Images Using 3D Geometry Yanpeng Cao and John McDonald Department of Computer Science National University of Ireland, Maynooth, Ireland {y.cao,johnmcd}@cs.nuim.ie
More informationMethods for Automatically Modeling and Representing As-built Building Information Models
NSF GRANT # CMMI-0856558 NSF PROGRAM NAME: Automating the Creation of As-built Building Information Models Methods for Automatically Modeling and Representing As-built Building Information Models Daniel
More informationShadows in the graphics pipeline
Shadows in the graphics pipeline Steve Marschner Cornell University CS 569 Spring 2008, 19 February There are a number of visual cues that help let the viewer know about the 3D relationships between objects
More informationLecture 6: Edge Detection
#1 Lecture 6: Edge Detection Saad J Bedros sbedros@umn.edu Review From Last Lecture Options for Image Representation Introduced the concept of different representation or transformation Fourier Transform
More information(Refer Slide Time 00:17) Welcome to the course on Digital Image Processing. (Refer Slide Time 00:22)
Digital Image Processing Prof. P. K. Biswas Department of Electronics and Electrical Communications Engineering Indian Institute of Technology, Kharagpur Module Number 01 Lecture Number 02 Application
More informationLearning visual odometry with a convolutional network
Learning visual odometry with a convolutional network Kishore Konda 1, Roland Memisevic 2 1 Goethe University Frankfurt 2 University of Montreal konda.kishorereddy@gmail.com, roland.memisevic@gmail.com
More informationCS 231A Computer Vision (Winter 2014) Problem Set 3
CS 231A Computer Vision (Winter 2014) Problem Set 3 Due: Feb. 18 th, 2015 (11:59pm) 1 Single Object Recognition Via SIFT (45 points) In his 2004 SIFT paper, David Lowe demonstrates impressive object recognition
More informationCombining PGMs and Discriminative Models for Upper Body Pose Detection
Combining PGMs and Discriminative Models for Upper Body Pose Detection Gedas Bertasius May 30, 2014 1 Introduction In this project, I utilized probabilistic graphical models together with discriminative
More informationProject 4 Results. Representation. Data. Learning. Zachary, Hung-I, Paul, Emanuel. SIFT and HoG are popular and successful.
Project 4 Results Representation SIFT and HoG are popular and successful. Data Hugely varying results from hard mining. Learning Non-linear classifier usually better. Zachary, Hung-I, Paul, Emanuel Project
More informationCost-alleviative Learning for Deep Convolutional Neural Network-based Facial Part Labeling
[DOI: 10.2197/ipsjtcva.7.99] Express Paper Cost-alleviative Learning for Deep Convolutional Neural Network-based Facial Part Labeling Takayoshi Yamashita 1,a) Takaya Nakamura 1 Hiroshi Fukui 1,b) Yuji
More informationAdvance Shadow Edge Detection and Removal (ASEDR)
International Journal of Computational Intelligence Research ISSN 0973-1873 Volume 13, Number 2 (2017), pp. 253-259 Research India Publications http://www.ripublication.com Advance Shadow Edge Detection
More informationAutomatic Tracking of Moving Objects in Video for Surveillance Applications
Automatic Tracking of Moving Objects in Video for Surveillance Applications Manjunath Narayana Committee: Dr. Donna Haverkamp (Chair) Dr. Arvin Agah Dr. James Miller Department of Electrical Engineering
More informationStereo: Disparity and Matching
CS 4495 Computer Vision Aaron Bobick School of Interactive Computing Administrivia PS2 is out. But I was late. So we pushed the due date to Wed Sept 24 th, 11:55pm. There is still *no* grace period. To
More informationComputer and Machine Vision
Computer and Machine Vision Lecture Week 4 Part-2 February 5, 2014 Sam Siewert Outline of Week 4 Practical Methods for Dealing with Camera Streams, Frame by Frame and De-coding/Re-encoding for Analysis
More informationMiniature faking. In close-up photo, the depth of field is limited.
Miniature faking In close-up photo, the depth of field is limited. http://en.wikipedia.org/wiki/file:jodhpur_tilt_shift.jpg Miniature faking Miniature faking http://en.wikipedia.org/wiki/file:oregon_state_beavers_tilt-shift_miniature_greg_keene.jpg
More informationLSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University
LSTM and its variants for visual recognition Xiaodan Liang xdliang328@gmail.com Sun Yat-sen University Outline Context Modelling with CNN LSTM and its Variants LSTM Architecture Variants Application in
More informationRemoving Shadows from Images
Removing Shadows from Images Zeinab Sadeghipour Kermani School of Computing Science Simon Fraser University Burnaby, BC, V5A 1S6 Mark S. Drew School of Computing Science Simon Fraser University Burnaby,
More informationRoom Reconstruction from a Single Spherical Image by Higher-order Energy Minimization
Room Reconstruction from a Single Spherical Image by Higher-order Energy Minimization Kosuke Fukano, Yoshihiko Mochizuki, Satoshi Iizuka, Edgar Simo-Serra, Akihiro Sugimoto, and Hiroshi Ishikawa Waseda
More informationCOS Lecture 10 Autonomous Robot Navigation
COS 495 - Lecture 10 Autonomous Robot Navigation Instructor: Chris Clark Semester: Fall 2011 1 Figures courtesy of Siegwart & Nourbakhsh Control Structure Prior Knowledge Operator Commands Localization
More informationLEARNING BOUNDARIES WITH COLOR AND DEPTH. Zhaoyin Jia, Andrew Gallagher, Tsuhan Chen
LEARNING BOUNDARIES WITH COLOR AND DEPTH Zhaoyin Jia, Andrew Gallagher, Tsuhan Chen School of Electrical and Computer Engineering, Cornell University ABSTRACT To enable high-level understanding of a scene,
More informationOCCLUSION BOUNDARIES ESTIMATION FROM A HIGH-RESOLUTION SAR IMAGE
OCCLUSION BOUNDARIES ESTIMATION FROM A HIGH-RESOLUTION SAR IMAGE Wenju He, Marc Jäger, and Olaf Hellwich Berlin University of Technology FR3-1, Franklinstr. 28, 10587 Berlin, Germany {wenjuhe, jaeger,
More informationSingle-view 3D Reconstruction
Single-view 3D Reconstruction 10/12/17 Computational Photography Derek Hoiem, University of Illinois Some slides from Alyosha Efros, Steve Seitz Notes about Project 4 (Image-based Lighting) You can work
More informationOutdoor Scene Reconstruction from Multiple Image Sequences Captured by a Hand-held Video Camera
Outdoor Scene Reconstruction from Multiple Image Sequences Captured by a Hand-held Video Camera Tomokazu Sato, Masayuki Kanbara and Naokazu Yokoya Graduate School of Information Science, Nara Institute
More informationImportant concepts in binocular depth vision: Corresponding and non-corresponding points. Depth Perception 1. Depth Perception Part II
Depth Perception Part II Depth Perception 1 Binocular Cues to Depth Depth Information Oculomotor Visual Accomodation Convergence Binocular Monocular Static Cues Motion Parallax Perspective Size Interposition
More informationIntroduction to 3D Concepts
PART I Introduction to 3D Concepts Chapter 1 Scene... 3 Chapter 2 Rendering: OpenGL (OGL) and Adobe Ray Tracer (ART)...19 1 CHAPTER 1 Scene s0010 1.1. The 3D Scene p0010 A typical 3D scene has several
More informationCS4495/6495 Introduction to Computer Vision
CS4495/6495 Introduction to Computer Vision 9C-L1 3D perception Some slides by Kelsey Hawkins Motivation Why do animals, people & robots need vision? To detect and recognize objects/landmarks Is that a
More informationOverview. Related Work Tensor Voting in 2-D Tensor Voting in 3-D Tensor Voting in N-D Application to Vision Problems Stereo Visual Motion
Overview Related Work Tensor Voting in 2-D Tensor Voting in 3-D Tensor Voting in N-D Application to Vision Problems Stereo Visual Motion Binary-Space-Partitioned Images 3-D Surface Extraction from Medical
More informationCombining Monocular and Stereo Depth Cues
Combining Monocular and Stereo Depth Cues Fraser Cameron December 16, 2005 Abstract A lot of work has been done extracting depth from image sequences, and relatively less has been done using only single
More informationFocusing Attention on Visual Features that Matter
TSAI, KUIPERS: FOCUSING ATTENTION ON VISUAL FEATURES THAT MATTER 1 Focusing Attention on Visual Features that Matter Grace Tsai gstsai@umich.edu Benjamin Kuipers kuipers@umich.edu Electrical Engineering
More informationGaze interaction (2): models and technologies
Gaze interaction (2): models and technologies Corso di Interazione uomo-macchina II Prof. Giuseppe Boccignone Dipartimento di Scienze dell Informazione Università di Milano boccignone@dsi.unimi.it http://homes.dsi.unimi.it/~boccignone/l
More informationPerspective and vanishing points
Last lecture when I discussed defocus blur and disparities, I said very little about neural computation. Instead I discussed how blur and disparity are related to each other and to depth in particular,
More informationAutomatic Colorization of Grayscale Images
Automatic Colorization of Grayscale Images Austin Sousa Rasoul Kabirzadeh Patrick Blaes Department of Electrical Engineering, Stanford University 1 Introduction ere exists a wealth of photographic images,
More informationBasic distinctions. Definitions. Epstein (1965) familiar size experiment. Distance, depth, and 3D shape cues. Distance, depth, and 3D shape cues
Distance, depth, and 3D shape cues Pictorial depth cues: familiar size, relative size, brightness, occlusion, shading and shadows, aerial/ atmospheric perspective, linear perspective, height within image,
More information