Exploiting Depth Camera for 3D Spatial Relationship Interpretation

Exploiting Depth Camera for 3D Spatial Relationship Interpretation Jun Ye Kien A. Hua Data Systems Group, University of Central Florida Mar 1, 2013 Jun Ye and Kien A. Hua (UCF) 3D directional spatial relationships Mar 1, 2013 1 / 32

Outline Motivation, definition and background Define 3D directional relationships 3D scene reconstruction Compute 3D directional relationships between two objects Performance study Jun Ye and Kien A. Hua (UCF) 3D directional spatial relationships Mar 1, 2013 2 / 32

Definition Definition: spatial relationship specifies how an object is located w.r.t. other objects. Applications: robotic spatial reasoning scene understanding video surveillance, spatial query. Jun Ye and Kien A. Hua (UCF) 3D directional spatial relationships Mar 1, 2013 3 / 32

Spatial relationships Topological relationships coincide, intersect, touch externally, touch internally, contains, inside, outside, disjoint, etc. Metric relationships Distance relationships at, nearby, far away from, etc. Directional relationships east, west, north, south, northeast, northwest, southeast, southwest, etc. Jun Ye and Kien A. Hua (UCF) 3D directional spatial relationships Mar 1, 2013 4 / 32

Limitation of 2D spatial representation sitting on a chair (contact) standing in front of a chair (no contact) Ambiguity 2D spatial representation cannot distinguish these two spatial relations. Jun Ye and Kien A. Hua (UCF) 3D directional spatial relationships Mar 1, 2013 5 / 32

Extension from 2D to 3D Device Kinect sensor (depth sensor) RGB image + depth image Spatial direction set Define a new set of 3D directional relationships Algorithm for computing the new relationships New method leverages the 3D object model Jun Ye and Kien A. Hua (UCF) 3D directional spatial relationships Mar 1, 2013 6 / 32

Application background We investigate the 3D spatial relationships representation as a component for our Live Video DataBase Management System (LVDBMS) [MMSJ 12]. The LVDBMS is a general-purpose framework for managing and processing video data for surveillance and analytical applications. This system allows automatic monitoring and management of a network of live cameras. The user specifies a monitoring task by formulating a query to describe a spatiotemporal event. The query is in the form of combinations of some logical, spatial, and temporal operators. When the specified event occurs, an action associated with the query is triggered. The LVDBMS treats a camera as a special class of storage, and processes the queries against the live video feed as a new category of databases. Jun Ye and Kien A. Hua (UCF) 3D directional spatial relationships Mar 1, 2013 7 / 32

Define 3D directional relationships: 2D to 3D extension 2D spatial direction is defined in a plane 3D spatial direction is define in space by introducing another two directions, above and below By combining the original 8 2D directions with A (above) and B (below), 26 primitive 3D directions are defined. They can be fit to a 3x3 cube. Jun Ye and Kien A. Hua (UCF) 3D directional spatial relationships Mar 1, 2013 9 / 32

Define 3D directional relationships: a complete set of 26 directions in 3D space There 27 grids in the 3x3 cube, supposing the reference object locates at the grid in the center. The 26 3D directions can be presented in the following way. ASW AS ASE AE ANE AN ANW AW SW S SE E NE N NW W BSW BS BSE BE BNE BN BNW BW A B 26 primitive 3D directions can be nicely fit into a 3X 3 cube. e.g. ASE means southeast and to the above Jun Ye and Kien A. Hua (UCF) 3D directional spatial relationships Mar 1, 2013 10 / 32

3D reconstruction: 3D point cloud Pixel in RGB image, 3 channels, 24 bits (i, j, r, g, b) Pixel in depth image, 1 channel, 16 bits (i, j, z) Point in 3D point cloud (x, y, z, r, g, b) x = (i u0) z a x y = (j v0) z a y (u 0, v 0 ): principle point, a x, a y : focal length Jun Ye and Kien A. Hua (UCF) 3D directional spatial relationships Mar 1, 2013 12 / 32

3D reconstruction: compute the camera pose Estimate the floor plane by RANSAC algorithm Compute the camera pose using the normal of floor plane Rotate the 3D point cloud according to camera pose 3D point cloud is then aligned to the world coordinate system Jun Ye and Kien A. Hua (UCF) 3D directional spatial relationships Mar 1, 2013 13 / 32

3D reconstruction: extract objects User specify the reference obj and the target obj Stationary obj (e.g. chair, table) are manually specified and segmentated Meanshift segmentation Filter background pixels (floor, walls, etc) using RGB image Extract object using depth image Mobile object (mostly human beings), automatically detected and tracked Using Kinect SDK Jun Ye and Kien A. Hua (UCF) 3D directional spatial relationships Mar 1, 2013 14 / 32

3D reconstruction: fill up the model by heuristics Single view only capture half information of the scene 3D point cloud is incomplete, back side point is missing The thickness of objects may be incorrectly represented thus leads to incorrect spatial relationships (e.g. side by side or behind?) Fill up the 3D model by heuristics (Manhattan suggestion), assume the back side pixel has the same depth as pixel on top, a vertical projection from top view Jun Ye and Kien A. Hua (UCF) 3D directional spatial relationships Mar 1, 2013 15 / 32

Compute 3D directions: sample the sphere The idea is adopted from ray tracing (computer graphics techniques for real-time rendering) 840 rays are originated from the reference point and go into the space at different directions. Each ray represents a particular direction < θ, φ > under polar coordinate system θ and φ are discretized into 21 and 40 sample points, respectively. θ [0, π], φ [0, 2π), step = π 20 Jun Ye and Kien A. Hua (UCF) 3D directional spatial relationships Mar 1, 2013 17 / 32

Compute 3D directions Each ray will go through the space until hit an object (parts of the object). It produces a result of 0 or 1, with 1 indicating a hit. A 840-dimension feature vector Advantage Since rays sample the entirely space around the reference object, it automatically investigate both the position and shape information of the target object. It can handle object with complex shapes. Jun Ye and Kien A. Hua (UCF) 3D directional spatial relationships Mar 1, 2013 18 / 32

Compute 3D directions: mapping to 26 directions Convert the 840D featrue vector to the 26 premitive directions. In accordance with the 3X3 cube holding the 26 premitive directions. Clustering neighboring rays into primitive directions by quantizing θ and φ into 5 and 8 sub-divisions. Jun Ye and Kien A. Hua (UCF) 3D directional spatial relationships Mar 1, 2013 19 / 32

Compute 3D directions: mapping to 26 directions (cont d) Each primitive direction covers 25 rays (5 consecutive θ values and φ values) They have different significance (ray through the center part are more significant) 25 rays are weighted by a 5x5 Gaussian filter g(x, y) = 1 A 1 2πσ e x 2 +y 2 2 2σ 2 Weighted sum of all 25 rays results = 2 2 f(x, y)g(x, y), x= 2 y= 2 { 1, ray hits the obj, f(x, y) = 0, otherwise 840D bool feature vector is converted to the 26D float feature vector Jun Ye and Kien A. Hua (UCF) 3D directional spatial relationships Mar 1, 2013 20 / 32

Compute 3D directions: Octree decomposition From which point of the reference object do we generate rays? cannot include all points, too computational intense a small object can have more than 50,000 points in the point cloud. Octree coding is used to reduce the model while preserving enough structural information. An Octree is a data structure that each node has 8 child nodes. It can recursively subdivide a cube into 8 sub-cubicles. n levels of Octree decomposition partitions the space into 8 n sub-cubicles. Starting from the bounding box of the object and subdivide it. Remove those subcubicles do not overlap with the object. Jun Ye and Kien A. Hua (UCF) 3D directional spatial relationships Mar 1, 2013 21 / 32

Compute 3D directions between two objects Centroid-based method Generate 840 rays from the centroid of the reference object Check if each ray intersects with the point cloud of the target object Compute the 26D float features Sort the results in descending order and output results greater than zero Simple, fast, useful when target obj is significantly larger than the reference obj. Landmark-based method 2 levels of Octree decomposition is performed to the bounding box of the reference object Do the same thing at the centroid of each subcubicles Accumulate the results of each subcubicles Robust to spatial relationships between complex object, but heavy on computation. Jun Ye and Kien A. Hua (UCF) 3D directional spatial relationships Mar 1, 2013 22 / 32

Performance study We perform the experimental study on both a real indoor environment, an office scene objects include chair, table, etc. 640 480 RGB and Depth images a public available RGB-D dataset http://www.cs.washington.edu/rgbd-dataset/ contain 300 common household objects (bowls, cups, dishes, computers, etc) 640 480 synchronized RGB and Depth images sequences captured from 8 different scenes Jun Ye and Kien A. Hua (UCF) 3D directional spatial relationships Mar 1, 2013 24 / 32

Performance study: demonstration in real scene Demonstration in an indoor environment (a) (b) (c) (d) Jun Ye and Kien A. Hua (UCF) 3D directional spatial relationships Mar 1, 2013 25 / 32

Performance study: experiments on RGB-D dataset 43 representative images are selected from the total image sequences Since no ground truth (GT) is provided, we manually label the image by ourselves. 214 pairs of objects are labeled from the 43 images and 364 spatial relationships between them are labeled. There could be multiple spatial relationships between a pair of objects. We annotate the most salient one as the major relationships and all others (include the major relationship) as candidate relationships. It is extremely challenging to label the GT perfectly. Since spatial relationship is a fuzzy concept, we label it in accordance with human perception. Experiments are performed based on both major relationships and candidate relationships Jun Ye and Kien A. Hua (UCF) 3D directional spatial relationships Mar 1, 2013 26 / 32

Performance study: experiments on RGB-D dataset Performance evaluation based on major relationship Centroid-based method Landmark-based method 89% of the object pairs are correctly detected regarding to the major relationship, our algorithm suits human perception Jun Ye and Kien A. Hua (UCF) 3D directional spatial relationships Mar 1, 2013 27 / 32

Performance study: experiments on RGB-D dataset Performance evaluation based on all candidate relationship, recall and precision are applied as the performance metrics, recall = precision = Centroid-based method num of correctly detected relationships total num of relationships in the GT, num of correctly detected relationships num of detected relationships Landmark-based method Landmark-based method significantly increase the recall without sacrificing too much precision. Jun Ye and Kien A. Hua (UCF) 3D directional spatial relationships Mar 1, 2013 28 / 32

Performance study: experiments on RGB-D dataset Performance based on speed. The elapsed time only includes the 3D object modeling and the spatial relationship detection Hardware: Intel i7 quad cores CPU and 24GB RAM. Landmark-based method repeat the same process for n times (n = number of subcubicles), it is n times slower than centroid method GPU techniques can be exploited to accelerate the algorithm, e.g. OpenCL, CUDA. Jun Ye and Kien A. Hua (UCF) 3D directional spatial relationships Mar 1, 2013 29 / 32

Conclusions We define a new set of 3D directional spatial relationships We introduce an efficient algorithm to compute the above 3D spatial relationships Extensive experimental study based on a real scene and a public dataset demonstrate the effectiveness of our algorithm. Jun Ye and Kien A. Hua (UCF) 3D directional spatial relationships Mar 1, 2013 30 / 32

Discussion and future work Geodesic sphere ray generation using polar coordinate system results more ray density at direction of two poles of a sphere geodesic sphere equally devides the surface of a spshere GPU implementation OpenCL, CUDA, etc. fully parallel, each OpenCL kernel process an octree subcubicle. Better way other than heuristics multiple kinects one moving kinect: kinect fusion [SIGGRAPH 2011] better heuristics Push Octree model ahead represent the 3D model by octree rather than pointcloud segmentation on Octree compute spatial relationship entirely based on octree Jun Ye and Kien A. Hua (UCF) 3D directional spatial relationships Mar 1, 2013 31 / 32

Thank you. Jun Ye and Kien A. Hua (UCF) 3D directional spatial relationships Mar 1, 2013 32 / 32