OCCLUSION BOUNDARIES ESTIMATION FROM A HIGH-RESOLUTION SAR IMAGE

Size: px

Start display at page:

Download "OCCLUSION BOUNDARIES ESTIMATION FROM A HIGH-RESOLUTION SAR IMAGE"

Wendy Randall
5 years ago
Views:

1 OCCLUSION BOUNDARIES ESTIMATION FROM A HIGH-RESOLUTION SAR IMAGE Wenju He, Marc Jäger, and Olaf Hellwich Berlin University of Technology FR3-1, Franklinstr. 28, Berlin, Germany {wenjuhe, jaeger, hellwich}@fpk.tu-berlin.de ABSTRACT Occlusion is the concept that several objects interfere with one another in an image. This phenomenon is prevalent in high resolution Synthetic Aperture Radar (SAR) images in urban areas. Geometric contents, which enable us to analyze occlusion, are partially observable in high resolution SAR images. Estimation of occlusion boundaries helps to discriminate different objects and localize their extents. An occlusion boundary map also corresponds to an efficient figure / ground segmentation, which would be quite promising for further object analysis. This paper applies a hierarchical framework [1] to extract occlusion boundaries among different objects, e.g. buildings and trees. The framework uses Conditional Random Fields to simultaneously reason about the boundaries and segments. Key words: SAR; urban; occlusion; boundary. 1. INTRODUCTION A Synthetic Aperture Radar (SAR) image is a projection of scattering reflections of 3D scene to slant range representation. Object extents, e.g. geometric information, are usually missing in SAR images. Speckles, SAR imaging mechanisms and geographical configuration of objects make the analysis of SAR images very difficult. In contrast to optical images, SAR images are not capable of reconstructing objects. However, geometric information contents are partially observable in high resolution SAR images. Thus their applications in urban environments are promising, e.g. when combined with interferometric SAR data which are able to provide height information. Occlusion is a common phenomenon in optical images due to the projection of 3D scene to 2D image plane. Occlusion reasoning is an important aspect of the intrinsic 3D understanding from a single image. This effect is handled in [1] by extracting potential occlusion boundaries. The occlusion boundaries define figure/ground labeling. The algorithm can naturally be adjusted to strengthen consistency of interested objects. SAR images are occluded in a different way. The propagations of electromagnetic waves in urban areas are complicated due to complex geometric configuration of man-made structures and surroundings. Multiple reflections happen among objects in the areas. Electromagnetic waves obstructed by objects can not reach some adjacent objects. Scatterers located at the bottom of a building may fall behind scatterers on the top in an image. It is difficult to discriminate neighboring objects in SAR images in urban areas. The boundaries between different objects are usually occluded. An example is that buildings and trees are sometimes situated together and have similar characteristics along their boundaries. Estimation of occlusion boundaries helps to discriminate different objects and localize their extents. It is very important for scene understanding using SAR images. An occlusion boundary map also corresponds to a foreground segmentation, which would be quite promising for object analysis despite the constrains of SAR imaging mechanism. This paper studies the occlusion between different objects in high resolution SAR images in urban areas. For instance, we estimate that buildings occlude trees and shadow, trees occlude grasses, and so on. We adopt an iterative strategy [1] exploring boundary strength and region characteristics coherently to solve this difficult problem. We integrate occlusion boundary estimation with segmentation problem. Initial segmentation is obtained by applying watershed method on polarimetric amplitude data. The boundaries of the generated segments are potential occlusion boundaries. Weak boundaries that are less likely to be occlusions can be removed and the small regions can be grouped if they have the same surface type. Many effective features are adopted in this paper, which help to characterize boundaries and regions efficiently. The boundaries and regions likelihoods are integrated into a Conditional Random Fields (CRF) framework, which models the interaction of boundaries, junctions and regions. The CRF inference outputs the occlusion boundary map. Our goal is to find the boundaries and occlusion relationships. The recovered occlusion boundary map shows major occlusions in a SAR image. Therefore, it would be helpful for 3D scene understanding of a single high resolution SAR image. An accurate occlusion boundary map also defines a high-quality segmentation. The segmen- Proc. of 4th Int. Workshop on Science and Applications of SAR Polarimetry and Polarimetric Interferometry PolInSAR 2009, January 2009, Frascati, Italy (ESA SP-668, April 2009)

2 tation formed by the boundaries also gives an efficient figure/ground segmentation for further object analysis. boundaries together with the segmentation are the input of the second iteration. 2. ALGORITHM Occlusion boundary analysis and image segmentation are integrated and interleaved in the algorithm [1]. Segmentation provides initial boundaries and regions. We gradually estimate occlusion boundaries by iteratively removing weak boundaries and inferencing on the new segmentation. The growing segments provide better spatial support for feature extraction. After several iterations we obtain an occlusion boundary map. Each iteration consists of three steps: (1) compute multiple features for boundaries and regions; (2) inference confidences for boundaries and regions; and (3) compute a hierarchical segmentation by iteratively removing boundaries with lower boundary strength than a given threshold. Regions are merged to form a new segmentation. Each time a weak boundary is about to be removed, boundary likelihood of the enlarged region has to be re-estimated. The new segmentation is used as the initial segmentation for the next iteration. Each iteration produces a new segmentation which enables reasonable complex feature extraction in the next iteration. Our estimation framework consists of three iterations. The first iteration is minimum merging using unary likelihood estimation. Boundaries with smallest occlusion likelihood are eliminated. In the second iteration, we use a CRF model to integrate the unary likelihood with the conditional dependency of a boundary on its preceding boundaries. In the third iteration, the CRF model is extended to model surface evidences on both sides of boundaries. In each iteration we apply the proposed three steps to obtain new and less boundaries, which are supposed to be likely to be occlusion ones. The new probabilistic boundary map is thresholded to give the initial segmentation for the next iteration. The third iteration produces the final occlusion likelihoods and boundaries Minimum merging At the beginning, we adopt watershed segmentation method to segment an image into small regions, which provide an initial hypothesis of the occlusion boundaries. An example is shown in Fig. 1(b). Watershed segmentation generates an over-segmentation with several thousand regions from intensity gradients of polarimetric SAR data. These regions provide nearly true boundaries. They are conservative estimations of the occlusion boundaries. Most of the boundaries are smooth and thus facilitate efficient junction analysis. We extract features for all boundaries and use a boundary classifier to estimate boundary likelihoods. The likelihoods are thresholded to provide a new hypothesis of the occlusion boundaries and a new segmentation. The 2.2. CRF model Both boundaries and regions indicate whether an occlusion boundary exists. On the one hand, the initial boundary map contains a large number of edges. Occlusion boundaries tend to be strong edges. We calculate strength, length and other features for boundaries. On the other hand, the initial segmentation contains lots of small regions. Regions with a same surface label are usually not occluded. Therefore, occlusions estimation will benefit from the integration of boundaries and regions. In the second and third iteration we use CRF to model the interaction of adjacent boundaries and surfaces on both sides. The CRF model inferences over boundary and junctions, modeling boundary strength and enforcing closure and boundary consistency. The model is defined as P (labels data) = 1 N j N e φ j γ e (1) Z where φ j indicates junction factor, γ e indicates surface factor, N j is the number of junctions, N e is the number of boundaries, and 1 Z is normalization item. The factor graph of CRF consists of a junction factor and a surface factor. The junction factor models the strength and continuities of boundaries, i.e. the likelihood of the label of each boundary according to the data, conditioned on its preceding boundaries if exist. The junction factor consists of unary boundary likelihood and conditional continuity likelihood. The surface factor models the likelihood of a boundary conditioned on the region types on each side. A boundary between two regions assigned with the same surface label is less likely to be occluded and thus has a low occlusion likelihood. We learn to detect whether a boundary probably exists between two regions due to occlusion. The CRF model achieves a joint inference over the two factors. Confidences for boundaries and surfaces are computed simultaneously, which are expected to be more stable. It enforces boundary consistency that the left side is object and that the left side occludes the right side. Surface evidence map also helps to guarantee the consistency of object boundaries. Besides, the model is capable to improve surface estimation in the mean time. CRF inference gives occlusion likelihood of boundaries. Boundaries with low likelihood are removed. Therefore, we obtain a new probabilistic boundary map. Given a labeling of boundaries and excluding the surface factor, the CRF model decomposes into single likelihood term for each boundary. This property allows us to learn boundary likelihood and conditional likelihood of junction factor using boosted decision tree, which is j e

3 able to perform feature selection and give probabilistic results. Boundary classifier and boundary continuity classifier are trained to generate potentials in the junction factor. Sum-product belief propagation is used for inference. The CRF outputs occlusion likelihoods of boundaries. Boundaries with low likelihoods are to be removed. Surface evidence maps used in the model are computed using a smaller set of low-level features by the algorithm in [2]. The maps indicate 5 surface types: layover, shadow, tree, grass and unknow class. The unknown class indicates that in meter-resolution SAR images some regions are hard to interpret by eyes. An example of surface map is shown in Fig. 1(d). The maps allow us to inference boundaries between different object types, and to impose penalty to non-occlusion boundaries. They help to enforce consistency between the region labels and boundary labels Features extraction Table 1. Features extracted for a boundary. Boundary Feature Descriptions Region R1. Polarimetric entropy, anisotropy and α differences R2. Sublook coherence and entropy differences R3. Optimized coherence difference R4. HH, VV and HV: amplitude differences R5. Span image: amplitude difference R6. Span histogram: KullbackLeibler (KL) divergence R7. Log span histogram: KL divergence R8. Filter bank responses of span: differences R9. Filter bank responses of log span: differences R10. Texton histogram of span: KL divergence R11. Texton histogram of log span: KL divergence R12. HOG of span: KL divergence R13. HOG of log span: KL divergence R14. Dense SIFT of log span: KL divergence R15. Area: area of region on each side, area ratio R16. Lines: difference of line pixels R17. Parallel lines: percentage difference R18. Position: differences of bounding box coordinates R19. Alignment: horizontal and vertical overlaps Boundary B1. Strength: average Pb B2. Length: length / (perimeter of smaller side) B3. Smoothness: length / (endpoint distance) B4. Orientation: directed orientation B5. Continuity: angle difference at each junction Surface S1. Surface evidences: confidences of each side S2. Surface evidences: differences of S1 We extract a rich set of features for regions in a segmentation. The region features are used to generate boundary features. They include polarimety, amplitude, texture, shape, and other types. We believe that comprehensive features extraction would better characterize different objects in the images. More robust features are expected for evolving regions. Besides these low-level features, we also use surface evidence maps as additional cues. We extract 204 features for each region. Polarimetric SAR data reveal more scattering physics than a single channel image. Therefore, polarimetric decomposition provides informative indicators of approximate main scattering processes involved in a region. We extract polarimetric entropy, anisotropy and α angle. Sub-aperture coherence, entropy and optimized coherence are also helpful, e.g. most coherent scatterers are targets formed by buildings or together with ground. Amplitude of polarimetric SAR data is the most important information for discriminating different objects, since all derived products of polarimetric SAR data, e.g. coherence, are strongly influenced by intensity, i.e., reflection strength. The distribution of SAR amplitude data can be modeled by K distribution, log normal, and so on. For simplicity, we use features extracted from log span image of polarimetric SAR data. The log features are very effective for SAR image segmentation. For span and log span images, we use a filter bank [3] to generate texton histogram. Histogram of oriented gradients (HOG) [4] is another effective feature. We also apply scale-invariant feature transform (SIFT) descriptor [5] to SAR images. Furthermore, we represent area, small lines generated by a line detector [2], position and bounding box as additional region features. Boundaries features are used to learn occlusions. Occlusion boundaries often have strong amplitude gradients. Therefore, the distances of the features of two neighboring regions are efficient boundary features. Probabilistic boundary map produced from polarimetric amplitude image by Pb algorithm [6] provides an important cue of boundary strength. An example of Pb map is shown in Fig. 1(c). We represent the mean Pb along the boundary pixels as a boundary feature. We also extract boundary length, smoothness, orientation, alignment continuity [1]. The extracted 88 boundary features are listed in Tab. 1. We expect that boundaries reasoning will benefit from effective features. We calculate continuity features to describe the conditional dependency of a boundary on its preceding one. The continuity features are the concatenation of boundary features of two adjacent boundaries, and in addition the relative angle between them. 3. EXPERIMENTS 3.1. Dataset The polarimetric SAR data of Copenhagen acquired by EMISAR are used in the experiments. We extract 98 images ( ) from the data. We generate ground truth occlusions boundary for 41 images. We use 31 of them for training, and 10 images to evaluate the estimation ac-

(a) (b) Figure 2. Precision-recall curve for classifying whether a boundary is an occlusion boundary in the first iteration. 3.3. Inference (c) (d) Figure 1.

To generate ground truth, we first segment an image into thousands of regions, and manually group them into object regions. Then we manually label occlusion types of adjacent regions. 3.2.

In the first iteration, we extract features for boundaries and apply the first boundary classifier. Weak boundaries are removed and new segmentation is formed.

4 (a) (b) Figure 2. Precision-recall curve for classifying whether a boundary is an occlusion boundary in the first iteration Inference (c) (d) Figure 1. (a) A polarimetric SAR image, (b) watershed segmentation (4915 segments), (c) probabilistic boundaries, (d) surface evidences. curacies. Ground truth contains object labels for each region. To generate ground truth, we first segment an image into thousands of regions, and manually group them into object regions. Then we manually label occlusion types of adjacent regions Training For a test image, we apply the classifiers and the models to estimate occlusion boundaries. The image is initially over-segmented by watershed. In the first iteration, we extract features for boundaries and apply the first boundary classifier. Weak boundaries are removed and new segmentation is formed. In the second iteration, we extract boundaries features and continuity features. The inference over junction factor terms gives boundary probabilities. We perform inference over the full model in the third iteration to obtain the final occlusion likelihoods. Fig. 3 and Fig. 4 show two examples of occlusion boundary estimation and corresponding segmentations. Fig. 3(f) shows the segmentation result of the second iteration, which contains more segments and is slightly more accurate than the final segmentation shown in Fig. 3(e) in terms of small objects. Nonetheless, final segmentation contains 290 less segments. Less segments will reduce the computational burden in further applications. This demonstrates the effectiveness of joint inference over junctions and surface evidences in the CRF model. We train three boundary classifiers and two boundary continuity classifiers in the three iterations using a logistic regression version of Adaboost [1]. In each iteration, the classifiers are trained on the segmentation result of the previous iteration. The trained classifiers are then applied to the train data. We transfer the ground truth from the last iteration to current iteration in order to train new classifiers on new regions. In the transfer process, we label each region as the object that has the most pixels in the region, and then label the occlusion types between regions. In the three iterations, we set the thresholds for removing weak boundaries to 0.08, 0.12 and 0.2, respectively. Setting the thresholds is a trade-off between more resulted segments and smoother, more sensible objects. In the second iteration, we restrict the CRF model to the junction factor. The CRF model is extended to the full factors in the third iteration. We impose a penalty (e 0.3 ) for the lack of a boundary between different surface classes, for shadow occluding others, for grass occluding layover or tree, and for unknow class occluding layover or tree Evaluation Table 2. Overall segmentation accuracy BSS and averaged number of segments. BSS Number Normalized cuts 42.48% 400 Our method Iter % 830 Iter % 582 The algorithm is evaluated by measuring the accuracy of boundaries classification and final segmentation. Fig. 2 shows the precision-recall curve for detecting whether an initial boundary is an occlusion boundary. Boundaries are weighted by length in computing the precisions and recalls. We measure the overall segmentation accuracy in terms of best spatial support (BSS) score [7]. For each ground truth region, BSS is the maximum overlap score across all of the segments. It measures how well the best segment covers the region. The segmentation accuracy

is shown in Tab. 2. The algorithm is comparable to Normalized cuts, which segments each image into 400 segments.

distance matrix. (a) (b) (a) (b) (c) (d) (c) (d) (e) (f) (e) Figure 3.

(c) estimated occlusion boundaries, (d) probabilistic boundaries, (e) segmentation defined by boundaries (629 segments), (f) segmentation

Segmentation and boundary estimation are integrated in the framework.

Increasing regions provide better spatial support, that helps us to better determine whether a boundary is caused by occlusion.

The obtained promising results of boundaries extraction and seg- (f) Figure 4.

probabilistic boundaries, (e) segmentation defined by boundaries (594 segments), (f) segmentation result of iteration 2 (860 segments).

The occlusion boundary map is a probabilistic output, which can be integrated into statistical geometric models for urban scene analysis using

5 is shown in Tab. 2. The algorithm is comparable to Normalized cuts, which segments each image into 400 segments. In Normalized cuts segmentation, only log span of polarimetric SAR data is used as feature, and Euclidean distance is used in constructing the distance matrix. (a) (b) (a) (b) (c) (d) (c) (d) (e) (f) (e) Figure 3. An example of boundary result: (a) original image, with RGB colors representing HH, VV and HV channels, (b) ground truth occlusion boundaries, (c) estimated occlusion boundaries, (d) probabilistic boundaries, (e) segmentation defined by boundaries (629 segments), (f) segmentation result of iteration 2 (919 segments). 4. CONCLUSIONS This paper extracts occlusion boundaries from a highresolution SAR image in urban areas. Segmentation and boundary estimation are integrated in the framework. An iterative strategy is adopted to estimate occlusion likelihood and then threshold them to generate occlusion boundaries and segmentations. Increasing regions provide better spatial support, that helps us to better determine whether a boundary is caused by occlusion. The algorithm jointly reasons about boundaries and surfaces that influence the occlusions in SAR images. The obtained promising results of boundaries extraction and seg- (f) Figure 4. Another example of boundary result: (a) original image, (b) ground truth occlusion boundaries, (c) estimated occlusion boundaries, (d) probabilistic boundaries, (e) segmentation defined by boundaries (594 segments), (f) segmentation result of iteration 2 (860 segments). mentation are applicable to further applications, e.g. object detection. The occlusion boundary map is a probabilistic output, which can be integrated into statistical geometric models for urban scene analysis using SAR data. The occlusion boundaries will play an important role in urban understanding using SAR images. REFERENCES [1] Hoiem, D., Stein, A. N., Efros, A. A. & Hebert, M. (2007). Recovering Occlusion Boundaries from a Single Image. In International Conference on Computer Vision. [2] Hoiem, D., Efros, A. & Hebert, M. (2007). Recovering Surface Layout from an Image. International Journal of Computer Vision, 75(1), [3] Varma, M. & Zisserman, A. (2005). A Statistical Approach to Texture Classification from Single Im-

6 ages. International Journal of Computer Vision, 62(1-2), [4] Dalal, N. & Triggs, B. (2005). Histograms of Oriented Gradients for Human Detection. In IEEE Conference on Computer Vision and Pattern Recognition, 2, pp [5] Lowe, D.G. (2004). Distinctive Image Features from Scale-invariant Keypoints. International Journal of Computer Vision, 2(60), [6] Martin, D.R., Fowlkes, C.C. & Malik, J. (2003). Learning to Detect Natural Image Boundaries using Brightness and Texture. In Advances in Neural Information Processing Systems 15 (NIPS), pp [7] Malisiewicz, T. & Efros, A. (2007). Improving Spatial Support for Objects via Multiple Segmentations. In British Machine Vision Conference, pp

CS395T paper review. Indoor Segmentation and Support Inference from RGBD Images. Chao Jia Sep

CS395T paper review. Indoor Segmentation and Support Inference from RGBD Images. Chao Jia Sep CS395T paper review Indoor Segmentation and Support Inference from RGBD Images Chao Jia Sep 28 2012 Introduction What do we want -- Indoor scene parsing Segmentation and labeling Support relationships