ROBUST ROAD DETECTION FROM A SINGLE IMAGE USING ROAD SHAPE PRIOR. Zhen He, Tao Wu, Zhipeng Xiao, Hangen He

Size: px

Start display at page:

Download "ROBUST ROAD DETECTION FROM A SINGLE IMAGE USING ROAD SHAPE PRIOR. Zhen He, Tao Wu, Zhipeng Xiao, Hangen He"

Dylan Beverly Greene
6 years ago
Views:

ROBUST ROAD DETECTION FROM A SINGLE IMAGE USING ROAD SHAPE PRIOR Zhen He, Tao Wu, Zhipeng Xiao, Hangen He College of Mechatronics and Automation National University of Defense Technology Changsha,

1 ROBUST ROAD DETECTION FROM A SINGLE IMAGE USING ROAD SHAPE PRIOR Zhen He, Tao Wu, Zhipeng Xiao, Hangen He College of Mechatronics and Automation National University of Defense Technology Changsha, Hunan, P. R. China ABSTRACT Many road detection algorithms require pre-learned information, which may be unreliable as the road scene is usually unexpectable. Single image based (i.e., without any pre-learned information) road detection techniques can be adopted to overcome this problem, while their robustness needs improving. To achieve robust road detection from a single image, this paper proposes a general road shape prior to enforce the detected region to be road-shaped by encoding the prior into a graph-cut segmentation framework, where the training data is automatically generated from a predicted road region of the current image. By iteratively performing the graph-cut segmentation, an accurate road region will be obtained. Quantitative and qualitative experiments on the challenging SUN Database validate the robustness and efficiency of our method. We believe that the road shape prior can also be used to yield improvements for many other road detection algorithms. Index Terms Road detection, shape prior, graph cuts. INTRODUCTION Vision-based road detection is of high relevance for autonomous driving and pedestrian detection. Detecting the road from an onboard road image is very challenging due to the diversity of road shapes (e.g., straight, left/right curve, and winding), road materials (e.g., concrete, soil, and asphalt), and backgrounds (e.g., street, forest, and mountain), and the noise induced by varying illumination, different weather, object occlusion, etc., as shown in Fig.. With the information learned off-line from some specific road scenes, many proposed segmentation algorithms [, 2, 3, 4] perform well for the same road scenes but poorly for the others. However, the off-line learning needs to label training data manually which is time consuming and the road scene is usually unknown, especially on moving vehicles. Hence, road detection from a single image (i.e. without any pre-learned information) is needed to overcome these problems and thus will be more challenging. There has been some works on single image based road detection. In [5], Kong et al. use a vanishing-point-constrained edge detection technique to acquire straight road boundaries. This method may be inapplicable when there is no legible vanishing point (e.g., on the ascent), or the road boundaries are curved. To deal with complex road shapes, Alvarez et al. [6] use a likelihood-based region growing method to label pixels. It is factually a local method and not robust to noise, which may cause overgrowth or undergrowth. More recently, in [7], general information off-line learned from other road scenes is combined with the information on-line learned from current image to enhance the detection robustness. However, in its on-line learning procedure, the training data is taken from a fixed region (i.e. the central-bottom of the image), thus the road patterns may not be well-learned /3/$ IEEE 2757 (c) (d) (e) (f) Fig.. Examples of different roads with: Different shapes, Different materials, (c) Different backgrounds, (d) Different illumination, (e) Different weather, and (f) Object occlusion. Aiming at utilizing the prior knowledge to improve the robustness of single image based road detection, we propose a general road shape prior to enforce the detected road region to be road-shaped by encoding this prior into a graph-cut segmentation framework, where the labels are automatically generated from the current image to model the road. The final detected road region is acquired by iteratively performing the graph cuts, where the last detected road region is iterated to a current predicted one that is used for prior encoding and label generating and initialized as a semicircle at the central-bottom of the image. The road shape prior is based on the observation that in on-board road images, the varied road regions should have some shape constraints making them look like road, and thus the detected road regions should also satisfy these constraints. Unlike previous methods using limited templates that are hard to cover varied road shapes and need matching, our approach benefits from a general road shape prior that is more natural and can be encoded in a unified graph-cut framework; unlike previous online learning methods generating labels from the current image in one shot, it progressively generates a better batch of labels automatically by iterative graph cuts and thus builds a better model. An outline of the paper follows. In Sec. 2 we demonstrate how the basic graph-cut segmentation can be used for road detection, in Sec. 3 we explain how to incorporate the road shape prior in the graph-cut segmentation framework, and in Sec. 4 we detail the iterative graph cuts. Experimental results and discussion are presented in Sec. 5, while conclusions are given in Sec ROAD DETECTION USING BASIC GRAPH-CUT SEGMENTATION Road detection can be formulated as a binary labeling problem trying to assign a label (road or background) to each pixel. A popular way to such problems is to use the efficient graph-cut segmentation [8]. Let yi 0 (background), (road)} be the label of pixel i of the image, P be the set of all pixels, and N be the set of all neighboring pixel pairs. Then y = yi i P} is the collection of all ICIP 203

label assignments and defines a segmentation.

and V i,j is the smoothness term penalizing the different assignments for neighboring pixels i and j. Let I i be the feature vector of pixel i.

model, and Pr(I i B) is the background model.

feature vectors over all neighboring pixels, and dist(i, j) is the Euclidean distance between pixel i and j. Parameter λ > 0 in Eq. () weights the importance between D i and V i,j.

However, it is still a bottom-up approach and unable to capture the rich properties of the road region.

3. INCORPORATING THE ROAD SHAPE PRIOR As there are a wide variety of road shapes in the on-board road images (see Fig. ), it is hard to describe them with limited templates. However, as shown in Fig.

Hence, there may be some shape constraints making a region like a road, which we call road shape prior. (c) (d) (e) (f) (g) (h) Fig. 2. Examples of some segmented images.

The former is based on the observation that along the road axis (i.e. the centerline of the road region), the road region is perspectively shrinking from the near to the distant, while the latter

3, let l be the centerline of the white region R, and pixel i be an arbitrary point. Horizontally move l to l, and draw a horizon line h, both of which pass through the center of i.

The shrinking constraint implies that i, i R p R, while the consistency constraint implies that i, (i R) (i / l) q R. Thus, in Fig.

2 label assignments and defines a segmentation. The basic graph-cut optimizes y by minimizing the energy function: E(y) = D i(y i) + λ V i,j(y i, y j) () i P i,j N where D i is the data term penalizing the assignments that do not fit for pixel i, and V i,j is the smoothness term penalizing the different assignments for neighboring pixels i and j. Let I i be the feature vector of pixel i. Then D i is typically defined as the negative loglikelihood of a label y i being assigned to pixel i and takes the form: D i() = ln Pr(I i R) D i(0) = ln Pr(I i B) (2) where Pr(I i R) is the road model, and Pr(I i B) is the background model. The smoothness term V i,j follows the conventional form: V i,j(y i, y j) = [y i y j] exp( Ii Ij 2 2β ) dist(i, j) where [ ] is if y i y j and 0 otherwise, β denotes the mean squaredifference between feature vectors over all neighboring pixels, and dist(i, j) is the Euclidean distance between pixel i and j. Parameter λ > 0 in Eq. () weights the importance between D i and V i,j. Compared with other local methods, the basic graph-cut mainly benefits from V i,j that encodes the neighboring consistency, and its efficient global optimization. However, it is still a bottom-up approach and unable to capture the rich properties of the road region. Thus, we propose a general road shape prior and incorporate it with the basic graph-cut segmentation framework, as shown in Sec INCORPORATING THE ROAD SHAPE PRIOR As there are a wide variety of road shapes in the on-board road images (see Fig. ), it is hard to describe them with limited templates. However, as shown in Fig. 2, for a segmented image, we can easily tell whether the foreground (white region) looks like a road. Hence, there may be some shape constraints making a region like a road, which we call road shape prior. (c) (d) (e) (f) (g) (h) Fig. 2. Examples of some segmented images. Obviously, the white regions in, (d), (f), and (h) look more like roads than those in, (c), (e), and (g). Here we use the shrinking constraint and consistency constraint to describe such a prior. The former is based on the observation that along the road axis (i.e. the centerline of the road region), the road region is perspectively shrinking from the near to the distant, while the latter suggests that the region between both sides of the road should consistently belong to the road, and vice versa. Consider Fig. 3, let l be the centerline of the white region R, and pixel i be an arbitrary point. Horizontally move l to l, and draw a horizon line h, both of which pass through the center of i. p is an arbitrary pixel on l and below i, and q is an arbitrary pixel on h and between l and l (or on l). The shrinking constraint implies that i, i R p R, while the consistency constraint implies that i, (i R) (i / l) q R. Thus, in Fig. 3, the shrinking constraint is satisfied in and, the consistency constraint is satisfied in and (c), whereas (d) satisfies neither of the constraints. Our criterion is that a region is road-shaped if and only if it satisfies both of the constraints. This criterion is consistent with our (3) (c) (d) Fig. 3. Illustration of the shrinking constraint and consistency constraint. In and, i, i R p R, so the shrinking constraint is satisfied, while in and (c), i, (i R) (i / l) q R, therefore the consistency constraint is satisfied. In (d), i, (i R) (p / R), and i, (i R) (i / l) (q / R), so neither of the constraints is satisfied. Table. Comparison between our criterion and the intuition. The road-shaped regions in Fig. 2 are also road-like, and vice versa. Images in Fig. 2 (c) (d) (e) (f) (g) (h) Shrinking const. Consistency const. Road-shaped Road-like intuition (e.g., see Table ). To improve the detection robustness, we enforce the detected road region to be road-shaped by encoding both constraints into a graph-cut framework. Inspired by [9], we define a shrinking constraint term S i,p and a consistency constraint term C i,q as follows: S i,p(y i, y p) =, if y i = and y p = 0 0, otherwise C i,q(y i, y q) =, if y i = and y q = 0 0, otherwise Note that S i,p is defined on pixel pairs (i, p), while C i,q is defined on pixel pairs (i, q) (see Fig. 3). Both terms penalize the assignments that violate the corresponding constraint by taking infinite cost. In fact, it is enough to put S i,p and C i,q only between neighboring pixels. For S i,p, if every pixel pair (i, p) satisfies the shrinking constraint, then, of course, every neighboring (i, p) also satisfies it, and if there exists some pixel pair not satisfying the shrinking constraint, then there should be at least a neighboring (i, p) not satisfying it. It is similar for C i,q. We use a standard 8-neighborhood system (see Fig. 4) for generality. For every pixel i that not on l, it is easy to find a neighboring pixel q, the center of which is exactly passed through by h. While for every pixel i, l may not exactly passes through the center of any neighboring pixels of i. In this case, a neighboring pixel p will be chosen if it lies the closest to l (see Fig. 4). Fig. 4. Illustration choosing a pixel p neighbored to i from a standard 8-neighborhood system. A standard 8-neighborhood system, where 8 neighbors are connected to each pixel i. A neighboring pixel of i (i.e., p 4) is chosen as it lies the closest to l among all possible neighbors (i.e., p p 5, which are not above i). (4) 2758

(5) as: yi, if Pr(Ii R) γ Di (yi ) = (6) yi, otherwise where the threshold γ = γ0 max Pr(Ii R), and γ0 [0, ] is a fixed value. Di penalizes the assignments not fitting for the data. In Eq.

3 With constraint terms Si,p and Ci,q, E(y) becomes: E(y) = Di (yi ) + λ Vi,j (yi, yj ) i P + i,j N Si,p (yi, yp ) + i,p N (5) Ci,q (yi, yq ) i,q N As the road region is less changeable than the background and the road feature is more self-similar, we only build the road model Pr(Ii R), and redefine the data term Di in Eq. (5) as: yi, if Pr(Ii R) γ Di (yi ) = (6) yi, otherwise where the threshold γ = γ0 max Pr(Ii R), and γ0 [0, ] is a fixed value. Di penalizes the assignments not fitting for the data. In Eq. (5), Vi,j (yi, yj ) takes the same form of Eq. (3), and we fix the weighting parameter λ to. According to [0], E is submodular since every term in Eq. (5) is submodular. As E is also a quadratic pseudo boolean function, it can be exactly minimized via graph cuts. Note that both the shrinking and consistency constraints depend on the road axis, which is determined by the road region, and the road labels should also be generated from the road region. However, the road region is unknown before detection. Thus, as detailed in Sec. 4, we first initialize a predicted road region, and then iteratively use the graph cuts to update the predicted region until it converges. Fig. 5. Illustration of our iterative graph cuts for road detection. 4. ROAD DETECTION BY ITERATIVE GRAPH CUTS Our iterative graph cuts algorithm is illustrated in Fig. 5. For an arbitrary test road image, we compute the illuminant invariant ranging from 0 to 255 as its feature,as described in [6]. As the real road region is unknown, we first initialize a road region Rp (see (d), a semicircle at the central-bottom of the image). To aquire a reliable labeling region L (see (c)), we leave it with a margin of 2 ( SRp SRp /2) (SR denotes the area of R) to the background region Bp. Then, a road model (f) is built with the training data uniformly labeled on L, and the likelihood image (see (e), which is normalized) of can be acquired using (f). Next, we obtain the data terms (see (i)) according to Eq. (6). The smaller the energy, the more likely the road class. While the smoothness terms (see (h)) can be directly computed from using Eq. (3). On the other hand, we get the road axis from (d) and extend it to the image border (see (g)). Then the road shape prior, i.e. the shrinking constraint (j) and consistency constraint (k) can be added. Finally, we minimize E(y) in Eq. (5) via graph cuts. If the segmented road region Rd (see (n)) converges, i.e., Rd satisfies ε(rd ) < ε0, where ε( ) is the convergence error and ε0 is a constant, a final detected result (m) will be obtained. Otherwise, Rd will be iterated to Rp of the next cycle. Here we set ε0 to 0 3, and define ε(rd ) = (SRd Rp SRd Rp )/(SRp Bp ). An example of the iterative graph cuts is shown in Fig. 6. By iteration, a better batch of road labels are progressively generated, thus a better road model can be built. Moreover, it favors our road shape prior, which benefits the robust road detection. 5. EXPERIMENTAL RESULTS AND DISCUSSION To validate our method, three different experiments are conducted on the SUN Database [], where 500 of the 868 still on-board road images are randomly selected as test set, and also manually Fig. 6. An example of the iterative graph cuts when γ0 (see Eq. (6)) is set to 0.. A test image with its illuminant invariant and smoothness energy, both of which are computed in one shot, and the ground truth. An illustration of the iteration. A better road model (column 2) is gradually built using the labeled data taken from L (column ). As the predicted region is approaching to the real road region, the road axis also becomes accurate (column 4), and a better road shape prior can be enforced on the likelihood image (column 3) to get a better result (column 5, 6). The detected road region Rd converges in the 4th iteration where ε(rd ) = < 0 3. labeled as ground truth (no training set is needed in our algorithm). All the images are resized to for testing. The parameter γ0 of Eq. (6) is fixed to 0. by default. We use gco-v3.0 software2 [2, 3, 0] to implement the graph-cut optimization. Evaluating the performance of iteration. First, we fix the number of iterations to 4 for efficiency, as 95.9% of the segmented results converge after 4 iterations (see Table 2). Then, the iteration performance is evaluated by 4 types of pixel-wise measures (ie., the recall (T P/(T P + F N )), precision (T P/(T P + F P )), F-measure (2/(/recall + /precision)), and quality (T P/(T P + F P + F N )), which are denoted by RC, P C, F, and Q, respectively) and the running time on a 2 GHz computer with 4.0 GB RAM. As shown in Table 3, each of the four measures reaches the highest after 4 iterations,

Table 2. The ratio of converged segmented results (η) with the increasing of iterations. η reaches 95.9% after the 4th iteration. Iter. 0 2 3 4 5 6 η(%) 0.0 0.3 56.2 79.4 95.9 97.9 99.0 Table 3.

2 78.8 98 ms which take only 98 ms running time and thus can achieve real-time detection. Some qualitative results can be seen in Fig. 7. Comparing with the basic graph-cut segmentation.

4 Table 2. The ratio of converged segmented results (η) with the increasing of iterations. η reaches 95.9% after the 4th iteration. Iter η(%) Table 3. Quantitative results of the iteration performance. Iter. RC (%) P C (%) F (%) Q (%) Time ms ms ms ms ms which take only 98 ms running time and thus can achieve real-time detection. Some qualitative results can be seen in Fig. 7. Comparing with the basic graph-cut segmentation. To show that our road shape prior can help improve the performance of basic graph-cut segmentation, we evaluate our method with BGCv (Basic graph cuts with only data terms) and BGCv2 (basic graph cuts with both data and smoothness terms). Quantitative results are provided using ROC curves that represent the trade-off between true positive rate (T P/(T P + F N )) and false positive rate (F P/(F P + T N )). The data terms for basic graph cuts are got by Eq. (6) using the training data taken from a fixed region (the same as our initialized labeling region). By changing γ0 from 0 to, we obtain the ROC curves and corresponding AU C (area under the curve) (see Fig. 8). Obviously, our method outperforms BGCv and BGCv2 with the highest AU C (95.4%). Qualitative results in Fig. 9 show that the detected road regions are enforced to be road-shaped with our method. Comparing with state-of-the-art algorithms. We compare our method with two state-of-the-art single image based road detection algorithms: the vanishing point (VP) based method [5] and the likelihood-based region growing (LRG) method [6]. Quantitative results (see Table 4) show that our method outperforms both VP and LRG with the measures of RC, P C, F, and Q. Qualitative results in Fig. 9 show that VP fails when there is no legible vanishing point (row 5, 6) or the road boundaries are curved (row 2), and the LRG fails due to overgrowth (row, 3, 5) or undergrowth (row 2, 4). Yet, our method performs the best on the first five instances, benefiting from the global optimization under a general road shape constraint. Discussion. While our method enforces such a shape prior on the road segmentation task, it can also result in a degradation in performance, e.g., a fork road may not satisfy our road shape prior (see, e.g., row 6 of Fig. 9). Besides, our method fails when the illuminantinvariant feature lacks discriminability in some cases (e.g., in the last instance of Fig. 9, the illuminant-invariant feature of the road and its surroundings is similar). Thus, engineering better features and exploring more useful road priors (e.g., a more general shape prior that can handle more complex road shapes such as forks or crossings) are our interesting directions for future work. Table 4. Quantitative results of our method compared with state-ofthe-art single image road detection algorithms VP [5] and LRG [6]. RC (%) P C (%) F (%) Q (%) VP 75.9 ± ± ± ± 22.9 LRG 89. ± ± ± ± 6.6 Ours 9.2 ± 8.6 ± 9.3 ± ± 6.9 Fig. 7. Qualitative results of the iteration performance. Fig. 8. ROC curves and the corresponding AU C of our method compared with basic graph-cut methods BGCv and BGCv2. 6. CONCLUSIONS In this paper, we proposed a general road shape prior to enforce the detected region to be road-shaped by encoding it into a graph-cut framework, where the training data is automatically generated from a predicted road region of the current image. Accurate road region is obtained by iteratively performing the graph cuts. Experimental results validate the robustness and efficiency of our method. Acknowledgment. The presented research work is supported by the National Natural Science Foundation of China (Grant No ) Fig. 9. Comparisons of our method over the basic graph-cut methods (BGCv, BGCv2) and two state-of-the-art methods (VP, LRG). Each row shows a different instance. White and black regions in column 2 7 denote roads and backgrounds, respectively. Column : Test images. Column 2 6: Detection results obtained by: VP [5], BGCv, BGCv2, LRG [6], and our method, respectively. Column 7: Manually labeled ground truth. Note that the same feature (i.e., the illuminant invariant) are used in the last four methods.

5 7. REFERENCES [] J.M. Alvarez, T. Gevers, and A.M. López, Vision-based road detection using road models, in Image Processing (ICIP), th IEEE International Conference on. IEEE, 2009, pp [2] Q. Huang, M. Han, B. Wu, and S. Ioffe, A hierarchical conditional random field model for labeling and segmenting images of street scenes, in Computer Vision and Pattern Recognition (CVPR), 20 IEEE Conference on. IEEE, 20, pp [3] P. Sturgess, L. Ladicky, N. Crook, and P.H.S. Torr, Scalable cascade inference for semantic image segmentation, in BMVC, 202. [4] G. Floros and B. Leibe, Joint 2d-3d temporally consistent semantic segmentation of street scenes, in Computer Vision and Pattern Recognition (CVPR), 202 IEEE Conference on. IEEE, 202, pp [5] H. Kong, J.Y. Audibert, and J. Ponce, Vanishing point detection for road detection, in Computer Vision and Pattern Recognition (CVPR), 2009 IEEE Conference on. IEEE, 2009, pp [6] J.M. Alvarez and A.M. Lopez, Road detection based on illuminant invariance, Intelligent Transportation Systems, IEEE Transactions on, vol. 2, no., pp , 20. [7] J.M. Alvarez, T. Gevers, Y. LeCun, and A.M. Lopez, Road scene segmentation from a single image, in Proceedings of the 2th European Conference on Computer Vision. Springer, 202. [8] Y. Boykov and M.P. Jolly, Interactive graph cuts for optimal boundary & region segmentation of objects in nd images, in Computer Vision, 200. ICCV 200. Proceedings. Eighth IEEE International Conference on. IEEE, 200, vol., pp [9] O. Veksler, Star shape prior for graph-cut image segmentation, in Proceedings of the 0th European Conference on Computer Vision: Part III. Springer, 2008, pp [0] V. Kolmogorov and R. Zabin, What energy functions can be minimized via graph cuts?, Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 26, no. 2, pp , [] J. Xiao, J. Hays, K.A. Ehinger, A. Oliva, and A. Torralba, Sun database: Large-scale scene recognition from abbey to zoo, in Computer vision and pattern recognition (CVPR), 200 IEEE conference on. IEEE, 200, pp [2] Y. Boykov, O. Veksler, and R. Zabih, Fast approximate energy minimization via graph cuts, Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 23, no., pp , 200. [3] Y. Boykov and V. Kolmogorov, An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision, Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 26, no. 9, pp ,

Supervised texture detection in images

Supervised texture detection in images Branislav Mičušík and Allan Hanbury Pattern Recognition and Image Processing Group, Institute of Computer Aided Automation, Vienna University of Technology Favoritenstraße