CRACKS are common defects that can be found on surfaces

Similar documents
FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

A Binarization Algorithm specialized on Document Images and Photos

Lecture 5: Multilayer Perceptrons

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Local Quaternary Patterns and Feature Local Quaternary Patterns

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

Support Vector Machines

Edge Detection in Noisy Images Using the Support Vector Machines

Image Representation & Visualization Basic Imaging Algorithms Shape Representation and Analysis. outline

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

Fitting: Deformable contours April 26 th, 2018

An Image Fusion Approach Based on Segmentation Region

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

Lecture 13: High-dimensional Images

CS 534: Computer Vision Model Fitting

Deep Spatial-Temporal Joint Feature Representation for Video Object Detection

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Active Contours/Snakes

Data Mining: Model Evaluation

Hermite Splines in Lie Groups as Products of Geodesics

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Machine Learning: Algorithms and Applications

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation

arxiv: v2 [cs.cv] 9 Apr 2018

TN348: Openlab Module - Colocalization

Cluster Analysis of Electrical Behavior

EYE CENTER LOCALIZATION ON A FACIAL IMAGE BASED ON MULTI-BLOCK LOCAL BINARY PATTERNS

(a) Input data X n. (b) VersNet. (c) Output data Y n. (d) Supervsed data D n. Fg. 2 Illustraton of tranng for proposed CNN. 2. Related Work In segment

A Novel Adaptive Descriptor Algorithm for Ternary Pattern Textures

Shape Representation Robust to the Sketching Order Using Distance Map and Direction Histogram

Learning-based License Plate Detection on Edge Features

Using Fuzzy Logic to Enhance the Large Size Remote Sensing Images

Smoothing Spline ANOVA for variable screening

Deep learning is a good steganalysis tool when embedding key is reused for different images, even if there is a cover source-mismatch

A Gradient Difference based Technique for Video Text Detection

An efficient method to build panoramic image mosaics

A Gradient Difference based Technique for Video Text Detection

Collaboratively Regularized Nearest Points for Set Based Recognition

User Authentication Based On Behavioral Mouse Dynamics Biometrics

A Background Subtraction for a Vision-based User Interface *

Wishing you all a Total Quality New Year!

Classifier Selection Based on Data Complexity Measures *

A Clustering Algorithm for Key Frame Extraction Based on Density Peak

An Optimal Algorithm for Prufer Codes *

ALEXNET FEATURE EXTRACTION AND MULTI-KERNEL LEARNING FOR OBJECT- ORIENTED CLASSIFICATION

3D vector computer graphics

Module Management Tool in Software Development Organizations

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

Face Detection with Deep Learning

Optimizing Document Scoring for Query Retrieval

High resolution 3D Tau-p transform by matching pursuit Weiping Cao* and Warren S. Ross, Shearwater GeoServices

Histogram of Template for Pedestrian Detection

Classifying Acoustic Transient Signals Using Artificial Intelligence

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 1. SSDH: Semi-supervised Deep Hashing for Large Scale Image Retrieval

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide

Novel Fuzzy logic Based Edge Detection Technique

Support Vector Machines

Research and Application of Fingerprint Recognition Based on MATLAB

Corner-Based Image Alignment using Pyramid Structure with Gradient Vector Similarity

Scale Selective Extended Local Binary Pattern For Texture Classification

Learning Ensemble of Local PDM-based Regressions. Yen Le Computational Biomedicine Lab Advisor: Prof. Ioannis A. Kakadiaris

Feature Reduction and Selection

Problem Set 3 Solutions

SRBIR: Semantic Region Based Image Retrieval by Extracting the Dominant Region and Semantic Learning

Performance Evaluation of Information Retrieval Systems

High-Boost Mesh Filtering for 3-D Shape Enhancement

Improving Web Image Search using Meta Re-rankers

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

Comparing Image Representations for Training a Convolutional Neural Network to Classify Gender

Comparison Study of Textural Descriptors for Training Neural Network Classifiers

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Focal Loss in 3D Object Detection

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated.

Fast Computation of Shortest Path for Visiting Segments in the Plane

Fast Feature Value Searching for Face Detection

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

Solving two-person zero-sum game by Matlab

The Research of Support Vector Machine in Agricultural Data Classification

Reducing Frame Rate for Object Tracking

A Modified Median Filter for the Removal of Impulse Noise Based on the Support Vector Machines

Discriminative Dictionary Learning with Pairwise Constraints

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 25, NO. 4, APRIL

The Study of Remote Sensing Image Classification Based on Support Vector Machine

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

An Improved Image Segmentation Algorithm Based on the Otsu Method

Dynamic wetting property investigation of AFM tips in micro/nanoscale

A fast algorithm for color image segmentation

A B-Snake Model Using Statistical and Geometric Information - Applications to Medical Images

Image Alignment CSC 767

Switching Convolutional Neural Network for Crowd Counting

Robust visual tracking based on Informative random fern

Brushlet Features for Texture Image Retrieval

CLASSIFICATION OF ULTRASONIC SIGNALS

Parallel matrix-vector multiplication

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Transcription:

1498 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 28, NO. 3, MARCH 2019 DeepCrack: Learnng Herarchcal Convolutonal Features for Crack Detecton Qn Zou, Member, IEEE, Zheng Zhang, Qngquan L, Xanbao Q, Qan Wang, and Song Wang, Senor Member, IEEE Abstract Cracks are typcal lne structures that are of nterest n many computer-vson applcatons. In practce, many cracks, e.g., pavement cracks, show poor contnuty and low contrast, whch brng great challenges to mage-based crack detecton by usng low-level features. In ths paper, we propose DeepCrackan end-to-end tranable deep convolutonal neural network for automatc crack detecton by learnng hgh-level features for crack representaton. In ths method, mult-scale deep convolutonal features learned at herarchcal convolutonal stages are fused together to capture the lne structures. More detaled representatons are made n larger scale feature maps and more holstc representatons are made n smaller scale feature maps. We buld DeepCrack net on the encoder decoder archtecture of SegNet and parwsely fuse the convolutonal features generated n the encoder network and n the decoder network at the same scale. We tran DeepCrack net on one crack dataset and evaluate t on three others. The expermental results demonstrate that DeepCrack acheves F-measure over 0.87 on the three challengng datasets n average and outperforms the current state-of-the-art methods. Index Terms Lne detecton, edge detecton, contour groupng, crack detecton, convolutonal neural network. I. INTRODUCTION CRACKS are common defects that can be found on surfaces of varous types of physcal structures, e.g., the road pavement [1], [2], the wall of nuclear power plants [3], the celng of tunnels [4], etc. Reparng cracks s an mportant task for preventng the expanson of harms and keepng the safety of engneerng nfrastructures. For example, a crack on the hghway pavement wll easly become a hole n just Manuscrpt receved March 1, 2018; revsed September 15, 2018; accepted October 25, 2018. Date of publcaton October 31, 2018; date of current verson November 21, 2018. Ths work was supported n part by the Natonal Natural Scence Foundaton of Chna under Grant 61872277, Grant 61301277, and Grant 91546106, n part by the Natonal Key Research and Development Program of Chna under Grant 2016YFB0502203, and n part by the Hube Provncal Natural Scence Foundaton under Grant 2018CFB482. The assocate edtor coordnatng the revew of ths manuscrpt and approvng t for publcaton was Dr. Yonggang Sh. (Correspondng author: Qngquan L). Q. Zou, Z. Zhang, and Q. Wang are wth the School of Computer Scence, Wuhan Unversty, Wuhan 430072, Chna (e-mal: qzou@whu.edu.cn; zhangzheng@whu.edu.cn; qanwang@whu.edu.cn). Q. L s wth the Shenzhen Key Laboratory of Spatal Smart Sensng and Servce, Shenzhen Unversty, Shenzhen 518060, Chna (e-mal: lqq@szu.edu.cn). X. Q s wth the Shenzhen Research Insttute of Bg Data, Shenzhen 518172, Chna (e-mal: qxanbao@gmal.com). S. Wang s wth the Department of Computer Scence and Engneerng, Unversty of South Carolna, Columba, SC 29200 USA (e-mal: songwang@cec.sc.edu). Dgtal Object Identfer 10.1109/TIP.2018.2878966 one rany nght, whch wll then be hazardous for hghspeed vehcles. For a country lke Chna or US, there are over 100,000 Km hghway to be tested and mantaned perodcally. Automatc testng methods are greatly desred to mprove the testng effcency and reduce the cost. Crack s one of the most common defects. Fxng a crack before ts deteroraton can greatly reduce the cost of mantenance. Up to date, fully automatc crack detecton from nose background s stll a challenge. As a crack s vsually a lnear/curvlnear structure, crack detecton can be formulated as lne detecton, whch s a fundamental problem n computer vson [5] [7]. In vsual percepton, a crack can be characterzed from two perspectves. From a global perspectve, t looks lke a one-pxel wde edge n the mage, as t s thn and often holds jumpng ntensty to the background. From a local perspectve, t s a lne object that has a certan wdth. Accordngly, the crack detecton methods can be roughly dvded nto two categores: edge-detecton based ones and mage-segmentaton based ones. In the deal case, f a crack has good contnuty and hgh contrast, then tradtonal edge detecton and mage segmentaton methods could detect t wth hgh accuracy. However, n practce cracks may constantly suffer from nose n the background, leadng to poor contnuty and low contrast. For example, n the pavement mage shown n Fg. 1(a), mpulse noses brought by the gran-lke pavement texture break the crack and undermne ts contnuty, whle the shadow reduces the contrast between the crack and the background. In addton, the drecton of exposure may also mpact the magng qualty of the crack. These complcatons commonly lead to degraded performance of the tradtonal low-level feature based crack detecton methods. In recent years, deep convolutonal neural network (DCNN) has demonstrated state-of-the-art, human-compettve, and sometmes better-than-human performance n solvng many computer vson problems, e.g., mage classfcaton [8], object detecton [9], mage segmentaton [10], [11], etc. For lne detecton, DCNN-based methods have also been proposed for tasks such as edge detecton [12], [13], contour detecton [14], [15], boundary segmentaton [16], [17] and so on. These deep archtectures buld hgh-level features from low-level prmtves by herarchcally convolvng the sensory nputs. In partcular, when usng deep learnng for edge detecton, t has been observed that, the convolutonal features become coarser and coarser n the convolvng-poolng ppelne, and 1057-7149 2018 IEEE. Personal use s permtted, but republcaton/redstrbuton requres IEEE permsson. See http://www.eee.org/publcatons_standards/publcatons/rghts/ndex.html for more nformaton.

ZOU et al.: DEEPCRACK: LEARNING HIERARCHICAL CONVOLUTIONAL FEATURES FOR CRACK DETECTION 1499 Fg. 1. A real example of crack detecton usng DeepCrack. The bottom row shows the feature maps generated by convolutonal feature fuson at dfferent scales n the DeepCrack net (for the mage patch denoted by the rectangle n the nput mage). the detaled features n larger-scale layers and the abstracted features n the smaller-scale layers can be fused together to mprove the performance of edge detecton [13], [18], [19]. When usng deep learnng for mage segmentaton, for example the SegNet [20], the convolutonal features n the decoder network have been found to be useful to mprove the performance of semantc mage segmentaton, and the ndexng of poolng postons can further mprove accuracy of boundary localzaton. Inspred by these observatons, we propose to fuse the convolutonal features n both the encoder and decoder networks, and construct a new DeepCrack network for crack detecton. We buld the DeepCrack on the encoder-decoder archtecture proposed n SegNet [20]. In SegNet, a convoluton stage n the encoder network s correspondng to a convoluton stage n the decoder network, at the same scale. In DeepCrack, we frst parwsely fuse the convolutonal features of the encoder network and decoder network at each scale, whch produces the sngle-scale fused feature map, and then combne the fused feature maps at all scales nto a mult-scale fuson map for crack detecton. An example s shown n Fg. 1, the bottom row shows the fused feature maps at dfferent scales. The sparse feature n smaller scales and the contnuous feature n larger scales are fused to get better crack-detecton performance. The contrbutons of ths work le n three-fold: Our man contrbuton s the desgn of a new neural network archtecture for crack detecton. Ths new network takes full use of the nformaton of the encoder and decoder network, and bulds a tranable end-to-end network for crack detecton. In the proposed network, a convolutonal layer of the encoder network and a convolutonal layer of the decoder network at one same scale are fused to compute the tranng loss at the correspondng scale. The fuson of herarchcal convolutonal features s found to be very effectve for nferrng the cracks out from the mage background. Four datasets are constructed for performance evaluaton, where one dataset contanng 260 pavement mages s used for tranng the network, and three others are used for test. For the three test datasets, two are pavement mage datasets and one s stone surface mage dataset. The ground-truth cracks are manually labeled by human expert, and the datasets are shared to the communty to promote the research of crack detecton. Extensve experments are conducted and the results demonstrate the effectveness of the proposed method. The rest of ths paper s organzed as follows. Secton II brefly revews the related work. Secton III descrbes the deep neural network archtecture for crack detecton. Secton IV demonstrates the effectveness of the proposed method by experments. Fnally, Secton V concludes the paper. II. RELATED WORK A. Lne Detecton Lne detecton s a fundamental problem n computer vson. In a broad sense, lne detecton ncludes the edge/contour detecton and lne object detecton. When edges and contours can be bult and perceved on the gradent, the detecton of them could be treated as lne object detecton or lne groupng n the gradent map. In the past several decades, the research n edge and contour detecton has experenced three man stages. The frst stage s featured by computng the frst order or second order gradents on the pxel ntensty, where a representatve n ths stage s the Canny edge detector [21].

1500 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 28, NO. 3, MARCH 2019 In the second stage, edge detecton and contour groupng are featured by energy mnmzaton methods and mddle-level feature learnng algorthms. The global Pb [22] s a representatve learnng method for edge detecton, and sketch token [23] and structure edge detector [24] have promoted the learnng ablty to a peak n ths stage. Whle for contour detecton, the rato contour [25], level set [26], [27] and untanglng cycles [28] are part of the representatves, whch model the lne clutters wth graph, and mnmze the energy functon to nfer out the contour. In the thrd stage, the detecton of edges and contours s featured by deep learnng, e.g., the deep learnng method for edge detecton [12], [13], [29], contour detecton [14], [15], and boundary segmentaton [11], [16], [17]. In [29], lne sectons are predcted from mage patches under a deep learnng framework, and a mult-scale verson was constructed for edge detecton. In [30], DCNN feature abstracton and neghbor search are combned together to handle edge detecton and lne object extracton. In [12], the edge s detected by a deep convolutonal network n an end-to-end manner. The convolutonal features n multple convolutonal stages are found to be useful for mprovng the edge detecton results. Smlarly, n [13], rcher convolutonal features generated by a fully convolutonal network are fused to further mprove the performance. A number of lne object detecton methods have also been developed for dfferent applcaton purposes. In [31], a path votng based method was proposed for wre-lne detecton from vessel X-ray mage. The mnmal paths were calculated on mage patches, and were aggregated to construct a lne probablty map. In [32], road network extracton from satellte mages was studed by regresson learnng and optmzaton. In [33], edge detector was bult on CNNs, and used to provde nformaton for semantc mage segmentaton. The convolutonal features n dfferent scales were also nvestgated for some other applcatons, e.g., vdeo segmentaton [19] and symmetry detecton [34]. B. Crack Detecton Under a normal llumnance, a crack s generally darker than the background. Therefore, the mage thresholdng s a straghtforward way for crack detecton. For example n [35], the threshold value was fgured out by examnng the dfference between the cracks and ther neghborng non-crack pxels. In [36], the threshold value was calculated n a heurstc way. However, pavement shadows and uneven llumnatons would undermne the robustness of the thresholdng-based methods. As the crack s thn and dsplays as an edge, many methods stemmed from edge detecton and wavelet transformaton have been developed for crack detecton [37] [40]. However, the edge nformaton would easly be tangled by heavy nose. As a branch of energy mnmzaton methods, mnmal path searchng has also been studed for crack detecton. In [35] and [41], seed-growng methods bult on mnmal path searchng were proposed for pavement crack detecton. In [42], mnmal path searchng was performed n a path-votng way. In [43], the mnmal path searchng was used to track cracks n complex background. In these mnmal-path-based methods, the man lmtaton s that the seed ponts for path trackng should be set n advance. Machne learnng based methods have also been nvestgated for crack detecton. In [2], deep convolutonal neural network was used to classfy the mage patches nto crack blocks and non-crack ones. In [4], the detecton of brdge cracks was studed by usng a modfed actve contour model and greedy search-based support vector machne. In [3], fully convolutonal neural networks were studed to nfer cracks of nuclear power plant usng mult-vew mages. Many other methods were also proposed for crack detecton, e.g., the salency detecton method [44], the structure analyss methods by usng the mnmal spannng tree [45] and the random structure forest [46]. Generally, deep learnng based methods produce better results than tradtonal methods. However, there stll lacks nvestgaton on end-to-end tranable CNN models for robust crack detecton. III. DEEPCRACK NETWORK In ths secton, we ntroduce frst the archtecture of the DeepCrack, then the desgn of the loss functon, and fnally the dfference of DeepCrack wth other deep convolutonal networks. A. Network Archtecture The DeepCrack network s bult on the SegNet network [20]. SegNet s a deep convolutonal encoder-decoder archtecture desgned for pxel-wse semantc segmentaton, whch contans an encoder network and a correspondng decoder network. The encoder network s nspred by the convolutonal layers n the VGG16 network [47], whch conssts of 13 convolutonal layers and 5 down-samplng poolng layers. The decoder network also has 13 convolutonal layers, and each decoder layer has a correspondng layer n the encoder network. Thus, the encoder network s almost symmetrc to the decoder network, where the only dfference s that, the frst encoder layer,.e., the frst convoluton operaton, produces a mult-channel feature map, and the correspondng last decoder layer,.e., the last convoluton operaton, produces a c-channel feature map, wth c the number of classes n the mage segmentaton task. After each convoluton operaton, a batch-normalzaton step s appled to the feature maps. The max-poolng operaton wth a strde larger than 1 can reduce the scale of feature maps whle not causng translaton varance over small spatal shfts, but the sub-samplng wll cause a loss of spatal resoluton, whch may lead to the bas of boundares. To avod the absence of detal representaton, max-poolng ndces are used to capture and record the boundary nformaton n the encoder feature maps when sub-samplng s performed. Then, n the decoder network, the correspondng decoder layer uses the max-poolng ndces to perform non-lnear up-samplng. Ths up-samplng step wll produce sparse feature maps. However, compared wth contnuous and dense feature maps, the sparse feature maps obtan more precse locaton of regon boundares. Meanwhle, due to the nature of herarchcal learnng of deep convolutonal neural networks, mult-scale convolutonal

ZOU et al.: DEEPCRACK: LEARNING HIERARCHICAL CONVOLUTIONAL FEATURES FOR CRACK DETECTION 1501 Fg. 2. An llustraton of the DeepCrack network. The feature maps of the encoder network and decoder network are parwsely connected and fused at each convoluton stage, whch produces fused maps of dfferent scales. At each scale, the pxel-wse predcton loss s calculated by a skp-layer fuson procedure, ndependently. Meanwhle, the fused maps at all scales are concatenated and fused to product a mult-scale fuson map, whch s the output of the DeepCrack network. Ths output s a crack probablty map for crack detecton. features can be learnt n the form of ncreasngly larger receptve felds n the down-sampled layers. The fuson of the multscale convolutonal features has been proved to be useful for mprovng the performance of lne detectors [13], [16], [18]. In ths work, we consder the scale changes caused by both the poolng operaton and upsamplng operaton, and buld the DeepCrack on the SegNet s encoder-decoder archtecture. In SegNet, there exst fve dfferent scales, whch correspond to 5 down-samplng poolng layers. In order to utlze both sparse and contnuous feature maps n each scale, the DeepCrack conducts a skp-layer fuson to connect the encoder network and decoder network. As llustrated n Fg. 2, the convolutonal layer before the poolng layer at each scale n the encoder network s concatenated to the last convolutonal layer at the correspondng scale n the decoder network. The skp-layer fuson handles the concatenated convolutonal features wth a sequence of operatons. Fgure 3 llustrates the skp-layer fuson n detals. Frst, the feature maps from encoder network and decoder network are concatenated, followed by a 1 1 conv layer whch decreases the mult-channel feature maps to 1 channel. Then, n order to calculate pxel-wse predcton loss n each scale, a deconv layer s added to up-sample the feature map and a crop layer s used to crop the up-samplng result nto the sze of the nput mage. After these operatons, we can get the predcton maps of each scale wth the same sze of the ground-truth crack maps. The predcton maps generated n the fve dfferent scales are further concatenated, and

1502 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 28, NO. 3, MARCH 2019 Fg. 3. An llustraton of skp-layer fuson at scale K. In each scale, the last conv layer n the encoder network and the last conv layer n the decoder network are concatenated, followed by a 1 1 conv layer wth 1-channel output. Then, a deconv layer s used to up-sample the feature map. After cropped nto the sze of the label map, the output s passed to a sgmod cross-entropy layer to calculate the loss. a1 1 conv layer s added to fuse the outputs at all scales. As last, we can obtan the predcton maps at each skp-layer fuson and the overall fused layer n the end. B. Loss Functon Gven a tranng data set contanng N mages as S = {(X n, Y n ), n = 1,..., N}, wherex n ={x (n), = 1,..., I} denotes the raw nput mage, Y n = {y (n), = 1,..., I, y (n) ɛ{0, 1}} denotes the ground-truth crack label map correspondng to X n, I denotes the number of pxel n every mage, our goal s to tran the network to produce predcton maps approachng the ground truth. In the encoder-decoder archtecture, let K be the number of convoluton stages, then at the stage k, the feature map generated by the skp-layer fuson can be formulated as F (k) = {f (k), = 1,..., I}, where k = 1,..., K. Further, the mult-scale fuson map can be defned as F fuse ={f fuse, = 1,..., I}. Dfferent from semantc segmentaton on Pascal VOC, there are only two classes n crack detecton, whch can be seen as a bnary classfcaton problem. We adopt a cross entropy loss to measure the predcton error. Generally, the groundtruth crack pxels stand as a mnorty class n the crack mage, whch makes t an mbalance classfcaton or segmentaton. Some works [12], [13] deal wth ths problem by addng larger weghts to the mnorty class. However, n crack detecton, we fnd that larger weghts addng to the cracks wll result n more false postves. Thus, we defne the pxel-wse predcton loss as { log(1 P(F ; W)), f y = 0, l(f ; W) = (1) log(p(f ; W)), otherwse, where F s the output feature map of the network n pxel, W s the set of standard parameters n the network layers, and P(F) s the standard sgmod functon, whch transforms the feature map nto a crack probablty map. Then, the total loss can be formulated as I K L(W) = ( l(f (k) ; W) + l(f fuse ; W)). (2) =1 k=1 C. Comparson Wth Other Archtectures The proposed DeepCrack has two man dfferences wth the orgnal SegNet [20]. Frst, the orgnal SegNet has no connecton between the convolutonal features n the encoder network and decoder network, whch would cause sparse outputs. In DeepCrack, skp-layer fuson s appled to connect the encoder network and decoder network. Second, the orgnal SegNet s desgned for semantc segmentaton, whch sets up a softmax loss layer to measure the predcton error n each object channel. Whle n the DeepCrack network, the output s a 1-channel predcton map that ndcates the probablty of each pxel belongng to the crack by usng a cross-entropy loss. DeepCrack s also qute dfferent wth U-Net [11]. U-Net performs skp-layer fuson by copyng convoluton layers n an early stage as a part of a correspondng later stage n the man network, whch results n a sole loss. DeepCrack performs skp-layer fuson at each stage ndependently and assgns t a loss, whch leads to multple losses, and to effectve capturng nformaton of thn objects at each scale. Compared wth DeepEdge [29], DeepContour [14] and N 4 -Felds [30] whch perform convoluton on mage patches, DeepCrack performs convoluton on the whole mage and generates results n an end-to-end manner. We also compare the DeepCrack network wth two endto-end deep edge detecton archtectures,.e., HED [12] and RCF [13]. Both HED and RCF have ther man archtectures bult on VGG16, whch s smlar to the encoder network n DeepCrack. Besdes the lack of the pool5 layer, RCF changes the strde of pool4 layer to 1 and uses the atrous algorthm to fll the holes. In the fve convoluton stages, HED connects the last convoluton layers n each scale to produce the fused

ZOU et al.: DEEPCRACK: LEARNING HIERARCHICAL CONVOLUTIONAL FEATURES FOR CRACK DETECTION 1503 predcton map, whle RCF connects all convoluton layers n each scale at frst, and then fuses mult-scale feature maps. Dfferent from HED and RCF whch do not have a correspondng decoder network, the proposed DeepCrack parwsely fuses convolutonal features n the encoder network and decoder network at the same scale. Due to the absence of sparse and non-lnear up-samplng features n the decoder network, feature maps generated by HED and RCF are often contnuous and dense, whch would lead to naccurate localzaton and error predcton. We wll llustrate ths pont n the experment. IV. EXPERIMENTS AND RESULTS In ths secton, we frst ntroduce the expermental settngs, and then report crack detecton results obtaned by DeepCrack and the comparson methods. At last, we nvestgate the performance of DeepCrack at dfferent settngs. A. Expermental Settngs 1) Implementatons Detals: We mplement our network usng the publcly avalable Caffe [48] whch s well-known n ths communty. In our network, batch normalzaton s used after each convolutonal layer n both the encoder and decoder network, whch s convnced to speed the convergence n tranng process. The weghts of conv layer n the entre network are ntalzed by the msra method and the bases are ntalzed to 0. The up-samplng operaton n decoder network s acheved by usng the poolng ndces stored n max-poolng layer, and n the skp-layer fuson s conducted by b-lnear nterpolaton. In tranng, the ntal global learnng rate s set to 1e-5 and wll be dvded by 10 after every 10k teratons. The momentum and weght decay are set to 0.9 and 0.0005, respectvely. The stochastc gradent descent method (SGD) s employed to update the network parameters wth mn-batch sze of 2 n each teraton. We tran the network wth 100k teratons n total. All experments n ths paper are performed by usng a sngle GeForce GTX TITAN-X GPU. 2) Datasets 1 : Four crack datasets are used n ths study, n whch the pavement crack dataset CrackTree260 s used for tranng the deep networks, and the other three ones are used for test. The mages n the test datasets share the same sze of 512 512. The ground-truth cracks are annotated by four persons usng a specalzed labelng tool. CrackTree260 It contans 260 road pavement mages - an expanson of the dataset used n [45]. These pavement mages are captured by an area-array camera under vsble-lght llumnaton. We use all 260 mages for tranng. Data augmentaton has been performed to enlarge the sze of the tranng set. We rotate the mages wth 9 dfferent angles (from 0-90 degrees at an nterval of 10), flp the mage n the vertcal and horzontal drecton at each angle, and crop 5 submages (wth 4 at the corners and 1 n the center) on each flpped mage wth a sze of 512 512. After augmentaton, we get a tranng set of 35,100 mages n total. CRKWH100 It contans 100 road pavement mages captured by a lne-array camera under vsble-lght 1 https://stes.google.com/ste/qnzoucn llumnaton. The lne-array camera captures the pavement at a ground samplng dstance of 1 mllmeter. CrackLS315 It contans 315 road pavement mages captured under laser llumnaton. These mages are also captured by a lne-array camera, at the same ground samplng dstance. Stone331 It contans 331 mages of stone surface. When cuttng the stone, cracks may occur on the cuttng surface. These mages are captured by an area-array camera under vsble-lght llumnaton. We produce a mask for the area of each stone surface n the mage. Then the performance evaluaton can be constraned n the stone surface. 3) Evaluaton Metrcs: For each mage, Precson and Recall can be calculated by comparng the detected cracks aganst the human annotated ground truth. Then, the F-measure ( 2 Precson+Recall Precson Recall ) can be computed as an overall metrc for performance evaluaton. Specfcally, three dfferent F-measure-based metrcs are employed n the evaluaton: the best F-measure on the data set for a fxed threshold (ODS), the aggregate F-measure on the data set for the best threshold on each mage (OIS), and the average precson (AP), whch s equvalent to the area under the precson-recall curve [24]. Consderng that cracks have a certan wdth, a detected crack pxel s stll taken as a true postve f t s no more than 2 pxels away from human annotated crack curves. 4) Comparson Methods: We compare the performance of DeepCrack wth current state-of-the-art methods. In these methods, the CrackTree s a tradtonal low-level feature based method, and the other ones are deep learnng based methods. HED [12]. It fuses mult-scale convolutonal features by usng the last convolutonal feature map at each stage n VGG16. We tran HED on CrackTree260. RCF [13]. It fuses mult-scale convolutonal features by usng all convolutonal feature maps at each stage n VGG16. We tran RCF on CrackTree260. SegNet [20]. It acheves an end-to-end learnng and segmentaton by sequentally usng an encoder network and a decoder network. We tran SegNet on CrackTree260. SRN [34]. It s orgnally desgned for end-to-end object symmetry detecton, whch uses the smlar feature fuson strategy n HED. We tran SRN on CrackTree260. U-Net [11]. It performs skp-layer fuson for end-to-end boundary segmentaton and formulates the tranng target wth one sngle loss. We tran U-Net on CrackTree260. SE [24]. It learns edges and lne structures usng the random decson forests. We tran SE on CrackTree260 by usng a number of 8 decson trees and default parameters released by [24]. CrackTree [45]. It s a method specfcally desgned for pavement crack detecton. The edge-length threshold for graph constructon s 10, and the tree-prunng threshold s 50, for all test mages. CrackForest [46]. It uses SE archtecture to generate the crack map, and post-processes the crack map to obtan the fnal crack. DeepCrack. DeepCrack s traned on CrackTree260. Note that, the results generated by RCF, HED, SRN and SE are thck crack maps, as shown n Fg. 4, whch requre to be

1504 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 28, NO. 3, MARCH 2019 Fg. 4. Crack maps produced by dfferent methods. Note that, DeepCrack, SegNet [20] and U-Net [11] produce thn crack maps, and RCF [13], SRN [34], HED [12] and SE [24] produce thck crack maps. Fg. 5. Precson-Recall curves on the three test datasets. (a) CRKWH100. (b) CrackLS315. (c) Stone331. post-processed. As n these methods, we employ the standard non-maxmum suppresson (NMS) [24] to thn the soft crack maps, and take the post-processed results n the comparson. For the results generated by DeepCrack, they are already thn crack maps and can be drectly evaluated. B. Overall Performance Fgure 5 shows the precson-recall curves of nne methods on the three test datasets, where sx methods are deeplearnng-based. A small rectangle has been plotted at the poston correspondng to the best F-measure for each curve. As the CrackTree and CrackForest methods produce hard crack curves, they are marked as pont (denoted by a trangle) on the chart usng the average precson and recall values. 1) CRKWH100: It can be seen from Fg. 5(a) that, Deep- Crack holds a curve most close to the up-rght corner n the chart, and acheves the best precson and recall values, as denoted by the best F-measure rectangle. The performances of RCF, SRN and HED are very close. SegNet shows the lowest performance among these deep learnng methods, whch ndcates that the combnaton of convolutonal features n dfferent scales s an effectve way to mprove the crackdetecton performance. Note that, the deep learnng based methods acheve sgnfcant boost performance over the lowlevel feature based methods - CrackTree, CrackForest and SE. Table I shows the quanttatve results of the comparson methods. The best result s acheved by DeepCrack, wth an ODS F-measure value of 0.9095. Comparng to RCF, SRN, HED and U-Net, there are 4.74%, 4.93%, 6.92% and 6.36% performance mprovement on ODS, respectvely. Although the performance of SegNet s relatvely lower than other deep learnng methods, t stll acheves ODS value of 0.8184.

ZOU et al.: DEEPCRACK: LEARNING HIERARCHICAL CONVOLUTIONAL FEATURES FOR CRACK DETECTION 1505 TABLE I QUANTITATIVEEVALUATION OF DIFFERENT METHODS ON THE THREE TEST DATASETS Among the methods, none of the three low-level feature based methods - CrackTree, CrackForest and SE acheves an ODS value over 0.7, whle the value of SE s 0.6888 and CrackTree obtans the lowest ODS value 0.6269. 2) CrackLS315: Images n ths dataset are captured under laser llumnaton, whch makes them more dfferent wth the tranng mages than that n CRKWH100. The precsonrecall curves are shown n Fg. 5(b). DeepCrack acheves the best performance on CrackLS315. HED, SRN, RCF and SegNet both show commendable results, whle RCF has better performance than HED, SRN and SegNet. It can be observed from Table I that, the ODS of DeepCrack reaches up to 0.8449 that outperforms all compared methods. RCF holds an ODS value of 0.7878, whch ranks the second. The ODS of HED, SRN, SegNet and U-Net, are 8.16%, 9.00%, 8.39% and 17.31% lower than the results of DeepCrack, respectvely. And compared wth CrackTree, DeepCrack obtans an mprovement of 20.20% n terms of ODS. The performance of SE suffers a surprsng declne on ths dataset, whch only holds an ODS value of 0.4586. 3) Stone331: It can be seen from Fg. 5(c), DeepCrack outperforms other comparson methods, whch has an ODS value of 0.8559. Surprsngly, the second rank s acheved by SegNet wth an ODS value of 0.7938, whch acheves a weak mprovement of 0.52% than RCF. The other deep learnng methods, such as HED, SRN, and U-Net, obtan better performance than tradtonal low-level feature based methods. We also make vsual comparsons on the results. In Fg. 6, crack-detecton results of sx typcal nput mages are gven for the proposed method and the comparson methods. In the frst two columns, the nput mages selected from CRKWH100 contan shadows and obvous nose. DeepCrack can stll generate a crack map very close to the ground truth. In the mddle two columns, two mages are selected from CrackLS315, one contans tny cracks and the other contans cracks embedded n the road lane, whch can hardly be observed wthout a careful nspecton. It can be seen that, all these methods can detect the tny crack. However, except for DeepCrack, the other methods produce many false detectons. For the stone surface mages n the last two columns, DeepCrack obtans crack-detecton results close to the ground truth, whle the comparson methods suffer from many false postves. Among the deep models, RCF, HED and SRN produce thck crack maps, whle DeepCrack, SegNet and U-net produce thn crack maps, as llustrated by Fg. 4. One possble reason s that, the bone networks of DeepCrack, SegNet and U-Net contan an almost symmetrcal decoder network correspondng to the encoder network. The decoder network explctly up-samples the feature maps stage by stage, resultng n an output wth the same sze of the nput, whch can help recover thn crack structures, as mposed by the ground truth. The bone networks of RCF, HED and SRN do not contan a decoder network, thus are less capable of producng thn cracks. In Fg. 6, DeepCrack, SegNet and U-Net are observed to be able to suppress more background artfacts than HED, RCF, SRN, whch ndcates that the decoder network can also mprove the precson of crack predcton. As the DeepCrack fuses low-level and hgh-level features n the convoluton stages of dfferent scales, t can further mprove the precson of crack extracton and robustness of background-artfacts suppresson. U-net also adopts skp-layers, but t apples one sole loss workng on the fnal predcton, whch makes t hard to converge and easly to product ncomplete predcton. In summary, better results can be acheved by the Deep- Crack, whch fuses the mult-scale convolutonal features n both the encoder and decoder networks. More results of the proposed method have been shown n Fg. 11. C. Constructng DeepCrack Wth Dfferent Scales In prevous work, the expermental results demonstrate that DeepCrack has notable advantages over other compared methods n crack detecton. In ths part, we study the effect of fusng the mult-scale convolutonal features. Specfcally, we want to know how mportant each scale s n the multscale fuson archtecture. So at each tme, we remove one scale connecton of all fve scales, and re-tran the modfed model wth the same parameter settng. We repeat ths modfcaton fve tmes and get fve ncomplet mult-scale DeepCrack models. Fnally, we test these models on the above three test datasets. It can be seen from Fg. 7, removng a skp-layer connecton of any scale wll result n a decreased performance. It ndcates that each scale makes a contrbuton to mprove the fnal results. Meanwhle, the connecton n scale one s observed to have sgnfcant contrbuton to the fnal result. It s because that, the scale one has the same resoluton wth the nput mage and holds most of the crack detals. For further exploraton, we set dfferent weghts to dfferent scales to test how t nfluences the performance of DeepCrack.

1506 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 28, NO. 3, MARCH 2019 Fg. 6. Comparson of results obtaned by dfferent methods on sx sample mages (from left to rght) selected from CRKWH100, CrackLS315 and Stone331, respectvely (wth two mages from each dataset). Note that, the results of HED, SRN, RCF and SE have been post-processed by NMS. The ground-truth cracks have been hghlghted n blue.

ZOU et al.: DEEPCRACK: LEARNING HIERARCHICAL CONVOLUTIONAL FEATURES FOR CRACK DETECTION 1507 Fg. 7. Comparson of DeepCrack wth ts modfed versons by removng the nformaton from a convoluton scale. TABLE II PERFORMANCE OF DEEPCRACK BY SETTING DIFFERENT WEIGHTS TO THE LOSS AT DIFFERENT SCALES For a clear presentaton, we rewrte the loss functon Eq. (2) as: I K L(W) = ( α (k) l(f (k) ; W) + l(f fuse ; W)), (3) =1 k=1 where α (k) denotes the weght placed on the scale k (1 k 5). Whle t s very dffcult to fnd an optmal parameter settng, we choose several representatve parameter settngs to explore the nfluence of dfferent weghts on the scales. The parameter settngs of α (k) and the correspondng results are lsted n Table II. Four equal-rato seres are used, wth the rato of 1/3, 1/2, 1, and 2, respectvely. In the frst two cases, larger weghts are set to smaller scales, whle n the last case, larger weghts are set to larger scales. In Table II we can see that, the rato of 1/3 produces lower results than the rato of 1/2, and the rato of 1/2 produces lower results than the rato of 1. It ndcates that, the nformaton fused at larger scales do make some contrbutons to the fnal results. When gvng larger weghts to smaller scales, a rato of 2 brngs no performance mprovement, and on the contrary leads to a subtle lower performance than the standard verson,.e., a rato of 1. It smply ndcates that the scale one holds a domnant nfluence on predctng the cracks and settng larger weghts on other scales wll do no good to mprove the performance. D. Tranng DeepCrack Wth and Wthout Pre-Traned Model In ths part, we study wth experments to fnd whether t s better to tran DeepCrack from the pre-traned model or from scratch. As a matter of fact, great dfference exsts between Fg. 8. DeepCrack results wth and wthout pre-traned model. Note that, the (ft) denotes the verson wth fne tune. the crack mages and the natural mages, especally the nature mages of ImageNet and Passcal. The natural mages are often colorful and contan vsually recognzable object(s), whle the crack mages are always grayscale and often contan heavy mpulse noses. We compare the results of DeepCrack traned from scratch and fne-tuned on pre-traned SegNet model on PASCAL VOC2012. The results have been plotted n Fg. 8. It shows that the model traned from scratch obtans better performance than that traned from pre-traned model, on all the three test datasets. It may be because that the pretraned model s well ft for nature mage segmentaton and s mpossble or very dffcult to be fne-tuned for crack detecton.

1508 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 28, NO. 3, MARCH 2019 TABLE III QUANTITATIVE EVALUATION OF DEEPCRACK BY USING DIFFERENT SETTINGS Fg. 9. Dfferent loss weghts on the crack and the background. E. Influence of Uncorrect Labels and Upsamplng Strateges We also conduct several experments to explore the senstvty of DeepCrack to nosy ground-truth crack labelng. Frst, for each mage, we randomly reduce 20% of the ground-truth crack pxels, and add 20% of nosy ground-truth crack pxels, and name the retraned models as DeepCrack-reduce (20%) and DeepCrack-nose (20%), respectvely. Second, for each mage, we random shft the crack labelng left, rght, up and down, wth 4 and 6 pxels, and name the retran models as DeepCrack-bas (4 pxels) and DeepCrack-bas (6 pxels), respectvely. The test results are presented n the Table III. From Table III we can see, on CRKWH100 and CrackLS315, reducng 20% of the ground-truth pxels or addng 20% nosy crack labels has very lttle nfluence on DeepCrack s performance. On Stone331, addng nosy crack labels wll brng lttle affect, but reducng ground-truth crack labels leads to a declned performance. The results show that DeepCrack s generally not senstve to nosy crack labels, and s less senstve on CRKWH100 and CrackLS315 than on Stone331. The reason may be that, the DeepCrack s traned on pavement mages, therefore t s more robust n handlng pavement mages than n handlng stone mages. From Table III we can also see, shftng the crack labels wth 4 pxels leads to largely decreased performance on all three datasets, and shftng wth 6 pxels leads to even worse results. It ndcates that the proposed method s senstve to the spatal bas of ground truth. To explore the nfluence of max-poolng ndces used n the upsamplng operaton, we replace the max-poolng ndces wth blnear nterpolaton for upsamplng the feature maps. We retran the model and predct cracks on the three test datasets. As shown n Table III, the upsamplng wth blnear Fg. 10. Detecton of brght cracks usng DeepCrack. nterpolaton results n a declned performance as compared to the case wth max-poolng ndces (n the last row of Table III). It ndcates that the max-poolng ndces are helpful to locate the crack pxels n the upsamplng procedure. F. Dfferent Weghts on the Crack and Non-Crack Background In secton III-B, we formulate the loss functon by addng the same weght to the crack label and the background, although the dstrbuton of them are mbalanced. We wll show the advantage of ths settng wth experments. Specfcally, we re-defne the weghted loss functon as Eq. (4), 2α l(f ; W) = α + β log(1 P(F ; W)), f y = 0, 2β (4) α + β log(p(f ; W)), otherwse, where α and β are dfferent weghts addng to the background and the cracks, respectvely. We set the label of pxel belongng to background as y = 0 and crack as y = 1. For a convenent comparson, we set the weght of background α to be 1 and set dfferent values of β wth {1, 10, 50, 100}. Notce that, when β = 1, the weghted loss functon s equvalent to the loss functon defned by Eq. (1), whch s called the standard weght. We also make comparson wth the balance weght settng used n [12], where α s the number of ground-truth crack pxels and β s the number of background pxels. We call t the balance weght. It can be seen from Fg. 9, DeepCrack equpped wth small weght (β <1) wll have decreased performance, whch ndcates that t s not good to allocate more weght to non-crack background. When larger weghts are set to the crack, lower ODSs can be observed on CRKWH100 and

ZOU et al.: DEEPCRACK: LEARNING HIERARCHICAL CONVOLUTIONAL FEATURES FOR CRACK DETECTION Fg. 11. 1509 More results of DeepCrack on the three test datasets. Full resoluton results can be accessed at our webste. CrackLS315. Compared wth the balance weght, the standard weght generally obtans hgher ODS. The reason s that, when a larger weght s gven to the crack, false negatve predcton wll receve heaver punshment. As a result, more pxels wll be predcted as crack, whle not brngng much mpact on the whole loss. Thus, when placng larger weghts to the crack, the overall performance wll not be mproved but get undermned. On the Stone331, rregular varatons of ODS can be observed, as compared wth that on CRKWH100 and CrackLS315. It may be because the DeepCrack model s traned on pavement mages, the rule got on pavement mages s not strctly consstent wth that on stone mages. Such results ndcate that addng dfferent weghts to crack and noncrack background wll not guarantee a stable mprovement on DeepCrack s performance. Fg. 12. Sample edge-detecton results on BSDS500. G. Detecton of Brght Cracks In the experments we fnd that, the DeepCrack model traned on CrackTree260 cannot detect brght cracks. The reason we guess s that there are very few brght cracks n the tranng dataset. To justfy ths pont, we nverse the brghtness of the tranng mages, such that cracks n them wll have hgher ntensty than the background and dsplay as brght cracks. We retran DeepCrack wth the new tranng dataset. We select four orgnal mages from CRKWH100 dataset that contan brght cracks, and perform crack detecton usng the newly traned model. The results are shown n Fg. 10. It can be seen that, DeepCrack can well handle the brght cracks. H. Runnng Effcency The proposed DeepCrack, as well as HED, SRN, RCF and SegNet, does not have fully connected layers, whch

1510 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 28, NO. 3, MARCH 2019 leads to largely reduced weght parameters. In the test, these networks do not have to suffer the heavy computaton load of gradent calculaton as n the tranng. Thus, DeepCrack and the four others can effcently predct crack maps. It can be seen from Table I (last column), DeepCrack handles mages of 512 512 at a speed of 6 FPS, exactly 0.153 second per mage. SegNet s a lttle faster than DeepCrack, whch needs 0.141 second per mages. Wth less network layers, HED, RCF and SRN can acheve even faster speeds at about 25 FPS, 20 FPS and 17 FPS. For the tradtonal methods, SE and CrackForest can process about 5 mages and 4 mages n one second, respectvely and CrackTree needs 2 second to process one mage n average. Note that, the runnng tme for HED, RCF, SRN and DeepCrack s based on a GeForce GTX TITAN-X GPU, and the runnng tme for SE, CrackTree and CrackForest s based on a 2.3GHz E5-2630 CPU. V. CONCLUSION In ths work, a novel end-to-end tranable convolutonal network - DeepCrack - was proposed for crack detecton. In DeepCrack, convolutonal features at each scale were parwsely fused, and the fused feature maps at all scales were further fused nto a mult-scale feature-fuson map for crack detecton. For performance evaluaton, four crack datasets were constructed. Under the same evaluaton protocol, one dataset was used for tranng, and the other three datasets were used for test. Expermental results showed that, the proposed DeepCrack acheved over 0.87 ODS F-measure value on the test datasets n average, and outperformed the competng methods that do not have a decoder network. It ndcates that the convolutonal features n the encoder and decoder networks are both useful for crack detecton. Expermental results also showed that the DeepCrack was not senstve to nosy crack labelng and could well handle brght cracks. APPENDIX DEEPCRACK S PERFORMANCE ON OTHER TASKS We also examne the capablty of DeepCrack on two other lne-detecton tasks. One s for edge detecton, and the other s for vessel detecton. A. Edge Detecton On BSDS500, we augmented the 300 tranng mages to tran DeepCrack, and used the other 200 mages for test. In Fg. 13, we can see DeepCrack got an ODS of 0.778, whch s slghtly hgher than DeepEdge and DeepContour, but lower than RCF (wth NMS) and HED (wth NMS). However, DeepCrack obtans better results than RCF and HED on some mages, for example the ones shown n Fg. 12. From Fg. 12 we can see, DeepCrack produces clean edge maps whle the HED and RCF produce thck ones. And the DeepCrack s found to be talent n detectng thn edges, and would omt fne structures, whch leads to relatvely lower recall n edge detecton. However, ths characterstc makes DeepCrack more sutable for crack detecton from nose and gran-lke texture background. Fg. 13. Edge-detecton performance on BSDS500. Fg. 14. Results on DRIVE dataset. Row 1: retnal vessel mages. Row 2: ground truth labeled by human expert. Row 3: results produced by DeepCrack. B. Vessel Detecton Vessel detecton/segmentaton s an mportant task n medcal mage processng. We run the proposed DeepCrack on the DRIVE dataset [49] for retnal vessel detecton. The DRIVE contans 20 mages for tranng and 20 mages for test. Snce the number of tranng samples s too small, we random select 15 mages from the test set and add them to the tranng set. Then, the remanng 5 mages are used for test, as shown n the top row of Fg. 14. We conduct data augmentaton to the 35 tranng mages, where one mage s augmented nto 54 mages, and a number of 1,890 mages are used to tran the DeepCrack. The results are dsplayed n Fg. 14. It s surprsng that the DeepCrack model traned on such a small-scale dataset presents very good performance on detectng the man blood vessels structures. Some small vessel branches are found to be mssed. We thnk ths would be solved by gvng enough tranng data to DeepCrack. ACKNOWLEDGEMENT The authors would lke to thank Yuanhao Yue and Qn Sun from Wuhan Unversty for ther help n labelng the crack ground-truth and plottng some of the fgures.

ZOU et al.: DEEPCRACK: LEARNING HIERARCHICAL CONVOLUTIONAL FEATURES FOR CRACK DETECTION 1511 REFERENCES [1] C. Koch, K. Georgeva, V. Kasreddy, B. Aknc, and P. Feguth, A revew on computer vson based defect detecton and condton assessment of concrete and asphalt cvl nfrastructure, Adv. Eng. Inform., vol. 29, no. 2, pp. 196 210, 2015. [2] L. Zhang, F. Yang, Y. D. Zhang, and Y. J. Zhu, Road crack detecton usng deep convolutonal neural network, n Proc. IEEE Int. Conf. Image Process., Sep. 2016, pp. 3708 3712. [3] S. J. Schmugge, L. Rce, J. Lndberg, R. Grzzy, C. Joffey, and M. C. Shn, Crack segmentaton by leveragng multple frames of varyng llumnaton, n Proc. IEEE Wnter Conf. Appl. Comput. Vs. (WACV), Mar. 2017, pp. 1045 1053. [4] Z.Qu,L.Ba,S.-Q.An,F.-R.Ju,andL.Lu, Lnngseamelmnaton algorthm and surface crack detecton n concrete tunnel lnng, J. Electron. Imag., vol. 25, no. 6, p. 063004, 2016. [5] J. Geusebroek, A. W. M. Smeulders, and H. Geerts, A mnmum cost approach for segmentng networks of lnes, Int. J. Comput. Vs., vol. 43, no. 2, pp. 99 111, 2001. [6] A. Sron E. Türetken, V. Lepett, and P. Fua, Multscale centerlne detecton, IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 7, pp. 1327 1341, Jul. 2016. [7] Z. Zhang, F. Xng, X. Sh, and L. Yang, Semcontour: A semsupervsed learnng approach for contour detecton, n Proc. IEEE Conf. Comput. Vs. Pattern Recognt., Jun. 2016, pp. 251 259. [8] A. Krzhevsky, I. Sutskever, and G. E. Hnton, ImageNet classfcaton wth deep convolutonal neural networks, n Proc. Adv. Neural Inf. Process. Syst., 2012, pp. 1097 1105. [9] R. Grshck, Fast R-CNN, n Proc. IEEE Int. Conf. Comput. Vs., Dec. 2015, pp. 1440 1448. [10] J. Long, E. Shelhamer, and T. Darrell, Fully convolutonal networks for semantc segmentaton, n Proc. IEEE Conf. Comput. Vs. Pattern Recognt., Jun. 2015, pp. 3431 3440. [11] O. Ronneberger, P. Fscher, and T. Brox, U-net: Convolutonal networks for bomedcal mage segmentaton, n Proc. Int. Conf. Med. Image Comput. Comput.-Assst. Intervent., 2015, pp. 234 241. [12] S. Xe and Z. Tu, Holstcally-nested edge detecton, n Proc. IEEE Int. Conf. Comput. Vs., Dec. 2015, pp. 1395 1403. [13] Y. Lu, M.-M. Cheng, X. Hu, K. Wang, and X. Ba, Rcher convolutonal features for edge detecton, n Proc. IEEE Conf. Comput. Vs. Pattern Recognt., Jul. 2017, pp. 5872 5881. [14] W. Shen, X. Wang, Y. Wang, X. Ba, and Z. Zhang, Deepcontour: A deep convolutonal feature learned by postve-sharng loss for contour detecton, n Proc. IEEE Conf. Comput. Vs. Pattern Recognt., Jul. 2015, pp. 3982 3991. [15] J. Yang, B. Prce, S. Cohen, H. Lee, and M.-H. Yang, Object contour detecton wth a fully convolutonal encoder-decoder network, n Proc. IEEE Conf. Comput. Vs. Pattern Recognt., Jun. 2016, pp. 193 202. [16] K.-K. Manns, J. Pont-Tuset, P. Arbeláez, and L. Van Gool, Convolutonal orented boundares, n Proc. Eur. Conf. Comput. Vs., 2016, pp. 580 596. [17] A. Khoreva, R. Benenson, M. Omran, M. Hen, and B. Schele, Weakly supervsed object boundares, n Proc. IEEE Conf. Comput. Vs. Pattern Recognt., Jun. 2016, pp. 183 192. [18] B. Yang, J. Yan, Z. Le, and S. Z. L, Convolutonal channel features, n Proc. IEEE Int. Conf. Comput. Vs., Dec. 2015, pp. 82 90. [19] A. Khoreva, R. Benenson, F. Galasso, M. Hen, and B. Schele, Improved mage boundares for better vdeo segmentaton, n Proc. Eur. Conf. Comput. Vs. Workshops, Nov. 2016, pp. 773 788. [20] V. Badrnarayanan, A. Kendall, and R. Cpolla, SegNet: A deep convolutonal encoder-decoder archtecture for mage segmentaton, IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 12, pp. 2481 2495, Dec. 2017. [21] J. Canny, A computatonal approach to edge detecton, IEEE Trans. Pattern Anal. Mach. Intell., vol. PAMI-8, no. 6, pp. 679 698, Nov. 1986. [22] P. Arbelaez, M. Mare, C. Fowlkes, and J. Malk, Contour detecton and herarchcal mage segmentaton, IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 5, pp. 898 916, May 2011. [23] J. J. Lm, C. L. Ztnck, and P. Dollár, Sketch tokens: A learned mdlevel representaton for contour and object detecton, n Proc. IEEE Conf. Comput. Vs. Pattern Recognt., Jun. 2013, pp. 3158 3165. [24] P. Dollár and C. L. Ztnck, Fast edge detecton usng structured forests, IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 8, pp. 1558 1570, Aug. 2015. [25] S. Wang, T. Kubota, J. M. Ssknd, and J. Wang, Salent closed boundary extracton wth rato contour, IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 4, pp. 546 561, Apr. 2005. [26] P. Martn, P. Refreger, F. Goudal, and F. Guerault, Influence of the nose model on level set actve contour segmentaton, IEEE Trans. Pattern Anal. Mach. Intell., vol. 26, no. 6, pp. 799 803, Jun. 2004. [27] C. L, C. Xu, C. Gu, and M. D. Fox, Level set evoluton wthout re-ntalzaton: A new varatonal formulaton, n Proc. IEEE Conf. Comput. Vs. Pattern Recognt., vol. 1, Jun. 2005, pp. 430 436. [28] Q. Zhu, G. Song, and J. Sh, Untanglng cycles for contour groupng, n Proc. IEEE Int. Conf. Comput. Vs., Oct. 2007, pp. 1 8. [29] G. Bertasus, J. Sh, and L. Torresan, DeepEdge: A mult-scale bfurcated deep network for top-down contour detecton, n Proc. IEEE Conf. Comput. Vs. Pattern Recognt., Jun. 2015, pp. 4380 4389. [30] Y. Gann and V. S. Lemptsky, N 4 -felds: Neural network nearest neghbor felds for mage transforms, n Proc. Asan Conf. Comput. Vs., 2014, pp. 536 551. [31] V. Bsmuth, R. Vallant, H. Talbot, and L. Najman, Curvlnear structure enhancement wth the polygonal path mage Applcaton to gude-wre segmentaton n X-ray fluoroscopy, n Proc. Int. Conf. Med. Image Comput. Comput. Assst. Intervent. (MICCAI), 2012, pp. 9 16. [32] A. Sron, V. Lepett, and P. Fua, Multscale centerlne detecton by learnng a scale-space dstance transform, n Proc. IEEE Conf. Comput. Vs. Pattern Recognt., Jun. 2014, pp. 2697 2704. [33] L.-C. Chen, J. T. Barron, G. Papandreou, K. Murphy, and A. L. Yulle, Semantc mage segmentaton wth task-specfc edge detecton usng CNNs and a dscrmnatvely traned doman transform, n Proc. IEEE Conf. Comput. Vs. Pattern Recognt., Jun. 2016, pp. 4545 4554. [34] W. Ke, J. Chen, J. Jao, G. Zhao, and Q. Ye, SRN: Sde-output resdual network for object symmetry detecton n the wld, n Proc. IEEE Conf. Comput. Vs. Pattern Recognt., Jul. 2017, pp. 302 310. [35] Q. L, Q. Zou, D. Zhang, and Q. Mao, FoSA: F seed-growng approach for crack-lne detecton from pavement mages, Image Vs. Comput., vol. 29, no. 12, pp. 861 872, 2011. [36] M. Kamalardakan, L. Sun, and M. K. Ardakan, Sealed-crack detecton algorthm usng heurstc thresholdng approach, J. Comput. Cvl Eng., vol. 30, no. 1, p. 04014110, 2014. [37] P. Subrats, J. Dumouln, V. Legeay, and D. Barba, Automaton of pavement surface crack detecton usng the contnuous wavelet transform, n Proc. Int. Conf. Image Process., Oct. 2006, pp. 3037 3040. [38] G. Zhao, T. Wang, and J. Ye, Ansotropc clusterng on surfaces for crack extracton, Mach. Vs. Appl., vol. 26, no. 5, pp. 675 688, 2015. [39] M. Salman, S. Mathavan, K. Kamal, and M. Rahman, Pavement crack detecton usng the Gabor flter, n Proc. IEEE Conf. Intell. Transp. Syst., Oct. 2013, pp. 2039 2044. [40] H. Olvera and P. L. Correa, Automatc road crack detecton and characterzaton, IEEE Trans. Intell. Transp. Syst., vol. 14, no. 1, pp. 155 168, Mar. 2013. [41] R. Amhaz, S. Chambon, J. Ider, and V. Baltazart, Automatc crack detecton on two-dmensonal pavement mages: An algorthm based on mnmal path selecton, IEEE Trans. Intell. Transp. Syst., vol. 17, no. 10, pp. 2718 2729, Oct. 2016. [42] Q. Zou, Q. L, F. Zhang, Z. Xong, and Q. Wang, Path votng based pavement crack detecton from laser range mages, n Proc. Int. Conf. Dgt. Sgnal Process., Oct. 2016, pp. 432 436. [43] V. Kaul, A. Yezz, and Y. C. Tsa, Detectng curves wth unknown endponts and arbtrary topology usng mnmal paths, IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 10, pp. 1952 1965, Oct. 2012. [44] W. Xu, Z. Tang, J. Zhou, and J. Dng, Pavement crack detecton based on salency and statstcal features, n Proc. IEEE Int. Conf. Image Process., Sep. 2013, pp. 4093 4097. [45] Q. Zou, Y. Cao, Q. L, Q. Mao, and S. Wang, CrackTree: Automatc crack detecton from pavement mages, Pattern Recognt. Lett., vol. 33, no. 3, pp. 227 238, 2012. [46] Y. Sh, L. Cu, Z. Q, F. Meng, and Z. Chen, Automatc road crack detecton usng random structured forests, IEEE Trans. Intell. Transp. Syst., vol. 17, no. 12, pp. 3434 3445, Dec. 2016. [47] K. Smonyan and A. Zsserman. (2014). Very deep convolutonal networks for large-scale mage recognton. [Onlne]. Avalable: https://arxv.org/abs/1409.1556 [48] Y. Ja et al., Caffe: Convolutonal archtecture for fast feature embeddng, n Proc. ACM Int. Conf. Multmeda, Nov. 2014, pp. 675 678. [49] J. Staal, M. D. Abramoff, M. Nemejer, M. A. Vergever, and B. Van Gnneken, Rdge-based vessel segmentaton n color mages of the retna, IEEE Trans. Med. Imag., vol. 23, no. 4, pp. 501 509, Apr. 2004.

1512 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 28, NO. 3, MARCH 2019 Qn Zou (M 13) receved the B.E. degree n nformaton engneerng and the Ph.D. degree n photogrammetry and remote sensng (computer vson) from Wuhan Unversty, Chna, n 2004 and 2012, respectvely. From 2010 to 2011, he was a Vstng Ph.D. Student wth the Computer Vson Lab, Unversty of South Carolna, USA. He s currently an Assocate Professor wth the School of Computer Scence, Wuhan Unversty. Hs research actvtes nvolve computer vson, pattern recognton, and machne learnng. He s a member of the ACM. He was a co-recpent of the Natonal Technology Inventon Award of Chna 2015. Xanbao Q receved the B.E. degree n nformaton engneerng and the Ph.D. degree n nformaton and sgnal processng from the Bejng Unversty of Posts and Telecommuncatons, n 2008 and 2015, respectvely. He was an Intern wth the Web Search and Mnng Group, Mcrosoft Research Asa, from 2011 to 2012. He was also a Researcher wth the Unversty of Oulu, Fnland, from 2014 to 2016. He held a post-doctoral poston wth the Department of Computng, The Hong Kong Polytechnc Unversty, from 2016 to 2018. He s currently a Research Scentst wth the Shenzhen Research Insttute of Bg Data. Hs current research nterests nclude face analyss, object detecton, and scene text detecton. Zheng Zhang receved the B.S. degree n computer scence from Wuhan Unversty, Chna, n 2015, where he s currently pursung the master s degree wth the School of Computer Scence. He receved the frst prze from the Chna Undergraduate Contest n Internet of Thngs n 2015. Hs research nterest ncludes deep learnng and ts applcatons n mage classfcaton and retreval. Qngquan L receved the Ph.D. degree n geographc nformaton scence and photogrammetry from the Wuhan Techncal Unversty of Surveyng and Mappng, Chna. From 1988 to 1996, he was an Assstant Professor wth Wuhan Unversty, where he became an Assocate Professor. Snce 1998, he has been a Professor wth Wuhan Unversty. He s currently the Presdent and a Professor wth Shenzhen Unversty, Chna. He s a Professor wth the State Key Laboratory of Informaton Engneerng n Surveyng, Mappng and Remote Sensng, Wuhan Unversty. He s also the Drector of the Shenzhen Key Laboratory of Spatal Smart Sensng and Servce, Shenzhen Unversty. He s an Academcan of the Internatonal Academy of Scences for Europe and Asa. Hs research areas nclude precson engneerng survey, pattern recognton, and ntellgent transportaton systems. Qan Wang receved the Ph.D. degree from the Illnos Insttute of Technology, USA. He s currently a Professor wth the School of Computer Scence, Wuhan Unversty. Hs research nterests nclude search and computaton outsourcng securty, wreless systems securty, bg data securty and prvacy, and appled cryptography. He s an Expert of the natonal 1000 Young Talents Program of Chna. He receved the Natonal Scence Fund for Excellent Young Scholars of Chna. He was a recpent of the 2016 IEEE Asa-Pacfc Outstandng Young Researcher Award. He serves as an Assocate Edtor for the IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING and the IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY. Song Wang (M 02 SM 13) receved the Ph.D. degree n electrcal and computer engneerng from the Unversty of Illnos at Urbana Champagn (UIUC) n 2002. From 1998 to 2002, he was a Research Assstant wth the Image Formaton and Processng Group, Beckman Insttute, UIUC. In 2002, he joned the Department of Computer Scence and Engneerng, Unversty of South Carolna, where he s currently a Professor. Hs research nterests nclude computer vson, medcal mage processng, and machne learnng. He s a Senor Member of the IEEE Computer Socety. He s currently servng as the Publcty/Web portal Char for the Techncal Commttee of Pattern Analyss and Machne Intellgence, IEEE Computer Socety. He s currently servng as an Assocate Edtor for PATTERN RECOGNITION LETTERS.