Global-to-Local Shape Matching for Liver Segmentation in CT Imaging Kinda Anna Saddi 1,2, Mikaël Rousson 1, Christophe Chefd hotel 1, and Farida Cheriet 2 1 Department of Imaging and Visualization, Siemens Corporate Research, Princeton, NJ, USA. 2 Department of Computer Engineering, École Polytechnique of Montréal, Canada. Abstract. We propose a two-stage algorithm to segment the liver in CT images. First, we estimate the pose and global shape properties using a statistical shape model defined in the low dimensional space spanned by a training set of shapes. Then, we apply a template matching procedure to recover local deformations that were not present in the learning set. In both steps, we optimize the same image term: the likelihood of the intensity inside the region of interest and its background. The method requires a single seed point inside the liver for the initialization. We show that this global-to-local strategy is able to recover livers with peculiar shapes in arbitrary poses. 1 Introduction Medical computer-aided diagnosis based on computed tomography (CT have gained significant attention. Such a procedure typically requires the extraction of anatomical structures. In particular, current methods for liver diagnosis involve its boundary extraction. This task is quite challenging because the liver tissue and other adjacent organs are characterized by very similar densities (Hounsfield unit in CT images. Moreover, partial-volume effects make the observation of intensity discontinuities difficult (weak edges. To resolve this lack of information, shape models can be employed, but again, difficulties arise due to the large interpatient liver shape variability. While most liver segmentation tools available to clinicians still require a time-consuming human interaction, several authors have proposed to automate this process in order to make it usable in clinical applications. In [1], Soler et al. presented a cascading segmentation framework that sequentially detects different hepatic structures for surgery planning where the liver is extracted by deforming globally and locally a given template according to [2]. In [3], Park et al. used a probabilistic atlas to classify voxel intensities in abdominal CT images. More recently, a trend toward the use of 3D shape models has emerged. For example, Lamecker et al. built in [4] a statistical shape model composed of 43 segmented livers and used a modified Active Shape Model (ASM [5] for segmentation. A similar approach was considered by Heimann et al. in [6] where T. Heimann, M. Styner, B. van Ginneken (Eds.: 3D Segmentation in The Clinic: A Grand Challenge, pp. 207-214, 2007.
a shape constrained deformable model [7] was chosen rather than the usual ASM fitting approach. This allowed them to capture shape variations that was not present in the training set. A detailed validation shows the improvement in accuracy with respect to ASM. This illustrates that relying too much on a prior shape model gives robust segmentation algorithms but generally has low accuracy. This is due to the large inter-patient shape variability of the liver which cannot be modeled from a reduced number of training samples. In practice, each liver shape has a particular geometry, only global shape characteristics remain consistent from one patient to another. Following this observation, we propose to separate the extraction of the liver in two phases: (i we estimate the pose and the global shape of the liver according to a prior shape model following [8], (ii we deform locally this initial solution using a template matching algorithm like in [9]. Both of these steps are formulated in a variational framework minimizing the same cost function on the image. In Section 2.1, we describe the region-based image term used to fit the shape model and to estimate local deformations. In Section 2.2, we present the shape model and its integration for segmentation. In Section 2.3, we detail the template matching method used to recover local deformations. Finally, in Section 3, we present several results and comparisons showing the robustness and the accuracy of our global-to-local strategy. 2 Method 2.1 Statistical Region-Based Model Let I : Ω R be the image to segment. We define p in (i and p out (i as the probability density functions of a random variable modeling intensity values i in the regions inside and outside the liver. Given this model, the optimal closed boundary Γ can be obtained using a maximum likelihood principle, minimizing the following energy as initially proposed by Zhu and Yuille in [10]: ( E data (Γ = χ(x, Γ log p in (I(x + (1 χ(x, Γ log p out (I(x dx, (1 Ω where χ(x, Γ = 1 if x is inside Γ and 0 otherwise. The intensity distributions p in (i and p out (i are estimated dynamically during the optimization process according to the intensity histograms inside each region for fixed positions of the boundary. There are different strategies to minimize this energy and obtain a region segmentation. A direct solution is to evolve a level set representation of the boundary like in [10] but, as we will see in the experiments, this has several drawbacks in our application. In this work, we propose an alternative minimization by first constraining Γ with a given shape model (Section 2.2 and then using a template registration formulation (Section 2.3 to refine locally the solution. 208
2.2 Shape Model In this section, we propose to constrain the boundary Γ by a shape model learned from a set of manually segmented images. Given a set of training shapes encoded by their signed distance functions {φ i } i=1..n, Tsai et al. [11] proposed to reduce the segmentation problem to a finite-dimensional optimization by constraining the problem to the subspace spanned by the training shapes. We make use of this compact representation of the embedding function. Given the distance d on the space of signed distance functions defined by: d 2 (φ 1, φ 2 = Ω (φ 1(x φ 2 (x 2 dx, we align the set of training shapes with respect to translation and rotation. Subsequently, we constrain the level set representation φ of the boundary Γ to a parametric representation of the form 3 : φα(x = φ 0 + n α i V i (x, (2 i=1 where φ 0 (x = 1 N N i=1 φ i(x represents the mean shape, {V i (x} i=1..n are the eigenmodes, and n < N is the dimension of the subspace spanned by the N training shapes. We can now represent each training shape φ i by its corresponding shape vector α i. In this notation, the goal of statistical shape learning is to infer a statistical distribution P(α from these sample shapes. Following [8], we consider a nonparametric density approximation: P(α = 1 Nσ N ( α αi K σ i=1, where K(u = 1 2π exp ( u2. (3 2 Constraining the boundary Γ to this subspace, Γ can be represented by a shape vector α, i.e. Γ is the zero crossing of the level set function φα. Introducing the parameters h R 3 and θ [0, 2π] 3 to model translation and rotation of the shape, we can express χ(θx + h, Γ as H(φα(θx + h, with H the Heaviside function. Using the short hand H φ = H(φα(θx + h, we can rewrite Eq. 1 as: ( E data (α, h, θ = H φ log p in (I(x + (1 H φ log p out (I(x dx, (4 Ω We incorporate the prior on the distribution of the shape vector α in the energy. This drives us to our final criteria: E(α, h, θ = log P(α + E data (α, h, θ (5 This energy is minimized using alternated gradient descents with respect to each unknown parameters α, h and θ. The detailed equations of these gradient descents and their implementations can be found in [8]. 3 Γ is defined as the zero crossing of the level set function φ defined in the image domain. 209
2.3 Template Matching To refine the segmentation, we propose a template matching algorithm that recovers local deformations of the shape obtained in the previous section. Let us consider the registration framework, we formulate the problem as finding a transformation ψ : Ω Ω that minimizes the cost functional E data (I T ψ defined in (Eq. 1, where I T (x = χ(x, Γ is the shape represented by a binary template obtained in the Section 2.2. Thus, we minimize the following energy: ( E data (I T ψ = (I T ψ log p in (I(x+(1 I T ψ log p out (I(x dx. (6 Ω In this equation, I T ψ is the warped binary template and the composition operator. Since we want to find an optimal transformation ψ, the derivation of the energy leads to the following gradient descent: E data (I T ψ ψ ( = (I T ψ log p in(i(x p out (I(x. (7 In non-rigid registration, deriving this energy according to a high-dimensional transformation results in a vector field v. To guarantee a well-posed problem, this vector field has to be regularized. For this purpose, different techniques have been proposed. The approach proposed by Christensen et al. [12] solves the registration problem using a partial differential equation and has the advantage of capturing large deformations. In this work, we use a Gaussian filtering that can be seen as a variant of the fluid-approach [12]. To find the optimal high-dimensional transformation, we build a sequence of transformations (ψ k k=0,...,+, by composition of small displacements [13], ψ k+1 = ψ k (ψ id + αv k, ψ 0 = ψ id, (8 where ψ id is the identity transformation and v k is a velocity vector field that follows the gradient of the cost functional to be minimized. Here, v k is obtained by computing the variational gradient of the cost functional given in Eq. 7. We regularize the gradient v k using a fast recursive filtering technique. This approximates a Gaussian smoothing [14] that has proven very efficient in practice. The previous iterative scheme (Eq. 8 is repeated until convergence, and can be seen as the discretization (via Taylor expansion of the transport equation in the Eulerian frame: ψ t t = Dψ t v, ψ 0 = ψ id, (9 where Dψ t stands for the Jacobian matrix of ψ t. Here, large deformations are possible because the regularization is applied to the velocity rather than the deformation (Dupuis et al. [15] detail the suitable regularity conditions on the velocity field to generate a diffeomorphism. As mentioned in Section 2.1, the region statistics are computed dynamically as the algorithm iterates. This algorithm is embedded in a coarse-to-fine strategy. 210
This reduces the computational cost by working with less data at lower resolutions. This also allows us to recover large displacements, and helps avoiding local minima. In this work, we used five levels of multi-resolutions. 3 Experimental Results To quantify the segmentation accuracy, we compare our results to a groundtruth. In this work, we examine 30 CT images 4 that have been manually segmented by radiological experts, working slice-by-slice in transversal view. A segmentation is defined as the entire liver tissue including all internal structures. Among these images, twenty segmentations are available for training. The other ten images are used for testing. All images are enhanced with contrast agent and most of them are pathological. In our approach, the shape model is composed of 50 segmented livers (including the 20 training images described above. The algorithm is initialized by giving one seed point inside the liver. This single point is sufficient for the initialization and does not have to be the center of mass (our algorithm has been proven robust to the initialization point. We use 30 modes of variation for the shape model. For the template matching, the regularization parameter σ is 2.0, and the number of iterations for the multi-resolution are 0, 16, 32, 48 and 16 (from high to low resolution. We compare our segmentation results to the ground-truth using five metrics: the volumetric overlap, the relative absolute volume difference, the average symmetric absolute surface distance, the symmetric RMS surface distance and the maximum symmetric absolute surface distance. These metrics are evaluated by assigning a score to each test case 5. We also compare our approach (Method A to two alternatives. The first one considers the model fitting described in Section 2.2 followed by an unconstrained level set evolution [16] (Method B. The second one uses the mean shape of the liver as initialization for the template matching described in Section 2.3 (Method C. We applied these three methods to the training images. Table 1 shows the score for every image in the training set (numbers in bold correspond to the highest score. Method 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 A 63 81 75 83 21 68 79 63 74 61 76 79 65 73 82 54 81 63 63 79 B 58 79 73 81 16 68 75 63 73 62 74 78 65 74 81 54 78 65 58 75 C 58 79 72 81 19 68 75 64 73 63 74 76 67 74 81 55 78 66 57 76 Table 1. Scores on the training images for the: A shape model and template matching, B shape model and level set evolution, and C mean shape and the template matching. 4 Data provided by MICCAI Workshop on 3D Segmentation in the Clinic. 5 For details see http://mbi.dkfz-heidelberg.de/grand-challenge2007/sites/eval.htm 211
Method Overlap Error Volume Diff. Avg. Dist RMS Dist Max. Dist. Total [%] Score [%] Score [mm] Score [mm] Score [mm] Score Score A 7.6 70 3.0 84 1.3 65 2.9 57 24.4 68 69 B 9.7 62 3.3 82 1.6 56 3.1 54 25.8 66 64 C 9.6 62 4.7 75 1.8 52 3.9 43 29.7 61 59 Table 2. Average metrics and corresponding average scores on the 20 training cases for: A shape model and template matching, B shape model and level set evolution, and C mean shape and the template matching. Table 2 illustrates the average metrics and scores of these methods. It shows that our approach gives overall better quantitative results than the other two. Contrary to the level set approach (Method B, we apply the regularization to the deformation field. This allows us to segment irregular shapes, that would be impossible with surface evolution techniques. We decorrelate the regularization from the intrinsic geometry of the template, which allows us to recover irregular shapes while avoiding leaks. We also validate our approach on 10 test cases (not included in the shape model. Table 3 shows the metrics and the scores compared to the ground-truth. Figure 1 shows the results of three different segmentations. It includes an easy, an average and a hard case in 3 views (sagittal, coronal and transversal. We observe that our approach scores 76 points for the case 1, which means that it performed roughly as good as a human. The lowest score is 54 and corresponds to an image with a tumor (case 3 which is considered as a hard case. Since our algorithm relies on the intensity distribution inside the liver, it cannot include the whole tumor in the segmentation. Dataset Overlap Error Volume Diff. Avg. Dist RMS Dist Max. Dist. Total [%] Score [%] Score [mm] Score [mm] Score [mm] Score Score 1 6.3 76 3.1 84 1.0 75 2.2 69 17.0 78 76 2 9.6 62-3.2 83 1.6 60 3.9 46 37.2 51 60 3 11.4 56 2.6 86 2.3 42 4.8 33 34.2 55 54 4 10.5 59 0.0 100 2.0 51 4.3 40 32.7 57 61 5 7.6 70-2.4 87 1.5 63 3.4 53 33.9 55 66 6 8.5 67-3.2 83 1.6 60 4.1 43 43.9 42 59 7 8.6 67 7.2 62 1.3 68 3.4 53 26.0 66 63 8 6.4 75-3.9 79 1.1 72 2.5 66 23.2 70 72 9 9.4 63 7.7 59 1.2 71 2.9 60 20.6 73 65 10 10.9 57 4.4 77 1.6 61 3.1 56 24.0 68 64 Average 8.9 65 3.8 80 1.5 62 3.4 52 29.3 62 64 Table 3. Results of the comparison metrics and corresponding scores for all ten test cases. 212
Fig. 1. From left to right, a sagittal, coronal and transversal slice from a relatively easy case (1, top, an average case (4, middle, and a relatively difficult case (3, bottom. The outline of the reference standard segmentation is in red, the outline of the segmentation of the method described in this paper is in blue. Slices are displayed with a window of 400 and a level of 70. Approximate computation time is 324 seconds per image in average on a 2GHz dual-core Intel processor. This can be reduced (if necessary for clinical application if a lower accuracy is acceptable. 4 Conclusion We demonstrated that our global-to-local strategy succeeds generally better than two competitive approaches. The shape-based step allowed us to estimate the pose and the global shape properties of the liver with good robustness. Then, the non-rigid shape matching was able to recover local shape properties. In addition to a global regularization on the shape, our non-rigid template registration method has the advantage of preserving the topology of the liver even for large shape variations. Promising results were presented on wide range of images, some with problematic attributes like developed tumors. Further improvement are of course still possible. In particular, by considering explicitly the tumors and detecting them in parallel to the liver segmentation. 213
References 1. Soler, L., Delingette, H., Malandain, G., Montagnat, J., Ayache, N., Koehl, C., Dourthe, O., Malassagne, B., Smith, M., Mutter, D., Marescaux, J.: Fully automatic anatomical, pathological, and functional segmentation from CT scans for hepatic surgery. Computed Aided Surgery 6(3 (2001 131 42 2. Montagnat, J., Delingette, H.: Volumetric medical images segmentation using shape constrained deformable models. In: CVRMed. (1997 13 22 3. Park, H., Bland, P., Meyer, C.: Construction of an abdominal probabilistic atlas and its application in segmentation. IEEE Trans. on Medical Imaging 22(4 (2003 483 492 4. Lamecker, H., Lange, T., Seeba, M.: Segmentation of the liver using a 3D statistical shape model. ZIB Preprint 04-09 (2004 5. Cootes, T.F., Taylor, C.J., Cooper, D.H., Graham, J.: Active shape models : their training and application. Comput. Vis. Image Underst. 61(1 (1995 38 59 6. Heimann, T., Wolf, I., Meinzer, H.P.: Active shape models for a fully automated 3d segmentation of the liver - an evaluation on clinical data. In: Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention. Volume 2. (2006 41 48 7. Weese, J., Kaus, M., Lorenz, C., Lobregt, S., Truyen, R., Pekar, V.: Shape constrained deformable models for 3d medical image segmentation. In: Proceedings of the Conference on Information Processing in Medical Imaging. (2001 380 387 8. Rousson, M., Cremers, D.: Efficient kernel density estimation of shape and intensity priors for level set segmentation. In: Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention. Volume 2. (2005 757 764 9. Saddi, K.A., Chefd hotel, C., Rousson, M., Cheriet, F.: Region-based segmentation via non-rigid template matching. In: Proceedings of the Workshop on Mathematical Methods in Biomedical Image Analysis (MMBIA 07, Rio de Janeiro, Brazil (2007 In Press 10. Zhu, S.C., Yuille, A.L.: Region competition: Unifying snakes, region growing, and bayes/mdl for multiband image segmentation. IEEE Trans. and Pattern Analysis and Machine Intelligence 18(9 (1996 884 900 11. Tsai, A., Yezzi, A.J., Wells, W.M., Tempany, C., Tucker, D., Fan, A., Grimson, W.E.L., Willsky, A.S.: A shape-based approach to the segmentation of medical imagery using level sets. IEEE Trans. on Medical Imaging 22(2 (2003 137 154 12. Christensen, G.E., Rabbitt, R.D., Miller, M.I.: Deformable templates using large deformation kinematics. IEEE Trans. on Image Processing 5(10 (1996 1435 1447 13. Chefd hotel, C., Hermosillo, G., Faugeras, O.: Flows of diffeomorphisms for multimodal image registration. In: Proceedings of the IEEE International Symposium on Biomedical Imaging. (2002 753 756 14. Deriche, R.: Recursively implementing the gaussian and its derivatives. In: Proceedings of the International Conference on Image Processing, Singapore (1992 263 267 15. Dupuis, P., Grenander, U., Miller, M.: Variational problems on flows of diffeomorphisms for image matching. Quarterly of Applied Mathematics LVI(3 (1998 587 600 16. Vese, L., Chan, T.: A multiphase level set framework for image segmentation using the Mumford and Shah model. International Journal of Computer Vision 50 (2002 271 293 214