Cephalometric Landmarks Localization Based on Histograms of Oriented Gradients

Size: px

Start display at page:

Download "Cephalometric Landmarks Localization Based on Histograms of Oriented Gradients"

Dinah Gibson
5 years ago
Views:

1 Cephalometric Landmarks Localization Based on Histograms of Oriented Gradients Ali A. Pouyan and M. Farshbaf, Members, IEEE School of Computer Engineering Shahrood University of Technology Shahrood, Iran Abstract In this paper, we explored the use of certain image features, block-wise histograms of local orientations. They are used in many current object recognition algorithms, for the task of locating cephalometric landmarks on X-ray images. After reviewing existing cephalometric landmark detection systems, we show experimentally that grids of Histograms of Oriented Gradients (HOG) descriptors significantly perform well on this task. The influence of bin and detection window size on performance for three landmarks has been studied. It has been shown that, fine orientation binning, large enough detection windows which contain all features around landmark are all important for good estimation of landmark position. Key words cephalometric landmark detection, pattern recognition, classification, histograms of oriented gradients. 1. INTRODUCTION Cephalometry, being the scientific measurement of the head s dimensions on x-ray images, is widely used in orthodontics, orthopedics, and other areas of oral and maxillofacial surgery to assess craniofacial growth and plan treatment. It is generally based on a set of agreed feature points called cephalometric landmarks. Once the landmarks are located, the measurement and analysis of various angular and linear parameters can be easily performed. Therefore, locating craniofacial landmarks on x-rays is of great importance for cephalometric analysis. However, manual landmarking is a tedious and time consuming process which takes about 15 to 20 minutes on each case for an experienced surgeon. Computerized landmarks localization is an interesting way to release these difficulties [1, 2, 3]. The number of defined landmarks is quite large. It is noticeable that many of these landmarks are rarely used. Reference [4]defined a smaller set of landmarks containing the most frequently used landmarks by orthodontists in what is called cephalometric evaluation[5]. The positions of some of these landmarks are demonstrated in Fig.1. Table 1 presents the definition of landmarks used in this paper. To determine the effectiveness of HOG features 3 landmarks have been selected. One easy landmark, Gnathion, one of medium difficulty, Incisal Upper Incisor Tip and a hard landmark, Sella. Detecting these landmarks in images is a challenging task owning to human variable skull appearance and other variations such as rotation, caused by the patient being improperly posed while taking the X-ray, or by the improper placement of the X-ray film during digitization. Brightness variety, controlled by the radiographer, and by the f-number of the camera aperture causes another challenge in landmark detection [6]. The first need to find the landmarks is a robust feature set that allows the landmark regions be discriminated clearly even under difficult illumination. We study the issue of locally normalized Histograms of Oriented Gradient (HOG) descriptors for providing a first estimation of the region that the landmark can be found within. The estimated position of the landmark is also detected by referring to a predefined point in the detection window. We make a detailed study around the effects of various implementation choices for detection window sizes and orientation bin sizes on detector performance. In this paper Support Vector Machines (SVM)are used as a base line classifier. The performance of this approach is evaluated on the cephalograms data set provided by [7]. We briefly discuss works on cephalometric landmark detection methods besides a brief description of HOG in section 2, describe our data set in section 3, give an overview of our method in section 4 and present a detailed description and experimental evaluation of each stage of the process in section 5. The main conclusion is summarized in section PREVIOUS WORKS There have been previous attempts to automate cephalometric analysis with the aim of reducing the time required to obtain an analysis, improving the accuracy of landmark identification and reducing the errors due to clinician subjectivity. These methods have been categorized in 4 classes [8]: The first class called handcrafted algorithms, are knowledge based methods that locate landmarks on relevant edges. Reference [9] located landmarks on edges after noise filtering and image enhancement. Based on their works some of the methods in this class have been improved by using a resolution pyramid to extract relevant lines in a given region [10, 11, 12]. References [13, 14, 15, 16, 17, 18] presented similar edge tracking methods. PCNN was used by [19] to highlight regions and then line following is employed to find the position of the landmarks. All of these edge tracking methods are dependent on the quality of the x-ray and give good results for landmarks which are on or near to edges in the image. The second class of researchers used mathematical or statistical models to reduce the search area. Reference [20] used sub-image matching based on gray-scale mathematical morphology. Reference [7] used

2 image based on extracting features from face profile and the algorithm found landmarks by mapping. Reference [32] modeled the size of skull, rotation and translation by some feature points. Edge detection and NN was employed to estimate the possible coordinates of landmarks. Then a modified ASM is applied to locate the exact location of landmarks. Our method, in this paper, can be classified in the 4th class of researches. Figure 1.Location of landmarks used in [7]. Table 1.Definition of landmarks used in this paper. Landmark Definition Sella The Midpoint of the hyperphysical fossa Gnathion The most anterior and inferior point of the Incisal Upper Incisor Tip bony chin Tip of the crown of the most anterior maxillary central incisor Active Shape Model. Special spectroscopy was used by [21] to establish the statistical gray model. In [22] improved the work of [20] by using a line detection module to search for the most significant lines then utilized mathematical morphology approach similar to the one used by [20]. The third class of researchers used Neural Networks, genetic algorithms and fuzzy systems to locate landmarks. References [6, 24] used a combination of Neural Networks and genetic algorithms. Reference [5] used a combination of Neural Networks and fuzzy systems. Reference [24] used SVM in conjunction with Projected Principal-Edge Distribution (PPED) representation as feature vectors, to detect 16 landmarks. Template matching and fuzzy decision making was used in [25], developing an application of iterative self-organizing data (ISODATA) pattern recognition technique together with a fuzzy logic decision making algorithm in order to increase accuracy of the results. Recently the 4th class of researchers uses a combination of those three classes. References [27,2] divided every training shape to 10 regions, and for every region, Principal component Analysis (PCA) is employed to characterize its shape and gray profile statistics. For an input image, some reference landmarks are recognized and the input shape is divided using the landmarks, and then landmarks are located by an active shape model. Reference[27] introduced a method based on Partial Least Square Regression (PLSR) to extract the relation between selected points coordinates on X-ray images and the expected location of set of landmarks. CNN was used by [28, 29] for highlighting the region. In these approaches identification of each landmark s coordinates have been done by appropriate CNN templates, then landmark-specific algorithms have been applied to pinpoint the landmark location. In [1] estimated the initial location for each landmark based on the similarities in human anatomical structure. A modification on using Active Shape Models (ASM) was applied by[8, 30]. The possible coordinates of landmarks were estimated by using LVQ for every new image, and active shape model used for pinpoint the exact location of landmarks. In[31]points were registered on the reference 3. DATA SET AND METHODOLOGY We evaluated out our method on the dataset provided by [7] that is reported containing 63 pretreatment cephalograms with good quality and condition. It is provided that the cephalograms were scanned at 100 dpi and 750 x 950 pixels with 256 gray levels. The position of 16 deferent landmarks on the cephalograms is specified in a separated accompanying file. In the proposed method a leave one out scheme for choosing training and testing sets from the 63 available cephalograms. Leave one out method of testing is statistically acceptable, makes maximum use of the training set and provides a maximum number of tested images. To reduce searching time and finding a first estimation of the landmark location, input images have been scaled to 1/8 their original size. The scaled down image is demonstrated in Fig.2. A detection window with step size of two pixels scans the whole image to find a first estimation of the landmark window. We have defined a specific point for each landmark in the detection window. In training steps this point is located on the landmark location and then the detection window is cropped. The position of this point on the detection window is landmark specific. A general candidate for landmark point can be the central point of the window. This strategy of choosing the landmark point for some landmarks that reside on the boundary of the cephalograms, such as Gnathion, causes the window to pass the boundary of the image. For these landmarks we can choose a point in the window that by residing that window on the landmark over that point whole of the window resides in the image boundaries. For example in Gnathion landmark the window is located somehow the landmark is horizontally centered in the window and is about 32 pixels upper from the bottom of the window at original scale, see Fig.2.c. In training steps, a set of 2 patches with specific dimensions for each landmark (demonstrated in Tabel.2)are sampled randomly from goal free regions in each image, which finally is equal to 2 62 patches. These patches provided the initial negative set. Goal free regions are supposed to be the regions around the landmark window that have overlapping equals to half or less in each dimension with the landmark window. Also a set of 62 patches with the same dimensions in the training step containing the landmark window constructs the positive samples set. In the test steps, tiling the detection window with a dense ( in fact, overlapping) grid of HOG descriptors and using the combined feature vector in a conventional SVM based window classifier gives our region detector chain in each level.

3 4. OUTLINE OF THE PROPOSED METHOD This section provides an overview of our recognition chain. Following steps are performed in a detection evaluation: Scaling down images to 1/8 their original size. Leave one image out and make a train set containing other 62 images in the data set. Extracting HOG features, as it is described in the next sub-section, and train SVM on the positive and negative patched extracted from the train set. Test the system on the left out image and found a first estimation of the landmark position on this scale. Perform all of these steps on all images in data set. At first landmark window detection is started on images with1/8 scale of original images. By scaling down the images, the problem of translation or shift in the image is solved since it guarantees that the distances between image components made shorter and the variety becomes less effective. After extracting the HOG features, whichh is described in the next section in more details, the image is feed to an SVM classifier. The SVM is trained with negative and positive sample patches described in section 3. After training the model we feed the input example to the trained model. In this step HOG features are extracted and the feature vector is constructed for the input window. Then the model classifies the input image as negative or positive sample. We used SVMperf [33, 34], a version of SVM light, as the classifier. This classifier assigns a score to the image as the result of classification. In this research the detection window with the most positive score is chosen as the final location of the landmark window. Then the detection process described above is repeated for each image. In the following sub sections the feature extraction procedure is explained in more details. For evaluating the effectiveness of these feature vectors we used an rbf SVM trained with SVMperf [37, 36]. It s claimed that it trains conventional linear classification SVMs optimizing error rate in time that is linear in the size of the training data through an alternative, but equivalent formulation of the training problem. 4.1 Feature Extraction HOG is a feature set that is based on evaluating well- orientations normalized local histograms of image gradient in a dense grid. In practice the image window is divided to small spatial regions ("cells"). For each cell, a local 1D histogram of gradient directions is accumulated. Then all histogram entries of cells in the windoww are combined to make the final feature vector (Fig.3). It is recommended in[35] that for better invariance to illumination, shadowing and etc. contrast normalize the local responses before using them. This can be done by accumulating a measure of histogram "energy" over somewhat larger spatial regions called ("blocks") and using the results to normalize all the cells in the block. For extracting the HOG features, we used the Basic derivation filters with no smoothing before, as in [35]. c) a) b) d) Figure 2.a) A scaled down sample cephalogram from the dataset, 1/8 of its original scale. b,c,d ) detection windows and their predfined location of landmarks Sella, Gnathion and Incisal The basic derivation filters are specified by: [h x ]=[h y ] t = [1 0-1] (1) Table 2.detection window size for each landmark. All the sizes are in scaled image to 1/8 its original size. Landmark Sella Gnathion Upper Incisor Tip window size 20 x 28 24x20 28 x 28 After that, the next step that is called the fundamental nonlinearity of the descriptor[35]is applied. Each pixel calculates a weighted vote for an edge orientation histogram channel based on the orientation of the gradient element centered on it, and the votes are accumulated into orientation bins over local spatial regions that are called cells. After that block normalization is performed. Reference [35] evaluated a number of different normalization schemes. Most of them are based on grouping cells into larger spatial blocks and contrast normalizing each block separately. The final descriptor is then the vector of all components of the normalized cell responses from all of the blocks in the detection window. In fact, they typically overlap the blocks so that each scalar cell response contributes several components to the final descriptor vector, each normalized with respect to a different block. We used the class of blocks named R-HOG introduced in [35].They are computed in dense grids at a single scale and used as part of a larger code vector that implicitly encodes spatial position relative to the detection window. The square R-HOG's, i.e. 2 2 grids of 4 4 pixels cells each containing 9 orientation bins is employed here. In reference [35] they used such a descriptor as the default for their detector but cell size was 8 8 pixels. As introduced in [35] and [36], it is useful to down-weight pixels near the edges of the block by applying a Gaussian spatial window to each pixel before accumulating orientation votes into cells. The purpose of this Gaussian window is to avoid sudden changes in the descriptor with small changes in the position of the block. Finally, the feature vector is modified by a normalization

4 Orientation Voting Figure 3. HOG feature Extraction Overlapping Blocks method[35,36], to reduce the effects of illumination change. Here L2 norm is used as in[35,36]. 5. EXPERIMENTAL RESULTS We now give details of our implementations for the proposed approach and systematically study the effect of various choices of window size and bin size on recognizer performance. Throughout this section we refer result to our default detector which has the following properties, described below: Gray scale with no image enhancement; [-1, 0, 1] gradient filter with no smoothing; Linear gradient voting into 9 orientation bins in (unsigned gradients) ; 8 8 pixel blocks of four 4 4 pixel cells; Gaussian spatial window with sigma=8 pixel; L2-Hys (Lowe style clipped L2 norm) block normalization; block spacing stride 4 pixels (hence 4 fold coverage of each cell); SVMperf classifier with rbf kernel and C=2;detection window size is landmark specific and is introduced in Table 2. All the experiments are done on the scale that is one eighth of the original size. If the detected landmark location is in a circle of 4 pixel radios around the landmark location in that scale the detection window is categorized as true positive and if the detected landmarks outside this circle are categorized as false negatives. A comparison between effects of variations of different window sizes in the performance of the proposed system is shown in Fig.4. It is demonstrated in Fig.4 that decreasing or increasing the window size decreases the performance within 10 percent at 1% false alarm rate (i.e. increases miss probability about 10 percent for each landmark at 1% false alarm rate). Fig.5 and Fig.6 demonstrate the effect of changing bin size on performance of the proposed system. According to Fig.5,6 increasing the number of orientation bins improves performance significantly up to 9 bins, but makes little difference beyond this. This is for bins spaced over 0 180, i.e. the sign of the gradient is ignored(fig.6).including signed gradients (orientation range 0 360, as in the original SIFT descriptor) decreases the performance, even when the number of bins is also doubled to preserve the original orientation resolution (Fig.6). Table 3 determines some results about the experiments down on different landmarks with the default landmark specific window size. It can be conceived from the results that in the experimented image scale (1/8 of the original image resolution) the proposed method can detect the landmark window within 4 pixel of the actual position of the window in 85 percent of evaluations. This could be a Table 3. Landmark detection rate in different distances to real position (scale 1/8) Landmark Window Detection Rate (%) Within Average Distance 1pix 2 pix 3 pix 4 pix 5 pix Sella pixels Gnathion pixels Incisal Upper Incisor Tip pixels reliable estimation of the landmark location for further post processing. 6. CONCLUSIONS AND FUTURE WORK We have shown that using locally normalized histogram of gradient orientations features similar to SIFT descriptors [36] in a dense overlapping grid gives very good results for landmark window estimation. We studied the influence of some of descriptor parameters and concluded that fine-scale gradients, fine orientation binning, relatively coarse spatial binning, and high-quality local contrast normalization in overlapping descriptor blocks are all important for good performance. The results provided in this paper determine the ability of the system to estimate the location of landmark very well. But in Cephalometry literature the accepted detected landmark must be within 2mm of the real location of the landmark. We are working on extending the method by applying a post processing around the detected landmark position in this paper that can pinpoint the exact position of the landmark on the original image resolution. The method is also based on employing the HOG features and its results will be published elsewhere soon. Another improvement to the system could be using a strategy for choosing SVM output in classification stage. In the proposed paper the highest output was chosen as the best answer but sometimes the highest output causes misclassification. That s because the similarities between the input window and the learned samples are very small. In these situations highest output strategy does not work very well and sometimes results in wrong estimations of landmark location. We are also working on other configurations of the HOG and SVM to make it more effective and reducing the miss classification rate while increasing the detection rate. REFERENCES [1] H. Mohseni and S. Kasaei, "Automatic Localization of Cephalometric Landmarks," 2007 IEEE International Symposium on Signal Processing and Information Technology, 2007, pp [2] W. Yue, D. Yin, C. Li, G. Wang, and T. Xu, "Automated 2-D Cephalometric Analysis on X-ray Images by a Model-Based Approach," IEEE Transactions on Biomedical Engineering, vol. 53,no.8, 2006, pp [3] L.M. Finlay, "Craniometry and Cephalometry: A History Prior to the Advent of Radiography," Journal of Angle Orthodontics, vol. 50, no. 4, 1980, pp [4] T. Rakosi, An Atlas and Manual of Cephalometric Radiology, London: Wolfe Medical, 1982.

a)sella b)gnathion Figure 4. Effects of window size variation on performance of system on 3 landmarks. c) Upper Incisor a)sella b)gnathion c) Upper Incisor Figure 5.

Effects of bin size variation on performance of system on 3 landmarks. (signed bin 0-360) [5] I. El-feghi, M.A. Sid-ahmed, and M.

5 a)sella b)gnathion Figure 4. Effects of window size variation on performance of system on 3 landmarks. c) Upper Incisor a)sella b)gnathion c) Upper Incisor Figure 5. Effects of bin size variation on performance of system on 3 landmarks. (unsigned bin 0-180) a)sella b)gnathion c) Upper Incisor Figure 6. Effects of bin size variation on performance of system on 3 landmarks. (signed bin 0-360) [5] I. El-feghi, M.A. Sid-ahmed, and M. Ahmadi, "Automatic localization of craniofacial landmarks for Rate assisted cephalometry," Pattern Recognition, vol. 37, no.3, 2004, pp [6] Y.T. Chen, K.S. Cheng, and J.K. Liu, "Improving cephalogram analysis through feature subimageextraction.," IEEE engineering in medicine and biology magazine, vol. 18,no.1, 1999, pp [7] T.J. Hutton, S. Cunningham, and P. Hammond, "An evaluation of active shape models for the automatic identification of cephalometric landmarks," European Journal Of Orthodontics, vol. 22, no.5, 2000, pp [8] R. Kafieh, A. Mehri, and S. Sadri, "Automatic landmark detection in cephalometry using a modified Active Shape Model with sub image matching," International Conference on Machine Vision, ICMV 2007, 2007, pp [9] A.D. Levy-Mandel, A.N. Venetsanopoulos, and J.K. Tsotsos, "Knowledge-Based Landmarking of Cephalograms," Computers and Biomedical Research, vol. 19, no. 3, 1986, pp [10] W. Tong, S.T. Nugent, G.M. Jensen, and D.F. Fay, "An algorithm for locating landmarks on dental X-rays," Engineering in Medicineand Biology Society, IEEE, 1989, pp [11] S. Parthasaraty, S.T. Nugent, P.G. Gregson, and D.F. Fay, "Automatic landmarking of cephalograms,"computers and Biomedical research, vol. 22, no. 3, 1989, pp [12] W. Tong, S.T. Nugent, P.H.Gregson, G.M. Jensen, and D.F. Fay, "Landmarking of cephalograms using a microcomputer system," Computers and Biomedical research, vol. 23, no. 4, 1990, pp [13] D.N. Davis and C.J.Taylor, "An intelligent segmentation system for lateral skull x-ray images," Proceedings of the Fifth Alvey Vision Conference,1989, pp [14] D.B. Forsyth and D.N. Davis, "Assessment of an automated cephalometric analysis system,"theeuropean Journal of Orthodontics, vol. 18, no. 5, 1996, pp [15] D.N. Davis and C.J. Taylor, "An Intelligent Visual Task System For Lateral Skull X-ray Images,"Proceedings of the British Machine Vision Conference, 1990, pp

6 [16] J. Ren,D.Liu, D. Feng, J. Shao, R. Zhao, Y. Liao, Z. Lin. Science, "A Knowledge-based Automatic Cephalometric Analysis Method," Proceedings of the 20th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, [17] D.N. Davis and C.J. Taylor, "Blackboard architecture for automating cephalometric analysis,"informatics for Health and Social Care, vol. 16, no. 2, 1991, pp [18] B. Romaniuk, M. Desvignes, M. Revenu, and M.J. Deshayes, G. Image, F. Caen, "Contour tracking by minimal cost path approach.application to cephalometry,"2004 International Conference on Image Processing, 2004, pp [19] A. Innes, V. Ciesielski, J. Mamutil, and S. John, "Landmark detection for cephalometric radiology images using pulse coupled neural networks," Int. Conf. on Artificial Intelligence, 2002, pp [20] J. Cardillo and M.A. Sid-ahmed, "An image processing system for the automatic extraction of craniofacial landmarks," Nuclear Science Symposium and Medical Imaging Conference,IEEE, vol. 3, 1991, pp [21] D.J. Rudolph, P.M. Sinclair, and J.M.Coggins "Automatic computerized radiographic identification of cephalometric landmarks,"american Journal of Orthodontics and Dentofacial Orthopedics, vol. 113, no. 2, 1998, pp [22] V. Grau, M.C. Juan, C. Monserrat, and C. Knoll, "Automatic localization of cephalometric landmarks," Journal of Biomedical Informatics, vol. 34, no. 3, 2001, pp [23] Y. Chen, K. Cheng, and J. Liul, "Feature subimage extraction for cephalogram landmarking," Proceedings of the 20th Annual International Conference of the IEEE Engineering in Medicine and Biology Society., vol. 20, 1998, pp [24] S. Chakrabartty, M. Yagi, T. Shibata, and G. Cauwenberghs, Robust cephalometric landmark identification using support vector machines," 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003.Proceedings. (ICASSP '03)., pp. II [25] S. Sanei, P.Sanei, M.Zahabsaniei, "Fuzzy detection of craniofacial landmarks," 6th International Conference on Image Processing and its Applications, 1997, pp [26] W. Yue, D. Yin, C.H. Li, G. Wang,andT.Xu, "Locating Large-Scale Craniofacial Feature Points on X-ray Images for Automated Cephalometric Analysis," IEEE, Los Alamitos, [27] I. El-Fegh, Y. Alginahi, M.A.Sid-Ahmed and M. Ahmadi, "Craniofacial Landmarks Extraction by Partial Least Squares Regression,"Proceedings of the 2004 International Symposium on Circuits and Systems, ISCAS'04, [28] D. Giordano, R. Leonardi, F. Maiorana, G. Cristaldi, M.L. Distefano, I. Informatica, U. Catania, and V.A. Doria, "Automatic Landmarking of Cephalograms by Cellular Neural Networks," Artificial Intelligence in Medicine, 2005, pp [29] R. Leonardi, D. Giordano, and F. Maiorana, "An evaluation of cellular neural networks for the automatic identification of cephalometric landmarks on digital images," Journal of biomedicine & biotechnology, vol. 2009, [30] R. Kafieh, S. Sadri, A. Mehri, and H. Raji, "Using a Combination of Model Based and Intelligent methods in Automatic Landmark Detection in Cephalometry," Innovations in Information Technologies (IIT), 2007, pp [31] I. El-Fegh, M. Galhood, M. Sid-Ahmed, and M. Ahmadi, "Automated 2-D cephalometric analysis of X-ray by image registration approach based on least square approximation.," Annual International Conference of the IEEE Engineering in Medicine and Biology Society., vol. 2008, 2008, pp [32] R. Kafieh, S. Sadri, A. Mehri, and H. Raji, "Discrimination of Bony Structures in Cephalograms for Automatic Landmark Detection," CSICC, 2008, pp [33] T. Joachims, "A Support Vector Method for Multivariate Performance Measures,"Proceedings of the 22nd international conference on Machine learning, 2005, pp [34] T. Joachims, "Training Linear SVMs in Linear Time,"Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, 2006, pp [35] N. Dalal and B. Triggs, "Histograms of oriented gradients for human detection," IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005.CVPR 2005, vol. 1, 2005, pp [36] D.G. Lowe, "Distinctive Image Features from Scale-Invariant Keypoints," International Journal of Computer Vision, vol. 60, no. 2, 2004, pp

Automatic Landmark Detection in Cephalometry Using a Modified Active Shape Model with Sub Image Matching

Automatic Landmark Detection in Cephalometry Using a Modified Active Shape Model with Sub Image Matching Rahele Kafieh, Alireza mehri Department of biomedical engineering Isfahahan University of medicine