Rendering-Based Video-CT Registration with Physical Constraints for Image-Guided Endoscopic Sinus Surgery

Similar documents
Endoscopic Reconstruction with Robust Feature Matching

shape modeling of the paranasal

Scale-Invariant Registration of Monocular Endoscopic Images to CT-Scans for Sinus Surgery

Automatic segmentation and statistical shape modeling of the paranasal sinuses to estimate natural variations

A Multi-view Active Contour Method for Bone Cement Reconstruction from C-Arm X-Ray Images

Improved Navigated Spine Surgery Utilizing Augmented Reality Visualization

Direct Endoscopic Video Registration for Sinus Surgery

Assessing Accuracy Factors in Deformable 2D/3D Medical Image Registration Using a Statistical Pelvis Model

High-Accuracy 3D Image-Based Registration of Endoscopic Video to C-Arm Cone-Beam CT for Image-Guided Skull Base Surgery

Self-Calibration of Cone-Beam CT Using 3D-2D Image Registration

Depth-Layer-Based Patient Motion Compensation for the Overlay of 3D Volumes onto X-Ray Sequences

Intraoperative Prostate Tracking with Slice-to-Volume Registration in MR

Sensor-aided Milling with a Surgical Robot System

Navigation System for ACL Reconstruction Using Registration between Multi-Viewpoint X-ray Images and CT Images

Object Identification in Ultrasound Scans

NIH Public Access Author Manuscript Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit. Author manuscript; available in PMC 2010 July 1.

Model-Based Validation of a Graphics Processing Unit Algorithm to Track Foot Bone Kinematics Using Fluoroscopy

2D-3D Registration using Gradient-based MI for Image Guided Surgery Systems

Evaluation and Stability Analysis of Video-Based Navigation System for Functional Endoscopic Sinus Surgery on In-Vivo Clinical Data

REAL-TIME ADAPTIVITY IN HEAD-AND-NECK AND LUNG CANCER RADIOTHERAPY IN A GPU ENVIRONMENT

Data Fusion Virtual Surgery Medical Virtual Reality Team. Endo-Robot. Database Functional. Database

Accurate 3D Face and Body Modeling from a Single Fixed Kinect

An Accuracy Approach to Robotic Microsurgery in the Ear

A Multiple-Layer Flexible Mesh Template Matching Method for Nonrigid Registration between a Pelvis Model and CT Images

Acknowledgments. Nesterov s Method for Accelerated Penalized-Likelihood Statistical Reconstruction for C-arm Cone-Beam CT.

Mech. Engineering, Comp. Science, and Rad. Oncology Departments. Schools of Engineering and Medicine, Bio-X Program, Stanford University

Learning-based Neuroimage Registration

Optimum Robot Control for 3D Virtual Fixture in Constrained ENT Surgery

An Endoscope With 2 DOFs Steering of Coaxial Nd:YAG Laser Beam for Fetal Surgery [Yamanaka et al. 2010, IEEE trans on Mechatronics]

An Iterative Framework for Improving the Accuracy of Intraoperative Intensity-Based 2D/3D Registration for Image-Guided Orthopedic Surgery

Response to Reviewers

1. INTRODUCTION ABSTRACT

Toward Video-based Navigation for Endoscopic Endonasal Skull Base Surgery

A New Method for CT to Fluoroscope Registration Based on Unscented Kalman Filter

Toward Video-Based Navigation for Endoscopic Endonasal Skull Base Surgery

Medicale Image Analysis

A model-based approach for tool tracking in laparoscopy

AUTOMATIC DETECTION OF ENDOSCOPE IN INTRAOPERATIVE CT IMAGE: APPLICATION TO AUGMENTED REALITY GUIDANCE IN LAPAROSCOPIC SURGERY

Advanced Visual Medicine: Techniques for Visual Exploration & Analysis

An Automated Image-based Method for Multi-Leaf Collimator Positioning Verification in Intensity Modulated Radiation Therapy

Computational Medical Imaging Analysis Chapter 4: Image Visualization

Validation System of MR Image Overlay and Other Needle Insertion Techniques

3D Ultrasound System Using a Magneto-optic Hybrid Tracker for Augmented Reality Visualization in Laparoscopic Liver Surgery

Segmentation and Tracking of Partial Planar Templates

Anatomically Constrained Video-CT Registration via the V-IMLOP Algorithm

Lecture 8: Registration

The team. Disclosures. Ultrasound Guidance During Radiation Delivery: Confronting the Treatment Interference Challenge.

Automatic Vascular Tree Formation Using the Mahalanobis Distance

Robot Control for Medical Applications and Hair Transplantation

Towards deformable registration for augmented reality in robotic assisted partial nephrectomy

Fully Automatic Endoscope Calibration for Intraoperative Use

Automatic Lung Surface Registration Using Selective Distance Measure in Temporal CT Scans

Occlusion Detection of Real Objects using Contour Based Stereo Matching

Calibration Method for Determining the Physical Location of the Ultrasound Image Plane

Probabilistic Tracking and Model-based Segmentation of 3D Tubular Structures

NIH Public Access Author Manuscript Proc Soc Photo Opt Instrum Eng. Author manuscript; available in PMC 2014 October 07.

Image Segmentation and Registration

3D Navigation for Transsphenoidal Surgical Robotics System Based on CT - Images and Basic Geometric Approach

SIGMI Meeting ~Image Fusion~ Computer Graphics and Visualization Lab Image System Lab

Recovery of 3D Pose of Bones in Single 2D X-ray Images

Scaling Calibration in the ATRACT Algorithm

Optimal Planning of Robotically Assisted Heart Surgery: Transfer Precision in the Operating Room

Assessing 3D tunnel position in ACL reconstruction using a novel single image 3D-2D registration

Prostate Detection Using Principal Component Analysis

Landmark-based 3D Elastic Registration of Pre- and Postoperative Liver CT Data

William Yang Group 14 Mentor: Dr. Rogerio Richa Visual Tracking of Surgical Tools in Retinal Surgery using Particle Filtering

Comparison of Different Metrics for Appearance-model-based 2D/3D-registration with X-ray Images

Structured Light II. Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov

Non-rigid 2D-3D image registration for use in Endovascular repair of Abdominal Aortic Aneurysms.

Georgia Institute of Technology, August 17, Justin W. L. Wan. Canada Research Chair in Scientific Computing

Penalized-Likelihood Reconstruction for Sparse Data Acquisitions with Unregistered Prior Images and Compressed Sensing Penalties

Parallelization of Mutual Information Registration

Intelligent Robots for Handling of Flexible Objects. IRFO Vision System

Towards Projector-based Visualization for Computer-assisted CABG at the Open Heart

AR Cultural Heritage Reconstruction Based on Feature Landmark Database Constructed by Using Omnidirectional Range Sensor

Nonrigid Surface Modelling. and Fast Recovery. Department of Computer Science and Engineering. Committee: Prof. Leo J. Jia and Prof. K. H.

3D Guide Wire Navigation from Single Plane Fluoroscopic Images in Abdominal Catheterizations

Direct Plane Tracking in Stereo Images for Mobile Navigation

Fiducial localization in C-arm based Cone-Beam CT

Markerless Endoscopic Registration and Referencing

A Navigation System for Minimally Invasive Abdominal Intervention Surgery Robot

Dense 3-D Reconstruction of an Outdoor Scene by Hundreds-baseline Stereo Using a Hand-held Video Camera

Prof. Fanny Ficuciello Robotics for Bioengineering Visual Servoing

Basic principles of MR image analysis. Basic principles of MR image analysis. Basic principles of MR image analysis

Medical Image Registration by Maximization of Mutual Information

8/3/2016. Image Guidance Technologies. Introduction. Outline

Measurement of 3D Foot Shape Deformation in Motion

Hybrid Spline-based Multimodal Registration using a Local Measure for Mutual Information

A System for Video-based Navigation for Endoscopic Endonasal Skull Base Surgery

CHAPTER-4 LOCALIZATION AND CONTOUR DETECTION OF OPTIC DISK

3D Computer Vision. Structured Light II. Prof. Didier Stricker. Kaiserlautern University.

Robot Localization based on Geo-referenced Images and G raphic Methods

The Concept of Evolutionary Computing for Robust Surgical Endoscope Tracking and Navigation

Leksell SurgiPlan. Powerful planning for success

GPU Based Region Growth and Vessel Tracking. Supratik Moulik M.D. Jason Walsh

Incorporation of Prior Knowledge for Region of Change Imaging from Sparse Scan Data in Image-Guided Surgery

Mobile Human Detection Systems based on Sliding Windows Approach-A Review

Application of level set based method for segmentation of blood vessels in angiography images

8/3/2017. Contour Assessment for Quality Assurance and Data Mining. Objective. Outline. Tom Purdie, PhD, MCCPM

A Study of Medical Image Analysis System

Transcription:

Rendering-Based Video-CT Registration with Physical Constraints for Image-Guided Endoscopic Sinus Surgery Y. Otake a,d, S. Leonard a, A. Reiter a, P. Rajan a, J. H. Siewerdsen b, M. Ishii c, R. H. Taylor a, G. D. Hager a Departments of a Computer Science, b Boimedical Engineering, c Otolaryngology Head and Neck Surgery, Johns Hopkins University, Baltimore MD, USA d Graduate School of Information Science, Nara Institute of Science and Technology, Nara, Japan ABSTRACT We present a system for registering the coordinate frame of an endoscope to pre- or intra- operatively acquired CT data based on optimizing the similarity metric between an endoscopic image and an image predicted via rendering of CT. Our method is robust and semi-automatic because it takes account of physical constraints, specifically, collisions between the endoscope and the anatomy, to initialize and constrain the search. The proposed optimization method is based on a stochastic optimization algorithm that evaluates a large number of similarity metric functions in parallel on a graphics processing unit. Images from a cadaver and a patient were used for evaluation. The registration error was 0.83 mm and 1.97 mm for cadaver and patient images respectively. The average registration time for 60 trials was 4.4 seconds. The patient study demonstrated robustness of the proposed algorithm against a moderate anatomical deformation. Keywords: Image-guided endoscopic surgery: rendering-based Video-CT registration: endoscopic sinus surgery. 1. INTRODUCTION Functional endoscopic sinus surgery (FESS) has evolved to represent the standard surgical treatment for sinonasal pathology [1, 2]. Navigation systems are used to maintain orientation during surgery because the anatomy is complex, highly variable due to stochastic embriologic development, and sinuses are adjacent to vital structures that have significant morbidities/mortality associated with them if injured. Commercial navigation systems, however, only provide qualitative sense of location because their 2 mm positioning errors under near ideal conditions [3] is insufficient for an environment featuring complex anatomical structures smaller than a millimeter, for example, for removal of structures within 1 mm thick bone right under the boundary to the brain, eye, major nerves or blood vessels. A recently proposed system using a passive arm for tracking the position and orientation of the endoscope [4] demonstrated promising results (~1.0 mm error) in non-clinical experiments. The physical presence of the arm and its physical connection with the endoscope, however, is cumbersome and adversely affects the ergonomics and workflow. Video-CT registration has been explored as a means of transcending the 2 mm accuracy limit by exploiting information contained in endoscope images [5]. The registration approaches are classified as reconstruction-based or rendering-based. The former [6] attempts to reconstruct a 3D surface from a series of images and then performs 3D-3D registration to a CTderived surface. The latter [7] generates simulated views by rendering surface models derived from patient CT images. It searches for the camera pose that produces the simulated view that best matches to the real video image. Although both approaches can significantly improve the accuracy of navigation systems, there are several challenges that must be addressed before they reach clinical acceptance. First, the methods must be robust to variations of appearance of tissue surface and distortion of the anatomy. Second the initial registration needs to be automated and be fast enough so that it does not significantly impair clinical workflow. This paper introduces a novel rendering-based video-ct registration system combining physical constraints, image gradient correlation, and a stochastic optimization strategy to improve robustness and provide an accurate and computationally efficient initial localization of the endoscope. Fig. 1. Flowchart for the proposed registration framework

2. METHODS 2.1 Overview of the proposed method The proposed registration method is summarized in Fig. 1. The framework follows the image-based 2D-3D registration [8] where an optimization algorithm searches for the camera pose that yields a rendered image (virtual endoscope) that has maximum similarity with the real endoscope image. As described below, the photometric rendering and similarity metric computation were implemented on GPU to accelerate the system. Ray p 1 p 0 p 1 Step size Camera center 1 4 2 3 p 0 (a) (b) (c) Figure 2. Iso-surface detection method, (a) concept of the level-of-detail (LOD) detection. Renderings (b) without and (c) with the LOD detection. 2.2. Rendering of virtual endoscope A traditional rendering algorithm [9] with a bidirectional reflectance distribution function (BRDF), shown in eq (1), was implemented on the GPU using CUDA (NVidia, Santa Clara CA). BRDF Lamb (p) = σ L cos θ s (p) πr(p) 2 (1) where L is intensity of the light source, s is the angle between the ray and surface normal, R is distance from light source to the surface and is a proportionality factor (albedo, camera characteristics, etc.). The surface normal was computed at the point where the ray intersects with an iso-surface of CT image defined by a specified threshold. A threshold of -300 HU was used in our experiments. Since the pixel size of the endoscope image is small compared to the CT voxel, a recursive search for the intersection point was used to avoid aliasing effects (Fig. 2). 2.3. Initialization using physical constraints One unique characteristic of FESS is that the workspace is long, narrow and surrounded by complex structures of the nasal cavity. Thus the set of endoscope poses that are feasible during the surgery can be estimated a priori based on the patient-specific anatomical structure in CT data. This is accomplished by sampling possible poses and computing collisions between a 3D model of the anatomy and a 3D model of the endoscope (a cylinder with 4 mm diameter in this study). More specifically, an iso-surface of sinus structure was extracted by thresholding the CT data, and the workspace of the endoscope tip was roughly outlined. Then, n endoscope poses were randomly generated within workspace (n=50,000 in this study). The orientation of each pose was determined randomly around the insertion direction. Collision between the sinus surface and the endoscope model was detected by the RAPID package [10]. Due to the complex anatomical structure of the sinuses, the computed collision-free poses form multiple islands in the 6-dimensional pose space (i.e., 3-DOF translation and 3-DOF rotation). These were grouped into m clusters using K-means clustering (m=20 in this study) and barycenter of the poses in each cluster was used as initial seed points for optimization. Note that this initialization was performed preoperatively based solely on the CT data and a 3D model of the endoscope. 2.4. Similarity metric We used gradient correlation [11] as the similarity metric in this study.

GC(I 0, I 1 ) = 1 2 {NCC ( d dx I 0, d dx I 1) + NCC ( d dy I 0, d dy I 1)} (2) NCC(I 0, I 1 ) = (I 0 I 0 )(I 1 I 1 ) (i,j) (i,j)(i 0 I 0 ) 2 (i,j)(i 1 I 1 ) 2 (3) The metric averages normalized cross correlation (NCC) between X-gradients and Y-gradients of the two images. In the endoscope image, the metric captures edges of occluding contours which are the main features in the virtual endoscope image derived from the CT volume. The gradient computation was implemented as a highly parallelizable kernel convolution on GPU. 2.5. Optimization The optimization problem was formulated as follows. T = argmax GC (I 1, I o (T(t x, t y, t z, θ x, θ y, θ z ))) (4) T S T where S T represents the space of feasible camera poses (i.e., collision-free poses). The objective function has multiple local optima due to structures that create similar contours in the image. To improve robustness against these local optima, a derivative-free stochastic optimization strategy, covariance matrix adaptation - evolutionary strategy (CMA-ES) [12], was employed. After generating sample solutions in each generation, any samples that caused collisions were rejected and replaced with a new sample solution. Table 1 summarizes parameters used in the optimization. The CMA-ES is especially beneficial on a parallel hardware such as a GPU since it allows multiple samples to be evaluated in parallel as opposed to intrinsically sequential algorithms such as simulated annealing or Powell s method. Computational efficiency is further improved by simultaneously executing multiple optimization trials from m initial poses computed in Section 2.3. Computation time (which is proportional to the number of function evaluations) increases proportionally with m; robustness against local optima also improves with increased m, since the density of the searched sample solution increased. The trade-off between the number of function evaluations and robustness generally holds in derivative-free optimization algorithms, thus we paid careful attention to its computational efficiency to improve robustness. Table 1. Summary of optimization parameter Downsampled image size 160 128 Number of multi-starts 20 Population size 100 Search range (tx, ty, tz, x, y, z) (5 mm, 5 mm, 5 mm, 15 o, 15 o, 15 o ). 2.6. Experiments Cadaver study. A cadaver specimen was used for quantitative evaluation of the proposed algorithm (Fig. 3) [13]. CT data (0.46 0.46 0.5 mm/voxel) and endoscope video images (1024 768, 30 Hz) were acquired. 21 gauge syringe needles were placed in the sphenoid sinus wall prior to the CT scan to create landmarks for quantitative error analysis. An offline camera calibration was performed using the German Aerospace Center (DLR) CalLab [14]. Fig. 3. (a) Experimental setup and rendering of bones and needles inserted in the wall of sinus as landmark points in evaluation, (b) endoscope image with landmarks (tip of the needles) Patent study. A retrospective study using CT data and endoscopic images recorded during surgery was performed to further investigate robustness and accuracy of the proposed method in the presence of texture of the mucosal layer and

soft tissue deformation. The study protocol was approved by our institutional review board. Quantitative evaluation of the accuracy was challenging due to absence of landmarks that were identifiable both in CT and the endoscope image (such as needle tip in cadaver study). In this study, we used the uncinate process as a landmark, because, although identification of the exact point correspondence was not possible, the edges of the sheet-like structure always appears as a line feature in the endoscope image. We manually identified 5 points along the line of uncinate process in the 2D image and established correspondence between each point and closest point along the edge segmented in 3D to compute error. Error metric. The target registration error (TRE) was defined as the distance between a landmark point in 3D and a ray emanating from the camera center in the direction of the target in the image [13]. 3. RESULTS 3.1. Cadaver study The result of the initialization process is summarized in Fig. 4. The collision-free poses were grouped into 20 clusters and barycenter of each cluster was used as an initial seed point in the optimization. The physical constraints based on collision detection restrict the search space which improved robustness of the optimization. The result is shown in Fig. 5. The average TRE over 7 landmark points was 0.83 mm. Figure 4. Initialization using collision detection. (a) Point cloud model of the sinus (cyan) and computed collision free poses clustered with K-means clustering. The line indicates center of one endoscope pose and each cluster colored in the same color, (b) virtual endoscope images at the barycenter pose of each cluster. Note that poses 8, 12, 14 shows nothing because the camera is very close to the wall, but not collided.

Fig. 5. Result of cadaver study. (a) Real endoscope, (b) virtual endoscope image at the predicted pose, (c) gradient image computed in the optimization 3.2. Patient study Proposed registration algorithm was tested on 60 consecutive endoscopic image frames. Fig. 6 shows the real endoscope images, predicted virtual endoscope images and the gradient images computed at the solution in three frames from the sequence. TRE over all frames was 1.97 1.05 mm (mean std). The gradient-based similarity metric successfully captured the occluding contours in both images, which led the registration to converge to the pose that provides approximately the same view as the real endoscope, even in the presence of clear deformation of the structure at the middle (middle turbinate) between preoperative CT and intraoperative endoscope. Our algorithm based on a stochastic optimization strategy was robust against these local optima in this case. However, the landscape depends on various factors including lighting, degree of deformation, quality of CT data, etc. Our preliminary investigation on sensitivity of the registration accuracy to computation time (Table 2) demonstrated the trade-off between them. As we increase the number of multistarts, the search takes longer time since larger number of sample solutions are evaluated, whereas the search converges to a better solution since the denser search finds a better local optimum in the multi-modal objective function space. Since the algorithm is highly parallelizable, runtime is expected to decrease proportionally to the number of parallel processor. Further investigation with different patients and different anatomy is underway. Fig. 6. Result of the proposed registration algorithm on image sequence from a patient study

4. CONCLUSIONS We have presented a method that is able to register endoscopic video image to CT image. The method maximizes a gradient-based similarity metric between an observed image and a predicted image. Our primary contribution has been twofold: 1) incorporation of physical constraint in the image-based registration algorithm for initialization and constraint of the search space, 2) utilization of a robust stochastic strategy in optimization by leveraging large number of function evaluations with GPU-acceleration. We have shown that tracking of the endoscope with the proposed algorithm can produce a registration in 4.4 seconds. Although this is too slow for real-time tracking, it is fast enough to produce an initial registration from which real-time tracking can begin using methods which may not be so robust for initial registration. Since the system relies on a rigid prior model, anatomical deformations, such as those produced by decongestants, create mismatches between predicted and observed images and generate local optima in objective function which could cause a failure. A possible solution to address this problem is intraoperative CT scanning [15] and/or simultaneous prediction of deformation and camera position. Table 2. Comparison of runtime and accuracy (50 registration trials were performed on one image from patient study.) # of multi-starts (m) 1 10 20 40 # of function evaluations 1938 20303 36132 71155 Runtime (sec) 0.88 2.78 4.60 8.37 Median TRE (mm) 2.37 1.16 1.14 1.13 ACKNOWLEDGEMENT This work is partially supported by NIH-1R01EB015530-01 grant. REFERENCES 1. Prulière-Escabasse, V. and A. Coste: Image-guided sinus surgery. European Annals of Otorhinolaryngology, Head and Neck Diseases. 127(1): p. 33-39, (2010). 2. Kassam, A.B., et al.: Endoscopic endonasal skull base surgery: analysis of complications in the authors' initial 800 patients. J Neurosurg. 114(6): p. 1544-68, (2011). 3. Thoranaghatte, R., et al.: Landmark-based augmented reality system for paranasal and transnasal endoscopic surgeries. Int J Med Robot. 5(4): p. 415-22, (2009). 4. Lapeer, R.J., et al.: Using a passive coordinate measurement arm for motion tracking of a rigid endoscope for augmentedreality image-guided surgery. Int J Med Robot. (2013). 5. Burschka, D., et al.: Scale-invariant registration of monocular endoscopic images to CT-scans for sinus surgery. Medical Image Analysis. 9(5): p. 413-426, (2005). 6. Mirota, D.J., et al.: Evaluation of a System for High-Accuracy 3D Image-Based Registration of Endoscopic Video to C- Arm Cone-Beam CT for Image-Guided Skull Base Surgery. IEEE Trans Med Imaging. 32(7): p. 1215-26, (2013). 7. Luo, X., et al., Robust Real-Time Image-Guided Endoscopy: A New Discriminative Structural Similarity Measure for Video to Volume Registration, in Information Processing in Computer-Assisted Interventions, p. 91-100 (2013). 8. Otake, Y., et al.: Intraoperative image-based multiview 2D/3D registration for image-guided orthopaedic surgery: incorporation of fiducial-based C-arm tracking and GPU-acceleration. IEEE Trans Med Imaging. 31(4): p. 948-62, (2012). 9. Bosma, M.K., J. Smit, and S. Lobregt. Iso-surface volume rendering. (1998). 10. Gottschalk, S., M.C. Lin, and D. Manocha, OBBTree: a hierarchical structure for rapid interference detection, in Proceedings of the 23rd annual conference on Computer graphics and interactive techniques1996, ACM. p. 171-180. 11. Penney, G.P., et al.: A comparison of similarity measures for use in 2-D-3-D medical image registration. IEEE Trans Med Imaging. 17(4): p. 586-95, (1998). 12. Hansen, N., The CMA Evolution Strategy: A Comparing Review, in Towards a New Evolutionary Computation, p. 75-102 (2006). 13. Mirota, D.J., et al.: A system for video-based navigation for endoscopic endonasal skull base surgery. IEEE Trans Med Imaging. 31(4): p. 963-76, (2012). 14. Strobl, K.H., et al. DLR CalDE and CalLab. Institute of Robotics and Mechatronics, German Aerospace Center (DLR). Available from: http://www.robotic.dlr.de/callab. 15. Chennupati, S.K., et al.: Intraoperative IGS/CT updates for complex endoscopic frontal sinus surgery. ORL J Otorhinolaryngol Relat Spec. 70(4): p. 268-70, (2008).