OPTIMIZATION METHODS IN INTENSITY MODULATED RADIATION THERAPY TREATMENT PLANNING

Size: px

Start display at page:

Download "OPTIMIZATION METHODS IN INTENSITY MODULATED RADIATION THERAPY TREATMENT PLANNING"

Jasper Holt
6 years ago
Views:

1 OPTIMIZATION METHODS IN INTENSITY MODULATED RADIATION THERAPY TREATMENT PLANNING By DIONNE M. ALEMAN A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 27 1

2 c 27 Dionne M. Aleman 2

3 To my ever-patient wife Nancy, and to my father Roberto, who, if not for the shortcomings of current cancer treatments, might still be with us today 3

4 ACKNOWLEDGMENTS Many thanks to Nancy Huang, Christopher Fox and Bart Lynch for so helpfully and happily explaining the physics of medical physics to me on a wide range of topics, even when those topics are not relevant to my own research. This work was supported in part by the NSF Alliances for Graduate Education and the Professoriate, the NSF Graduate Research Fellowship and NSF grant DMI

5 TABLE OF CONTENTS page ACKNOWLEDGMENTS LIST OF TABLES LIST OF FIGURES ABSTRACT CHAPTER 1 INTRODUCTION Intensity Modulated Radiation Therapy Treatment Planning Dissertation Summary Fluence Map Optimization Beam Orientation Optimization Fractionation Modeling the Dose Deposition of a Beam Contribution Summary Fluence map optimization Beam Orientation Optimization Fractionation Modeling the Dose Deposition of a Beam FLUENCE MAP OPTIMIZATION Introduction Literature Review Model Formulation Spatial Considerations A Primal-Dual Interior Point Algorithm for FMO Primal-Dual Interior Point Algorithm Hessian Approximations Single Hessian Approximation BFGS Hessian Update Insignificant Beamlets Warm Start Results How Small of a Duality Gap is Necessary? Computational Results Clinical Results Spatial Coefficient Results Warm Start Results Conclusions

6 3 BEAM ORIENTATION OPTIMIZATION Introduction Literature Review Model Formulation Mixed-Integer Model Formulation Beam Data Generation A Response Surface Approach to BOO Overview of Response Surfaces Determining the Next Observation Maximizing the expected improvement Obtaining an upper bound on the uncertainty Branch-and-Bound Method of Obtaining the Next Observation Neighborhood Search Introduction Neighborhood Search Approaches A Deterministic Neighborhood Search Method for BOO Neighborhood Definition Neighbor Selection Implementation Simulated Annealing Neighborhood Definition Neighbor Selection Implementation Convergence A New Neighborhood Structure Results Evaluating Plan Quality Target coverage Critical structure sparing Response Surface Method Results Proof of concept Adding a non-coplanar beam to a coplanar solution Clinical results Neighborhood Search Method Results Add/Drop algorithm results Simulated Annealing results Clinical results Conclusions and Future Directions Response Surface Conclusions Neighborhood Search Conclusions

7 4 FRACTIONATION Introduction Model Formulation Results Computational Results Clinical Results Spatial Coefficient Results Conclusions and Future Directions A MONTE CARLO METHOD FOR MODELING DOSE DEPOSITION Introduction Monte Carlo Engine Dose Distribution of a Beamlet Depth-Dose Curve Lateral Penumbra Methodology to Model a Beamlet Modeling the Depth-Dose Curve Modeling the Lateral Penumbra Results Conclusions and Future Directions REFERENCES BIOGRAPHICAL SKETCH

8 Table LIST OF TABLES page 2-1 Average run times for 5-beam treatment plans FMO value obtained using ɛ = Comparison of duality gaps Performance measures of interior point method warm starts Performance measures of projected gradient method warm starts Sparing criteria varies for each critical structure Sizes of test cases Minimum FMO value obtained and time required to obtain it Target coverage achieved by the treatment plans Percentage of plans in which an organ is spared Definitions of implementations Case sizes and run times using identical algorithm and weighting parameters Sparing criteria varies for each critical structure Computation times in minutes of Monte Carlo simulations Computation times for dose distribution fits Variation of fits

9 Figure LIST OF FIGURES page 2-1 Progression of duality gap Dose received by targets as a function of the duality gap Dose received by saliva glands as a function of the duality gap Quality of DVHs for various duality gaps The spatial coefficients used for two cases Comparison of spatial and non-spatial treatment plans Comparison of spatial and non-spatial treatment plans A linear accelerator and the available movements FMO value as a function of two angles Initial regions Partitioning a region into subregions Accounting for symmetry The flip neighborhood Selection probabilities in N h (θ) and Nh F (θ) Proof of concept results Comparison of response surface, Add/Drop and equi-spaced targets Comparison of response surface, Add/Drop and equi-spaced targets Add/Drop and simulated annealing comparison of FMO convergence Comparison of Add/Drop and 7-beam equi-spaced plans Comparison of simulated annealing and 7-beam equi-spaced plans Target DVHs, saliva DVHs and axial slices in Fractions 1 and Target DVHs, saliva DVHs and axial slices in Fractions 1 and Target DVHs, saliva DVHs and axial slices in Fractions 1 and Target DVHs, saliva DVHs and axial slices in Fractions 1 and Target DVHs, saliva DVHs and axial slices in Fractions 1 and

10 4-6 Target DVHs, saliva DVHs and axial slices in Fractions 1 and Target DVHs, saliva DVHs and axial slices in Fractions 1 and DVHs and axial slices in Fractions 1 and 2 using spatial coefficients DVHs and axial slices in Fractions 1 and 2 using spatial coefficients DVHs and axial slices in Fractions 1 and 2 using spatial coefficients DVHs and axial slices in Fractions 1 and 2 using spatial coefficients DVHs and axial slices in Fractions 1 and 2 using spatial coefficients DVHs and axial slices in Fractions 1 and 2 using spatial coefficients DVHs and axial slices in Fractions 1 and 2 using spatial coefficients Dose distribution of a single beamlet in various tissues Colorwash of the lateral penumbra of a finite sized pencil beam Plot of the lateral penumbra of a finite sized pencil beam Observed depth-dose curve in water for several histories Polynomial fits of several histories Variation of polynomial fit as function of degree An error function and an error function pair Lateral penumbra for several numbers of Monte Carlo histories Error function fits of several histories Error function pairs summed to approximate a beamlet in water Depth-dose curves in muscle tissue Lateral penumbra curves in muscle tissue Depth-dose curves in lung tissue Lateral penumbra curves in lung tissue Depth-dose curves in heterogeneous muscle and lung tissue Variation of fits as a function of number of histories

11 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy OPTIMIZATION METHODS IN INTENSITY MODULATED RADIATION THERAPY TREATMENT PLANNING Chair: H. Edwin Romeijn Major: Industrial and Systems Engineering By Dionne M. Aleman December 27 The design of a treatment plan for intensity modulated radiation therapy a mathematical programming problem which is not yet satisfactorily solved. Current techniques include dividing the problem into several subproblems, which are then solved sequentially. My research addresses several of these subproblems, particularly, beam orientation optimization (BOO), fluence map optimization (FMO) and fractionation. The integration of the BOO and FMO subproblems is considered, as well as improved techniques to model the dose deposition of a beamlet. 11

12 CHAPTER 1 INTRODUCTION 1.1 Intensity Modulated Radiation Therapy Treatment Planning Every year, approximately 1.4 million people in the United States alone are newly diagnosed with cancer (American Cancer Society, [1]). More than half of these patients will receive some form of radiation therapy (Murphy et al. [2], Perez and Brady [3]), and approximately half of these patients may significantly benefit from conformal radiation therapy (Steel [4]). During this therapy, beams of radiation pass through a patient, thereby killing both cancerous and normal cells. Although some patients die of their disease despite sophisticated treatment methods, many patients may suffer unpleasant side effects as a result of the radiation therapy which may severely detract from the patient s quality of life. Thus, the radiation treatment must be carefully planned so that a clinically prescribed dose is delivered to targets containing cancerous cells so that the cancer will be eradicated. Simultaneously, a small enough dose must be delivered to the nearby organs and tissues (called critical structures) so that they may survive the treatment. This is achieved by irradiating the patient using several beams sent at different orientations spaced around the patient so that the intersection of these beams includes the targets, which thus receive the highest radiation dose, whereas the critical structures receive radiation from some, but not all, beams and may thus be spared. Currently, a technique called intensity modulated radiation therapy (IMRT) is considered to be the most effective radiation therapy for many forms of cancer. The problem of designing an IMRT treatment plan for an individual patient is a large-scale mathematical programming problem that is not yet solved satisfactorily. Current treatment planning systems decompose the planning problem into several stages, and the corresponding subproblems are solved sequentially. These subproblems include determining the number and orientation of the beams of radiation, the radiation dose 12

13 distribution of each beam and the decomposition of a single treatment plan into several smaller fractions. This work addresses the integration of the beam orientation optimization (BOO) and fluence map optimization (FMO) subproblems based on a convex formulation of the latter and associated efficient algorithms for solving it, an approach which has not received much attention in previous studies. The fractionation problem, the problem of dividing a single treatment plan into the 35 treatments (fractions) the patient will actually receive, is also addressed. Also, the problem of modeling the dose deposition of a beam is also considered. 1.2 Dissertation Summary In IMRT, each beam is modeled as a collection of hundreds of small beamlets, the fluences of which can be controlled individually. These fluence values are known as a fluence map, and optimization of these fluences given a fixed set of beams is known as fluence map optimization. The optimal solution value of the FMO problem quantifies the quality of the treatment plan, where quality means the ability of the plan to deliver the prescribed radiation dose to the specified target structures while sparing critical structures by ensuring that they receive an acceptably low amount of radiation. Thus, the quality of a set of beams can be measured by the optimal solution of the FMO problem performed with those beams. Thus, the problem of selecting the best directions from which to deliver radiation to the patient (the BOO problem) is based on the treatment plan quality indicated by the optimal solution value to the corresponding FMO problem Fluence Map Optimization One of the most popular subproblems of the intensity modulated radiation therapy (IMRT) treatment planning problem is the fluence map optimization (FMO) problem. In IMRT, each beam of radiation can be discretized in hundreds of smaller beamlets, the radiation intensities (fluences) of which can be modulated independently of the other beamlets. For a given set of beams, the beamlet fluences can greatly influence the quality of the treatment plan, that is, the ability of the treatment to deposit the prescibed amount 13

14 of dose to cancerous target structures while simultaneously delivering a small enough dose to critical structures so that they may continue to function after the treatment. These fluence values are known as a fluence map, and optimization of these fluences given a fixed set of beams is known as fluence map optimization. Because the fluences of the beamlets can drastically affect the quality of the treatment plan it is critical to obtain good fluence maps for radiation delivery. As the FMO problem is one of the most popular subproblems in IMRT optimization, it has been extensively studied in the literature. Several problem structures and algorithms to solve various models are presented in many studies Beam Orientation Optimization In a typical head-and-neck treatment plan, radiation beams are delivered from 5-9 nominally-spaced coplanar orientations around the patient. These coplanar orientations are obtained from rotating the gantry only. Several components of a linear accelerator can rotate and translate to achieve more orientations than those obtained from rotating the gantry. The available orientations consist of the orientations obtained from rotation of the gantry, collimator and couch, as well as the three translation directions of the couch. Beam orientation optimization (BOO) is the problem of selecting from the available beam orientations the best set to use in delivering a treatment plan. Given a fixed set of beams, different fluence maps (radiation intensities of beamlets) yield treatment plans with different qualities. Therefore, the quality of an optimized fluence map should be considered when selecting a set of beam orientations to use in a treatment plan. Optimal fluence maps may be difficult to obtain depending on the FMO model. Thus, it is common in the literature for scoring approximations and other heuristics to be used to estimate the quality of a beam solution. Regardless of the objective function used in the BOO problem, the problem is fundamentally nonlinear as the physics of dose deposition change with direction. Because nonlinear programming problems are difficult to solve, most approaches to the BOO 14

15 problem rely on global search algorithms to obtain a solution, which may or may not be optimal Fractionation An important subproblem related to the FMO problem which has not yet received much attention is the fractionation problem. Rather than deliver an entire treatment plan in one session, a treatment plan is divided into several sessions, called fractions. This is done to take advantage of the fact that normal, healthy cells recover faster from the radiation than cancerous cells. To obtain the treatment plans for the fractions, in practice, a single FMO treatment plan is developed and then divided into the desired number of fractions, usually around 35. This division of a treatment plan is a non-trivial task, as the target voxels, geometric cubes of tissue, must receive Gy of radiation in each fraction. With a single IMRT treatment plan, it is practically impossible to devise a constant dose-per-fraction delivery technique because only a single FMO problem is solved to obtain the treatment plan, which is then simply divided into a number of daily fractions. If a single plan is optimized to deliver doses to multiple target-dose levels, then the dose per fraction delivered to each target must change in the ratio of a given dose level to the maximum dose level. For example, say PTV1 has a prescription dose of 7 Gy, PTV2 has a prescription dose of 5 Gy, and the number of fractions is 35. If a single treatment plan is divided among the 35 fractions, then PTV1 will receive 7/35 = 2. Gy in each fraction, but PTV2 will only receive 5/35 = 1.4 Gy, and thus any cancerous cells in PTV2 may not be eradicated by the treatment. Similarly, if only 25 fractions are used in order to ensure that PTV2 receives 2. Gy per fraction, then PTV1 receives 7/25 = 2.8 Gy per fraction, well above the desired dose Modeling the Dose Deposition of a Beam The FMO problem is arguably the most significant in determining the quality of the treatment plan. The FMO problem depends heavily on the calculation of dose 15

16 received in each voxel of a patient. This dose is typically approximated by assuming a linear relationship with the radiation intensities of the beamlets delivering the radiation. Although this approximation is accepted as satisfactory, it is not truly accurate. The dose in a voxel is determined by the paths the photons in the radiation beams follow through the patient. Some photons may collide with particles inside the patient and scatter in any direction, while others may collide with particles and be absorbed. Still other photons may pass entirely through the patient with no collisions. Due to the unpredictable nature of the radiation beam inside the patient, the dose received in a voxel can only be accurately obtained through Monte Carlo simulations. A simple linear relationship is assumed between total dose and beamlet fluences and is commonly accepted as a satisfactory dose approximation in IMRT optimization. Errors of as much as 3% have been reported for photon beams near tissue inhomogeneities (Ma et al. [5]). For IMRT optimization, particularly with advent of image-guided IMRT (IGIMRT), or 4D IMRT, the FMO problem must be solved extremely quickly to create real-time treatment plans. Thus, the speed of the FMO problem is paramount. Lengthy Monte Carlo simulation can provide an accurate measure of the dose deposited in a voxel, but this technique is time intensive and impractical for clinical use and particularly for treatment planning optimization. 1.3 Contribution Summary Fluence map optimization Nonlinear functions to approximate biological behavior and desired dose distributions are common in the previously proposed FMO models in the literature, as are mixed-integer programming models. These models can be difficult and computationally expensive to solve. To make the FMO problem more tractable, we employ a model with a convex objective function and linear constraints. This desirable structure allows our model to be solved quickly and to optimality with the primal-dual interior point algorithm we have developed specifically for this problem. 16

17 One of the greatest benefits of an interior point algorithm is that a globally optimal solution can be found for many problem structures, and in particular, convex problem structures. As our FMO model is convex, the interior point algorithm can locate the globally optimal solution to within a specified duality gap. While there are other algorithms that can theoretically return a globally optimal solution to a convex problem (and many algorithms that cannot), interior point methods have the advantage of providing a known duality gap and generally fast computation times. Because the duality gap is known in each iteration, the user can make knowledgeable trade-offs between computation time and solution optimality without having to guess how far from the optimum the final solution may be. This allows for a scientific comparison of different IMRT delivery techniques as we can solve the different problems to a specific duality gap. Several alterations to the standard primal-dual interior point method were made to improve its performance. Beamlets that are likely to have little or no contribution to the treatment plan are removed a priori and different approximations to the objective function Hessian are tested to save time in calculating the true Hessian in each iteration. The use of warm starts to initialize the interior point method is also examined. The solutions obtained provide quality treatment plans in a clinically feasible amount of time. The incorporation of spatial information into the FMO model is also considered. The probability of tumor metastasis increases with proximity to gross tumor mass. By using the distances of voxels from target structures, the voxels can be weighted according to their importance in the treatment plan. For example, it should be less important to spare saliva gland voxels near a target structure than it should be to spare saliva gland voxels far from a target. The use of spatial coefficients will help the model identify quality treatment plans that will prevent future metastasis Beam Orientation Optimization For head-and-neck cancers, typical IMRT treatment plans use 5-9 equi-spaced coplanar beams. Coplanar beams are those beams obtained from the rotation of only 17

18 the gantry of the linear accelerator, the machine which delivers radiation beams to the patient. If all other components of the linear accelerator are fixed, the rotation of the gantry sweeps out a set of coplanar beams. The couch can rotate and translate in three dimensions, and the head of the gantry can rotate independently, creating an even larger set of beams. Beams obtained from the movement of more than one component from the linear accelerator are known as non-coplanar beams. Intuitively, one may expect that the number of beams required for a high-quality treatment plan can be reduced, or the quality of the treatment plan for a given number of beams can be improved, if the beam orientations are chosen optimally and/or from a larger set. In particular, we investigate the effect of considering more coplanar or non-coplanar beams. A treatment plan consisting of fewer beams is preferable because the number of beams used in a plan directly affects the length of the actual treatment. If fewer beams are used to treat a patient, then each treatment takes less time and more patients can be treated in a day, which is beneficial from both a clinical and economic perspective. Longer treatment times also allow for more errors due to possible patient motion. We view the BOO problem in IMRT treatment planning as a global optimization problem with expensive objective function evaluations, each of which involves solving a FMO problem. We propose a response surface method that, unlike other approaches, allows for the generation of problem data only for promising beam orientations on-the-fly as the algorithm progresses, enabling the consideration of far more candidate orientations than is currently feasible. Our response surface approach to BOO allows us to develop high quality plans using just four beams for head-and-neck cases, in contrast to the current practice of using 5-9 beams. The response surface method also provides for convergence to the globally optimal solution. We have developed neighborhood search methods to solve our BOO model. One method is simulated annealing, a proper global optimization algorithm, and the other 18

19 is a local search heuristic designed specifically for the BOO problem. The local search heuristic, which we call the Add/Drop method, returns a locally optimal solution in a small amount of time. The simulated annealing algorithm has the ability to escape local minima, and is theoretically able to return a globally optimal solution given enough time. For each of these algorithms, we have devised a new neighborhood structure based on observations of known optimal BOO solutions compared to the simulated annealing and Add/Drop BOO solutions. This new neighborhood structure provides faster objective function value convergence in both algorithms Fractionation In practice, a single FMO treatment plan is developed and then divided into the number of desired fractions. Dividing a single FMO into multiple treatments is a non-trivial task, owing to the need of maintaining a constant dose-per-fraction to each the target structures, which may have different prescription doses. Therefore, any division of a single FMO plan into multiple fractions can lead to suboptimal treatments. We propose a new method of formulating the fractionation problem which yields optimal fluence maps for each cancerous target structure. These fluence maps can then be easily divided into optimal fractions. The proposed fractionation model is solved using the same primal-dual interior point method presented for the FMO problem. The solutions provide high quality fluence maps for each target, and in a clinically acceptable amount of time Modeling the Dose Deposition of a Beam We propose obtaining a limited number of Monte Carlo histories to obtain a noisy dose distribution which can then be transformed into a very accurate, smooth dose distribution suitable for optimization techniques in a reasonable amount of time. Because the particles in a beamlet scatter in three dimensional space, multiple dose distributions must be considered to satisfactorily model the beamlet s affect on the patient s tissue. These distributions arise from the amount of radiation the beamlet 19

20 deposits as a function of depth (the depth-dose curve), and from the amount of radiation radiating outward from the center of the beamlet (the lateral penumbra). The depth-dose curve is modeled using a high-degree polynomial and the lateral penumbra is modeled as the sum of error functions. The parameters of the error functions are determined using a Levenberg-Marquardt quasi-newton minimization method. Using these techniques, dose distributions with satisfactory accuracy can be obtained using at least a factor of 1 fewer Monte Carlo histories than would otherwise be required. This can greatly decrease the amount of time required to obtain dose data for beamlets in the FMO problem of IMRT treatment planning without sacrificing accuracy. 2

21 CHAPTER 2 FLUENCE MAP OPTIMIZATION 2.1 Introduction IMRT is differentiated from conformal radiation therapy by the dose distributions that can be delivered by each beam. Rather that just delivering a uniform radiation field of radiation, the dose distribution of a beam can be any desired distribution. This ability allows for greater flexibility and accuracy in targeting the target structures while avoiding the critical structures. The dose distribution of a beam is achieved as follows. In IMRT, each beam can be thought of as consisting of several hundred smaller beamlets, each of which can have its own radiation intensity (fluence) independent of its neighbors. By modulating the intensities of these beamlets, any dose distribution can be achieved. Given a fixed set of beams, the optimization of these intensities is called fluence map optimization. 2.2 Literature Review Because the FMO problem is one of the most studied problems of IMRT, many different approaches have been taken to formulate the FMO problem, based on both physical (Bortfeld [6]) and biological (Alber and Nusslin [7], Jones and Hoban [8], Kallman et al. [9], Mavroidis et al. [1], Niemierko et al. [11], Niemierko [12], Wu et al. [13, 14]) objective functions and constraints. Linear programming (LP)-based multi-criteria optimization (Hamacher and Küfer [15]) and mixed-integer linear programming (MILP) (Bednarz et al. [16], Ferris et al. [17], Langer et al. [18, 19], Lee et al. [2, 21], Shepard et al. [22]) models have been proposed for FMO. Constraints to enforce various measures of treatment quality are also taken into account in different FMO models. Hamacher and Küfer [15] include the homogeneity of the dose received by the targets as a constraint in their FMO model. Full-volume constraints, which require that the dose in every voxel of a structure be within pre-determined upper and lower bounds, are common for controlling the dose in each structure. Models 21

22 employing full-volume constraints are found in Bednarz et al. [16], Hamacher and Küfer [15], Lee et al. [2, 21], Romeijn et al. [23] and many others. Models containing partial volume constraints, constraints requiring that dose in only a subset of voxels be within pre-determined upper and/or lower bounds, are also common. Formulations with partial volume constraints are found in Lee et al. [2, 21], Romeijn et al. [23, 24] and Shepard et al. [22]. In addition to varying constraints, there are many competing methods of formulating the FMO objective function to reflect the quality of the treatment plan. Shepard et al. [22] describe several different objective formulations. These formulations include minimizing the sum of doses received at all voxels; minimizing a weighted combination of doses received at each voxel, where the weights depend on the structure in which the voxel resides; and minimizing the deviation of the dose in each voxel from the recommended prescription. Romeijn et al. [25] showed that most of the treatment plan evaluation criteria proposed in the medical physics literature are equivalent to convex penalty function criteria when viewed as a multicriteria optimization problem. For each set of treatment plan evaluation criteria from a very large class, there exists a class of convex penalty functions that produces an identical Pareto efficient frontier. Therefore, a convex penalty function-based approach to evaluating treatment plans is used to investigate the BOO problem. Although this approach could be used in a multicriteria setting, Romeijn et al. [23, 26] suggest that it is possible to quantify a trade-off between the different evaluation criteria that produces high-quality treatment plans for a population of patients, eliminating the need to solve the FMO problem as a multicriteria optimization problem for each individual patient. 2.3 Model Formulation A convex penalty function-based approach to the FMO model as described in Romeijn et al. [23] is employed to quantify the quality of the treatment plan by appropriately 22

23 making the trade-off between delivering the prescribed radiation dose to the target structures while sparing the critical structures. Using this approach, the FMO problem can formulated as a quadratic programming problem with linear constraints as follows. Denote the set of all potential beam orientations as B. The structures (both targets and critical structures) are irradiated using a predetermined set of beam angles, denoted θ, where each beam θ h B, h = 1,..., k and k is the number of beams in θ. Each beam is decomposed into a rectangular grid of beamlets with m rows and n columns, yielding typically 1-4 beamlets per beam. The position and intensity of all beamlets in a beam can be represented by a vector of values representing the beamlet intensities, called bixels. The set of all bixels in beam θ h is denoted by B θh. The core task in IMRT treatment planning is finding radiation intensities for all beamlets. Denote the total number of structures by S and the number of targets by T. Each structure s is discretized into a finite number v s of volume cubes, known as voxels. Typically, around 35, voxels are required to accurately represent the targets and surrounding structures of a head-and-neck cancer site. Because a beamlet must pass through a certain amount of tissue to reach a voxel, the dose received in a voxel from a beamlet may not be the full delivered intensity. Denote D ijs as the dose received by voxel j in structure s from beamlet i at unit intensity. The D ijs values are known as dose deposition coefficients. Let x i denote the intensity of bixel i. This brings us to the following expression for the dose z js received by voxel j in structure s: z js = k h=1 i B θh D ijs x i j = 1,..., v s, s = 1,..., S Although the goal of IMRT treatment planning is to control the dose received by each structure, if hard constraints are imposed on the amount of dose received by each structure because such a solution may not exist. In some cases, it may be necessary to sacrifice organs in order to treat targets, and if that possibility is not allowed in the model, then a feasible or a satisfactory solution may not exist. Thus, in our model, a penalty is 23

24 assigned to each voxel based on the dose it receives for a given set of beamlet intensities. Let F js denote a convex penalty function for voxel j in structure s of the follwing form: F js (z js ) = 1 v s ( w s [ (Ts z js ) +] p s + w s [ (zjs T s ) +] p s ), where T s is the dose threshold value for structure s, w s and p s are weighting factors for underdosing, and w s and p s are weighting factors for overdosing. The expression ( ) + denotes max{, }. The function is normalized over the number of voxels in the structure using the coefficient 1/v s. By setting w s, w s and p s, p s 1, convexity is ensured. A basic formulation of the FMO problem is then: S v s minimize F js (z js ) s=1 j=1 k subject to z js = h=1 x i D ijs x i i B θh j = 1,..., v s, s = 1,..., S i B θh, h = 1,..., k The FMO problem is the black-box function F (θ) in the BOO model to quantify the quality of beam vector θ. In contrast with the methods presented by all of the previously cited FMO studies except for Das and Marks [27], Haas et al. [28] and Schreibmann [29], this measure of beam vector quality is an exact measure of the FMO problem, rather than using heuristic methods or scoring approaches which cannot accurately optimize the beam orientations. 2.4 Spatial Considerations With IMRT optimization, it is possible to generate treatment plans with similar FMO objective function values but very different levels of clinical treatment quality. Chao et al. 23 [3] illustrate this possibility with two treatment plans that have nearly identical target coverage when plotted on a dose-volume histogram, but while one plan delivers an acceptable homogeneous dose, the other plan results in significant underdosing of the target structure. 24

25 Chao et al. 23 [3] show that the probability of microscopic tumor extension decreases linearly with distance from the gross tumor volume, implying that cold spots located near the gross tumor volume are far more likely to allow for tumor metastasis after treatment. Likewise, cold spots located far from the gross tumor volume are unlikely to result in tumor metastasis. To reduce the likelihood of obtaining an unsatisfactory plan with a good dose-volume histograms, spatial coefficients are introduced into the FMO model. For each voxel, we consider its position relative to the primary target as a measure of how acceptable/unacceptable overdosing or underdosing may be. Voxels further from the gross tumor volume are penalized more heavily than voxels closer to the gross tumor because it is less acceptable for a voxel far away from the actual tumor to receive an overdose, as the cancerous cells are unlikely to spread very far from the tumor location (Chao et al. [3]). This additional penalization is called the spatial coefficient, and is denoted c js for voxel j in structure s. For voxels inside the target structures, the probability of cancer spread is 1, as cancer already exists in those voxels. Let S denote the set of gross tumor structures. Let d ljs be the minimum distance from voxel j in structure s to structure l. The spatial coefficient c js for voxel j in structure s is 1 j = 1,..., v s, s / S c js = { { min 1, max.1, }} S l=1 [exp ( λ ld ljs ) + µ l d ljs + β l ] j = 1,..., v s, s S, where λ l, µ l and β l are weighting coefficients. The objective function for the FMO problem becomes F spatial (x) = S v s c js F js (z js ) s=1 j=1 2.5 A Primal-Dual Interior Point Algorithm for FMO To solve the FMO and fractionated FMO models, a primal-dual interior point method is employed. For a convex problem such as the FMO model presented in the preceding section, this method yields an optimal solution in short amount of time. 25

26 The primal-dual interior point algorithm moves through the interior of the solution space along a central path (a path through the interior of the solution space) toward the optimal solution. The central path is defined by perturbing the KKT conditions described below. These conditions ensure primal feasibility, dual feasibility and complementary slackness. If these conditions are satisfied for a convex programming problem with linearly independent constraints, they yield the optimal solution. Thus, we only need to solve this system to obtain an optimal solution to our FMO model (which has a convex objective function and linear, linearly independent constraints). The KKT system can be difficult to solve, so the conditions are perturbed in order to obtain a solution. The general idea of the primal-dual interior point algorithm is to start from an initial feasible solution, use the perturbed KKT conditions to obtain a step direction close to the central path, and then move the current solution some step length along that direction. The amount of pertubation in the KKT conditions is gradually decreased so that in each step, the solution becomes closer to the optimum. The interior point method allows for the duality gap, the gap between the objective functions of the primal and dual problems, to be calculated, thus providing a measure of how close the current solution is to the optimum. For a problem with continuous variables, when the objective functions of the primal and dual problems are equal (duality gap of zero), the solution is optimal. A mathematical description of the primal-dual interior point method can be found in Nocedal and Wright [31]. Further explanation is provided only as needed to define variables in the algorithm. In the FMO problem, G(x) = Ix, so the KKT conditions for the FMO formulation are s S 1 D ij F j v s j V s ( ) D lj x l s i = i N. (2 1) l N s i x i = i N. (2 2) s i i N (2 3) x i i N, (2 4) 26

27 where the Equation (2 4) ensures that the solution is feasible, as the only constraints in the FMO problem are nonnegativity. The complimentary slackness constraint (2 2) forces the solution to the above conditions to be on the boundary of the solution space. Since a point in the interior of the solution space is desired, the complimentary slackness constraint must be relaxed. The complimentary slackness constraint (2 2) is relaxed by changing each s i x i = to s i x i = µ, where µ >. This, along with requiring that x > and s > for feasibility, ensures that a solution to the perturbed KKT conditions is an interior point. Let n be the size of decision variable vector x. A solution is close enough to the central path if the duality measure µ in iteration k is µ k = (xk ) s n and X k S k µ k e θµ k, where X k is a matrix with x k i values as diagonals and zeros (2 5) elsewhere, and S k is a matrix with s k i values as diagonals and zeros elsewhere. As the algorithm progresses, µ is reduced to zero until the solution is sufficiently close to optimality. To reduce µ, in each iteration we set µ = µσ, where σ [, 1] is called the centering parameter. If the duality gap is very large, σ can be reduced so that µ is reduced faster. In each iteration, the current solution (x, s) is moved in a direction ( x, s) for some step length α is given by xk+1 s k+1 = xk s k + α xk s k Let X k = diag(x k ), S k = diag(s k ), H(x k ) = 2 φ(x k ). The directions x k and s k can be determined by solving the following equations: [ (X k ) 1 S k + H(x k )] x k = r DF ( X k) 1 rxs (2 6) s k = ( X k) 1 ( rxs + S k x k) (2 7) 27

28 In order to solve this system, we must obtain x k from Equation (2 6) by taking the inverse of [(X k ) 1 S k + H]. Because computing the inverse of such a large dense matrix is very time consuming, a Cholesky factorization to solve this system quickly. The primal-dual interior point method requires a feasible (x, s) solution in each step. Thus, a maximum step length α max must be imposed on each step direction to ensure that x and s : { } α max = min min { x i/ x i }, min { s i/ s i } i=1,...,n i=1,...,n Because the inverse of each x i is required to determine the step directions, it is undesirable to have any x i =, which would result from using step length α max. Instead, only a percentage η < 1 of α max is used: α = min{1, ηα max } (2 8) The benefit of this primal-dual method is that in each step, we can calculate the objective of the dual problem (simply s x), thus providing a bound on how far the current solution is from optimality Primal-Dual Interior Point Algorithm The primal-dual interior point algorithm is as follows: Initialization 1. Select initial values for ɛ, σ and η (we use ɛ = 5, σ =.1, and η =.95). 2. Set x =.5 (very close to ) and calculate φ(x ) and H(x ) = 2 φ(x ). 3. Set s = µ(x ) Set µ = ( n i=1 φ(x ) i )/1. 5. Set k =. Algorithm 1. If the duality gap is very large ((x k+1 ) s k+1 > 1 7 ɛ), set σ =.1σ. 2. Set µ k = σµ k. 28

29 3. Solve for the step direction ( x k, s k ) as described in Equations (2 6) and (2 7). Note that this involves calculating the Hessian H(x k ). 4. Solve for the step length α as described in Equation (2 8). 5. Set x k+1 = x k + α x and s k+1 = s k + α s. 6. If the duality gap (x k+1 ) s k+1 < ɛ, stop. Otherwise, set µ k+1 = (x k+1 ) s k+1 /n and k k + 1 and repeat Hessian Approximations The most time-consuming step in the primal-dual interior point algorithm is calculating the Hessian of the objective function in each iteration. For clarity, let denote s S 1/v s j V s and F j (x) denote F j ( l N D ljx l ). The Hessian of the FMO problem is then given by H(x) = F j (x)d 1j D nj F j (x)d1j F j (x)d nj D 1j... F j (x)d 2 nj Note that only the pairwise D ij products differ in each element of the Hessian. By precomputing these cross products, only s S 1/v s j V s F j ( l N D ljx l ) has to be recomputed in each iteration. The matrix of the D ij products yields the sparsity (or density) pattern of the Hessian, which stays constant throughout the algorithm. Because the Hessian is symmetric, the matrix values only need to be computed for half of the matrix, further improving efficiency. Despite these observations, computing the Hessian is still so expensive that it renders the algorithm impractical. Methods of approximating the Hessian are implemented to speed up the algorithm Single Hessian Approximation One way of speeding up the algorithm is to compute the Hessian just once during initialization to obtain H(x ), and then rather than re-compute the Hessian in each iteration, use H(x ) as an approximation to H(x k ). We call this the Single Hessian 29

30 approximation. Although the convergence of such an approximation has not yet been mathematically proven, tests run on several head-and-neck cases for 5-beam and 7-beam plans show that the Single Hessian does in fact converge to the known optimal solution BFGS Hessian Update Another Hessian approximation is the Broyden-Fletcher-Goldfarb-Shanno (BFGS) Hessian update. The approximation to the Hessian in iteration k is B k, with B = H(x ). The update to the approximated Hessian in each iteration is where B k+1 = B k + q kq k q k p k B kp k p k B k p k B kp k, p k = x k+1 x k q k = φ(x k+1 ) φ(x k ) Note that this update ensures that B k is always symmetric and positive definite, so the Cholesky factorization can still be applied to obtain the step direction. This approximation also empirically converges to the known optimal solution for 5- and 7-beam head-and-neck cases Insignificant Beamlets Insignificant beamlets are those that bear little contribution to the quality of the FMO plan. Letting d denote the diagonal elements of the initial Hessian H(x ), the set of insignificant beamlets B I is defined as B I = { i : } d i max{ d } <.1 These beamlets are removed by removing the ith row and the ith column in H(x ) for every i B I, and then updating the number of bixels to the number of remaining bixels. The insignificant beamlets must be re-inserted into the solution x k in order to calculate 3

31 the voxel doses, objective function, gradient and Hessian, but the inversion of the Hessian is done to the Hessian with the bad beamlets removed, providing significant time savings Warm Start For the sake of theoretical accuracy, a truly optimal solution cannot have the bad beamlets described in Section removed. Without removing the bad beamlets a priori, the interior point method must be run for an impractical amount of time to obtain a near-optimal solution, say, ɛ =.1. The interior point method is typically started with a decision variable vector x equal to almost zero. If the algorithm were to be started at a point closer to the final solution, denoted x warm, time savings could be gained, allowing all beamlets to be considered in the interior point algorithm in a reasonable amount of time. Such an approach is a called a warm start. One difficulty in using a warm start with the interior point method is that a warm start solution may have some x warm i =, which is not allowed because the inverse of each x i must be taken. To correct this problem, any x warm i = is simply replaced with some very small value γ. Because these zero-valued variables are less important to the problem than nonzero variables, γ should be less than the minimum nonzero value of x warm. Let γ = min i=1,...,n {x warm i : x warm i > }. Then, γ = min{.1, γ}. x warm x i i / B I i = γ i B I An additional problem with warm starts in the interior point method is that the KKT variable vector s is unknown at the warm start point. Depending on the algorithm used to obtain the warm start, some information about s warm and µ warm, s and µ at the warm start point, respectively, may not be available. If no information is available about s from the warm start, then s =. If an interior point algorithm is used to obtain the warm start, then s warm is available. If the warm start did not include the insignificant beamlets, some corrections must be made to account for the insignificant beamlets which will be 31

32 optimized in the final solution. Let s be the initial s used in the interior point method after the warm start has been obtained. Then, s warm s i i / B I i = µ warm /γ i B I, where the value chosen for s i corresponding to insignficant beamlets arises from the general initialization s = µ(x ) Results The true Hessian, Single Hessian approximation, and BFGS update implementations of the primal-dual interior point algorithm are tested on six cases head-and-neck cases to obtain coplanar, equi-spaced 5-beam plans. The tests are run on a 2.33GHz Intel Core 2 Duo processor with 2GB of RAM. The method is tested for both leaving in and removing the insignificant beamlets, as well as the proposed alternative to computing the Hessian. The optimality of the interior point method solutions is verified by comparison to the known optimal solutions obtained by Java with CPLEX (ILOG). An acceptable duality gap must be determined in order to implement the interior point method. While we consider a duality gap of ɛ =.1 to be acceptably close to optimal, it may be unnecessary to achieve such a small duality gap to obtain a quality solution. A duality gap of.1 may be sufficiently small to ensure optimal solutions given objective function values using certain weighting parameters, depending on the parameters used in the FMO objective function, the value of the objective function may vary widely. Because of the potential range of values, a stopping criteria based on a relative duality gap rather than an absolute duality gap is preferable. Say the objective function value in an iteration is f. Define the relative duality gap in an iteration to be ɛ = ɛ/f. An examination of the relative duality gap necessary is presented in Section Computational results are presented in Section and clinical comparisons are provided in Section

33 2.6.1 How Small of a Duality Gap is Necessary? Because the run time of the algorithm is dependent on the required duality gap, it is desirable to only require the algorithm to achieve as small a duality gap as necessary to ensure a clincally good solution. The duality gap decreases quickly in the first few iterations, and then subsequently decreases by only a small amount per iteration, as shown in Figure 2-1A. If these iterations with only marginal improvements are found to be unnecessary in terms of clinical quality, significant time can be saved by stopping the algorithm once the duality gap is reasonably small, as opposed to waiting until the duality gap is very small. To check the importance of the duality gap, the FMO value and dose delivered to the targets and the saliva glands were plotted against the duality gap in each iteration using the true Hessian and without removing insignificant beamlets. For a representative case, the FMO values per duality gap are shown in Figure 2-1B. It is clear that the duality gap decreases rapidly in the first few iterations, but subsequent iterations yield increasingly smaller drops in the duality gap. Similarly, the amount of dose received by the targets and critical structures does not change significantly toward the end of the algorithm. Figure 2-2 plots the dose received by the two targets, PTV1 and PTV2, starting from a duality gap of.15%. The prescription doses are 7 Gy for PTV1 and 5 Gy for PTV2, common dose values used in the cancer clinic at Shands Hospital at the University of Florida. Neither the dose received by 95% of the targets nor the size of the hotspots and coldspots changes significantly in this duality gap range (Figure 2-2A). The hotspots are measured by the percent of the target receiving 11% and 12% of the prescription dose, while the coldspots are measured by the percent of the target receiving at least 93% of the prescription dose (Figure 2-2B). Figure 2-3 shows for two representative cases the amount of dose received by the saliva glands starting from a relative duality gap of.15%. Both cases show that the 33

34 Objective function and relative duality gap v. iteration x FMO value iterations Figure 2-1. The duality gap drops sharply in early iterations, but very slowly thereafter. The relative duality gap monotonically decreases after several iterations relative duality gap change in dose received by the saliva glands as the duality gap decreases is not clinically relevant. From these figures, it appears that a duality gap as large as.1% could provide clinically acceptable plans. Since the algorithm may terminate with a duality gap less than the one specified as the stopping criteria, a duality gap larger than.1% will also be tested for acceptability Computational Results Table 4-1 shows the average run times for each of the implementations of the algorithm. Relative duality gaps of.15%,.1%,.5% and.1%. are compared. The value of θ used to define the central path is.5. As expected, using the Single Approximation Hessian alternative with the insignificant beamlets removed is the fastest method, while using the true Hessian is the slowest method, regardless of whether the insignificant beamlets are removed. Interestingly, for large duality gaps, it is slightly faster to leave the insignificant beamlets in the model when using the true Hessian. Otherwise, it is faster to remove the insignificant beamlets. The final FMO values are displayed for each of the tested methods using a duality gap of.1, which is sufficiently small to ensure optimal solutions given typical objective function values (Table 2-2). For each case, the final FMO value is nearly identical, 34

35 dose (Gy) Target coverage at 95% PTV1 PTV2 Percent of target Target hotspots and coldspots PTV1 at 1.1 PTV1 at 1.2 PTV1 at.93 PTV2 at 1.1 PTV2 at 1.2 PTV2 at relative duality gap (%).1.5 relative duality gap (%) A B Figure 2-2. Dose received by targets as a function of the duality gap. A) The amount of dose received by at least 95% of each target is used to assess proper target coverage. B) The percent of each target receiving 11% and 12% of the prescription dose indicates hotspots, while 93% of the prescription dose indicates coldspots. dose (Gy) Saliva gland dose at 5% L. parotid gland R. parotid gland L. SMB gland R. SMB gland dose (Gy) Saliva gland dose at 5% R. parotid gland L. parotid gland R. SMB gland L. SMB gland relative duality gap (%) relative duality gap (%) Figure 2-3. The amount of dose received by at least 5% of each saliva gland remains relatively constant even for large duality gaps. Two representative cases are shown. 35

36 Table 2-1. Average run times for 5-beam treatment plans. Remove insig. Average run time (s) Hessian type beamlets? ɛ =.1 ɛ =.15 ɛ =.1 ɛ =.5 ɛ =.1 True no True yes BFGS no BFGS yes Single Approx. no Single Approx. yes Table 2-2. FMO value from using ɛ =.1. Remove insig. Hessian type beamlets? Case 1 Case 2 Case 3 Case 4 Case 5 Case 6 True Hessian no True Hessian yes BFGS update no BFGS update yes Single Approx. no Single Approx. yes indicating that the Hessian alternatives and the removal of the insignificant beamlets still provide for convergence to the optimal solution. The percentage increases in the FMO values using an absolute duality gap of.1 and relative duality gaps of.15%,.1%,.5% and.1% are shown in Table Clinical Results For each of the duality gaps tested, the DVHs of the solutions obtained using the Single Approximation Hessian with the insignificant beamlets removed are compared. Since the each of the interior point implementations obtains nearly identical solutions, it does not matter which implementation is used to produce the DVHs. As previously stated, the prescription doses used are 7 Gy for PTV1 and 5 Gy for PTV2, marked by a vertical line in Figure 2-4A. As saliva glands are the most difficult organs to spare in head-and-neck cases, the only critical structures shown are the saliva glands (Figure 2-4B). All other glands are spared in every implementation. The sparing criteria used for saliva glands is that no more than 5% percent of the saliva gland can 36

37 Table 2-3. Percent increase in objective function value from various relative duality gaps as opposed to an absolute duality gap of ɛ =.1. Remove insig. Avg. increase in obj. fn. (%) Hessian type beamlets? ɛ =.15 ɛ =.1 ɛ =.5 ɛ =.1 True no True yes BFGS no BFGS yes Single Approx. no Single Approx. yes Interior point method: Target DVHs Interior point method: Saliva DVHs Volume [Fractional] ε =.15% ε =.1% ε =.5% ε =.1% Volume [Fractional] ε=.15% ε=.1% ε=.5% ε=.1% Dose [Gy] Dose [Gy] A B Figure 2-4. Quality of DVHs for duality gaps ɛ =.1%,.5%,.1% and.15%. A) The target coverage is nearly identical. B) The saliva gland sparing for the different duality gaps is similar, but the solution for ɛ =.15% sacrifices one saliva gland. The sparing criteria is marked by a star. receive more than 3 Gy in order to be spared. This point is marked by a star in Figure 2-4B. Each of the duality gaps achieves good target coverage. While they each provide similar saliva gland dosage, the plan obtained using ɛ =.15% slightly surpasses the sparing criteria used for saliva glands Spatial Coefficient Results To assess the possible treatment plan improvement afforded by spatial coefficients, spatial parameters were tuned and then compared to treatment plans obtained without using spatial information. To demonstrate the spatial coefficients, Figure 2-5 displays the 37

38 Figure 2-5. The spatial coefficients used for two cases. coefficients used for two cases. In addition to tuning λ, µ and β to values of 1.7, -.32 and.77, respectively, a minimum spatial coefficient of.25 was also set for target voxels. By definition, the maximum value of a spatial coefficient is 1. These spatial parameters generally produce treatment plans of nearly identical quality to the best plans obtained without using spatial information, though with the added benefit of preventing misleading dose-volume histograms. In some cases, the spatial coefficients were able to outperform the non-spatial plans. Figures 2-6 and 2-7 illustrates two such cases. In Figure 2-6, the spatial coefficients yield improved target coverage and spare all saliva glands, as opposed to the non-spatial plan which only spares three of the four saliva glands. There is less dose outside the desired target in the plan using spatial coefficients. In Figure 2-7, the spatial coefficients reduce the amount of overdose in the primary targets. In this patient, both the spatial and non-spatial plans spare all saliva glands Warm Start Results Warm start solutions were obtained using the interior point method and the projected gradient algorithm (Nocedal and Wright [31]). The interior point method warm starts were tested with each Hessian possibility and a large duality gap of 2, both with and without insignificant beamlets removed. The projected gradient algorithm was tested using 38

39 Target DVHs: Non spatial Target DVHs: Spatial 1 PTV2 PTV1 1 PTV2 PTV1 Volume [Fractional] Volume [Fractional] Dose [Gy] Saliva gland DVHs: Non spatial Dose [Gy] Saliva gland DVHs: Spatial Volume [Fractional] left parotid gland left submandibular gland right parotid gland right submandibular gland Volume [Fractional] left parotid gland left submandibular gland right parotid gland right submandibular gland Dose [Gy] Dose [Gy] A B Figure 2-6. Comparison of spatial and non-spatial treatment plans. A) Non-spatial parameters result in slightly low target dosage and fail to spare one saliva gland. B) Spatial parameters allow for improved target coverage and spare all saliva glands. 39

40 Target DVHs: Non spatial Target DVHs: Spatial 1 PTV2 PTV1 1 PTV2 PTV1 Volume [Fractional] Volume [Fractional] Dose [Gy] Saliva gland DVHs: Non spatial Dose [Gy] Saliva gland DVHs: Spatial Volume [Fractional] left parotid gland left submandibular gland right parotid gland right submandibular gland Volume [Fractional] left parotid gland left submandibular gland right parotid gland right submandibular gland Dose [Gy] Dose [Gy] A B Figure 2-7. A) Non-spatial parameters result in slightly low target dosage and fail to spare one saliva gland. B) Spatial parameters allow for improved target coverage and spare all saliva glands. 4

41 several stopping criteria and without insignificant beamlets removed. It was observed that the projected gradient algorithm is fast enough that the time required to remove and re-insert the insignificant beamlets as necessary caused the algorithm to slow down. To be theoretically close to optimal, the interior point method used after the warm start has duality gap of.1 and no beamlets removed. The determine the how close the warm start solution is to the final solution, the percent improvement in objective function value the final solution obtains over the warm start is measured. To assess how close to optimality the final solutions using a warm start are, the percentage by which their objective function values are greater than the objective function value of a near-optimal solution is measured. Lastly, the decrease in run times over obtaining a near-optimal solutions are provided. These results for the interior point and projected gradient warm starts are displayed in Tables 2-4 and 2-5, respectively. From Table 2-4, it is clear that using an interior point warm start can provide significant time savings over the near-optimal solution times. There is also a significant increase in the FMO objective function value. From the amount of increase in the objective function value, the interior point warm start does not appear to converge to the optimal solution, and is unlikely to provide acceptable solutions. It is interesting to note that the improvement from the warm start solution to the final solution is very small. This indicates that KKT information obtained from the warm start and used in the final algorithm were unhelpful in improving the solution. For the projected gradient algorithm, once there is less than δ percent decreases from one iteration to the next, the algorithm terminates. Several δ values are tested. As with the interior point warm starts, the projected gradient warm starts also provided significant time savings, as shown in Table 2-5. The final solutions from the projected gradient warm start methods are nearly identical to the near-optimal solutions. The final interior point method also significantly improves the objective value of the warm start solution. This implies that despite not having KKT information about the warm start, the interior point 41

42 algorithm is still able to converge to the optimal, or at a least near-optimal, solution using the KKT value approximations and adjustments to the warm start vector described in Section

43 Table 2-4. Performance measures of interior point method warm starts. interior point warm start final interior point algorithm Improvement Increase Remove insig. Remove insig. over warm start in final Avg. time Hessian type beamlets? ɛ Hessian type beamlets? ɛ obj. fn. (%) obj. fn. (%) savings (s) True no 5 true no True yes 5 true no True no 5 BFGS no True yes 5 BFGS no True no 5 Single Approx. no True yes 5 Single Approx. no BFGS no 5 true no BFGS yes 5 true no BFGS no 5 BFGS no BFGS yes 5 BFGS no BFGS no 5 Single Approx. no BFGS yes 5 Single Approx. no Single Approx. no 5 true no Single Approx. yes 5 true no Single Approx. no 5 BFGS no Single Approx. yes 5 BFGS no Single Approx. no 5 Single Approx. no Single Approx. yes 5 Single Approx. no

44 Table 2-5. Performance measures of projected gradient method warm starts. interior point warm start final interior point algorithm Improvement Increase Remove insig. Remove insig. over warm start in final Avg. time beamlets? δ Hessian type beamlets? ɛ obj. fn. (%) obj. fn. (%) savings (s) no 1 True no no 5 True no no 1 True no no 1 True no no 5 True no no 1 BFGS no no 5 BFGS no no 1 BFGS no no 1 BFGS no no 5 BFGS no no 1 Single Approx. no no 5 Single Approx. no no 1 Single Approx. no no 1 Single Approx. no no 5 Single Approx. no

45 2.7 Conclusions The primal-dual interior point method is an effective algorithm for obtaining fluence maps that deliver quality treatment plans. The proposed Hessian alternatives appear to converge to the optimal solution, even when insignificant beamlets are removed. The removal of the insignificant beamlets provides significant time savings in all instances. The interior point method may also be run with a duality gap as large as 2 and still achieve quality treatment plans, thus decreasing the amount of time required to run the algorithm. Of the implementations tested, the fastest method that still provides quality solutions without using a warm start is to use the Single Approximation Hessian alternative, remove insignificant beamlets and employ a relative duality gap of.1%. When the interior point method is started with one of the warm starts discussed, time savings were again significant. Although the interior point warm starts generally provided more improvement in computation time than the project gradient warm starts, the final solutions using the projected gradient warm starts were much closer to optimality. The fastest and most effective warm start method is to use the projected gradient algorithm with δ = 5, followed by the interior point method with ɛ =.1% and the Single Approximation Hessian. This combination results in a near-optimal solution with an average total computation time of 8.32 seconds. 45

CHAPTER 3 BEAM ORIENTATION OPTIMIZATION 3.1 Introduction In a typical head-and-neck treatment plan, radiation beams are delivered from 5-9 nominally-spaced coplanar orientations around the patient.

46 CHAPTER 3 BEAM ORIENTATION OPTIMIZATION 3.1 Introduction In a typical head-and-neck treatment plan, radiation beams are delivered from 5-9 nominally-spaced coplanar orientations around the patient. These coplanar orientations are obtained from rotating the gantry only. As shown in Figure 3-1, several components of a linear accelerator can rotate and translate to achieve more orientations than those obtained from rotating the gantry. The available orientations consist of the orientations obtained from rotation of the gantry, collimator and couch, as well as the three translation directions of the couch. Figure 3-1. A linear accelerator and the available movements; the gantry rotation is highlighted. BOO is the problem of selecting from the available beam orientations the best set to use in delivering a treatment plan. Given a fixed set of beams, different fluence maps (radiation intensities of beamlets) yield treatment plans with different qualities. Thus, the quality of an optimized fluence map should be considered when selecting a set of beam orientations to use in a treatment plan. 46

47 3.2 Literature Review Many approaches have been taken to solve the BOO problem. Evolutionary algorithms (Schreibmann [29]) and variants of evolutionary algorithms, particularly genetic algorithms (Ezzell [32], Haas et al. [28], Li et al. [33]) have been employed. Li et al. [34] use a particle swarm optimization method, which is conceptually based on evolutionary algorithms. Bortfeld and Schlegel [35], Djajaputra et al. [36], Lu et al. [37], Pugachev and Xing [38], Rowbottom et al. [39] and Stein et al. [4] have all employed variations of simulated annealing to determine a beam solution. Söderstrom and Brahme [41] selected coplanar beam orientations using two measures, entropy and the integral of the low frequency part of the Fourier transform of the optimal beam profiles, both of which are based on the size and shape of the target structure. Soderstrom and Brahme [42] also use an iterative technique to determine the optimal number of coplanar beams required using BOO. Das and Marks [27] use a quasi-newton method. Rowbottom et al. [43] use artificial neural network algorithms to select beam orietations. Gokhale et al. [44] use a measure of each beam s path of least resistance from the patient surface to the target location to determine the best beam directions. Meedt et al. [45] use a fast exhaustive search to obtain a non-coplanar solution. The concept of beam s-eye view (BEV) has also been commonly used to approach the BOO problem (Chen et al. [46], Cho et al. [47], Goitein et al. [48], Lu et al. [37], Pugachev and Xing [38, 49, 5]). Despite the varying techniques to quantify the quality of a beam solution, it is widely accepted that the optimal solution to the FMO problem presents the most relevant measure (Bortfeld and Schlegel [35], Djajaputra et al. [36], Holder and Salter [51], Lee et al. [2, 21], Li et al. [33, 34], Meedt et al. [45], Morrill et al. [52], Oldham et al. [53], Rowbottom et al. [39, 43, 54], Schreibmann et al. [29], Söderstrom and Brahme [41], Stein et al. [4], Wang et al. [55, 56], Woudstra and Heijman [57]). Given this accepted measure of treatment quality, the shortcoming of the previous works is twofold. First, they predominantly only consider coplanar angles, and not necessarily even the entire 47

48 coplanar solution space, while those that do consider non-coplanar beams only consider a hand-selected subset of the available orientations. Second, the majority of the previous studies do not select beam solutions using the FMO problem as a model for determining quality; instead, the beam solutions are chosen based on scoring methods (e.g., BEV, path of least resistance) or approximations to the FMO. By not optimizing the beam solution with respect to the exact FMO problem, the BOO methods cannot guarantee convergence to an optimal solution. Of the previously cited works, only Das and Marks [27], Gokhale et al. [44], Meedt et al. [45], Lu et al. [37], Rowbottom et al. [39] and Wang et al. [56] consider non-coplanar orientations. This is likely due to the computational difficulties associated with the inclusion of non-coplanar orientations as well as the widespread belief that non-coplanar orientations do not improve the quality of a treatment plan. Also, of those works that addressed non-coplanar beams, Das and Marks [27] require that the beam distances be maximized, essentially requiring that beam solutions must be equi-distant and thus restricting the size of the solution space; Meedt et al. [45] only consider 3,5 beams (a minute subset of orientations available by rotation of the couch and the gantry); and Wang et al. [56] use only nine pre-selected non-coplanar beams. With the exception of Das and Marks [27], Haas et al. [28] and Schreibmann [29], the previous studies have based their BOO approaches not on a beam solution s optimal solution to the FMO problem, but on locally optimal FMO solutions or on various scoring techniques. Without basing BOO on the optimal FMO solutions, the resulting beam solutions have no guarantee of optimality, or even of local optimality. 3.3 Model Formulation The goal of radiation therapy treatment planning is to design a treatment plan that delivers a prescribed level of radiation dose to the targets while simultaneously sparing critical structures by ensuring that the level of radiation dose received by these structures is less than a structure-specific radiation dose. These two goals are contradictory if the 48

49 targets are located near critical structures. This is especially problematic for certain cancers, such as tumors in the head-and-neck area, which are often located very close to, for instance, the spinal cord, brain stem and salivary glands. In order to model the BOO problem, a quantitative measure that appropriately makes trade-offs between these contradictory goals must be developed. Let F (θ) be a black-box function that quantifies the quality of the treatment plan if radiation is delivered from beam vector θ = (θ 1,..., θ k ), where k is the user-specified number of orientations that may be used. F is formulated in such a way that the optimal plan yields the minimum function value. For k beams orientations to be optimized in the treatment plan, the vector of decision variables representing the beam orientations is defined as θ = (θ 1,..., θ k ) T. The decision vector θ is used as input into the black-box function F (θ) to determine the ability of the beam vector to deliver the prescribed treatment without unduly damaging normal tissue and critical structures. The BOO problem is then formulated as min F (θ) subject to θ h B h = 1,..., k, where B is the set of candidate beams. The candidate set of beams can be selected according to any user-specified criteria; for example, the beams can be coplanar or non-coplanar, continuous or discrete, or only represent a subset of the available beams. It is also possible to fix some beams and only optimize a subset of the total number of beams to be used. Theoretically, the linear accelerator is able to capture a continuous set of orientations, but due to machine tolerances, the actual beams delivered may not be exactly the desired beams. Therefore, it is common to only consider a discretized set of beam orientations. In our BOO model, the black-box function F (θ) is the convex FMO problem described in Section 2.3, thus ensuring an exact measure of the quality of each beam vector. Even though F (θ) is convex, this formulation of the BOO problem is fundamentally 49

50 nonlinear because the physics of dose deposition change with each beam orientation; that is, the effect of a beam on each patient can be drastically different than the effect of a neighboring beam. To illustrate the nonlinearity of the problem, Figure 3-2 shows the FMO problem as a function of just two coplanar beam angles. From this illustration, it is evident that the FMO function, particularly in higher, more realistic dimensions, is likely to also be multi-modal. Although the FMO problem itself can be solved quickly using the convex model presented in Section 2.3, in order to perform the FMO, lengthy calculations must be made in order to determine each candidate beam s effect on the patient. These calculations, described in Section 3.5, require 13 minutes per beam to calculate, and thus make each evaluation of the FMO problem expensive. Despite the time required for each function evaluation, the limiting factor in beam orientation optimization is the hard drive space required to store the beam data for each candidate beam. If the candidate set of beams is small, this data can be pre-computed and stored, allowing the FMO problem to be solved quickly in the BOO problem. But, if the candidate set of beams is large for example, consisting of non-coplanar orientations then the data cannot be pre-computed due to storage requirements. Because of these difficulties with the BOO problem, previous studies have been largely unable to consider the entire solution space of available beams. By using the response method, which is specifically designed to model expensive nonlinear black-box functions, we can iteratively identify promising beam vector solutions and generate beam data for these solutions on-the-fly, thus circumventing the issue of storage space and allowing for the consideration of all deliverable beam orientations. 3.4 Mixed-Integer Model Formulation As an alternative to the BOO model given in Section 3.3, if the set of beam orientations B is finite, the BOO and FMO problems can be formulated together and solved simultaneously as a mixed-integer linear or nonlinear program (D Souza et al. [58], 5

51 4 3 FMO value Angle Angle Figure 3-2. FMO value as a function of two angles. Ehrgott and Johnston [59], Ferris et al. [17], Lee et al. [2, 21], Lim et al. [6], Shepard et al. [22], Wang et al. [61]). The FMO formulation can be combined with BOO in the following model. Let y θ be a binary variable indicating whether or not beam θ B is used. If beam θ is used in the treatment plan, then all the beamlets in θ, B θ, are turned on ; that is, they can have positive fluences up to some pre-determined maximum intensity M. The simultaneous BOO+FMO MIP model is then minimize F (z) k subject to z js = h=1 x i My θ y θ k θ B x i y θ {, 1} D ijs x i i B θk j = 1,..., v s, s = 1,..., S i B θ, θ B i B θ, θ B θ B 51

52 In order to solve such a problem, all beam data must be pre-computed for every beam orientation. As described in Section 3.5, beam data requires a tremendous amount of time and space to compute and store. Because of this requirement, only a small subset of all possible beam orientations can be considered due to time and space constraints for a BOO+FMO MIP formulation. 3.5 Beam Data Generation For each beam orientation that is considered, lengthy calculations must be made to determine the beam s effect on the patient s tissue and organs. This includes determining in which structure each voxel lies, which voxels are hit by which beamlets and the amount of intensity of each beamlet is deposited in each voxel through which it passes. Beamlet dose computation models used in IMRT rely heavily on ray-tracing algorithms for voxel classification and determination of the radiological path (Fox et al. [62]). Voxel classification (Siddon [63]) establishes whether voxels are inside or outside the path of a radiation beam and classifies voxel centers as inside or outside of segmented targets and critical structures. The radiological path is the effective distance traveled by a beamlet when the effect of traveling through tissues of different densities is considered. The exact radiological path of a beamlet through the patient is required to correct for tissue heterogeneities in determining the dose deposition coefficients (Siddon [64]). Siddon s ray-tracing algorithms (Siddon [63, 64]) have been the standard methods used for ray-tracing in radiotherapy since the 198s. In Siddon s polygon and voxel ray-tracing algorithms for voxel classification (point-in-polygon testing), structures are represented as 3D polygonal objects, known as Siddon Prisms, and the signs of cross-products of rays passing through the polygons are used to determine whether a voxel lies inside or outside a structure. Despite its overwhelming use, Siddon s algorithm for polygon ray-tracing becomes very costly due to the number of voxels in a patient. Fox et al. [62] developed a novel approach to polygon ray-tracing that circumvents the need for cross-products by translating the polygon structure onto a coordinate system, replacing 52

53 the need for a cross-product by the sign of the second coordinate of each voxel in the coordinate system. In Siddon s algorithm for determining radiological paths (Siddon [64]), the radiological path must be determined for each voxel for every beamlet. This involves computations for millions of beamlet-voxel combinations. As reported by Jacobs et al. [65] a significant amount of computational time is required for these repeated calculations. Fox et al. [62] combine the incremental voxel ray-tracing algorithm presented by Jacobs et al. [65] with a method of virtual stereographic projection to significantly reduce the computational cost of obtaining radiological path lengths. Using their polygon translation and incremental ray-tracing algorithms, Fox et al. [62] achieve a 1-3 fold improvement in computation time over Siddon s point-in-polygon algorithm. Because of the significant reduction in computation time, these methods are used to generate beam data. Because these beam data calculations must be performed for each of millions of beamlet-voxel combinations, beam data generation is a lengthy process, requiring 13 minutes per beam using the algorithms described by Fox et al. [62]. In a typical FMO formulation, the beam vector is pre-determined and the beam data for the beam vector is calculated once and stored a priori. For a 5-beam case, this requires 15 MB of space to store. As with a typical FMO problem, in a simultaneous FMO+BOO mixed-integer programming (MIP) formulation, beam data for each of the candidate beams in B must be generated a priori. If candidate beams are considered only for coplanar angles on a 1 grid, that is, only every 1th angle, beam data would have to be computed for 36 beams, which requires 5 hours to compute and 8 MB of space to store. If we also wanted to consider the possibility of rotating the couch on a 1 grid in addition to the gantry, beam data would then have to be computed for 36 2 beams, which would require 17 hours and 6 GB of space for just one plan. 53

54 Clearly, the storage space requirements for each beam restricts the number of beams that can be considered in a simultaneous FMO+BOO MIP formulation. This issue is typically addressed by simply restricting the number of candidate beams in B. Lee et al. [2] restrict the set B to only contain 18 pre-selected beam orientations, which can be coplanar or non-coplanar. If only gantry and couch rotations are allowed on a 1 grid, a beam set of 18 beams comprises only a small percent of the available beam orientations. As more ranges of motion are allowed, this percentage falls even further. The inclusion of all possible beam orientations significantly increases the size of the solution space and could possibly allow for improved treatment plans, but the beam data for all orientations cannot be pre-computed. In order to consider these orientations, we use a method that allows us to generate the beam data on-the-fly only as necessary. 3.6 A Response Surface Approach to BOO The shortcoming of the previous works on BOO is twofold. First, they predominantly only consider coplanar angles, and not necessarily even the entire coplanar solution space, while those that do consider non-coplanar beams only consider a hand-selected subset of the available orientations. Second, the majority of the previous studies do not select beam solutions using the FMO problem as a model for determining quality; instead, the beam solutions are chosen based on scoring methods (e.g., BEV, path of least resistance) or approximations to the FMO. By not optimizing the beam solution with respect to the exact FMO problem, the BOO methods cannot guarantee convergence to an optimal solution. Of the previously cited works, only Das and Marks [27], Gokhale et al. [44], Meedt et al. [45], Lu et al. [37], Rowbottom et al. [39] and Wang et al. [56] consider non-coplanar orientations. Of these works, Das and Marks [27] require that the beam distances be maximized, essentially requiring that beam solutions must be equi-distant and thus restricting the size of the solution space; Meedt et al. [45] only consider 3,5 beams (a 54

55 minute subset of orientations available by rotation of the couch and the gantry); and Wang et al. [56] use only nine pre-selected non-coplanar beams. With the exception of Das and Marks [27], Haas et al. [28] and Schreibmann [29], the previous studies have based their BOO approaches not on a beam solution s optimal solution to the FMO problem, but on locally optimal FMO solutions or on various scoring techniques. Without basing BOO on the optimal FMO solutions, the resulting beam solutions have no guarantee of optimality, or even of local optimality. Because beam data generation is costly, a method that iteratively identifies only promising beam orientations is required. The response surface (RS) method is such an algorithm. In contrast to the previous studies, our approach to the BOO problem allows for the inclusion of all possible beam orientations which are measured according to the exact FMO problem, thus ensuring convergence to optimality due to the properties of the response surface method. The RS method is designed to efficiently model expensive black-box functions. In this application, the FMO solver is our black box and the set of beams to be used is the input. As in Aleman et al. [66, 67], we employ the response surface method as detailed in Jones [68] and Jones et al. [69] Overview of Response Surfaces The response surface method identifies promising solutions based on the performance of previous solutions. The function value and expected improvement over the current best solution of a certain point is estimated based on the function behavior learned from previously sampled points and their calculated objective function values. The function values of points are related by correlation functions that depend on each point s distance from the previously sampled points. From the correlation functions, the algorithm predicts the probability that the best solution will improve at unexplored points in the solution space. Using this probability, a promising solution is identified. For the BOO problem, 55

56 beam data only needs to be generated for these promising solutions, thus saving both computation time and storage space. the form The response surface method models the objective function as a stochastic process of F (θ) = µ + ɛ(θ), (3 1) where µ is a constant representing an average of the function F and ɛ(θ) is a random error term associated with the point θ. In the general case, the error terms between two points, say θ (1) and θ (2), are correlated by ( ) )) [ )] Corr ɛ (θ (1), ɛ (θ (2) = exp d (θ (1), θ (2), (3 2) where d(θ (1), θ (2) ) is a weighted distance measure between θ (1) and θ (2). Intuitively, if two points are very close together, the correlation between them will be close to one; similarly, if two points are very far apart, the correlation between them will approach zero. Jones et al. [69] propose the following weighted distance measure in general: ) d (θ (1), θ (2) = k h=1 c h θ (1) h θ(2) h p h, where the parameters c h and p h are weighting factors corresponding to the importance of each variable h and the smoothness of the function F in the direction of variable h, respectively. If small changes in variable h cause large changes in the function F, then c h should be large to reflect that two points with relatively small differences in the value of variable h should be far apart due to the large difference in their function values, and thus have a low correlation. The parameter c h can take on any value, whereas 1 p h 2, with p h = 2 corresponding to objective function smoothness and p h = 1 corresponding to less objective function smoothness. In the application to BOO, θ = (θ 1,..., θ k ) is the vector of k angles from which radiation will be delivered. Because no beam is more important than another beam, each beam orientation h contributes equally to the FMO function, so c h = c and p h = p for 56

57 all h = 1,..., k. To maintain tractability of the subproblems described in the following sections, the angles are treated as though they are points on a line rather than points on a circle and so a Euclidean distance metric is used to determine the distance between two points. The weighted distance measure for BOO is then ) d (θ (1), θ (2) = c θ (1) θ (2) p, (3 3) where p denotes the l p -norm. To ensure tractability of the subproblems described in Section 3.6.2, the value p = 2 is used. The idea of the RS method is to iteratively evaluate the true function F at certain beam vectors θ, and then construct the conditional stochastic process given these function values. This conditional stochastic process is then used to decide where to evaluate the function F next. Due to the time and space required to generate the beam data necessary to evaluate the function F, it is desirable to only evaluate points that will either improve the best solution with a significant probability or significantly increase our knowledge of the function. The optimization models to determine the next observation are described in Section Let θ (1),..., θ (n) be n previously sampled points. R n is the matrix of correlations between the previously sampled points, y n is the vector of function values F (θ (i) ) of the previously sampled points and ˆµ n and ˆσ n be estimators of the average and variance of the function F, respectively. The response surface algorithm is given by: p Initialization: 1. Choose values for the parameters c and p. 2. Choose an initial sample size, n, and a set of angles θ (i), i = 1,..., n. Evaluate the function F at each of these points, yielding the values y i, i = 1,..., n. Iteration: 1. Compute or update the values of R n, R 1 n, ˆµ n, ˆσ n, and F n, the minimum observed objective function value. 57

58 2. Determine the next point to observe using one of the methods described in Section and call this point θ (n+1). 3. Find the value y n+1 = F (θ (n+1) ), set n n + 1, and repeat Determining the Next Observation Because the function F is expensive to evaluate, we want to sample as few points as possible. Thus, in each iteration, an optimization problem is solved that determines the best next point at which to observe the true function F. Some of the optimization problems that have been proposed in the literature depend on the uncertainty of the predictor as a function of θ, as well as the expected improvement over the current best solution (Jones [68], Jones et al. [69]). Let r n (θ) be the vector correlations between θ and the n previously sampled points. The uncertainty is then given by [ where s 2 n(θ) = ˆσ 2 n 1 r n (θ) R 1 n r n (θ) + ˆσ 2 n = 1 n (y n 1ˆµ n ) R 1 n (y n 1ˆµ n ) [ 1 1 R 1 n r n (θ) ] ] 2, 1 R 1 n 1 is the estimator of the variance σ 2 n based on the n observations. The expected improvement, denoted I n (θ), is given by I n (θ) = s n (θ) [zφ (z) + φ (z)] (3 4) where z = ( F n ˆF ) n (θ) s n (θ) (3 5) and F n = min{y 1,..., y n } is the current best solution and ˆF n (θ) is the estimated function value of θ given the n previously sampled points. Φ and φ are the c.d.f. and p.d.f. of a standard normal random variable, respectively. 58

59 The selection of the next point will be based on selecting the point that maximizes either the uncertainty or the expected improvement, or a combination of both. Denote the beam vector to be chosen as the vector θ Maximizing the expected improvement Jones [68] and Jones et al. [69] recommend selecting the next point to sample as the point θ for which the expected improvement over the current best solution value, I n (θ), is largest. This corresponds to solving the following optimization problem: max I n (θ) subject to θ h B h = 1,..., k Although this is a difficult optimization problem, it can be solved using a branch-and-bound technique, but in order to do so, an upper bound on I n (θ) must be obtained. This can be done by solving for the expected improvement in equation (3 4) while substituting an upper bound on the uncertainty and a lower bound on ˆF n (θ), used in equation (3 5) to determine the value z. The method of bounding ˆF n (θ) is taken directly from Jones [68] and Jones et al. [69] and is not discussed further here. The method of bounding s 2 n(θ) is improved from the original formulation in Jones et al. [69] to overcome numerical instabilities, and is presented in Section The branch-and-bound algorithm used to maximize I n (θ) is described in Section Obtaining an upper bound on the uncertainty Due to the complexity of the s 2 n(θ) function, maximizing the uncertainty is a difficult problem to solve. It can be relaxed into a linearly constrained quadratic programming problem as follows (Jones et al. [69]). The resulting solution to the relaxed uncertainty maximization problem is an upper bound on the uncertainty that can be used in determining an upper bound on I n (θ) as described in Section Let r = {r 1,..., r n }, where r is a vector of decision variables independent of θ. By treating both r and θ as decision variables, a quadratic objective function is obtained. 59

60 Because r is now a decision variable independent of θ, an equality constraint must be added to the problem to ensure that r assumes the correct correlation values according to the correlation definition in equation (3 2). This constraint is nonlinear, but it can be relaxed by expressing the single equality as two inequalities ( and ) and then replacing the nonlinear terms generated by ln(r i ) and c θ θ (i) 2 2 with linear underestimators a i + b i r i and p i,h + q i,h θ h, respectively. The different types of linear estimators require different values for a i, b i, p i,h and q i,h, and are differentiated by a superscript c for the chord underestimators and a superscript t for the tangent line underestimators in the model formulation, denoted Problem s 2 -UB. Unfortunately, this relaxation provided by Jones et al. [69] can become numerically unstable if two sampled points are very close together. If such a situation arises, the bounds of the corresponding correlation value can become so close that due to round-off error, the lower bound r L i can become slightly larger than the upper bound r U i, resulting in infeasibility. To avoid such an instability, instead of bounding r i using constraints, the amount by which r i is outside of its feasible range is penalized by adding penalization terms wi L = min{, r i ri L } and wi U = min{, ri U r i }. This final formulation is given in Problem s 2 -UB. This formulation has only two more variables and two more constraints for each sampled point, so the increased problem size does not significantly increase the amount of time required to solve the problem. 6

61 PROBLEM s 2 -UB: Choose r and θ to [ min ˆσ 2 n 1 r R 1 n r + subject to (a c i + b c ir i ) + c [ 1 1 R 1 n r ] 2 1 R 1 n 1 ] + n ( ) w L 2 n ( ) i + w U 2 i i=1 i=1 k ( ) p t i,h + qi,hθ t h i = 1,..., n h=1 ( ) k ( ) a t i + b t ir i + c p c i,h + qi,hθ c h i = 1,..., n h=1 w L i i = 1,..., n w L i r i r L i i = 1,..., n w U i i = 1,..., n w U i r U i r i i = 1,..., n l h θ h u h h = 1,..., k Using the upper bound on the uncertainty provided by Problem s 2 -UB, the point yielding the maximum uncertainty is obtained by using the same branch-and-bound method described in , except that s 2 n(θ) is maximized rather than I n (θ). Alternatively, another approach would be to choose the next point based on maximizing uncertainty rather than the expected improvement. The branch-and-bound approach described in Section can be adapted to solve that problem rather than maximizing the expected improvement Branch-and-Bound A branch-and-bound method is used to determine the maximum expected improvement in each iteration. At some point in the algorithm, n points, θ (1),..., θ (n), have already been observed. The solution space is divided into regions based on these previously sampled points and consider each region as a separate subproblem. Each of these subproblems is solved using branch and bound. First, the upper bound on the uncertainty is determined as described in Section using the subregion s 61

62 lower and upper bounds on θ. Next, the lower bound ˆF L on ˆF n (θ) is determined using the method in Jones [68] and Jones et al. [69]. The upper bound on s 2 n(θ) and lower bound on ˆF are now used to determine an upper bound on I n (θ) over the current subregion by solving for I n (θ) substituting ˆF n (θ) = ˆF L and s n (θ) = s U as described in Jones [68] and Jones et al. [69]. In addition, the θ that yielded the maximum uncertainty can be used to evaluate the function I n (θ), yielding a lower bound on I n (θ) over the interval l h θ h u h, h = 1,..., k. This value is used to update the current best lower bound found (i.e., if the current best lower bound is less than the new lower bound found, the current best lower bound is replaced by the new one; otherwise, the current best lower bound is unchanged). If the upper bound is less than the current best lower bound, the subregion is discarded as not interesting. If the lower and upper bound are very close, we say that we have found the optimum over the current subregion. Otherwise, the upper bound is significantly larger than the current lower bound, so the subregion is further divided into subregions as described below and the procedure is repeated for each of the new regions. This is the branching step. At some point, there are no more subregions to consider, as we have either decided they are not interesting or have found the optimal solution for that subregion. Then, the algorithm terminates and the current best lower bound is the optimal solution for I n (θ) over the current region. This branch-and-bound procedure is applied to each of the regions, and the overall largest I n (θ) value is then the maximum I n (θ), and the corresponding θ is the next point at which to evaluate the FMO function. Selecting the subregions. An important component of the branch-and-bound algorithm is the method of selecting the subregions. The definition of these subregions, as well asl the order in which they are explored, can have significant impact on both the amount of time and memory required to perform the algorithm. As our implementation 62

63 of the branch-and-bound method requires that the entire solution space be divided into subregions before the branch-and-bound algorithm begins, the selection of these initial regions may also affect the speed of the algorithm. Initial regions. Before beginning the branch-and-bound process, the solution space of the decision variables, θ h [, 36] for all h = 1,..., k, is divided into a set of initial regions. If θ represents non-coplanar orientations, we consider two ways of selecting the regions defined by the non-coplanar orientations. First, we consider the entire solution space as the only region, that is, instead of dividing the solution space into several subregions, we only consider one subregion that encompasses the entire solution space (see Figure 3-4A). Second, denote a subset of variable indices H {1,..., k}. For each index h H, order the n previously sampled points increasingly by h. For each previously sampled point i = 1,..., n 1, consider the regions defined by l h = and u h = 36 for h / H, and l h = θ (i) h and u h = θ (i+1) h. Also consider the region defined by l h = and u h = 36 for h / H, and l h = and u h = θ (1) h. Similarly, consider the region defined by l h = and u h = 36 for h / H, and l h = θ (n) h and u h = 36. Figures 3-4A-3-4D illustrate the initial regions for different H values where k = 2. Denote the initial region set where H = as B (Figure 3-4A), H = {1} as B1 (Figure 3-4B), H = {2} as B2 (Figure 3-4C) and H = {1, 2} as B2 (Figure 3-4D). Note that in the coplanar case, it is only necessary to test the initial region scheme for one angle because the angles are interchangeable. Bounds for discrete and continuous variables. If θ is discrete, the points on the boundary between between the two subregions will be contained in both subregions, thus creating an inefficiency. This can be seen in Figure??, where θ (1) b is the point at which we branch and the blue line represents the division of the region into two subregions. The boundary line is contained in both the top interval and the bottom interval. This overlap can be avoided when θ is integral by adjusting the bounds between subregions in such a 63

64 36 Initial region scheme B 36 Initial region scheme B Couch angle 18 Couch angle Gantry angle Gantry angle A B 36 Initial region scheme B2 36 Initial region scheme B Couch angle 18 Couch angle Gantry angle Gantry angle C D Figure 3-3. Initial regions in the branch-and-bound algorithm. A) Initial regions with H = (B). B) Initial regions with H = {1} (B1). C) Initial regions with H = {2} (B2). D) Initial regions with H = {1, 2} (B3). 64

65 way as to prevent overlapping between any subregions. If the lower bound l h on θ h in a subregion is fractional, then we discard the non-integral solutions by setting l h = l h. Similarly, if the upper bound u h on θ h in a subregion is fractional, then u h = u h. If the l h and u h bounds are integral and l h = u h, overlapping is avoided by setting l h = l h 1 (see Figure??). If θ is continuous, the bounds cannot be adjusted. Branching scheme. The basic principle of the branch-and-bound method is to decompose regions into smaller subregions in such a way that as many subregions as possible can be discarded as uninteresting, leaving a reduced number of subregions that must actually be searched. The branch-and-bound method is a well studied problem, and as such, there are numerous methods of selecting the subregions. Regions may be divided into two equal subregions (bisection), or more generally, into multiple subregions which may or may not be equal in size (multisection) (Csallner et al. [7], Lagouanelle and Soubry [71]). Some other common methods include selecting only a subset of variables on which to branch (Epperly et al. [72]), using Langrangian duality to obtain lower bounds (Barrientos and Correa [73], Thoai [74], Tuy [75]) and applying decomposition algorithms (Phong et al. [76], Bomze [77], Cambini and Sodini [78]). In our branching step, we form the subregions based on some point in the region. The region is divided at this point along one of the indices. In Figure 3-4A, θ (1) b is the point at which we branch. We branch by dividing the region horizontally into two subregions at θ (1) b, taking into account the adjustments to the bounds described above so as to avoid overlapping regions. For k = 2, in each branching step, we alternately divide the region horizontally (along index 2) and vertically (along index 1) as shown in Figures 3-4B 3-4D. After branching horizontally once at θ (1) b region and select θ (2) b as shown in Figure 3-4B, we examine the top as the point at which we branch. We then branch by dividing this subregion vertically at θ (2) b. We proceed in the same manner for θ (3) b, where we branch horizontally, and so on until the convergence criteria is met. 65

66 In the general case, we divide the region into two subregions along the branching index while cycling through each of the indices h = 1,..., k sequentially. For the branching index h, the bounds for one new subregion are l h = l h and u h = θ b, h 1, and the bounds for the other new subregion are l h = θ b, h and u h = u h. The lower and upper bounds on the region for the remaining indices are unchanged for both new subregions, i.e. l h = l h and u h = u h for h h. In the non-coplanar case, a beam in θ may be represented by more than one index. For example, if a single non-coplanar beam consisting of couch and gantry rotation is optimized, the vector θ consists of θ 1 representing the gantry angle and θ 2 representing the couch angle. The branching index h {1, 2} represents branching on either the gantry angle or on the couch angle. If two such non-coplanar beams are optimized, then θ consists of θ 1 and θ 2 representing the gantry and couch angles of the first beam, respectively, and θ 3 and θ 4 representing the gantry and couch angles of the second beam, respectively. The branching index h {1, 2, 3, 4} then represents branching on a single component of a single beam. Accounting for symmetry. In the case where θ represents a set of coplanar beam angles, the ordering of the variables in θ is irrelevent to the FMO value obtained at θ. For example, if θ (1) = (1, 2, 3, 4) and θ (2) = (2, 3, 4, 1), then F (θ (1) ) = F (θ (2) ). Thus, it is redundant to consider both θ (1) and θ (2), and elimination of these redundant regions can greatly decrease the size of the solution space. For example, if we consider the two-dimensional case (k = 2), the solution space is a square region with θ 1 36 and θ The points above the line θ 1 θ 2 are equivalent to the points below the line, so we only need to consider one of these regions. Say we branch by splitting the region into four equal quadrants, as shown in Figure 3-5A. If we arbitrarily choose to only examine the points above the line θ 1 θ 2, then quadrant 4 can be eliminated. 66

67 Branching scheme Branching scheme u 2 u 2 Couch angle θ b (1) Couch angle θ b (1) l 2 l 2 l 1 Gantry angle u 1 l 1 Gantry angle u 1 A B Branching scheme Branching scheme u 2 u 2 θ b (3) θ b (2) θ b (2) Couch angle θ b (1) Couch angle θ b (1) l 2 l 2 l 1 Gantry angle u 1 l 1 Gantry angle u 1 C D Figure 3-4. Partitioning a region into subregions. A) Partitioning a region into subregions without accounting for overlap. B) Preventing overlapping regions. C) Regions after two branches. D) Regions after three branches. 67

A B Figure 3-5. Accounting for symmetry. A) Accounting for symmetry in 2D. B) Accounting for symmetry in 3D. In three dimensions, the solution space is a cube.

68 A B Figure 3-5. Accounting for symmetry. A) Accounting for symmetry in 2D. B) Accounting for symmetry in 3D. In three dimensions, the solution space is a cube. If we branch by splitting the cube into eight equal cubes, the region to be examined is shown in Figure 3-5B, where the origin is the back bottom left corner of the cube. From this figure, we can see that a sizable portion of the solution space can be discarded. In regions where there are both viable and redundant solutions (for example, quadrants 2 and 3 in Figure 3-5A), the addition of constraints requiring that θ 1... θ k in the problem of maximizing the expected improvement ensure that only the unique portion of the region is considered. If more than one non-coplanar orientation is optimized, a similar symmetry to the multiple coplanar orientation symmetry exists. Consider an implementation where two non-coplanar beam orientations are optimized, and these orientations are obtained from rotating both the gantry and the couch. Each beam is represented by two variables in the solution vector: one variable indicating the degree of gantry rotation, and one variable indicating the degree of couch rotation. Let θ 1 and θ 2 be the gantry rotation and couch rotation of the first beam, respectively, and θ 3 and θ 4 be the gantry rotation and couch rotation of the second beam, respectively. Then, the solution vector {θ 1, θ 2, θ 3, θ 4 } 68

69 is identical to the solution vector {θ 3, θ 4, θ 1, θ 2 }. Because the couch angle selected is dependent on the gantry angle (and vice versa), this symmetry can be exploited by only removing redundant solutions from one of the beam variables, that is, by requiring that θ 1 θ 3 (removing redundancy from the gantry angles) or θ 2 θ 4 (removing redundancy from the couch angles). In general, if d degrees of motion are used to obtain m beam orientations, and the linear accelerator motion variables are in the same order for each beam, then θ k θ k+d θ k+2d... θ k+(m 1)d for some k {1,..., d} Method of Obtaining the Next Observation The RS algorithm allows for two methods of selecting the next point to observe: by maximizing the expected improvement or by maximizing the uncertainty. In these tests, the point to observe is obtained by first selecting the point that maximizes the expected improvement until the maximum expected improvement falls below a certain threshold, and then switching to the point that maximizes the uncertainty. Once the maximum uncertainty also falls below a certain threshold, the algorithm terminates. By first selecting according to the expected improvement, the method quickly obtains a good solution. By then selecting according to uncertainty, theoretical convergence to the global minimum is ensured. 3.7 Neighborhood Search Introduction From Aleman et al. [79], we test the simulated annealing algorithm on the BOO problem, as well as existing and new variants of a greedy neighborhood search heuristic called the Add/Drop algorithm (see Kumar [8]) to obtain a good solution to the BOO problem. In each step of the Add/Drop algorithm, a beam in the current beam set is replaced by a neighboring beam that yields an improving solution. As with the simulated annealing implementation, we also apply our new neighborhood to the Add/Drop algorithm and compare its performance to a commonly used neighborhood structure. 69

70 3.7.2 Neighborhood Search Approaches Neighborhood search approaches are common methods of obtaining solutions to global optimization problems. For a vector of decision variables, a neighbor is obtained by perturbing one or more of the decision variables. A neighborhood for a particular vector of decision variables is the set of all its neighbors for a given method of perturbating the decision variable vector. A solution is considered to be locally optimal if there are no improving solutions in its neighborhood. Both deterministic and stochastic neighborhood search algorithms have been applied to a wide variety of optimization problems. A deterministic neighborhood search algorithm is one in which the entire neighborhood, or a pre-defined subset of the neighborhood, is enumerated in each iteration to find an improving solution. Stochastic versions of neighborhood search approaches, for example, simulated annealing, randomly select neighboring solutions in an attempt to find an improving solution in each iteration. For the BOO problem, we consider two neighborhood search methods. The first is a deterministic neighborhood search algorithm that finds a locally optimal solution, and the second is the simulated annealing algorithm, which, although based on neighborhood searches, provably converges to the globally optimal solution for certain neighborhood structures A Deterministic Neighborhood Search Method for BOO Deterministic neighborhood search methods are optimization algorithms that start from a given solution and then iteratively select the best point in the current neighborhood as the next iterate. The best point in the neighborhood can be found by complete enumeration if the neighborhood is small, or by optimization is the neighborhood is large or if objective function evaluations are expensive. Due to the complexity of the BOO problem, even when only a subset of available orientations is considered, we will focus on smaller neighborhoods and use enumeration. The neighborhood could alternatively be searched heuristically, for example by searching the neighborhood until 7

71 the first improving solution is found, rather than the best improving solution. If no improved solution can be found the current solution is a local optimum. In our implementation of the Add/Drop algorithm, a small neighborhood is desired for enumeration purposes. In each iteration, a neighborhood for just a single beam is considered. Say a beam set consisting of k beams is desired. Letting the neighborhood of a single beam θ h in θ be denoted as N h (θ), the Add/Drop algorithm is as follows: Initialization: 1. Choose an initial starting solution θ (). 2. Set θ = θ () and i =. Iteration: 1. Select h {1,..., k}, then generate θ N h (θ (i) ). 2. If F ( θ) < F (θ ), set θ = θ (i+1) = θ and set i i If all points in k h=1 N h(θ (i) ) have been sampled without improvement, stop with θ as a local minimum. Otherwise, repeat Step Neighborhood Definition In each step of the Add/Drop algorithm, a beam in the current solution is replaced with an improving beam in its neighborhood. Rather than define a neighbor as related to an entire beam vector, the neighborhoods of individual beams are considered. The neighborhood of a single beam θ h in θ is defined as N h (θ) = { (θ 1,..., θ h 1, θ mod 36, θ h+1,..., θ k ) } B k : θ h δ θ θ h + δ. In other words, the neighborhood of a beam is all beams within ± δ degrees taking into account the cyclic nature of the angles. The cyclicality of the angles refers to the fact that all angles can be represented by degrees in [,36]. For example, 4 = 4 and 1 = 26. The expression θ mod 36 captures this cyclicality. 71

72 Neighbor Selection The process of selecting a neighboring point in each iteration consists of two steps: selecting the index h to change and then selecting an improving angle in N h (θ) to replace θ h. If h is selected as i mod k + 1, the algorithm will cycle through each index sequentially, similar to a Gibbs Sampler (see, for example, Geman and Geman [81] and Gelfand and Smith [82]). The Gibbs Sampler also uses a similar two-step approach to generating a new point by sequentially generating a new value for each variable in turn. If h is selected randomly in each iteration, the resulting algorithm is similar to a Hit-and-Run method (see, for example, Smith [83] and Bélisle [84]), in which a variable to be changed is selected randomly, and then a new value for that variable is also selected randomly within a neighborhood. Once h is selected, the new value for θ h can be generated by enumeration or by a heuristic method. The Add/Drop algorithm compares the quality of the new solution to the current solution, and then only accepts improving solutions. This greedy approach results in a locally optimal solution Implementation The index of beam angle to be changed in each iteration, h in Step 1 of the algorithm in Section 3.7.3, is chosen as h = i mod k + 1 to cycle through each index in a sequential manner. In the Add/Drop implementation, once h is determined, θ in iteration i is chosen as θ = arg min θ Nh (θ (i) ){F (θ)}. By replacing each beam by the most improving neighbor, the Add/Drop algorithm is a greedy heuristic which terminates when there is no improving neighbor for any beam. A multi-start aspect is added by repeating the algorithm with multiple initial starting points. For example, one strategy to select starting points would be to select a random starting point according to a particular distribution. Another strategy would be to select an equi-spaced solution and rotate it a fixed number of times to obtain new starting points until the initial equi-spaced solution is repeated. Equi-spaced beam solutions are common 72

73 in clinical practice for an odd number of beams. The reason that such a method is not generally used in practice for even-numbered beams is that the resulting beam set would contain parallel-opposed beams (beams that lie on the same line), which are not used by convention as it is believed that the effect of a parallel-opposed beam is very similar to simply doubling the radiation delivered from a beam. If an equi-spaced solution is not possible given a beam set of k beams and the discretization level of the candidate beam set B, then the solution can be rounded so that θ () h B, h = 1,..., k Simulated Annealing The simulated annealing algorithm used is similar to the classical simulated annealing approach proposed in Kirkpatrick et al. [85]. The simulated annealing algorithm is based on the Metropolis algorithm, wherein a neighboring solution to the current iterate is generated, and if it is an improving point, it becomes the current iterate. Otherwise, it becomes the current iterate with probability exp{ F/T }, where F is the difference in FMO value between the current iterate and the newly generated point and T is the temperature, a measure of the randomness of the algorithm. If T =, then only improving points are selected. If T is very large, then any move is accepted, which is essentially a random search. The simulated annealing algorithm starts with an initial temperature T and performs a number of iterations of the Metropolis algorithm using T = T. Then, the temperature is decreased according to some cooling schedule such that {T i }. Obvious parallels can be drawn between the simulated annealing algorithm and the Add/Drop neighborhood search method described in Section While the Add/Drop algorithm deterministically searches the neighborhood for improving solutions, the simulated annealing algorithm randomly selects neighboring solutions. Rather than being limited by the ability to only move to improving solutions, the simulated annealing algorithm may still move to a non-improving solution with a certain probability, thus 73

74 allowing for the escape from local minima. The Add/Drop algorithm, on the other hand, is a greedy algorithm that is specifically designed to find local minima. The simulated annealing algorithm is essentially a randomization of the Add/Drop algorithm. In addition to the added randomness, the possibility of changing more than one beam in each iteration is allowed by selecting a set of indices H {1,..., k} to change, rather than just selecting a single index h. The simulated annealing algorithm is as follows: Initialization: 1. Choose an initial beam set θ () and calculate its FMO objective function value F. 2. Set ˆθ = θ (), ˆF = F, i =. Iteration: 1. Select H {1,..., k}, generate θ h H N h (θ (i) ), and calculate its FMO objective function value F. 2. If F < ˆF, set ˆF = F, F i+1 = F, θ (i+1) = θ and ˆθ = θ. Otherwise, set F i+1 = F and θ (i+1) = θ with probability exp{(f i F )/T i }. 3. Set i i + 1 and repeat Step 1. The simulated annealing algorithm has been previously applied to the BOO problem. Bortfeld and Schlegel [35] use the fast simulated annealing algorithm described by Szu and Hartley [86] which employs a Cauchy distribution in generating neighboring points. Stein et al. [4], Rowbottom [39] and Djajaputra et al. [36] also use a Cauchy distribution in generating neighoring solutions. Lu et al. [37] randomly select new points satisfying BEV and conventional wisdom criteria and Pugachev and Xing [38] randomly generate new points and then vary them according to an exponential distribution. All accept improving solutions, and with the exception of Rowbottom et al. [39] who only accept improving solutions (essentially T i = for all i), all accept non-improving solutions with a 74

75 Boltzmann probability. None of the previous BOO studies employing simulated annealing use the exact FMO as a measure of the quality of a beam set Neighborhood Definition Two neighborhood structures are explored. The first neighborhood is similar to that described in Section in that a neighborhood N h (θ) is considered for only a single beam index h {1,..., k}, just as in the Add/Drop method. As an extension to changining a single angle in each iteration, we also consider a neighborhood that involves changing all beams in each iteration, corresponding to H = {1,..., k} in Step 1 of the simulated annealing algorithm in Section This neighborhood is defined as N (θ) = k h=1 N h(θ). Again, the neighborhoods for the individual beams are defined as in the first method, with bounds of ± δ degrees Neighbor Selection The method of selecting a neighbor depends on the neighborhood structure as described in Section In the first method where only one beam is changed at a time, a neighbor is selected using the randomized approach described in Section Once h is selected, the probability of selecting a particular solution in N h (θ) where the new θ is d degrees from θ h is P {D = d}, where D is the realization of a random variable of some probability distribution defined on the interval [ δ, δ + 1,..., δ]. For the neighborhood N (θ) where all beams are changed in an iteration, the new value for each beam h {1,..., k} is generated from N h (θ) in the same manner described above Implementation In addition to basing our algorithms on the exact FMO solution rather than on heuristics or scoring measures, our simulated annealing approach differs from the previous studies in the distribution used to generate neighbors, the definition of the neighborhood, the cooling schedule and the number of iterations/restarts used. Not only do we use a new neighborhood structure, but also a geometric probability distribution rather than a 75

76 uniform or Cauchy distribution on the neighborhood. The geometric distribution is similar in shape to the Cauchy distribution in that they both can have fat tails depending on the choice of probability parameters. The fat tails of these distributions allow for points far away from the current solution to be selected as successive iterates, which potentially increases the likelihood of finding a globally optimal solution. The geometric distribution has the added attractiveness of producing discrete solutions, which is desirable for the BOO problem in which discrete solutions are preferred. By using the cooling schedule T i+1 = αt i with α < 1, the sequence of temperatures {T i } converges to zero as the number of iterations increases. In our approach, the neighborhood of a beam for both the N h (θ) and N (θ) neighborhoods is defined using δ = 18, that is, N h (θ) = B. By defining the neighborhood of each beam to be the entire single-beam solution space, the simulated annealing algorithm converges to the global optimum when using the neighborhood N (θ) defined in Section Though N h (θ) is large, each beam in N h (θ) is assigned a probability so that only the beams closest to θ h have a significant probability of being selected. Figure 3-7A shows the probability of replacing θ h with beams at varying distances using probability p =.25 for the geometric distribution. Note that the current beam cannot be selected as a replacement. As with the Add/Drop method, a multi-start aspect is added to the simulated annealing algorithm by repeating the algorithm using several different starting points Convergence Unlike many previously proposed simulated annealing algorithms, our algorithm converges to the globally optimal solution to the BOO problem under mild conditions. The following theorem summarizes these conditions. Theorem Suppose that H = {1,..., k} lim i T i = δ = 18 76

77 There is a positive probability of generating any solution in the neighborhood. Then our simulated annealing algorithm converges to the global optimum solution in the sense that lim F i = F i in probability where F is the global optimum value of the BOO problem. Proof. This follows from Theorem 1 in Bélisle et al. [87] A New Neighborhood Structure For the BOO problem, the neighborhood structure that is typically used for a vector of beam orientations is simply the collection of beam vectors obtained from changing one or more of the beams to a neighboring beam, where each beam has its own neighborhood N h (θ). In addition to N h (θ), we consider a new neighborhood which we call a flip neighborhood. The flip neighborhood of a beam index h consists of N h (θ) plus a neighborhood around the parallel opposed beam of h. The parallel opposed beam is the beam 18 away, that is, h = (θ h + 18) mod 36 The flip neighborhood can be defined as N F h (θ) = { (θ 1,..., θ h 1, θ mod 36, θ h+1,..., θ k ) B k : θ [θ h δ, θ h + δ] [ θ h + 18 δ F, θ δ F ] } Note that the values δ and δ F may be different. Figure 3-6 depicts a flip neighborhood for a beam located at degrees, the center of the top shaded wedge representing N h (θ), where θ h =. The motivation for the flip neighborhoods arises from the observation that many of the 3-beam simulated annealing plans generated using the regular neighborhood contained two beams very close to two beams in the optimal solution (obtained by explicit 77

78 enumeration), while the third beam was very close to the parallel opposed beam of the third beam in the optimal solution. Given this observation, it is intuitive that the inclusion of the neighborhood around the parallel beam should provide improved solutions. The neighborhoods N h (θ) and Nh F (θ) with varying δf values are applied to both the Add/Drop and the simulated annealing frameworks. For the geometric probability distribution used in the simulated annealing method, Figure 3-7B shows the probability of selecting beams at different distances using a flip neighborhood with probability p =.25. Note that the current beam cannot be selected as its own neighbor. Figure 3-6. N h (θ) (top shaded area) and N F h (θ) (top and bottom shaded areas) for θ h=..15 Geometric probability distribution for standard neighborhood p=.25.7 Geometric probability distribution for flip neighborhood p=.25.6 selection probability.1.5 selection probability distance from current beam A distance from current beam B Figure 3-7. Selection probabilities. A) N h (θ). B) N F h (θ). 78

Radiation therapy treatment plan optimization

Radiation therapy treatment plan optimization H. Department of Industrial and Operations Engineering The University of Michigan, Ann Arbor, Michigan MOPTA Lehigh University August 18 20, 2010 Outline 1 Introduction Radiation therapy delivery 2 Treatment