PROTON Computed Tomography (pct) [1] is a medical

Size: px

Start display at page:

Download "PROTON Computed Tomography (pct) [1] is a medical"

Blaze Carr
6 years ago
Views:

1 182 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 6, NO. 1, JANUARY-MARCH 2018 Scalable pct Image Reconstruction Delivered as a Cloud Service Ryan Chard, Student Member, IEEE, Ravi Madduri, Nicholas T. Karonis, Kyle Chard, Member, IEEE, Kirk L. Duffin, Caesar E. Ordo~nez, Thomas D. Uram, Justin Fleischauer, Ian T. Foster, Senior Member, IEEE, Michael E. Papka, Senior Member, IEEE, and John Winans Abstract We describe a cloud-based medical image reconstruction service designed to meet a real-time and daily demand to reconstruct thousands of images from proton cancer treatment facilities worldwide. Rapid reconstruction of a three-dimensional Proton Computed Tomography (pct) image can require the transfer of 100 GB of data and use of approximately 120 GPU-enabled compute nodes. The nature of proton therapy means that demand for such a service is sporadic and comes from potentially hundreds of clients worldwide. We thus explore the use of a commercial cloud as a scalable and cost-efficient platform for pct reconstruction. To address the high performance requirements of this application we leverage Amazon Web Services GPU-enabled cluster resources that are provisioned with high performance networks between nodes. To support episodic demand, we develop an on-demand multi-user provisioning service that can dynamically provision and resize clusters based on image reconstruction requirements, priorities, and wait times. We compare the performance of our pct reconstruction service running on commercial cloud resources with that of the same application on dedicated local high performance computing resources. We show that we can achieve scalable and on-demand reconstruction of large scale pct images for simultaneous multi-client requests, processing images in less than 10 minutes for less than $10 per image. Index Terms Cloud computing, proton computed tomography, medical imaging Ç 1 INTRODUCTION PROTON Computed Tomography (pct) [1] is a medical imaging modality based on tracking the change in trajectory and energy loss of protons as they pass through an R. Chard is with the School of Engineering and Computer Science, Victoria University of Wellington, New Zealand. ryan@ecs.vuw.ac.nz. R. Madduri is with the Computation Institute, University of Chicago and Argonne National Laboratory, Chicago, IL and the Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL. madduri@mcs.anl.gov. K. Chard is with the Computation Institute, University of Chicago and Argonne National Laboratory, Chicago, IL, USA. chard@uchicago.edu. I.T. Foster is with the Computation Institute, University of Chicago, Chicago, IL and Argonne National Laboratory, Argonne, IL and the Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL and the Department of Computer Science, University of Chicago, Chicago, IL. foster@anl.gov. M.E. Papka is with the Department of Computer Science, Northern Illinois University, DeKalb, IL and Argonne National Laboratory, Argonne, IL. papka@niu.edu. N.T. Karonis is with the Department of Computer Science, Northern Illinois University, DeKalb, IL, and the Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL. karonis@niu.edu. K.L. Duffin, C.E. Ordo~nez, J. Fleischauer, and J. Winans are with the Department of Computer Science, Northern Illinois University, DeKalb, IL. {duffin, cordonez}@cs.niu.edu, justin_fleischauer@hotmail.com, jwinans@niu.edu. T.D. Uram is with the Argonne Leadership Computing Facility, Argonne National Laboratory, Argonne, IL. turam@anl.gov. Manuscript received 24 June 2014; revised 30 June 2015; accepted 2 July Date of publication 16 July 2015; date of current version 7 Mar Recommended for acceptance by P. Corcoran. For information on obtaining reprints of this article, please send to: reprints@ieee.org, and reference the Digital Object Identifier below. Digital Object Identifier no /TCC object. pct imaging was initially developed as a method for acquiring high accuracy images for proton cancer therapy applications. pct systems provide a number of advantages over traditional X-ray Computed Tomography (xct) scanners, such as higher accuracy of electron density reconstruction, and lower dose for the same density resolution [2]. The enhanced accuracy in reconstructed electron density serves to improve the quality of care delivered to patients. It first allows physicians to develop more accurate treatment plans, thus sparing healthy tissue during treatment. It also allows health care providers to more accurately position patients during treatment sessions. Presently, a patient s position is verified just prior to receiving each treatment through the use of two-dimensional orthogonal projections. Position verification can be significantly improved by instead using the three-dimensional image produced by pct. In order to use pct imaging for position verification, images must be reconstructed in near real-time initial studies suggest within 10 to 15 minutes [1]. Due to the nonlinear path of protons through an imaged material, data reduction techniques cannot be applied to pct datasets in the same way that they can to other modalities such as positron emission tomography (PET) and xct. Thus, extremely large datasets must be processed in order to reconstruct an image. It is estimated that the ratio of total protons to total number of 1mm 3 voxels in a target should be greater than 100 to 1 in order to image the target [3]. This gives a conservative upper limit of proton histories for reconstruction size. Each history can be represented in 50 bytes, producing a dataset of 100 GB. Considerable compute resources are needed to reconstruct a dataset of this size in a ß 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See ht_tp:// for more information.

2 CHARD ET AL.: SCALABLE PCT IMAGE RECONSTRUCTION DELIVERED AS A CLOUD SERVICE 183 timely manner. For example, Penfold [4] reports that a reconstruction on a small 6 GB dataset of 131 million proton histories, using a single CPU and GPU, took almost seventy minutes. At this rate, two billion histories would require almost nine hours to process. In previous work we developed a high performance parallel, Message Passing Interface (MPI)-based pct image reconstruction code [5]. Using Gaea, a 60-node, GPU-enhanced, high performance computing (HPC) cluster at Northern Illinois University (NIU) we reduced the time required to reconstruct an image considerably. Running on 60 nodes, our parallelized, hardware accelerated software can reconstruct a two billion history image within seven minutes, and a small 131 million history image in less than 30 seconds. This work demonstrated for the first time the feasibility of using a pct scanner to provide near real-time images for clinical treatment. However, the use of an HPC cluster limits the applicability of our results. Clusters are expensive both to acquire and to maintain, making it impractical for many medical centers to perform real-time reconstructions of pct data. This limitation is unfortunate, as both the number of patients treated by proton therapy and the number of proton cancer treatment facilities have been on the rise [6]. As of December, 2012, over 90,000 patients have received proton therapy in almost 50 different medical centers, with nearly 40,000 patients receiving it since Over 10,000 patients were treated each year in 2011 and Approximately two thirds of those patients received treatment for prostate cancer, which requires 45 treatment sessions, and one third received treatment for some other form of cancer, each requiring three to eight treatment sessions. These figures conservatively place the current global demand for real-time pct imaging at over 1,200 images per day, and that demand is expected to rise. Commercial cloud resources represent a promising alternative platform for pct reconstruction. They have the advantage that computing resources can be obtained quasiinstantaneously, when required, and only paid for when in use. Furthermore, cloud providers are increasingly offering high performance and GPU-enabled nodes, capable of running high performance applications such as pct image reconstruction. But no one has previously explored the feasibility of using cloud resources for pct reconstruction. We describe in this paper an on-demand and scalable pct reconstruction service that combines our parallel pct reconstruction software with on-demand computing resources provided by the Amazon Web Services (AWS) Elastic Compute Cloud (EC2). Our contributions include investigation of the challenges related to deploying a data- and computeintensive service in the cloud; a novel architecture that includes scalable data transfer and elastic, on-demand resource provisioning; and a detailed evaluation of the application of this architecture under a range of real-world scenarios. Our results show that our implementation dynamically provisions, resizes, and removes GPU-enhanced clusters efficiently to fulfill workloads. We demonstrate that our service can compute billion-history reconstructions in under 10 minutes, for as little as $7 per image, thus meeting the goals specified. pct is not yet common in clinical practice and is currently undergoing a multidisciplinary research effort to develop a solution for widespread clinical adoption. Our work addresses one significant aspect of this investigation, that is the computational viability and cost efficiency of real-time pct reconstruction. 2 RELATED WORK Both the HPC and medical imaging communities have explored the use of public cloud resources [7]. However, uptake has been limited due to issues related to inter-node latencies, data privacy, cost, and other constraints. HPC applications often have vastly different quality of service requirements than e-commerce applications for which clouds were originally designed [8]. HPC applications can be extremely sensitive to bandwidth and latency variations, where small overheads can significantly affect performance. Comparisons between HPC applications on clouds and HPC resources using standard benchmarking suites, such as the NAS parallel benchmarks, have shown that network performance is a key limitation of HPC execution on clouds [9]. Concerns have also been raised with respect to the economic models employed by cloud providers, especially when moving and analyzing large scientific data [10]. Despite these limitations, the use of the clouds for scientific applications is growing rapidly. Lifka et al. [7] survey the use of clouds for research and education and report that cloud resources have been successfully adopted in many projects spanning over 25 scientific domains from science and engineering as well as humanities, arts, and social sciences. On-demand access to burst resources and support for high throughput scientific workflows were found to be two of the main reasons for adoption of cloud. The growth of cloud computing as a viable platform for science has also been proven via scientometic analysis [11]. With an emphasis on current trends toward Big Data and data analytics. There is also significant literature related to efficient execution of scientific workflows on clouds. For example, Sossa and Buyya [12] propose a resource provisioning and scheduling algorithm for minimizing execution cost while meeting deadline constraints. While much medical imaging cloud research focuses on the exchange and storage of images [13], [14], there is widespread belief that the use of cloud resources will become commonplace for medical image processing [15]. Kim et al. [16] use the CometCloud engine to integrate local and public cloud resources dynamically to facilitate image registration requests from various research groups on small EC2 instances. Parsonson et al. [17] create an image processing framework that exploits cloud resources for tasks such as volume rendering; however, unlike our work they focus on creating single-instance environments for multiple researchers and clinicians to access collaboratively. Bednarz et al. [18] present an image analysis toolkit that enables access to the cutting edge analysis tools on cloud resources. Rather than build services or leverage scalable infrastructure, the authors instead focus on providing accessible interfaces to pre-deployed software packages on a cloud VM. GPU and parallel programming techniques have been explored for medical image analyses in positron emission tomography [19], magnetic induction tomography [20], and

184 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 6, NO. 1, JANUARY-MARCH 2018 transmission tomography [21]. Our previous work is the first such approach for pct [5].

3 184 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 6, NO. 1, JANUARY-MARCH 2018 transmission tomography [21]. Our previous work is the first such approach for pct [5]. 3 PROTON COMPUTED TOMOGRAPHY Proton therapy is a form of radiation treatment that delivers highly directed, and localized doses of radiation to areas of interest. Proton therapy uses a beam of protons as agents, allowing for higher degrees of conformality than conventional external beam X-ray therapy. Protons also have little lateral scatter due to their mass and therefore proton beams can focus more precisely on a tumor with reduced sideeffects to surrounding tissues. Protons lose energy as they travel through a medium, therefore protons with a certain energy have a particular range and few protons will pass beyond this range. The rate at which energy is lost is related to the electron density of the medium being targeted. The energy lost through a medium can be quantified as the Relative Stopping Power (RSP), or the ratio between the stopping power of the target compared to that of water. When treating tumors at different depths the proton accelerator must generate beams with different energies. Thus tumors closer to the surface of the body require less beam energy than those deeper in the body. Efficient and accurate proton therapy is therefore dependent on the accuracy of the scan reconstructions used in treatment planning. As tumors and tissue change over time, patients require a number of scans to direct and fine tune the therapy. An initial scan is used to establish a treatment plan and subsequent scans are used for online position verification. Reconstructions of these scans have different priorities with varying deadlines. For example reconstructions for a treatment plan are often not required for several days, while reconstructions for online verification scans are needed immediately. Historically, proton therapy treatment plans have used pre-treatment X-ray CT scans to determine the RSP of a medium, which in turn is used to estimate the required beam energy. This process requires mapping of the Hounsfield units of X-ray CT scans to proton RSPs. However, this conversion is unique to each X-ray CT machine, requiring calibration and leading to uncertainties during treatment. The goal of pct is to establish the RSP of each target directly, using the same particles for imaging and treatment and therefore reducing uncertainty and increasing treatment efficiency. The concept of pct was first proposed by Cormack in 1976 [22] and has recently seen increased interest in the development of clinical pct devices [1], [23], [24]. In order to measure the RSP of a medium, pct typically employs a detection configuration, such as that shown in Fig. 1 [1]. This configuration enables the path and energy of individual protons, known as a proton s history, to be recorded as a broad beam is directed at the target. Each proton has a known input energy, and passes through two sensor planes before and after the target. The sensor planes collect the entry and exit positions, as well as the angle of a proton, allowing the trajectory to be estimated. The exit energy of each proton is then captured by the energy detector, enabling the calculation of total energy lost, and therefore electron density and RSP. The goal of pct is to reconstruct a map of these densities given a set of proton measurements. Fig. 1. The configuration of the NICADD/NIU pct detector. Protons pass left-to-right through sensor planes and traverse the target before stopping in the detector at the far right. This approach requires significant computation and many protons must be collected in order to generate statistically reliable measurements and therefore images. Furthermore as protons passing through different media travel in non-straight paths, the optimization techniques applied in other imaging modalities cannot be applied to reduce the data. 3.1 pct Reconstruction Our pct reconstruction code is a multi-stage and multi-process application which begins with each participating process reading a subset of proton histories into memory before performing a series of preliminary calculations. The preliminary calculations evaluate entry and exit data to filter out statistically abnormal histories. Typically 30%-40% of histories have either not successfully passed through the target or do not meet the statistical requirements for reconstruction and are removed from the dataset. The remaining histories are then rebinned according to their direction through the target and filtered back projection (FBP) is used to estimate an initial reconstruction solution. Solution bounds produced by FBP are used to create refined proton trajectories through the target, known as most-likely-paths (MLPs). The voxels of the MLP for each proton identify the non-zero coefficients in a row of the matrix representing the system of linear equations. The majority of the reconstruction execution time is spent computing the MLPs and iteratively solving the system of linear equations. Thus, it is advantageous to compute the MLP paths once and store the result in memory. In our pct code the MLP phase utilizes the GPUs of a compute node in order to improve parallelism and increase the performance of the reconstruction. However, the MLP phase can generate up to two terabytes of data for a two billion history image, requiring many nodes to store the MLP paths. The final stage of reconstruction is the iterative linear solver which uses a version of the string averaging algorithm, Component-Averaged Row-Action Projections (CARP) [25]. The initial distribution of data divides up the protons (and the corresponding rows of the linear system matrix) into blocks, one block per executing process. The result of the FBP is broadcast to each process and used as an initial solution. For each iteration of the linear solver, every process computes the rows of their block and updates the solution vector. A reduction across all processes computes an average solution for the iteration. The average solution is broadcast to each process and is used to start the subsequent iteration. A detailed description of the pct reconstruction workflow can be found in our previous work [5].

4 CHARD ET AL.: SCALABLE PCT IMAGE RECONSTRUCTION DELIVERED AS A CLOUD SERVICE PCT AND THE CLOUD Cloud computing has gained significant popularity in the last several years, by providing convenient, self-serviceable, and cost effective infrastructure and services to users. Cloud computing enables on-demand and elastic provisioning of virtualized computing resources. Commercial cloud providers employ utility computing models through which consumers pay only for the resources used. In this section we outline background information about the commercial cloud infrastructure that we use in this work as well as important considerations in the deployment and execution of our pct reconstruction software on the cloud. 4.1 Amazon Elastic Compute Cloud The Amazon Elastic Compute Cloud platform offers many different virtualized instance types to consumers. These instances are optimized for different scenarios (e.g., CPU, I/O or memory). Recently, EC2 has incorporated cluster computing instances aimed toward HPC computing applications. Importantly, these instances offer high CPU and memory as well as improved network performance between instances. Some enhanced instances also include GPUs. EC2 allows consumers to lease resources following two distinct pricing models; on-demand and spot. On-demand instances incur the standard advertised price for each instance type, allowing a user to request an instance at their convenience, pay the hourly rate, and release the instance at their discretion. Spot pricing provides a potentially discounted option to acquire resources. Users bid on excess instances and acquire them when their bid exceeds the market price of the instance. The market price varies with demand and is recalculated hourly. However, as the market rate increases, spot requests can be reassigned if the market price exceeds the bid. 4.2 pct Reconstruction Instances Types Prior to deploying the pct reconstruction software on EC2 we first established a mapping of application requirements to cloud instances. Based on our experience with our parallel, GPU-enabled, MPI application we identified requirements for high-cpu, high-memory, and GPU-enabled instances as well as low latency between instances. The only instance type that matched these requirements at the time of our study was the Amazon EC2 GPUenhanced cluster compute instance, termed CG1. CG1 instances include two Intel Xeon X5570, quad-core CPUs with hyperthreading, 22.5 GB of RAM, and two NVIDIA Tesla M2050 GPUs, each containing 3 GB of RAM. The cluster compute instances are connected by a high performance 10-Gigabit network. To run pct reconstruction on these resources, we deployed an Ubuntu Server instance on a CG1 instance and installed the appropriate drivers, tools, and dependencies. We then tested a small-scale version of the pct reconstruction software on a single instance. Due to the difference in architecture between the Gaea cluster and the Amazon CG1 instance, the reconstruction software could not be directly mapped to the cloud. Gaea has 60 nodes, each with 72 GB of RAM, two six-core Intel Xeon X5650 CPUs and two NVIDIA Tesla M2070 s with 6 GB of RAM each. Gaea nodes are connected by QDR Infiniband, a high performance, switched network fabric [26]. The key difference between Gaea and Amazon CG1 instances is the lower available memory in CG1. We therefore modified our pct reconstruction parameters to reduce the number of proton histories that can be processed concurrently per node. 4.3 Shared File System The pct reconstruction software requires a shared data source to access proton information for each of the working MPI processes. As the application requires on the order of 120 CG1 instances to process large images (based on memory requirement calculations for billions of proton histories) a high performance data storage model is required. We chose to use GlusterFS, an open source distributed file system that provides scalable and high performance access to files [27]. The GlusterFS model relies on one or more storage bricks (or servers) that allow client applications, in this case the pct reconstruction worker nodes, to mount the data source. To evaluate the use of GlusterFS we deployed a small (non-optimized) EC2 instance and evaluated its ability to satisfy the data access requirements when running the pct reconstruction software over different topologies. For a small number of instances the software performed as predicted (based on theoretical calculations); however, as the number of instances was increased to 120 a significant decrease in performance was observed due to a network bottleneck resulting in high latency between the worker nodes running on CG1 instances and the Gluster node running on a separate non-cluster instance. To resolve this problem we deployed the GlusterFS storage node on a co-located cluster compute instance with a 10-Gigabit network interconnect between the working cluster and the GlusterFS storage node. Even using co-located storage we found the data distribution phase of our pct reconstruction code to be significantly longer on the cloud than on our dedicated HPC cluster. Because the HPC cluster utilizes a dedicated QDR Infiniband connection and requires fewer MPI ranks, the distribution phase is negligible in comparison to execution. Thus, the pct code focuses on optimizing the execution phase by allocating each process a specific range of proton histories from each input file. However, due to an increased number of processes, decreased network performance, and the overhead of GlusterFS, parallel reads reduced the performance of cloud-based reconstructions. To overcome this problem, we modified the data distribution algorithm in our pct reconstruction code so that processes are allocated a sequential set of histories. 4.4 Data Upload/Download Transferring large datasets to and from the cloud can be challenging as we must ensure that bandwidth is maximized, data is transferred reliably, and in the case of medical images, securely. For the real-time pct reconstruction application that we consider here, we require a high performance transfer system that can move data quickly between distributed source nodes (at hospitals) and our reconstruction service (in the cloud). Importantly, we require reliable data transfer as corruption may result in incorrect reconstructions.

5 186 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 6, NO. 1, JANUARY-MARCH 2018 To address these requirements we chose to use Globus [28] to provide high performance, secure, and reliable third party data movement. Globus moves data between Globus endpoints, the name given to a resource on which a Globus agent is installed. Globus endpoints can be created by installing Globus Connect Server for multi-user environments or, for an individual user, the lightweight Globus Connect Personal client for Linux, Windows, or Mac. Globus handles all the difficult aspects of data transfer allowing a user to fire and forget data transfers. Globus automatically tunes parameters to maximize bandwidth usage, manages security configurations, provides automatic fault recovery, encrypts data channels, ensures that files are transferred reliably by matching checksums, and notifies users of completion and problems. 4.5 Cloud Images and Elastic Scale Out One advantage of cloud computing models is the ability to take snapshots of virtual machines that can be reused without requiring re-installation and reconfiguration of software. In Amazon, these snapshots are referred to as Amazon Machine Images (AMIs). We leverage this approach to provision pct worker nodes that are preinstalled with the pct reconstruction software and all of its dependencies (e.g., GPU/MPI drivers and libraries) and GlusterFS client software. The task of configuring a pct reconstruction cluster is therefore limited to setting appropriate configuration settings on each node, a process which we also automate. The pct reconstruction software requires seven instances to reconstruct a small, 131 million history, image. In order to perform larger tasks, such as an adult human head, we need to be able to easily and consistently launch entire clusters with potentially hundreds of nodes. To do so, we leverage the Amazon EC2 APIs to automatically provision an arbitrary number of instances of a specified type, with customized security policies, running the predefined AMI. Automatically scaling out clusters requires coordinated mechanisms to contextualize nodes and to manage the cluster. Rather than implement a new scheduler, we leverage Apache Mesos [29] to configure, resize, and terminate provisioned clusters. Due to application specific requirements of pct, such as the need for shared file systems and GPU drivers, we extended Mesos to facilitate additional contextualization functionality when deploying pct worker nodes. Our scripts use AWS APIs to request instances by specifying the instance type, security group, number of instances, and if requesting spot-instance, a bid price. Our provisioning tool deploys the customized pct AMI described above. The tool creates a cluster consisting of one master node and as many slaves as required for the reconstruction. It also monitors instances as they are started to ensure they are started correctly. Provided the instances are successfully provisioned a second contextualization tool is used to ensure that each node in the cluster is correctly assembled, and is capable of executing the MPI workload. This tool performs basic assembly functions, such as connecting the instance to the shared GlusterFS drive and ensuring the appropriate GPU devices are loaded. The tool also generates an MPI hostfile, specifying each node in the cluster and the Fig. 2. The pct reconstruction service. Hospitals request image reconstructions, transferring input data via Globus. The scheduler dynamically creates and manages HPC, MPI-capable, cloud clusters to service requests. Once reconstructed, the resulting image is asynchronously pushed back to the client via Globus. available execution slots. Once these tools have executed, MPI workloads can be deployed across the cluster. 5 PCT RECONSTRUCTION SERVICE We have created an on-demand and elastic pct reconstruction service that leverages elastic cloud resources to support time-varying workloads that may involve many concurrent reconstructions of different sizes. The service processes requests for pct image reconstruction by provisioning cloud resources in an on-demand fashion. The general deployment model relies on a persistent representational state transfer (REST) service hosted on AWS. Clients worldwide can connect to this single service to request reconstructions. A single instance is responsible for hosting the REST interface, managing data transfers and the shared file system, provisioning clusters, and scheduling reconstructions across these clusters. The service uses Globus for asynchronous upload and download of input datasets and reconstructed images. Fig. 2 shows the core components of the system. At the bottom of the figure, clients, representing hospitals or proton imaging centers, request image reconstructions from the pct reconstruction service. The service includes a scheduler that both schedules reconstructions and dynamically provisions and manages clusters as required. Requests submitted to the scheduler are created with a priority that represents the type of reconstruction (e.g., treatment plan or position verification). The priority determines whether the work must be immediately processed (which may require starting a new cluster), or whether the job can be cost effectively scheduled over existing infrastructure as it becomes idle. The resulting images are then pushed back to the requesting client. 5.1 pct Reconstruction Service The pct reconstruction service provides a machine-accessible REST interface and is hosted on a CG1 cluster

6 CHARD ET AL.: SCALABLE PCT IMAGE RECONSTRUCTION DELIVERED AS A CLOUD SERVICE 187 instance to ensure low latency connections to the working cluster nodes. The REST API includes functionality to enable clients (hospitals) to request processing of reconstruction workloads; manage and monitor the status of a reconstruction; and retrieve a computed reconstruction. The service includes a co-located Globus Connect Server endpoint for data transfer to and from the cloud. The service also includes a database that is used to maintain state relating to the available clusters and current and previous reconstructions. The pct reconstruction service includes a user interface (UI) for creation, monitoring, and management of reconstructions. An administration UI provides information regarding active clusters, clients, and existing reconstructions. Once a reconstruction is complete, details regarding its execution, such as the time required for each phase, are stored in the database and displayed through the UI. When a new reconstruction request is submitted, either programatically or through the UI, a record is created in the service database with the associated metadata (e.g., file transfer endpoint, priority, and name). The client s proton history dataset is then transferred to the service s Globus endpoint. The service creates a unique identifier for the data, stores this identifier in the database to associate the data with the reconstruction job, the service then monitors the Globus transfer for completion. Once complete, the reconstruction job is marked as ready to be scheduled. 5.2 Scheduler The pct reconstruction service includes an asynchronous scheduler that is responsible for creating and managing clusters, and then deploying and monitoring reconstruction jobs over them. Once a reconstruction is ready to be scheduled, it is queued for execution. The scheduler relies on predefined policies to determine reconstruction execution. By default, if no existing clusters are available and reconstruction is of high priority, a new cluster will be instantiated. Otherwise the reconstruction is added to a queue of low priority requests to be serviced when excess cluster time is available. We have designed the scheduler so that a wide variety of policies could be implemented. These policies provide a way to trade-off cost against compute time by managing how clusters are deployed and managed. For example, the stated quality of service goals for near real-time pct reconstruction requires responses in approximately 10 to 15 minutes. EC2 resources are paid for by the hour. Thus it is economically inefficient to create a new cluster for every reconstruction. The scheduler takes this information into account and only destroys clusters (minutes before the next billing hour) if they are no longer in use. Where possible, unused clusters are resized to satisfy the requirements of larger reconstructions to avoid constructing new clusters. Amazon EC2 also offers a number of different economic models for provisioning instances. Spot instances allow bidding on excess resources and potentially acquiring them at a significantly lower price than on-demand instances. Spot instances provide a trade-off between price and reliability; if a bid is exceeded, an instance can be destroyed without warning. Due to the unstable nature of spot instances, the practicality of launching entire clusters of spot instances is questionable, especially when reconstructions may have fixed deadlines. We provision the master of a cluster as an on-demand instance and remaining worker nodes as spot instances. However this approach is not without risk. For example, a recent increase in CG1 instance usage has caused the availability and price of spot instances to become increasingly volatile. Our scheduler implements policies that dictate the instances provisioned. We have found that a combination of approximately half spot and half on-demand instances achieves a good compromise between cost and reliability. Where possible, we leverage clusters composed of ondemand instances for high priority reconstructions. In the future, we aim to extend the scheduler to monitor spot prices to determine the volatility of the market. Using this approach the scheduler will then adjust the ratio of spot instances used in the cluster. 6 EVALUATION Our evaluation focuses on several areas: first, we quantify the performance of the pct reconstruction software on both cloud resources and a dedicated HPC cluster. Second, we investigate the performance of the reconstruction service, looking specifically at reconstruction time, transfer rate, and cost when reconstructing images of various sizes. Third, we study the performance of our pct reconstruction service when used for end-to-end, multi-client reconstructions. 6.1 pct Reconstruction We investigate the performance of the individual phases of the pct reconstruction on a pool of Amazon CG1 cluster instances and on the dedicated HPC cluster Gaea Input Reconstruction Data We use data collected at the Loma Linda University Medical Center on a phantom target object to evaluate the pct reconstruction software. Using a proton detector similar to that depicted in Fig. 1, we obtain a phantom dataset of 131 million proton histories. In order to scale the analysis to larger target areas we read the phantom dataset multiple times to create larger reconstructions. This is a fair reflection of pct reconstruction as the software operates at the individual voxel level and does not optimize calculations for repeated data. The pct software includes the ability to specify the number of times to process an input dataset. Reading 16 iterations of the phantom dataset provides approximately two billion proton histories, which is approximately our conservative upper limit of the number of histories generated when scanning a human head Cloud Reconstruction To evaluate reconstruction on cloud resources we created a number of clusters of different sizes and evaluated the time to compute reconstructions of varying numbers of proton histories. Each CG1 instance has eight physical cores (two hyperthreaded quad cores). We consider two different MPI configurations for the pct reconstruction software, one with Processes Per Node (PPN) = 2 and one with PPN = 8, where PPN defines the number of MPI ranks run on a single

7 188 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 6, NO. 1, JANUARY-MARCH 2018 Fig. 3. Performance for PPN = 2 and PPN = 8 on AWS. node. PPN = 8 is used in order to utilize each of the physical cores. Higher PPN values, which leverage hyperthreading, were not found to provide any significant advantage. Fig. 3a shows the time taken to reconstruct an image of a given size when PPN = 2. We see that performance improves as more nodes are added, especially for larger datasets. Due to the limited memory available within each instance, we can only reconstruct large datasets on large clusters. Therefore, we have gathered results over clusters consisting of up to 120 CG1 instances. Because such large numbers of instances are required, and as these instances are subject to public demand, multiple availability zones are required to ensure the number of instances can be acquired. For these tests, instances have been acquired evenly over two availability zones. However, it is important to note that this distribution may effect the latency between nodes. The PPN ¼ 2 configuration provides a baseline for which to evaluate executions using each of the available physical cores. In an attempt to optimize pct image reconstruction, we also performed reconstructions using PPN = 8. Fig. 3b shows the time taken to reconstruct an image of different size over various sized clusters at PPN = 8. We see slightly improved performance relative to PPN = 2, as individual nodes are used more efficiently. However, due to overheads associated with a far greater number of processes (and therefore the increased cost of data distribution), the execution time was found to be more variable than when PPN = 2. As above, restrictions related to the number of processes prevent us from reconstructing the largest, two billion history, datasets Dedicated HPC Cluster Reconstruction We now compare the performance of the pct reconstruction code on Gaea, a dedicated HPC cluster with 60 nodes, each with two GPU units and 12 physical cores. As each Gaea node has more physical cores than a CG1 instance, the pct software can be run at up to PPN = 12. Fig. 4 shows the time required to reconstruct different datasets for increasing cluster sizes on Gaea at PPN = 2 and PPN = 12. As on the cloud, there are only small differences between PPN = 2 and PPN = 12. However, with fewer MPI ranks, a more efficient shared file system and a dedicated Infiniband network connection, the data distribution phase is much faster than the cloud solution. Thus, PPN = 12 is more efficient for all reconstruction sizes, as opposed to the cloud case where PPN = 8 only becomes more efficient over large datasets Discussion The primary limitation with the MPI-based pct reconstruction software is the requirement for large amounts of memory. The software was developed for Gaea, where each of the nodes provide 72 GB of memory and GPUs have 6 GB. As the AWS instances have less than a third the memory Fig. 4. Performance for PPN = 2 and PPN = 12 on NIU s Gaea, from Karonis et al. [5].

8 CHARD ET AL.: SCALABLE PCT IMAGE RECONSTRUCTION DELIVERED AS A CLOUD SERVICE 189 (22.5 GB) and half the GPU memory (3 GB), clusters must be significantly larger on AWS than when running on Gaea. In order to fulfill a two billion history dataset, more than 100 instances are required on AWS whereas only 60 nodes are required on Gaea. Moreover, to achieve similar reconstruction performance, Gaea requires roughly one third as many nodes. This is approximately proportional to the ratio of CPU RAM between Gaea nodes and cloud instances. In addition, the reduced GPU memory has induced more staging which in turn also reduces the performance of the software. Network performance is another limitation to cloudbased reconstructions. Even when using cluster compute instances connected through a high speed network, network limitations, multi-availability zone deployments, poor GlusterFS performance, and an increased number of MPI ranks make data distribution considerably more costly than on dedicated infrastructure. We plan to investigate the use of more GlusterFS bricks to distribute workload, co-located bricks across availability zones, and optimized SSD storage instances to provide more efficient I/O. We also expect that Amazon will soon release cluster nodes with faster network connections, which will improve application performance. While these limitations reduce performance when compared with a dedicated cluster, our results demonstrate that large-scale pct images can be reconstructed in a timely manner on commercial cloud resources and well within our stated quality of service goals. We believe that such approaches offer significant advantages in terms of cost and parallelism, as many Amazon clusters can be created and used simultaneously, far exceeding the capabilities of our dedicated cluster. 6.2 pct Reconstruction Service The pct reconstruction service described in Section 5 provides end to end support for pct reconstructions. The service enables clients (hospitals) to upload and reconstruct images of different sizes and different priorities. The service then allocates these reconstructions over a dynamic pool of cloud resources. In this section we investigate the total reconstruction time including transfer and processing, and the cost of reconstructing images Reconstruction Time The total time required for a reconstruction can be determined by calculating the combined time necessary to transfer the input data to the service, reconstruct the image, and transfer the resulting image back. Fig. 5 depicts the total time required to reconstruct images for various sized datasets. For these results, we assume that a 120nodeclustercapableoffulfilling the request is operational, and idle, at the time of receiving the input dataset. We measure transfer time by transferring different dataset sizes between AWS and the University of Chicago using Globus. The figure highlights one key limitation of a dataintensive cloud service; that is, the ability to transfer data to and from the cloud efficiently. Our results show that transfer time is a significant component of overall reconstructiontimeandwhilesmallerdatasetscanbereconstructed within our stated goal (10 to 15 minutes), larger datasets can take almost an hour to process, as transfer time alone exceeds our goal. Fig. 5. The total time required to transfer and reconstruct a pct image. Each column includes the time taken to transfer datasets to and from the cloud service, as well as perform the reconstruction. The forecast time required for transfer and reconstruction when supported by a 1-Gigabit and 10-Gigabit network with 100 percent utilization are also shown. The figure also shows the total reconstruction time that we would predict if the service is supported by a high speed (1-Gigabit and 10-Gigabit) network with maximum network utilization. Based on the previously calculated execution times, we require a 2.5-Gigabit bandwidth connection between the client and the service to reconstruct a two billion history dataset within 15 minutes. With a 10-Gigabit connection, we would be able to reconstruct two billion history datasets within 10 minutes. Fortunately, Amazon offers a direct connect capability that enables the creation of private high speed connections (1-Gigabit or 10-Gigabit) between AWS and client applications. We expect that as pct technology becomes more sophisticated, fewer histories will be required to reconstruct a statistically accurate image. Other approaches, such as the data reduction technique proposed by Herman and Davidi [30] in which an object is scanned from only one side and therefore the data generated is halved, can also be applied to reduce data sizes; however, the authors also recognize that this approach could result in noise, masking the presence of tumors. There is also potential to optimize our reconstruction code to process streamed data. At present computation waits for all data to be uploaded; however, if a sufficient data rate can be sustained we could overlap data upload with computation to produce results more quickly. We aim to investigate these approaches as future work Transfer Rate We have seen that data transfer rate has a major influence on total reconstruction time. As transfer rates may differ significantly between locations we measure the time required to move different amounts of data between various centers and Amazon. We have selected endpoints at the University of Chicago (UC), NIU, and the National Energy Research Scientific Computing Center (NERSC) in Berkeley, California. These centers provide geographical distribution across the US and represent the types of locations that we would expect to use such a service. In each location we create a virtual machine with Globus Connect Personal and measure the transfer time to a Globus Connect Server running on a Amazon EC2 instance in the US East Zone.

9 190 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 6, NO. 1, JANUARY-MARCH 2018 Fig. 6. Upload (solid) and download (dashed) transfer rates between Globus endpoints at various sites and Amazon. Fig. 6 shows the result of uploading (solid lines) and downloading (dashed lines) various size files to and from Amazon. The rates are computed based on total elapsed transfer times, which include costs associated with ensuring reliability, managing security, and monitoring the transfer. For example, total time includes the computation and comparison of checksums between source and destination, and any re-transfers of files that are found to be corrupt. From a pct perspective, data upload is most important as reconstructed images are negligible in size relative to input datasets. As with the results above, the transfer rates represent a barrier for real-time pct reconstruction. Uploading a modest dataset (131 million histories, 6 GB) typically takes between two and three minutes from UC, NIU, and NERSC. The variability of the results can be in part explained by the tests being conducted over public networks with varying load. In order to minimize this effect, we recorded measurements multiple times. The download rate from AWS to UC is significantly lower than any other center, a result that we attribute to restrictions placed on the data center in which our VM was housed Cost Due to the scale of resources required to perform reconstructions, the cost of provisioning AWS clusters is significant. Each GPU-enhanced cluster compute instance has an on-demand price of $2.10 per hour, meaning an entire 120 instance cluster costs $252 an hour. The largest envisioned images, consisting of two billion histories, require approximately nine minutes to reconstruct. Thus six of these reconstructions can be completed within an hour, making the price per reconstruction $42, assuming sufficient demand. As datasets vary significantly in size, smaller and cheaper clusters can be used to service many smaller reconstruction requests. Spot instances can provide significantly lower prices than that of on-demand instances. During our initial evaluation in 2013 the average spot price for CG1 instances was $0.34; we found that a bid price of $0.40 provided high reliability when provisioning large clusters and that a 120 instance cluster could be provisioned for approximately $50 an hour. Two billion history reconstructions could therefore be performed for less than $10 each. However, due to an increase Fig. 7. Per-image PPN = 2 reconstruction costs for various datasets, when using clusters of different sizes that are made up of either entirely on-demand (solid) or entirely spot (dashed) instances. in demand in 2014, the price for spot CG1 instances is now typically over $1 and can sometimes even exceed ondemand prices. In addition, the volatility of spot instance prices has increased, making clusters comprised of only spot instances increasingly unreliable. We have recently adopted a hybrid approach with half spot and half ondemand instances, this approach results in 120-instance clusters costing approximately $200 an hour. Fig. 7 shows the projected cost for reconstructing various size datasets under different cluster configurations with both on-demand (solid line) and low spot price (dashed line). The projected spot prices are based on our initial experiments in which spot prices were regularly at $0.34 per hour. Interestingly, the cheapest reconstructions tend to use smaller clusters. This is indicative of a slight trade-off between time and cost, and the lack of perfect linear scaling (as illustrated in Fig. 3). These results show that small reconstructions (approximately 500 million or less histories) can be conducted for under $10 using on-demand ($2.10 per hour) instances, while large reconstructions (e.g., two billion histories) can also be executed for under $10 using spot instances. It should be noted that the specialized GPU enhanced cluster instances used by our service are only offered in two Amazon regions (only one in the US) which removes the need for complex provisioning approaches at present. While limiting execution to a single region may affect cost and performance, in related work we have developed cost-aware provisioning techniques that could be used to further reduce costs [31]. While it is difficult to compare these costs accurately to that of using a dedicated HPC cluster we here present estimates based on the cost of Gaea. Costs associated with a dedicated cluster are in two categories: the upfront cost of the resources and the operational cost. The initial cost for a cluster of that size is approximately $800,000 and thereafter can cost an institution over $150,000 annually in staffing, contracts, and other operational expenses. Moreover, these estimates do not include the cost to replace or upgrade the cluster which will likely have to occur every five years or less. Even considering only the conservative annual cost a center would need to process over 15,000 reconstructions annually to make the cost comparable to our cloud-based solution.

10 CHARD ET AL.: SCALABLE PCT IMAGE RECONSTRUCTION DELIVERED AS A CLOUD SERVICE Workflow Simulation In order to evaluate the ability of our pct reconstruction service to elastically respond to and service job requests, we have developed and tested a simulation workflow. We use a Python client application to simulate requests from a single location. Using several of these clients we are able to study the performance of the service when multiple simultaneous requests are submitted. Each client creates reconstruction requests and transfers the input proton history dataset to the pct reconstruction service. Clients are constructed with a name and associated Globus Connect endpoint from which datasets are transferred. Each reconstruction request includes a size and priority based on the workload. The reconstruction description is sent to the pct reconstruction service, and a Globus transfer is initiated to move the input dataset to the shared data store. When the reconstruction is complete, a second Globus transfer returns the resulting image to the client s endpoint. To reduce experiment cost, simulation jobs are created with a size of between 10 and 50 million proton histories. The service, in turn, creates and resizes clusters based on the clusters queue lengths preferring first to resize an existing cluster before creating a new one. We also restrict the scheduler to assume only 10 million histories can be processed by an individual instance, meaning the size of a job determines the minimum number of instances a cluster must have to successfully execute a job. Fig. 8a shows the simulation workload and time spent on each reconstruction job. It illustrates when jobs are created, which client creates them, and how long each job takes to complete. Fig. 8b shows the sizes of the clusters that are created by the reconstruction service to satisfy requests. We see that four clusters are created over the duration of the simulation, two clusters are resized to service larger jobs, and idle clusters are shutdown before their next billing cycle. Fig. 8c shows the total time spent for each request to be completed. An interesting feature of this graph is the representation of the differing queue time for each job. Longer queue times are caused by backlog in the system, where a new cluster must be started to service an influx of job requests. Once another cluster is operational, subsequent jobs are serviced more efficiently. In our simulation Cluster 1 ran jobs 1-9, 11, 14, 17, 19, and 20; Cluster 2 ran jobs 10, 13, 15, 16, and 18; Cluster 3 ran job 12; and Cluster 4 ran jobs DISCUSSION AND FUTURE WORK We briefly discuss important challenges identified in deploying and evaluating the pct reconstruction service, and present potential areas for future work. 7.1 File System Bottlenecks In any data-intensive application, in which many nodes simultaneously access shared data, file system access may become a bottleneck. Our approach, using GlusterFS, scaled adequately to the 120 node clusters we deployed. However, we also found that performance degraded when additional processes per node were used. Designing an appropriate shared file system is therefore an important challenge for Fig. 8. The result of simulating multiple clients requesting several reconstruction jobs over the pct reconstruction service. (a) shows the simulation workload in terms of when requests are submitted, by which client, and how long they take to be satisfied; (b) shows the history of the four clusters that the service creates and resizes to service waiting jobs; (c) shows the time consumed by each phase of the submitted reconstruction jobs. such applications. Our initial experience striping GlusterFS over multiple CG1 instances proved to be both costly and ineffective as it did not improve performance for our workload, presumably due to the lack of requirement for further blocks. Moreover, as demand for CG1 instances grows, we increasingly see the need to deploy clusters that span multiple availability zones, which in turn requires deployment-aware data distribution. In future work we aim to investigate whether replicated GlusterFS bricks on each availability zone can reduce the setup time and improve the overall performance of the application when deployed in such configurations. Discussions with Amazon developers suggested that storage-optimized I2 instance could be used to host GlusterFS to provide improved file system access. I2 storage include solid state drives to store data with high IO requirements; importantly, they also share the same high speed interconnect used by CG1 instances. We plan to investigate performance when using I2 instances to host the pct file system, as they are more cost efficient than a CG1 cluster instance and may reduce bottlenecks associated with file system access.

11 192 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 6, NO. 1, JANUARY-MARCH Cloud Deployment New instance types and high speed network connections introduced by Amazon may provide yet more benefits to our pct reconstruction service. For example, a new GPUenabled instance type, G2, is designed to facilitate streaming capabilities within the cloud. Although these instances are significantly less computationally powerful than CG1 instances, they are much cheaper and could potentially be employed to reduce the cost of pct reconstructions. Data transfer rates are a severe limitation of deploying our current implementation in production. Our results show that high speed networks between proton therapy treatment facilities and an operating pct reconstruction service are required before such a service would be plausible for large reconstructions. Our future work will investigate the use of Amazon s direct connect capabilities as a model for providing 1-Gigabit to 10-Gigabit connections to the pct reconstruction service. We have applied leading commercial approaches to the design of the pct reconstruction service to ensure that the service can be deployed in a highly available and reliable model. For instance, where possible we leverage reliable cloud-based services such as RDS to host databases and EC2 to host stateless service instances. Thus, using multi-availability zone deployments and elastic load balancers (ELB) to allocate requests across a pool of running services we can ensure a high degree of availability. Moreover, using Amazon health checks and services such as CloudWatch, we can monitor running instances, generate alerts, and automatically remove and replace unhealthy instances. 7.3 Pricing Strategies We have shown that cost-effective pct image reconstruction can be achieved via on-demand provisioning of public cloud resources. We note, however, that for large-scale image reconstructions to be practical, a high demand for such a service is required to amortize fixed overhead. These fixed overheads include, in our current implementation, a persistent CG1 instance that is used to to maintain the shared file system and facilitate data transfers; they could be reduced by using smaller and I/O-optimized instances, but cannot be eliminated. In an attempt to reduce cluster operation costs, we have developed policies to shut down idleresourcespriortoan upcoming billing cycle. It may also be possible to make more effective use of spot instances. For example, Poola etal.[32]describeanapproachinwhichspotinstancesare used until a specified slack time is exceeded, at which time a workflow is migrated to more costly on-demand instances in order to fulfill obligations. This approach could be applied to the pct reconstruction service by incorporating service level agreements on image reconstructions, and extending our model with various costing strategies. We also require better algorithms and mechanisms to predict spot price volatility and provision instances by bidding appropriately and to trade off cost and availability. In future work we plan to further evaluate the use of a combination of on-demand, spot, and reserved pricing models to establish a cost-efficient cloud service for on-demand HPC workloads. Finally, in order to operate the pct reconstruction service commercially we must develop a billing model in which users are charged for the resources consumed. We aim to leverage billing capabilities we have developed for the Globus Genomics [33] service for this purpose. The Globus Genomics model charges users for the cumulative costs associated with each analysis job as well as a fixed subscription for using the service. 7.4 Privacy on Clouds Image reconstructions may, depending on target, contain identifiable information such as human faces. Privacy of medical information in the US is governed by the Health Insurance Portability and Accountability Act (HIPAA) which places technical and non-technical restrictions on accessing and managing personal health information. There are two approaches to compliance when analyzing and reconstructing images: 1) anonymization or 2) analysis on HIPAA-compliant infrastructure. Anonymization of images includes both phenotype information (e.g., subject name, age, and address) and, in the case of neuroimages, removal of face images. It has long been noted that storage and processing of identifiable health information on commercial clouds is not possible due to data privacy requirements. However, this situation is changing rapidly. Cloud providers such as AWS now offer a number of compliance frameworks including HIPAA, and are also able to sign Business Associate Agreements (BAAs) necessary for HIPAA compliance. Moreover, issues of compliance and privacy are active research areas for the entire cloud community and we have seen a variety of new approaches for providing secure storage and computation on clouds [34]. Our pct reconstruction service currently requires that clients explicitly remove all directly identifiable information. Before deploying the service for clinical use we will investigate best practice approaches with respect to operations and security. For this purpose we expect to leverage our experiences operating the Globus and Globus Genomics services. We also expect to leverage AWS HIPAA-compliance to provide a secure and compliant reconstruction service for clinical use. Finally, as part of our operating procedures we will develop a comprehensive threat model prior to to making the service generally available. 8 CONCLUSION Real-time pct image reconstruction represents an exciting new approach to providing precise proton imaging immediately before proton therapy. Our parallel GPU-enabled software can produce large image reconstructions in a matter of minutes. We have shown in this paper that commercial cloud services can enable a cost-effective alternative to dedicated HPC infrastructure. We demonstrated, that specialized EC2 cluster instances can provide on-demand and highly scalable reconstruction. Our results show that large scale reconstructions can be performed within minutes using up to 120 GPUenhanced cluster compute nodes. While dedicated HPC clusters can achieve similar reconstruction performance with

12 CHARD ET AL.: SCALABLE PCT IMAGE RECONSTRUCTION DELIVERED AS A CLOUD SERVICE 193 half the number of nodes, the extra flexibility of cloud platforms and the ability to satisfy sporadic usage requirements at low cost may be advantageous in many settings. Our results also reveal some limitations of currently available cloud instances for pct reconstruction: in particular, the reduced network performance when compared with our dedicated cluster. We show that enhancements such as using instances with higher quantities of RAM and faster networks, as well as software and deployment improvements such as tuning shared file systems can improve reconstruction performance. In order to demonstrate that such capabilities could be offered to a wide community, we have constructed a scalable end-to-end reconstruction service that provisions cloud clusters dynamically to meet demand. This service shields users from the complexities involved in configuring and maintaining the complex MPI- and GPU-enabled reconstruction code, allowing them simply to submit requests through a simple REST interface. The service handles dataset upload, the prioritization and scheduling of workloads, the creation and resizing of cloud clusters, and the return of results to clients. Furthermore, by using spot instances, the service can compute two billion history reconstructions in under 10 minutes for less than $10. The different instance pricing models supported by commercial cloud providers, plus the different provisioning, scheduling, and execution options, suggest exciting new strategies for optimizing reconstruction time and cost. The service is not yet commercially available. The biggest limitation that prohibits immediate use of our pct reconstruction service for real-time imaging is data upload overhead. Our experiments indicate that for 100 GB datasets, upload time significantly exceeds reconstruction time. Amazon s high-speed network capabilities provide a possible solution to this problem. Nevertheless, even with low bandwidth links, realistic reconstructions up to half a million histories can be uploaded and processed within 10 to 15 minutes. ACKNOWLEDGMENTS This work was supported by Amazon.com, Inc., the US Department of Defense contract no. W81XWH , and the US Department of Energy contract no. DE-SC The authors thank David Pellerin, Steve Elliott and Jamie Kinney from Amazon for their continued support; Keith Schubert and his students Scott McCallister and Micah Witt for conversations on hardware accelerated computing; Gabor Herman, Yair Censor, Ran Davidi, and Joanna Klukowski for many and valued discussions of pct mathematics; and Ford Hurley and the Loma Linda University Medical Center for sharing the LUCY phantom proton history data. Finally, they thank Reinhard Schulte and Scott Penfold for their collaboration and insight into understanding pct and their work. They especially thank Dr. Schulte for his comments on this manuscript. REFERENCES [1] R. Schulte, V. Bashkirov, T. Li, Z. Liang, K. Mueller, J. Heimann, L. Johnson, B. Keeney, H. F.-W. Sadrozinski, A. Seiden, D. Williams, L. Zhang, Z. Li, S. Peggs, T. Satogata, and C. Woody, Conceptual design of a proton computed tomography system for applications in proton radiation therapy, IEEE Trans. Nuclear Sci., vol. 51, no. 3, pp , Jun [2] V. Bashkirov, R. Schulte, G. Coutrakon, B. Erdelyi, K. Wong, H. Sadrozinski, S. Penfold, A. Rosenfeld, S. McAllister, and K. Schubert, Development of proton computed tomography for applications in proton therapy, in Proc. Am. Inst. Phys. Conf. Series, Mar. 2009, vol. 1099, pp [3] R. W. Schulte, V. Bashkirov, M. C. Loss Klock, T. Li, A. J. Wroe, I. Evseev, D. C. Williams, and T. Satogata, Density resolution of proton computed tomography, Med. Phys., vol. 32, no. 4, pp , [4] S. Penfold, Image reconstruction and monte carlo simulations in the development of proton computed tomography for applications in proton radiation therapy, Ph.D. dissertation, Centre for Medical Radiation Physics, Univ. of Wollongong, New South Wales, Australia, [5] N. T. Karonis, K. L. Duffin, C. E. Ordo~nez, B. Erdelyi, T. D. Uram, E. C. Olson, G. Coutrakon, and M. E. Papka, Distributed and hardware accelerated computing for clinical medical imaging using proton computed tomography (pct), J. Parallel Distrib. Comput., vol. 73, no. 12, pp , [6] (2014, Apr.). Particle therapy co-operative group [Online]. Available: [7] D. Lifka, I. Foster, S. Mehringer, M. Parashar, P. Redfern, C. Stewart, and S. Tuecke, XSEDE cloud survey report, Technical report, National Science Foundation, USA, XSEDE, Tech. Rep XSEDE-Reports-CloudSurvey-v1.0, [8] A. Gupta and D. Milojicic, Evaluation of HPC applications on cloud, in Proc. 6th Open Cirrus Summit, Oct. 2011, pp [9] K. Jackson, L. Ramakrishnan, K. Muriki, S. Canon, S. Cholia, J. Shalf, H. Wasserman, and N. Wright, Performance analysis of high performance computing applications on the amazon web services cloud, in Proc. IEEE Int. Conf. Cloud Comput. Technol. Sci., Nov. 2010, pp [10] E. Deelman, G. Singh, M. Livny, B. Berriman, and J. Good, The cost of doing science on the cloud: The montage example, in Proc. Int. Conf. High Perform. Comput., Netw., Storage Anal., Nov. 2008, pp [11] L. Heilig and S. Voß, A scientometric analysis of cloud computing literature, IEEE Trans. Cloud Comput., vol. 2, no. 3, pp , Apr [12] M. R. Sossa and R. Buyya, Deadline based resource provisioning and scheduling algorithmfor scientific workflows on clouds, IEEE Trans. Cloud Comput., vol. 2, no. 2, pp , Apr [13] G. Kanagaraj and A. C. Sumathi, Proposal of an open-source cloud computing system for exchanging medical images of a hospital information system, in Proc. 3rd Int. Conf. Trendz Inform. Sci. Comput., Dec. 2011, pp [14] A. Reddy and R. Bhatnagar, Distributed medical image management: A platform for storing, analysis and processing of image database over the cloud, in Proc. Int. Conf. Adv. Energy Convers. Technol., Jan. 2014, pp [15] G. C. Kagadis, C. Kloukinas, K. Moore, J. Philbin, P. Papadimitroulas, C. Alexakos, P. G. Nagy, D. Visvikis, and W. R. Hendee, Cloud computing in medical imaging, Med. Phys., vol. 40, no. 7, p , [16] H. Kim, M. Parashar, D. J. Foran, and L. Yang, Investigating the use of autonomic cloudbursts for high-throughput medical image registration, in Proc. 10th IEEE/ACM Int. Conf. Grid Comput., 2009, pp [17] L. Parsonson, S. Grimm, A. Bajwa, L. Bourn, and L. Bai, A cloud computing medical image analysis and collaboration platform, in Cloud Computing and Services Science, (Series Service Science: Research and Innovations in the Service Economy), I. Ivanov, M. van Sinderen, and B. Shishkov, Eds. New York, NY, USA: Springer 2012, pp [18] T. Bednarz, P. Szul, Y. Arzhaeva, D. Wang, N. Burdett, A. Khassapov, S. Chen, P. Vallotton, R. Lagerstrom, T. Gureyev, and J. Taylor, Biomedical image analysis and processing in clouds, in Proc. AIP Conf., 2013, vol. 1559, no. 1, pp [19] T. Beisel, S. Lietsch, and K. Thielemans, A method for OSEM PET reconstruction on parallel architectures using STIR, in Proc. IEEE Nuclear Sci. Symp. Conf. Record, 2008, pp [20] Y. Maimaitijiang, M. Roula, S. Watson, G. Meriadec, K. Sobaihi, and R. Williams, Evaluation of parallel accelerators for high performance image reconstruction for magnetic induction tomography, J. Select. Areas Softw. Eng., vol. 170, pp. 1 7, 2011.

194 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 6, NO. 1, JANUARY-MARCH 2018 [21] D. Vintache, B. Humbert, and D.

Cormack and A. M. Koehler, Quantitative proton tomography: preliminary experiments, Phys. Med. Biol., vol. 21, no. 4, pp. 560 569, 1976. [23] G. Cirrone, G. Cuttone, G. Candiano, F. Di Rosa, S.

Talamonti, R. Schulte, and R. Schulte, Monte carlo studies of a proton computed tomography system, IEEE Trans. Nuclear Sci., vol. 54, no. 5, pp. 1487 1491, Oct. 2007. [24] V. Sipala, M. Bruzzi, M.

Vanzif, and M. Zanid, A proton computed tomography system for medical applications, J. Instrumentation, vol. 8, no. 02, p. C02021, 2013. [25] D. Gordon and R.

Association, InfiniBand Architecture Specification: Release 1.0. InfiniBand Trade Association, 2000. [27] (2014, Apr.). The gluster web site [Online]. Available: http:// www.gluster.org/ [28] I.

Ghodsi, A. D. Joseph, R. Katz, S. Shenker, and I. Stoica, Mesos: A platform for finegrained resource sharing in the data center, in Proc. 8th USENIX Conf. Netw. Syst. Des. Implementation, 2011, pp.

Lacinski, R. Madduri, and I. Foster, Cost-aware cloud provisioning, in Proc. 11th IEEE Int. Conf. escience, 2015. [32] D. Poola, K. Ramamohanarao, and R.

Rodriguez, K. Chard, U. J. Dave, and I. T.

2266 2279, 2014. [34] L. Wei, H. Zhu, Z. Cao, X. Dong, W. Jia, Y. Chen, and A. V. Vasilakos, Security and privacy for storage and computation in cloud computing, Inform. Sci., vol. 258, pp.

13 194 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 6, NO. 1, JANUARY-MARCH 2018 [21] D. Vintache, B. Humbert, and D. Brasse, Iterative reconstruction for transmission tomography on GPU using nvidia CUDA, Tsinghua Sci. Technol., vol. 15, no. 1, pp , [22] A. M. Cormack and A. M. Koehler, Quantitative proton tomography: preliminary experiments, Phys. Med. Biol., vol. 21, no. 4, pp , [23] G. Cirrone, G. Cuttone, G. Candiano, F. Di Rosa, S. Lo Nigro, D. Lo Presti, N. Randazzo, V. Sipala, M. Bruzzi, D. Menichelli, M. Scaringella, V. Bashkirov, R. D. WilliamsHartmut, F.-W. Sadrozinski, J. Heimann, J. Feldt, N. Blumenkrantz, C. Talamonti, R. Schulte, and R. Schulte, Monte carlo studies of a proton computed tomography system, IEEE Trans. Nuclear Sci., vol. 54, no. 5, pp , Oct [24] V. Sipala, M. Bruzzi, M. Bucciolini, M. Carpinelli, G. Cirrone, C. Civinini, G. Cuttone, D. L. Presti, S. Pallotta, C. Pugliatti, N. Randazzoi, F. Romanog, M. Scaringellac, C. Stancampianoh, C. Talamontid, M. Tesic, E. Vanzif, and M. Zanid, A proton computed tomography system for medical applications, J. Instrumentation, vol. 8, no. 02, p. C02021, [25] D. Gordon and R. Gordon, Component-averaged row projections: A robust, block-parallel scheme for sparse linear systems, SIAM J. Sci. Comput., vol. 27, no. 3, pp , [26] I. T. Association, InfiniBand Architecture Specification: Release 1.0. InfiniBand Trade Association, [27] (2014, Apr.). The gluster web site [Online]. Available: [28] I. Foster, Globus online: Accelerating and democratizing science through cloud-based services, IEEE Internet Comput., vol. 15, no. 3, pp , May [29] B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. D. Joseph, R. Katz, S. Shenker, and I. Stoica, Mesos: A platform for finegrained resource sharing in the data center, in Proc. 8th USENIX Conf. Netw. Syst. Des. Implementation, 2011, pp [30] G. T. Herman and R. Davidi, Image reconstruction from a small number of projections, Inverse Problems, vol. 24, no. 4, p , [31] R. Chard, K. Chard, K. Bubendorfer, L. Lacinski, R. Madduri, and I. Foster, Cost-aware cloud provisioning, in Proc. 11th IEEE Int. Conf. escience, [32] D. Poola, K. Ramamohanarao, and R. Buyya, Fault-tolerant workflow scheduling using spot instances on clouds, Procedia Comput. Sci., vol. 29, no. 0, pp , [33] R. K. Madduri, D. Sulakhe, L. Lacinski, B. Liu, A. Rodriguez, K. Chard, U. J. Dave, and I. T. Foster, Experiences building globus genomics: A next-generation sequencing analysis service using galaxy, globus, and amazon web services, Concurrency Comput.: Practice Exp., vol. 26, no. 13, pp , [34] L. Wei, H. Zhu, Z. Cao, X. Dong, W. Jia, Y. Chen, and A. V. Vasilakos, Security and privacy for storage and computation in cloud computing, Inform. Sci., vol. 258, pp , Ryan Chard received the BSc (Hons) and MSc degrees from the Victoria University of Wellington. He is currently working toward the PhD degree at the Victoria University of Wellington. He is a student member of the IEEE. Nicholas T. Karonis is a professor of computer science at Northern Illinois University. Karonis received a Ph.D. in computer science from Syracuse University and is a Resident Guest Associate at Argonne National Laboratory. Kyle Chard received the PhD degree in computer science from the Victoria University of Wellington. He is a senior researcher and a fellow at the Computation Institute, a joint institute of the University of Chicago and Argonne National Laboratory. He is a member of the IEEE. Kirk L. Duffin received the PhD degree in computer science from Brigham Young University. He is an associate professor of computer science at Northern Illinois University. Caesar E. Ordo~nez received the PhD degree in nuclear physics from the Massachusetts Institute of Technology. He is a researcher at Northern Illinois University and has been working in the field of medical imaging for more than 20 years. Thomas D. Uram is a member of the research staff at the Argonne National Laboratory and the Computation Institute at the University of Chicago. Justin Fleischauer is a graduate student at Northern Illinois University. Ravi Madduri is a project manager at the Argonne National Laboratory and a fellow of the Computation Institute, a joint institute of the University of Chicago and Argonne National Laboratory. Ian T. Foster is a director of the Computation Institute, a joint institute of the University of Chicago and Argonne National Laboratory. He is also an Argonne senior scientist and distinguished fellow and the Arthur Holly Compton distinguished service professor of computer science. He is a senior member of the IEEE.

CHARD ET AL.: SCALABLE PCT IMAGE RECONSTRUCTION DELIVERED AS A CLOUD SERVICE 195 Michael E. Papka received the PhD degree in computer science from the University of Chicago.

He is a senior fellow of the Computation Institute, a joint institute of the University of Chicago and Argonne National Laboratory, and an associate professor of computer science at Northern Illinois

14 CHARD ET AL.: SCALABLE PCT IMAGE RECONSTRUCTION DELIVERED AS A CLOUD SERVICE 195 Michael E. Papka received the PhD degree in computer science from the University of Chicago. He is a senior scientist at the Argonne National Laboratory, where he serves both as a deputy associate laboratory director and as a director of the Argonne Leadership Computing Facility. He is a senior fellow of the Computation Institute, a joint institute of the University of Chicago and Argonne National Laboratory, and an associate professor of computer science at Northern Illinois University. He is a senior member of the IEEE. John Winans received the MS degree from Northern Illinois University. He is a research associate at Northern Illinois University. " For more information on this or any other computing topic, please visit our Digital Library at

CIT 668: System Architecture. Amazon Web Services

CIT 668: System Architecture. Amazon Web Services CIT 668: System Architecture Amazon Web Services Topics 1. AWS Global Infrastructure 2. Foundation Services 1. Compute 2. Storage 3. Database 4. Network 3. AWS Economics Amazon Services Architecture Regions