Analysis of the Parallelisation of the Duchamp Algorithm

Size: px
Start display at page:

Download "Analysis of the Parallelisation of the Duchamp Algorithm"

Transcription

1 ivec Research Internships ( ) Analysis of the Parallelisation of the Duchamp Algorithm Stefan Westerlund University of Western Australia Abstract A critical step in radio astronomy is to search images to determine the objects they contain. New telescope installations, such as the Murchinson Widefield Array (MWA) and the Australian Square Kilometer Array Pathfinder (ASKAP), are capable of observing the sky in superior resolution than previous telescopes. This increased resolution results in a much greater data output, so increased computing power is required in order to search this data for objects. The Square Kilometer Array (SKA) will produce even more data, and require even more computational power to search its output. A parallel application is required to make use of the required computing performance. The goal of this project is to examine the source finder program Duchamp to determine how it will perform in a parallel implementation, and to estimate the potential combinations of hardware to be used to run this parallel implementation. This is done by calculating the arithmetic intensity of Duchamp, and matching it to the arithmetic intensity of potential hardware. This comparison is performed using a black box model, to determine the overall performance of the computing system and its bandwidth A node model is also considered to determine the number, performance, memory and interconnect bandwidth of the individual nodes that comprise the parallel computer system. The results of this project suggest two potential computer systems, one consists of 392 nodes, each with an Intel Core i7 975 processor, at least 14.1GB of RAM and a network capable of providing a connection of at least 1.56GB/s (12.5Gbit/s) of bandwidth to each node. The second uses 67 nodes powered by nvidia Tesla C2070 GPUs, at least 82.1GB of RAM per node and network that can provide at least 4.32GB/s (34.6Gbit/s) of bandwidth to each node. Both systems should use a 100 Gigabit Ethernet network to transfer data to and from the system. Other configurations considered had memory per node requirements that exceed currently available commodity hardware.

2 S. Westerlund / ivec Research Internships ( ) 2 1. Introduction Modern telescope installations, such as the Square Kilometer Array (SKA) and the Australian Square Kilometer Array Pathfinder (ASKAP) search a much larger area of sky in a given amount of time than current telescopes. The SKA will produce data cubes that are terabytes in size. An all sky survey will produce thousands of such image cubes, resulting in a data set that is petabytes in size. All of this data will need to be searched to find the objects they contain. The problem is that the computational requirements to search these images dwarfs that available from desktop machines. Instead, supercomputers are needed to process these images in a reasonable amount of time. Therefore, the source finder programs used for searching the large astronomy images must be able to run in parallel, so they can make use of the computational power of current, parallel supercomputers. This project uses the Duchamp program as a representative example of a source finder, in order to examine the implications of running a source finder in a parallel computer network. The goal of this project is to examine the Duchamp source finding program and determine the effects of parallelising it. This is done by examining how many operations Duchamp requires and how much data transfer is required in order to search an area of sky. These values will be used to determine an appropriate combination of hardware to run the parallel version of Duchamp. This project will consider both the hardware of the computer system as a whole, and as a network of nodes. The Background section will describe the knowledge required to understand this report. The manner in which the Duchamp program will be analysed to understand how it will perform with a parallel implementation will be detailed in the Methodology section, along with how estimates for potential hardware configurations for this problem will be obtained. The results of evaluating the models from the Methodology section are shown in the Testing section. The implications of these results will be considered and the selection of hardware will be made in the Discussion section. The conclusions of this work are made, along with a discussion of the limitations of this project and the future work to be done to expand on this project, in the Conclusion section. 2. Background This section will first provide a broad introduction radio astronomy and the role of source finders. It will then detail Duchamp, the source finder chosen to be investigates in this project. Also described is the á trous image reconstruction algorithm which comprises the majority of the computational requirements of the Duchamp program. The concepts of arithmetic intensity and computational complexity will be explained as they are relevant to the understanding of this work.

3 S. Westerlund / ivec Research Internships ( ) Radio Astronomy Radio Astronomy is the study of celestial objects, by examining electromagnetic radiation in the radio spectrum. It is possible to examine this radiation from earth because it is in one of the frequency windows that is not blocked by Earth s atmosphere. Radio waves are beneficial to study because they pass through objects that are opaque to visible light, such as dust clouds. Radio astronomy also allows astronomers to observe objects that do not emit visible light, such as hydrogen clouds, as neutral hydrogen produces 21cm radiation [1]. Radio waves are often detected using arrays of telescopes because the signals can be combined between telescopes in a process called radio interferometry. Using multiple telescopes improves the angular resolution of the system, such that the angular resolution of two telescopes a certain distance apart is the same as a single telescope with a dish with a diameter of that distance. It takes a significant amount of computational power to process the signals received by telescopes into astronomy images. Signals from different telescopes are correlated together, taking account of their relative positions. The results are integrated over time, causing noise to cancel out towards zero and allowing fainter signals to be detected. This information is combined into a structure called a data cube. This cube has three dimensions, two of which are spatial dimensions that denote where in the sky that element represents, and the third dimension is the frequency channel that a particular element represents. Once these data cubes have been created, they need to be searched. Source Finders are programs that are used to find sources of electromagnetic radiation in an image. The quality of a source finder is measured in terms of its completeness and reliability. Completeness is a measure of how many of the actual sources in the data cube the source finder finds. Reliability is the proportion of the objects a source finder finds that are actual sources, rather than noise. Several source finders were considered for this report. These source finders include were MultiFind and TopHat, which were used by the HIPASS survey [2] and the Duchamp source finder [3]. Neither MultiFind nor TopHat were chosen because of requirements of completeness and reliability. In the HIPASS survey, people were used to confirm each of the sources found by these programs. MultiFind found around 83% and TopHat found around 90% of the sources that were deemed to be in the data, but each one found sources the other didn t. Additionally, MultiFind and TopHat found 137,060 and 17,232 sources respectively, compared to 4,315 sources in the final count [2]. The data from newer surveys, such as ASKAP, will have too many sources to be verified by people. Duchamp [3] is a new source finder program written by Dr. Matthew Whiting. Still under development, Duchamp uses a different algorithm. Because of this, Duchamp will be used as an estimate of how much computing power a source finder will need, and what are the effects of parallelising it. The detection algorithm used by Duchamp is to consider all elements above a certain threshold as bright elements. Adjacent bright elements are then joined together as objects. Objects that are below another threshold in size are discarded as noise, rather than actual sources. Duchamp uses pre-processing of the data cube to reduce the noise of the data cube, and to allow it to see fainter objects. The pre-processing uses image reconstruction with the á trous method [4], which is explained in detail in the next section.

4 S. Westerlund / ivec Research Internships ( ) The Á Trous Image Reconstruction Algorithm The á trous image reconstruction algorithm is a three-dimensional wavelet transform [5]. Through successive three-dimensional low-pass filtering, it considers the image at several scales. The filtered values at each scale are added to the output only if they are still greater than a threshold. The flowchart for the algorithm is shown in Figure 1. The algorithm is described in more detail in the following paragraphs. First the algorithm loads the original data cube as the input, and the values of the output data cube are initialised to zero. The data cube is operated on several iterations of the outside loop, with the stopping criteria dependent on the change in MAD (Median Absolute Deviation) from one iteration to the next, and a minimum of two iterations. For each of these iterations, first the data cube values are set to the original input minus the current output and the scale is initialised to one. The wavelet values are calculated by convolving the data cube is with a low-pass filter, and subtracting these values from the values of the data cube. The distance between the elements used in the filter is dependent on the scale. A threshold is calculated from the median of the wavelet values. The wavelet values that are greater than the threshold are added to the output. The data cube is then updated by subtracting the wavelet values from the current data cube values. The inner loop is repeated, incrementing the scale at each iteration, for a number of scales proportional to the logarithm of the shortest side length of the data cube. Once all the scales have been completed, the final filtered values are added to the output, without regard to a threshold. The stopping condition is then checked to see if another iteration of the outer loop should be performed. Once all the iterations are complete, the output data cube is returned. The exact operations performed by the algorithm are described in the next paragraph. Starting with the original data cube as the input, it is then convolved with a discrete filter. Consider x, y and z as the coordinates of an element in the image cube, d s,l is the real-valued data cube that comprises the image, at scale s and iteration l. α is the original input data cube and β l 1 is the output at the end of iteration l 1 and the start of iteration l. W = l f 2 where l f is the one-dimensional length of the filter used, and f [i][ j][k] are the coefficients of the threedimensional filter. Then the values of the data cube are updated from one scale and iteration to the next according to the following equations: W W W d s+1,l [x][y][z] = f [i][ j][k]d s,l [x + 2 s i][y + 2 s j][z + 2 s k] i= W j= W k= W d 1,l [x][y][z] = α[x][y][z] β l 1 [x][y][z] (1) This data access pattern is demonstrated, in two dimensions, in Figure 2. If a required element is outside of the cube, then a reflected element is used instead. The number of scales, S is dependent on the smallest side length of the data cube being examined. If the length of the shortest side is l min, then the number of scales is S = log 2 (l min ) 1. The wavelet coefficients, w s,l, at scale s and iteration l are then equal to the difference between

5 S. Westerlund / ivec Research Internships ( ) 5 Load Input Image Set Data Cube to Input Minus Output Set Scale to 1 Calculate Wavelet Coefficients Calculate Threshold Add Wavelet Values Greater than Threshold to Output Update Data Cube Values Using Wavelets Increment Scale No Scale > Max? Yes Add Data Cube Values To Output Continue Check Stopping Criteria Stop Return Output Figure 1: Á Trous Image Reconstruction Flowchart This flowchart shows the working of the à trous image reconstruction algorithm used by Duchamp. First it loads the original image as the input, and the values of the output data cube is initialised to zero. The image is operated on several iterations of the outside loop, with the stopping criteria dependent on the change in MAD (Median Absolute Deviation) from one iteration to the next, and a minimum of two iterations. For each of these iterations, the data cube is convolved with a filter over several scales. At each scale, data cube values that are above a threshold are added to the output.

6 S. Westerlund / ivec Research Internships ( ) ,3 2 2,3 2 2, ,2 1 1,2 1 1, ,3 1,2 1 X 1 1,2 2, ,2 1 1,2 1 1, ,3 2 2,3 2 2, Figure 2: Duchamp Data Access Pattern This diagram shows the values needed to calculate the next value of a given element. A twodimensional data set is used instead of a three-dimensional one for clarity. Likewise, only the elements required for the first three scales are shown. The element marked X is the element whose next value is being calculated. The numbers indicate in which scale the surrounding elements are used. Elements that are used in two different scales still need to be read twice, as the value of the surrounding pixels will have also changed value from one scale to the next. Note that the distance from the target element to the surrounding element doubles with each scale. Also, each coloured element will require the black element for their calculation, at that scale.

7 the data cube values at successive scales: S. Westerlund / ivec Research Internships ( ) 7 w s,l [x][y][z] = d s,l [x][y][z] d s+1,l [x][y][z] (2) The wavelet coefficients are then added to the output array, if and only if they are a certain threshold, t[s], above the median, m[ w s,l ]. This threshold is a constant, based on the current scale, s. The increase in the output as a result of scale s in iteration l, β s,l, is therefore calculated according to the following equation: w s,l [x][y][z] if w s,l [x][y][z] > m[ w s,l ] + t[s] β s,l [x][y][z] = (3) 0 otherwise The threshold is a constant value determined at the start of the program. It is dependent on the scale, and is multiplied by a value given in the program parameters. The output is calculated for S scales. The final filtered values for the data are added to the output, so the total output for iteration l of the algorithm, β l, is given according to the following equation: β l+1 [x][y][z] = β l [x][y][z] + d S +1,l [x][y][z] + β 0 [z][y][z] = 0 S β s,l [x][y][z] s=1 (4) The output after each iteration is calculated until the difference in the median absolute deviation from one iteration until the next is small enough. If M[ x] is the median absolute deviation of x and τ is the tolerance, then the stopping condition is evaluated according to the equation: M[ α β l ] M[ α β l 1 ] M[ α β l ] < τ (5) The tolerance is specified in the input parameters for the program. With the default setting, the algorithm usually takes three or four iterations Algorithmic Intensity The measure used to match the algorithm to potential hardware is Arithmetic Intensity. This value compares computation to data transfer. The arithmetic intensity of an algorithm is the number of operations it requires per byte of data transferred. The arithmetic intensity of a computer system is its operational performance divided by its bandwidth. If the arithmetic intensity of the algorithm is greater than that of the system, then the problem is computationally bound, and excess bandwidth will be unused. Conversely, if the arithmetic intensity of the algorithm is less than that of the computer system the algorithm is bandwidth bound, and excess computational performance will be unused. This metric can therefore be used to match an algorithm with suitable hardware.

8 S. Westerlund / ivec Research Internships ( ) 8 Equation 6 denotes how the arithmetic intensity of an algorithm and computer system is calculated. a a is the arithmetic intensity of the algorithm, a c is the algorithmic intensity of the computer system. p is the number of operations required for the algorithm, usually counted as FLOPs, of Floating Point OPerations. r is the number of bytes the algorithm needs to transfer. c is the computational power of the computer, in FLOP/s, Floating Point OPerations per second. b is the bandwidth of the computer system. If a a > a c then the problem is computationally bound, and excess bandwidth will be unused. If a a < a c then the algorithm is bandwidth bound, and extra computational performance will not be used. This is how an algorithm can be matched to an appropriate computer system. The number of operations required, and the related computational complexity, is discussed in the next section. a a = p r a c = c b (6) 2.4. Computational Complexity and Operation Counts Computational Complexity is a measure of how much effort is required to run an algorithm. It is often written as the upper bound, in big O notation. A function f (n) is of order O(g(n)) if Equation 7 holds for some constant k. lim f (n) kg(n) (7) n inf The computational complexity of the á trous image reconstruction method used by Duchamp is O(VS L), where V is the number of elements in the image, S is the number of scales and L is the number of iterations of the outer loop required. This can be used to estimate the running time of the program, based on the running time of the program with different input. If t is the running time of the program with parameters V, S and L and t 0 is the running time measured using parameters V 0, S 0 and L 0, then the running time can be estimated using Equation 8. VS L t t 0 (8) V 0 S 0 L 0 Examining the source code also allows for the operation counts to be determined. The the filtering portion of the image reconstruction algorithm requires 2180VS L single-precision floating point operations and 250VS L double-precision floating point operations. Calculating the median requires 48V log 10 (V)(S + 2)L single-precision floating point operations. This analysis considers the median algorithm to be a single-threaded implementation of introsort [6], followed by picking the middle element, as a worst-case scenario. Parallel, and more efficient, implementations exist, for example Bader, 2004 [7]. The single- and double-precision operations will be combined to derive an equivalent number of single-precision floating point operations. This will be done by considering a double-precision

9 S. Westerlund / ivec Research Internships ( ) 9 operation to be equivalent to two single-precision operations in the case of CPUs [8], two singleprecision operations for nvidia GPUs [9] and five single-precision operations in the case of AMD GPUs [10]. This is because different processors perform double-precision floating point operations at different speeds relative to how fast they can perform single-precision floating point operations. These are the operation counts that will be used in calculating the arithmetic intensity of the image reconstruction algorithm. 3. Methodology This section will describe how the Duchamp source finder program will be analysed. It will explain how the data cubes that are the input to Duchamp will be considered. Two models of the computing environment in which Duchamp will be run will be considered, a black box model and a node model. The two measures that will be applied to Duchamp, arithmetic intensity and computational complexity, are explained in this section. This report will consider a data cube with two spatial dimensions, X and Y, and a frequency dimension, F. This results in an image cube having XY F elements. Each element has D singleprecision values for each, where D is greater than or equal to one. These values specify different properties of the element, including one element denoting the brightness of that element. Therefore, with a single-precision floating point number requiring four bytes of storage, the total file size for the data cube is 4XYFD bytes. Because Duchamp only uses the brightness value for an element, only this value will be considered when determining how much memory Duchamp needs to store all its data Black Box Model The first model is a black box model, as shown in Figure 3. This considers the computer system as a black box, with a certain computational rate and a bandwidth that determines the rate at which data is moved on and off the system. This model will compare the total number of floating point operations required by the algorithm to the amount of data transfer of moving the input data cube onto the system and moving the output data catalogue off the system. This model will help determine the overall performance of the potential system. This model gives particular values to use to calculate the arithmetic intensities, as shown in Equation 6. p is the number of floating point operations required, as given in Section 2.4. The value of r is equal to the file size of the image cube, 4XYFD plus the file size of the catalogue, in bytes. Although only one value per element is used by Duchamp, this report will consider all D values per element being transferred to the computer system, as a worst-case situation. The value c is equal to the total computational performance of the computer system, in FLOP/s and b is the bandwidth of the communication link that moves data on and off the system.

10 S. Westerlund / ivec Research Internships ( ) Node Model The second model is the node model, as shown in Figure 4. This concerns the computation once all the data is on the system. It considers a series of nodes, each capable of a certain computational performance. These are connected by an interconnect that has a certain bandwidth from one node to another. This will compare the operations required by the algorithm against the data it needs to transfer. This model will also consider the memory requirements of the system. Examining the source code shows that the image reconstruction algorithm uses five singleprecision values in memory for each element in the data cube, so each node requires 5 4V bytes of RAM to store all the required data in memory. In determining the arithmetic intensity, the node model uses the same number of floating point operations as the black box model for p. The computational performance of the computer system, c, is the computational performance of a single node. The system bandwidth, b, is the bandwidth of the interconnect from one node to another. The amount of data transfer required, r, is more complicated, as it depends on the number of nodes used, and how the elements in the data cube are distributed between the nodes. The amount of the data transfer required is calculated in the following paragraphs. This analysis will consider each node to hold an m m m cube of elements from the data cube, and be responsible for performing the operations required for these elements. If there are a total of v elements in the data cube then the number of nodes used is n = v, so m = m 3 3 v n. For an a filter with length 2w + 1, for some positive integer w and at each scale s, to calculate the wavelet coefficient for a given element, the node holding that element needs the value of the elements 2 s 1, 2 2 s 1, 3 2 s+1,..., w 2 s 1 values away, on either side, in each dimension. This requires that each node in the computing network store not only the data cube values for the elements it is responsible for computing, but also the elements that surround these in the data cube. These extra values that are not operated on by a node, but are used in calculations for the elements on that node, are called a halo. If a node needs a element d values away from than element in each of three dimensions and a node works on a cube of elements with side length m, then the elements a node in the computer network needs form a cube with side lengths of m + 2d elements. The amount of data that needs to be transferred to a node is equal to the volume of the above cube stored in that node, minus the size of the cube itself, as the node already holds its own values. Therefore, the amount of data transfer needed per node, d n is: d n = (m + 2d) 3 m 3 = 6m 2 d + 12md 2 + 8d 3 (9) The total amount of data needing transfer from one node to another is equal to the data transfer needed per node multiplied by the number of nodes. Thus the number of elements being transferred per pass of the filter, d f, is equal to: d f = nd n = (6m 2 d + 12md 2 + 8d 3 )n = 6nm 2 d + 12nmd 2 + 8nd 3 (10)

11 S. Westerlund / ivec Research Internships ( ) 11 Data Cube Bandwidth Computer System Bandwidth Catalogue Computational Performance Figure 3: Black Box Model This figure shows the black box model. It compares the computation required to complete an algorithm on the input given, against the data transfer required to move the input onto the computer system and to move the output from the system. This can be used to determine a potential performance for the computer system, when using a given data transfer technology. Nodes Computational Performance Interconnect Bandwidth Figure 4: Node Model This figure shows the node model. It considers a number of computing nodes that are each capable of a certain computational performance, and are connected together with an interconnect of a certain bandwidth. Each node hold part of the data cube, and works with the other nodes to calculate the result. This model can be used to match the algorithm to a given computational performance, bandwidth, number of nodes in the network, and memory per node.

12 S. Westerlund / ivec Research Internships ( ) 12 Substituting in m = 3 v n gives the amount of transfer per pass of the filter as a function of the total data cube size and the number of nodes. d f = 6n 1 3 v 2 3 d + 12n 2 3 v 1 3 d 2 + 8nd 3 (11) Thus the amount of data transfer needed is proportional to the number of nodes. The maximum amount of elements stored on a given node is limited by the amount of memory that node has, divided my the amount of memory it needs per element to be able to operate on that element. For this algorithm, the distance is dependent on the scale: d = w2 s 1, for s = 1 to S. The amount of data transferred at scale s, d s is: d s = 6n 1 3 v 2 3 w2 s n 2 3 v 1 3 w 2 2 2s 2 + 8nw 3 2 3s 3 (12) The amount of information transferred for each iteration of the main loop, d l is therefore: d l = S s=1 6n 1 3 v 2 3 w2 s n 2 3 v 1 3 w 2 2 2s 2 + 8nw 3 2 3s 3 (13) is: And so the total amount of information needing transfer for L iterations of the main loop, d t d t = L S s=1 6n 1 3 v 2 3 w2 s n 2 3 v 1 3 w 2 2 2s 2 + 8nw 3 2 3s 3 (14) The amount of data transfer required for a given scale reaches a maximum when all the data needed in that scale comes from elements that are stored in a different node from the center element. When this happens, the data transfer for that scale stays constant as the number of processes increases. This is because each node needs data from w 3 1 other nodes, rather than needing all the values in a certain distance in the data cube. The data access patterns between nodes can be seen in Figures 5 and 6. This calculation overestimates the amount of data transfer because it fails to account for the edge cases of the data cube. When an element near the edge of the data cube needs the value of an element that is outside the data cube, a reflected value is used instead. This reflected element may lie in the current node, or in data that has already been loaded from another node, meaning that the reflected value does not need to be loaded itself. The portion of the data that does not need to be loaded because of this increases as the number of elements per node increases relative to the total number of elements. Thus, the overestimation is greatest when the fewest number of nodes are used. This is why the amount of data transfer does not decrease to zero and the number of nodes is one.

13 S. Westerlund / ivec Research Internships ( ) 13 Figure 5: Duchamp Node Data Access Pattern This diagram shows the elements a node needs from other nodes, for different scales. A twodimensional data set is shown instead of a three-dimensional data set for clarity. This example uses a node size value of m = 4, a filter of length 5, so w = 2 and shows four scales, S = 4. The boundaries between nodes are shown by the thick black lines. The black elements are the elements whose wavelet values are being calculated by a given node. The first scale uses the blue elements. The second scale uses the blue and red elements. The third scale uses the blue, red, green and yellow elements. The fourth scale requires the yellow and brown elements. Note how for the first three scales, all the elements in a certain distance around the node are needed, but at the fourth scale only certain blocks of values are needed, with gaps in between. In this example, the data transfer between nodes reaches a maximum at the fourth scale, and remains the same for higher scales.

14 S. Westerlund / ivec Research Internships ( ) 14 Figure 6: Duchamp Node Data Access Pattern This diagram shows the elements a node needs to calculate the wavelet coefficients for its own elements, when the node size is not a power of two. This example uses a node size value of m = 5, a filter of length 5, so w = 2 and shows four scales, S = 4. The first scale used the blue elements. The second scale uses the blue, red and orange elements. The third scale uses the blue, red, orange, yellow and green elements. The fourth scale uses the orange, green and brown elements. Note that the elements needed by a particular node to not align with the elements other nodes hold.

15 S. Westerlund / ivec Research Internships ( ) 15 The data transfer for the median requires one transfer of the image each time the median is calculated. The amount of data transfer per calculation of the median is d m = V. The median is called once per scale and twice at then end of each iteration, so the total number of elements needing transfer for calculation of the median, d m,t is: d m,t = d m (S + 2)L = V(S + 2)L (15) 4. Results There are a number of steps in analysing how the performance of a parallel implementation of Duchamp. First the input to Duchamp will be defined, for use in the remainder of the testing. Preliminary testing was first performed to examine the single-threaded implementation, to determine its running time and most computationally intensive methods. Duchamp was then analysed according to the black box model to match the computational requirements of the entire computer system with the speed of the connection used to transfer data on and off the system. Duchamp was then considered using the node model. This model relates the number of nodes, the computational speed and memory of each node, and the speed of the interconnect between nodes. There are two data cubes that will be considered in this analysis. The first is a data cube of the Virgo cluster, made from data from the HIPASS survey. This cube has spatial dimensions of X Y = , and F = 256 frequency channels. From these dimensions, the number of scales for this cube, S will be six. This cube only has one value per element, so D = 1 and the file size is 116MB. This data cube will be used to test the Duchamp program. The second cube is a hypothetical cube that may be produced by ASKAP, to be used to estimate what hardware a computer system would need to process such a cube. This cube has spatial dimensions of X Y = 4, 096 4, 096 and frequency channels F = 16, 384. This results in a number of scales, S, of ten. The ASKAP cube may have D = 5 values per element and a file size of 5.5TB. Of the five values, four are the Stokes parameters that determine the polarisation of that electromagnetic radiation, including the brightness, and the fifth being a weighting that measures how exposed that element was over the time the data cube was produced. As a preliminary test, Duchamp was first run using the HIPASS cube as input. This was to provide an estimate of the time needed, and to check what methods used the majority of the computing time. This test was run on a system with a dual-core AMD 1.8GHz Opteron 265 processor with 4GB of DDR2 memory. The filter chosen for this test, and for the ASKAP data cube, was of length five, so w = 2. Running the Duchamp program on this system with the HIPASS cube as input took 30 minutes. This required three iterations for the outermost loop. This report will estimate that the ASKAP will require four iterations of the outermost loop. Using these values, as estimate can be made for how long Duchamp will run when processing an ASKAP cube. Using Equation 8, the estimate of the time taken, T, is:

16 S. Westerlund / ivec Research Internships ( ) 16 4, 096 4, , T 30 minutes , 000 minutes 440 days (16) This test produces a catalogue output that is 70kB in size. This shows that for the black box model, the size of the output can be ignored because it is negligible compared to the size of the input. Duchamp was profiled with the gprof program in order to determine which method calls take the greatest portion of the running time. Analysis of the operation counts, in section 2.4, suggests that the á trous image reconstruction algorithm and the determining the median would comprise the majority of the computational requirements of Duchamp. Executing Duchamp with the HIPASS data cube shows that the á trous image reconstruction algorithm takes 95% of the running time, including 17% for calculating the median. This confirms that these are the most time consuming parts of the Duchamp program. The amount of operations required is a function of the cube, and independent of the number of processors used. For the filtering, the HIPASS cube requires 1.23 single-precision TFLOPs and 130 double-precision GFLOPs. For calculating the median, it requires 248 single-precision GFLOPs. The ASKAP cube requires 24 single-precision EFLOPs and 2.75 double-precision EFLOPs for filtering and 7.24 single-precision EFLOPs for calculating the median. The black box arithmetic intensity can be calculated from these variables. How the black box arithmetic intensity varies with the size of the data cube is shown in Figure 7. The black box arithmetic intensity of the HIPASS and ASKAP cubes is compared to different network technologies in Figure 8. The technologies shows are Gigabit Ethernet at 125MB/s, 10 Gigabit Ethernet at 1.25GB/s, the proposed 100 Gigabit Ethernet at 12.5GB/s [11] and InfiniBand QDR 4X at 4.00GB/s [12]. The node model arithmetic intensity of the algorithm changes with the number of nodes used, as the amount of data transfer varies. The amount of data transfer required is shown in Figure 9. Comparing this with the amount of operations required, the arithmetic intensity can be calculated. The arithmetic intensity of the HIPASS and ASKAP cubes, as the number of nodes varies, is shown in Figures 11 and 12, respectively. Comparing these values against available hardware links a potential combination of hardware to the optimum number of nodes, as shown in Figure 13. The processors shown are an Intel Core i7 975, with a single-precision performance of 213 GFLOP/s [13], a nvidia Tesla C2070 with a single-precision performance of 1.26 TFLOP/s [14] and an AMD Radeon HD 5970 with a single-precision performance of 4.64 TFLOP/s [10]. The interconnects used are 10 Gigabit Ethernet at 250MB/s [11], InfiniBand QDR 4X at 8.00GB/s [12], PCI Express v2 x16 at 16.0GB/s[15], and the proposed 100 Gigabit Ethernet at 25.0GB/s [11]. These figures are twice the one-way bandwidth because the Duchamp algorithm can benefit from transferring information in both directions with full-duplex interconnects. How the RAM requirements for each node varies with the number of nodes is shown in Figure 14.

17 S. Westerlund / ivec Research Internships ( ) FLOPs per Byte Transferred e+06 1e+09 1e+12 Number of Elements in Data Cube HIPASS ASKAP Figure 7: Duchamp Black Box Computational Intensity This graph shows how the approximate computational intensity for the Duchamp program varies with the number of elements in the data cube size. This is measured as the number of combined single- and double-precision FLOPs required per byte of data transferred onto the computer system. This graph assumes that each element in the data cube has five single-precision floating point values associated with it, and that four iterations of the main loop are performed. Because of this, the HIPASS cube shown here shows less arithmetic intensity than it does in practice. This graph was made to determine how the balance of computation and data transfer varies for different data cube sizes.

18 S. Westerlund / ivec Research Internships ( ) 18 Computational Performance (TFLOP/s) System Bandwidth (Gigabytes per Second) HIPASS Computational Intensity ASKAP Computational Intensity Gigabit Ethernet 10 Gigabit Ethernet InfiniBand QDR 4X 100 Gigabit Ethernet Figure 8: Duchamp Black Box Technological Requirements This graph compares the floating point performance of the computer system required to keep up with a given bandwidth that transfers data onto the computer system. The x-axis is the bandwidth used to transfer data onto the system in bytes per second and the y-axis is the floating point performance, in floating point operations per second. Also shown are several common network technologies and their bandwidths. This is made in order to match the connection bandwidth to the overall computational performance of the computer system.

19 S. Westerlund / ivec Research Internships ( ) Data Transfer Between Nodes (TB) Number of Nodes HIPASS ASKAP Figure 9: Duchamp Data Transfer This figure shows how the amount of data needing transfer from one node to another varies with the number of nodes. The x-axis is the number of nodes in the computer system and the y-axis is the amount of data, measured in terabytes. This graph calculates the amount of data transfer as the number of nodes in the system varies, to be used to calculate the arithmetic intensity of the image reconstruction algorithm when calculated using different numbers of nodes.

20 S. Westerlund / ivec Research Internships ( ) Data Transfer Between Nodes (TB) e+06 1e+09 1e+12 Number of Nodes HIPASS ASKAP Figure 10: Duchamp Data Transfer for Large Numbers of Nodes This diagram shows how much the data transfer varies for large numbers of nodes. The x-axis is the number of nodes. Each line starts at one node at ends at a number of nodes equal to the number of elements in that data cube. The y-axis is the amount of data transferred from one node to another, in terabytes. This graph shows the data transfer required for grater numbers of nodes in order to show the effects of the image reconstruction algorithm reaching the maximum data transfer for a given scale.

21 S. Westerlund / ivec Research Internships ( ) FLOPs per Byte Transferred Number of Nodes Single-Precision FLOPs Double-Precision FLOPs Combined FLOPs Figure 11: HIPASS Computational Intensity This graph shows the computational intensity of the Duchamp 3D image reconstruction algorithm, when run on the HIPASS data cube. This arithmetic intensity can be used to match the algorithm to suitable hardware. The first line shows the single-precision algorithmic intensity. The second line shows the double-precision algorithmic intensity. The third line shows the computational intensity of single- and double-precision operations together, counting one double-precision operation as two single-precision operations. These lines show the single- and double-precision operations because they are performed at different speeds by CPUs and GPUs.

22 S. Westerlund / ivec Research Internships ( ) FLOPs per Byte Transferred Number of Nodes Single-Precision FLOPs Double-Precision FLOPs Combined FLOPs Figure 12: ASKAP Computational Intensity This graph shows the computational intensity of the Duchamp 3D image reconstruction algorithm, when run on the ASKAP data cube. This arithmetic intensity can be used to match the algorithm to suitable hardware. The first line shows the single-precision algorithmic intensity. The second line shows the double-precision algorithmic intensity. The third line shows the computational intensity of single- and double-precision operations together, counting one double-precision operation as two single-precision operations. These lines show the single- and double-precision operations because they are performed at different speeds by CPUs and GPUs.

23 S. Westerlund / ivec Research Internships ( ) Number of Nodes Interconnect Bandwidth (Gigabytes per Second) Intel Core i7 975 nvidia Tesla C2070 AMD Radeon HD Gigabit Ethernet InfiniBand QDR 4X PCI-E v2, x Gigabit Ethernet Figure 13: Duchamp Optimum Number of Nodes This graph shows the optimum number of nodes to use for the Duchamp algorithm on an ASKAP data cube. The number of nodes is a function of the bandwidth of the interconnect used and the chosen processor. If a greater number of nodes is used, then there is not enough bandwidth to keep up with the extra computational performance and increased data transfer. If a fewer number of nodes is used, then bandwidth will go unused as the system waits for calculations to complete. The arithmetic intensity figure used for each processor takes into account the relative speed of single- and double-precision floating point operations on that processor. This graph uses arithmetic intensity of the algorithm to match a processor speed and interconnect bandwidth to the number of nodes used to make the system.

24 S. Westerlund / ivec Research Internships ( ) TB 1 TB RAM per Node Needed 100 GB 10 GB 1 GB 100 MB Number of Nodes HIPASS Data Cube ASKAP Data Cube Figure 14: Duchamp Memory Requirements This graph shows how many nodes are needed, for a given amount of RAM per node, to store all the information needed in RAM. The x-axis is the number of nodes in the computer system, and the y-axis is the amount of RAM each node needs to store all the data required by Duchamp. This RAM is used to avoid the longer access times of secondary storage. This relation between the amount of RAM needed and the number of nodes, and the maximum amount of RAM per node in available technology, forms a lower bound on the number of nodes that can be used to execute Duchamp.

25 S. Westerlund / ivec Research Internships ( ) Discussion The black box arithmetic intensity is first calculated. A technology for the system bandwidth can be chosen, and from the black box arithmetic intensity the overall computational performance of the system can be determined. An additional choice of processor can then be made, and compared to the overall computational performance to determine how many nodes are needed. The node model is then considered. Using the node arithmetic intensity from this model, the interconnect bandwidth can be determined from the number of nodes and the performance of the chosen processor. The node model also determines the amount of memory each node will need, from the number of nodes used. Therefore the information obtained from these models is used to estimate an potential combination of hardware to be used to execute the Duchamp program Black Box Model The arithmetic intensity of Duchamp is first considered using the black box model. This is done to match the computational complexity of the entire system with the bandwidth of the connection that is used to transfer data to and from the system. The black box arithmetic intensity increases with the size of the data cube, as shown in Figure 7. This test is done to show how the black box arithmetic intensity of the algorithm varies with different-sized data cubes. The arithmetic intensity increases because the number of operations required is proportional to the size of the data cube increases and the number of scales increases, but the amount of data transfer required is only proportional to the size of the data cube. The jumps in the graph occur as the data cube becomes large enough that another scale is needed for the filtering. This suggests that a proportionally faster computer system, compared to the bandwidth, can be used as the size of the image increases. For simplicity, this graph uses three assumptions to show the arithmetic intensity as a function of the data cube size. First, it assumes that the data cube has the same length in each of the three dimensions, so that the number of scales is only a function of the total number of elements in the cube, rather than considering the smallest side length. The second assumption is that the number of values per element that need to be transferred is D = 5 and the third is that the number if iterations of the outermost loop is L = 4. Because of this, the arithmetic intensity shown here is only approximate. In particular, the arithmetic intensity of the HIPASS data cube is higher than that shown, because this graph overestimates how many values are needed to be transferred onto the system. The actual arithmetic intensities of the HIPASS and ASKAP data cubes are shown in Figure 8. This graph shows that the black box arithmetic intensity of Duchamp using the HIPASS data cube greater than that when using the ASKAP data cube. This is because the HIPASS cube required less data transfer per element of the data cube. From this graph, a network technology can be chosen and the appropriate computational power of the system can be determined Node Model We now consider Duchamp using the node model. In order to calculate the arithmetic intensity of the system, the number of operations and amount of data transfer must be known. The

26 S. Westerlund / ivec Research Internships ( ) 26 number of operations required, as calculated from the equations in Section 2.4 are given on page 16. The amount of data transfer is more complex and shown in Figure 9. These graphs show how the amount of data transfer needed increases with the number of nodes. As the number of nodes increases, the amount of data transfer required approaches a linear increase with the number of nodes. There are a sudden decreases in slope present in these graphs. These occur when the maximum data transfer is reached for a particular scale, so the data transfer for that scale stops increasing with the number of nodes. These changed can be seen more clearly in Figure 10. This plot shows the data transfer of Duchamp using the two data cubes, from using a single node for the entire data cube to using a single node for each element in the data cubes. Note that these plots overestimate the amount of data transfer required, particularly for low numbers of nodes. This is because this model does not account for reflection of edge values, where the element of a needed value lies outside the data cube, and instead the value of a reflected element is used instead. The reflected element may lie in the original node, or overlap with elements needed from another node. With these results, the arithmetic intensity of the Duchamp algorithm when using the HIPASS and ASKAP data cubes as input can be calculated. The arithmetic intensity of Duchamp using the HIPASS data cube is shown in Figure 11 and the algorithmic intensity using the ASKAP data cube is shown in Figure 12. These figures show how the arithmetic intensity decreases as the number of nodes increases. This is because the number of operations is constant with the number of nodes, but the data transfer needed increases. These figures each show three lines. These are the arithmetic intensities calculated using the single-precision floating point operations, double-precision operations and the equivalent combined operation counts. Comparing this arithmetic intensity against available hardware can be used to determine the optimum number of nodes. Figure 13 shows the optimum number of nodes for a given combination of processor and interconnect bandwidth. There is a sudden jump in the optimum number of nodes, near 2000 nodes. This because a scale reaches maximum data transfer, so the slope of the data transfer decreases and the slope of the arithmetic intensity of the Duchamp algorithm increases. The slope of the optimum number of nodes otherwise decreases, as the data transfer required increases. The last factor in the node model is the amount of memory each node needs. Figure 14 shows how the amount of RAM each node needs varies with the number of nodes. Combined with a maximum amount of RAM per node on available technology, this relation forms a lower limit on the number of nodes that can be effectively used to run Duchamp. As a node needs five single-precision floating point values for each element of a data cube it holds, the amount of RAM required is proportional to the number of elements in the image, and inversely proportional to the number of nodes in the computer system. The graph decreases in steps for low numbers of nodes because only integer numbers of nodes are considered Hardware Choices There are a number of constraints that affect what the potential choice of hardware for running Duchamp on an ASKAP-size data cube. The computer system should finish computation on the data cube in an equivalent amount of time to transferring the data cube onto the system.

PARALLEL PROGRAMMING MANY-CORE COMPUTING: THE LOFAR SOFTWARE TELESCOPE (5/5)

PARALLEL PROGRAMMING MANY-CORE COMPUTING: THE LOFAR SOFTWARE TELESCOPE (5/5) PARALLEL PROGRAMMING MANY-CORE COMPUTING: THE LOFAR SOFTWARE TELESCOPE (5/5) Rob van Nieuwpoort Vrije Universiteit Amsterdam & Astron, the Netherlands Institute for Radio Astronomy Why Radio? Credit: NASA/IPAC

More information

ASKAP Data Flow ASKAP & MWA Archives Meeting

ASKAP Data Flow ASKAP & MWA Archives Meeting ASKAP Data Flow ASKAP & MWA Archives Meeting Ben Humphreys ASKAP Software and Computing Project Engineer 25 th March 2013 ASTRONOMY AND SPACE SCIENCE ASKAP @ Murchison Radioastronomy Observatory Australian

More information

Chapter 36. Diffraction. Dr. Armen Kocharian

Chapter 36. Diffraction. Dr. Armen Kocharian Chapter 36 Diffraction Dr. Armen Kocharian Diffraction Light of wavelength comparable to or larger than the width of a slit spreads out in all forward directions upon passing through the slit This phenomena

More information

ASKAP Central Processor: Design and Implementa8on

ASKAP Central Processor: Design and Implementa8on ASKAP Central Processor: Design and Implementa8on Calibra8on and Imaging Workshop 2014 Ben Humphreys ASKAP So(ware and Compu3ng Project Engineer 3rd - 7th March 2014 ASTRONOMY AND SPACE SCIENCE Australian

More information

John W. Romein. Netherlands Institute for Radio Astronomy (ASTRON) Dwingeloo, the Netherlands

John W. Romein. Netherlands Institute for Radio Astronomy (ASTRON) Dwingeloo, the Netherlands Signal Processing on GPUs for Radio Telescopes John W. Romein Netherlands Institute for Radio Astronomy (ASTRON) Dwingeloo, the Netherlands 1 Overview radio telescopes six radio telescope algorithms on

More information

A Comprehensive Study on the Performance of Implicit LS-DYNA

A Comprehensive Study on the Performance of Implicit LS-DYNA 12 th International LS-DYNA Users Conference Computing Technologies(4) A Comprehensive Study on the Performance of Implicit LS-DYNA Yih-Yih Lin Hewlett-Packard Company Abstract This work addresses four

More information

Adaptive selfcalibration for Allen Telescope Array imaging

Adaptive selfcalibration for Allen Telescope Array imaging Adaptive selfcalibration for Allen Telescope Array imaging Garrett Keating, William C. Barott & Melvyn Wright Radio Astronomy laboratory, University of California, Berkeley, CA, 94720 ABSTRACT Planned

More information

VISUALISATION AND ANALYSIS

VISUALISATION AND ANALYSIS VISUALISATION AND ANALYSIS CHALLENGES FOR WALLABY Christopher Fluke David Barnes, Amr Hassan [ Scientific Computing & Visualisation Group ] CRICOSProductions provider 00111D Swinburne Astronomy WALLABY

More information

Chapter 6. Parallel Processors from Client to Cloud. Copyright 2014 Elsevier Inc. All rights reserved.

Chapter 6. Parallel Processors from Client to Cloud. Copyright 2014 Elsevier Inc. All rights reserved. Chapter 6 Parallel Processors from Client to Cloud FIGURE 6.1 Hardware/software categorization and examples of application perspective on concurrency versus hardware perspective on parallelism. 2 FIGURE

More information

Computational issues for HI

Computational issues for HI Computational issues for HI Tim Cornwell, Square Kilometre Array How SKA processes data Science Data Processing system is part of the telescope Only one system per telescope Data flow so large that dedicated

More information

Adaptive-Mesh-Refinement Hydrodynamic GPU Computation in Astrophysics

Adaptive-Mesh-Refinement Hydrodynamic GPU Computation in Astrophysics Adaptive-Mesh-Refinement Hydrodynamic GPU Computation in Astrophysics H. Y. Schive ( 薛熙于 ) Graduate Institute of Physics, National Taiwan University Leung Center for Cosmology and Particle Astrophysics

More information

Image Compression With Haar Discrete Wavelet Transform

Image Compression With Haar Discrete Wavelet Transform Image Compression With Haar Discrete Wavelet Transform Cory Cox ME 535: Computational Techniques in Mech. Eng. Figure 1 : An example of the 2D discrete wavelet transform that is used in JPEG2000. Source:

More information

Double-Precision Matrix Multiply on CUDA

Double-Precision Matrix Multiply on CUDA Double-Precision Matrix Multiply on CUDA Parallel Computation (CSE 60), Assignment Andrew Conegliano (A5055) Matthias Springer (A995007) GID G--665 February, 0 Assumptions All matrices are square matrices

More information

Image Transformation Techniques Dr. Rajeev Srivastava Dept. of Computer Engineering, ITBHU, Varanasi

Image Transformation Techniques Dr. Rajeev Srivastava Dept. of Computer Engineering, ITBHU, Varanasi Image Transformation Techniques Dr. Rajeev Srivastava Dept. of Computer Engineering, ITBHU, Varanasi 1. Introduction The choice of a particular transform in a given application depends on the amount of

More information

CS 179: GPU Computing LECTURE 4: GPU MEMORY SYSTEMS

CS 179: GPU Computing LECTURE 4: GPU MEMORY SYSTEMS CS 179: GPU Computing LECTURE 4: GPU MEMORY SYSTEMS 1 Last time Each block is assigned to and executed on a single streaming multiprocessor (SM). Threads execute in groups of 32 called warps. Threads in

More information

Disk Scheduling COMPSCI 386

Disk Scheduling COMPSCI 386 Disk Scheduling COMPSCI 386 Topics Disk Structure (9.1 9.2) Disk Scheduling (9.4) Allocation Methods (11.4) Free Space Management (11.5) Hard Disk Platter diameter ranges from 1.8 to 3.5 inches. Both sides

More information

Overlapping Computation and Communication for Advection on Hybrid Parallel Computers

Overlapping Computation and Communication for Advection on Hybrid Parallel Computers Overlapping Computation and Communication for Advection on Hybrid Parallel Computers James B White III (Trey) trey@ucar.edu National Center for Atmospheric Research Jack Dongarra dongarra@eecs.utk.edu

More information

Introduction to Digital Image Processing

Introduction to Digital Image Processing Fall 2005 Image Enhancement in the Spatial Domain: Histograms, Arithmetic/Logic Operators, Basics of Spatial Filtering, Smoothing Spatial Filters Tuesday, February 7 2006, Overview (1): Before We Begin

More information

ASKAP Pipeline processing and simulations. Dr Matthew Whiting ASKAP Computing, CSIRO May 5th, 2010

ASKAP Pipeline processing and simulations. Dr Matthew Whiting ASKAP Computing, CSIRO May 5th, 2010 ASKAP Pipeline processing and simulations Dr Matthew Whiting ASKAP Computing, CSIRO May 5th, 2010 ASKAP Computing Team Members Team members Marsfield: Tim Cornwell, Ben Humphreys, Juan Carlos Guzman, Malte

More information

PARALLEL PROGRAMMING MANY-CORE COMPUTING FOR THE LOFAR TELESCOPE ROB VAN NIEUWPOORT. Rob van Nieuwpoort

PARALLEL PROGRAMMING MANY-CORE COMPUTING FOR THE LOFAR TELESCOPE ROB VAN NIEUWPOORT. Rob van Nieuwpoort PARALLEL PROGRAMMING MANY-CORE COMPUTING FOR THE LOFAR TELESCOPE ROB VAN NIEUWPOORT Rob van Nieuwpoort rob@cs.vu.nl Who am I 10 years of Grid / Cloud computing 6 years of many-core computing, radio astronomy

More information

LUNAR TEMPERATURE CALCULATIONS ON A GPU

LUNAR TEMPERATURE CALCULATIONS ON A GPU LUNAR TEMPERATURE CALCULATIONS ON A GPU Kyle M. Berney Department of Information & Computer Sciences Department of Mathematics University of Hawai i at Mānoa Honolulu, HI 96822 ABSTRACT Lunar surface temperature

More information

OBCOL. (Organelle Based CO-Localisation) Users Guide

OBCOL. (Organelle Based CO-Localisation) Users Guide OBCOL (Organelle Based CO-Localisation) Users Guide INTRODUCTION OBCOL is an ImageJ plugin designed to autonomously detect objects within an image (or image stack) and analyse them separately as individual

More information

The type of all data used in a C (or C++) program must be specified

The type of all data used in a C (or C++) program must be specified The type of all data used in a C (or C++) program must be specified A data type is a description of the data being represented That is, a set of possible values and a set of operations on those values

More information

GPU Acceleration of Matrix Algebra. Dr. Ronald C. Young Multipath Corporation. fmslib.com

GPU Acceleration of Matrix Algebra. Dr. Ronald C. Young Multipath Corporation. fmslib.com GPU Acceleration of Matrix Algebra Dr. Ronald C. Young Multipath Corporation FMS Performance History Machine Year Flops DEC VAX 1978 97,000 FPS 164 1982 11,000,000 FPS 164-MAX 1985 341,000,000 DEC VAX

More information

A New Parallel Algorithm for Connected Components in Dynamic Graphs. Robert McColl Oded Green David Bader

A New Parallel Algorithm for Connected Components in Dynamic Graphs. Robert McColl Oded Green David Bader A New Parallel Algorithm for Connected Components in Dynamic Graphs Robert McColl Oded Green David Bader Overview The Problem Target Datasets Prior Work Parent-Neighbor Subgraph Results Conclusions Problem

More information

OSKAR: Simulating data from the SKA

OSKAR: Simulating data from the SKA OSKAR: Simulating data from the SKA Oxford e-research Centre, 4 June 2014 Fred Dulwich, Ben Mort, Stef Salvini 1 Overview Simulating interferometer data for SKA: Radio interferometry basics. Measurement

More information

High-Order Finite-Element Earthquake Modeling on very Large Clusters of CPUs or GPUs

High-Order Finite-Element Earthquake Modeling on very Large Clusters of CPUs or GPUs High-Order Finite-Element Earthquake Modeling on very Large Clusters of CPUs or GPUs Gordon Erlebacher Department of Scientific Computing Sept. 28, 2012 with Dimitri Komatitsch (Pau,France) David Michea

More information

Using Graphics Chips for General Purpose Computation

Using Graphics Chips for General Purpose Computation White Paper Using Graphics Chips for General Purpose Computation Document Version 0.1 May 12, 2010 442 Northlake Blvd. Altamonte Springs, FL 32701 (407) 262-7100 TABLE OF CONTENTS 1. INTRODUCTION....1

More information

What is Parallel Computing?

What is Parallel Computing? What is Parallel Computing? Parallel Computing is several processing elements working simultaneously to solve a problem faster. 1/33 What is Parallel Computing? Parallel Computing is several processing

More information

A Fast GPU-Based Approach to Branchless Distance-Driven Projection and Back-Projection in Cone Beam CT

A Fast GPU-Based Approach to Branchless Distance-Driven Projection and Back-Projection in Cone Beam CT A Fast GPU-Based Approach to Branchless Distance-Driven Projection and Back-Projection in Cone Beam CT Daniel Schlifske ab and Henry Medeiros a a Marquette University, 1250 W Wisconsin Ave, Milwaukee,

More information

GPU Computing with Fornax. Dr. Christopher Harris

GPU Computing with Fornax. Dr. Christopher Harris GPU Computing with Fornax Dr. Christopher Harris ivec@uwa CAASTRO GPU Training Workshop 8-9 October 2012 Introducing the Historical GPU Graphics Processing Unit (GPU) n : A specialised electronic circuit

More information

High performance 2D Discrete Fourier Transform on Heterogeneous Platforms. Shrenik Lad, IIIT Hyderabad Advisor : Dr. Kishore Kothapalli

High performance 2D Discrete Fourier Transform on Heterogeneous Platforms. Shrenik Lad, IIIT Hyderabad Advisor : Dr. Kishore Kothapalli High performance 2D Discrete Fourier Transform on Heterogeneous Platforms Shrenik Lad, IIIT Hyderabad Advisor : Dr. Kishore Kothapalli Motivation Fourier Transform widely used in Physics, Astronomy, Engineering

More information

Make sure that your exam is not missing any sheets, then write your full name and login ID on the front.

Make sure that your exam is not missing any sheets, then write your full name and login ID on the front. ETH login ID: (Please print in capital letters) Full name: 63-300: How to Write Fast Numerical Code ETH Computer Science, Spring 015 Midterm Exam Wednesday, April 15, 015 Instructions Make sure that your

More information

Wallace Hall Academy

Wallace Hall Academy Wallace Hall Academy CfE Higher Physics Unit 2 - Waves Notes Name 1 Waves Revision You will remember the following equations related to Waves from National 5. d = vt f = n/t v = f T=1/f They form an integral

More information

GPU acceleration of 3D forward and backward projection using separable footprints for X-ray CT image reconstruction

GPU acceleration of 3D forward and backward projection using separable footprints for X-ray CT image reconstruction GPU acceleration of 3D forward and backward projection using separable footprints for X-ray CT image reconstruction Meng Wu and Jeffrey A. Fessler EECS Department University of Michigan Fully 3D Image

More information

Chapter 38. Diffraction Patterns and Polarization

Chapter 38. Diffraction Patterns and Polarization Chapter 38 Diffraction Patterns and Polarization Diffraction Light of wavelength comparable to or larger than the width of a slit spreads out in all forward directions upon passing through the slit This

More information

The determination of the correct

The determination of the correct SPECIAL High-performance SECTION: H i gh-performance computing computing MARK NOBLE, Mines ParisTech PHILIPPE THIERRY, Intel CEDRIC TAILLANDIER, CGGVeritas (formerly Mines ParisTech) HENRI CALANDRA, Total

More information

Physics I : Oscillations and Waves Prof. S Bharadwaj Department of Physics & Meteorology Indian Institute of Technology, Kharagpur

Physics I : Oscillations and Waves Prof. S Bharadwaj Department of Physics & Meteorology Indian Institute of Technology, Kharagpur Physics I : Oscillations and Waves Prof. S Bharadwaj Department of Physics & Meteorology Indian Institute of Technology, Kharagpur Lecture - 20 Diffraction - I We have been discussing interference, the

More information

OSKAR-2: Simulating data from the SKA

OSKAR-2: Simulating data from the SKA OSKAR-2: Simulating data from the SKA AACal 2012, Amsterdam, 13 th July 2012 Fred Dulwich, Ben Mort, Stef Salvini 1 Overview OSKAR-2: Interferometer and beamforming simulator package. Intended for simulations

More information

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems.

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. Cluster Networks Introduction Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. As usual, the driver is performance

More information

Understanding Fraunhofer Diffraction

Understanding Fraunhofer Diffraction [ Assignment View ] [ Eðlisfræði 2, vor 2007 36. Diffraction Assignment is due at 2:00am on Wednesday, January 17, 2007 Credit for problems submitted late will decrease to 0% after the deadline has passed.

More information

TRANSFORMATIONAL TECHNOLOGIES

TRANSFORMATIONAL TECHNOLOGIES TRANSFORMATIONAL TECHNOLOGIES FOR THREE-DIMENSIONAL VISUALISATION (AND ANALYSIS) Christopher Fluke ESO 3D2014 CRICOS provider 00111D Thank you Key Collaborators: David Barnes Monash University e-research

More information

Chapter 15. Light Waves

Chapter 15. Light Waves Chapter 15 Light Waves Chapter 15 is finished, but is not in camera-ready format. All diagrams are missing, but here are some excerpts from the text with omissions indicated by... After 15.1, read 15.2

More information

Large Scale Data Visualization. CSC 7443: Scientific Information Visualization

Large Scale Data Visualization. CSC 7443: Scientific Information Visualization Large Scale Data Visualization Large Datasets Large datasets: D >> 10 M D D: Hundreds of gigabytes to terabytes and even petabytes M D : 1 to 4 GB of RAM Examples: Single large data set Time-varying data

More information

Introduction. Part I: Measuring the Wavelength of Light. Experiment 8: Wave Optics. Physics 11B

Introduction. Part I: Measuring the Wavelength of Light. Experiment 8: Wave Optics. Physics 11B Physics 11B Experiment 8: Wave Optics Introduction Equipment: In Part I you use a machinist rule, a laser, and a lab clamp on a stand to hold the laser at a grazing angle to the bench top. In Part II you

More information

Module 2: Computer Arithmetic

Module 2: Computer Arithmetic Module 2: Computer Arithmetic 1 B O O K : C O M P U T E R O R G A N I Z A T I O N A N D D E S I G N, 3 E D, D A V I D L. P A T T E R S O N A N D J O H N L. H A N N E S S Y, M O R G A N K A U F M A N N

More information

CUDA Experiences: Over-Optimization and Future HPC

CUDA Experiences: Over-Optimization and Future HPC CUDA Experiences: Over-Optimization and Future HPC Carl Pearson 1, Simon Garcia De Gonzalo 2 Ph.D. candidates, Electrical and Computer Engineering 1 / Computer Science 2, University of Illinois Urbana-Champaign

More information

Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in class hard-copy please)

Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in class hard-copy please) Virginia Tech. Computer Science CS 5614 (Big) Data Management Systems Fall 2014, Prakash Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in

More information

Hyperspectral Unmixing on GPUs and Multi-Core Processors: A Comparison

Hyperspectral Unmixing on GPUs and Multi-Core Processors: A Comparison Hyperspectral Unmixing on GPUs and Multi-Core Processors: A Comparison Dept. of Mechanical and Environmental Informatics Kimura-Nakao lab. Uehara Daiki Today s outline 1. Self-introduction 2. Basics of

More information

CS 111: Digital Image Processing Fall 2016 Midterm Exam: Nov 23, Pledge: I neither received nor gave any help from or to anyone in this exam.

CS 111: Digital Image Processing Fall 2016 Midterm Exam: Nov 23, Pledge: I neither received nor gave any help from or to anyone in this exam. CS 111: Digital Image Processing Fall 2016 Midterm Exam: Nov 23, 2016 Time: 3:30pm-4:50pm Total Points: 80 points Name: Number: Pledge: I neither received nor gave any help from or to anyone in this exam.

More information

Tera-scale astronomical data analysis and visualization

Tera-scale astronomical data analysis and visualization MNRAS 429, 2442 2455 (2013) doi:10.1093/mnras/sts513 Tera-scale astronomical data analysis and visualization A. H. Hassan, 1 C. J. Fluke, 1 D. G. Barnes 2 andv.a.kilborn 1 1 Centre for Astrophysics and

More information

[1] IEEE , Standard for Floating-Point Arithmetic [and Floating-Point formats]

[1] IEEE , Standard for Floating-Point Arithmetic [and Floating-Point formats] MISB RP 1201 Recommended Practice Floating Point to Integer Mapping February 15 th 2012 1 Scope This recommended practice describes the method for mapping floating point values to integer values and the

More information

Assignment 6: Ray Tracing

Assignment 6: Ray Tracing Assignment 6: Ray Tracing Programming Lab Due: Monday, April 20 (midnight) 1 Introduction Throughout this semester you have written code that manipulated shapes and cameras to prepare a scene for rendering.

More information

TABLE OF CONTENTS PRODUCT DESCRIPTION VISUALIZATION OPTIONS MEASUREMENT OPTIONS SINGLE MEASUREMENT / TIME SERIES BEAM STABILITY POINTING STABILITY

TABLE OF CONTENTS PRODUCT DESCRIPTION VISUALIZATION OPTIONS MEASUREMENT OPTIONS SINGLE MEASUREMENT / TIME SERIES BEAM STABILITY POINTING STABILITY TABLE OF CONTENTS PRODUCT DESCRIPTION VISUALIZATION OPTIONS MEASUREMENT OPTIONS SINGLE MEASUREMENT / TIME SERIES BEAM STABILITY POINTING STABILITY BEAM QUALITY M 2 BEAM WIDTH METHODS SHORT VERSION OVERVIEW

More information

Fast Holographic Deconvolution

Fast Holographic Deconvolution Precision image-domain deconvolution for radio astronomy Ian Sullivan University of Washington 4/19/2013 Precision imaging Modern imaging algorithms grid visibility data using sophisticated beam models

More information

9/3/2015. Data Representation II. 2.4 Signed Integer Representation. 2.4 Signed Integer Representation

9/3/2015. Data Representation II. 2.4 Signed Integer Representation. 2.4 Signed Integer Representation Data Representation II CMSC 313 Sections 01, 02 The conversions we have so far presented have involved only unsigned numbers. To represent signed integers, computer systems allocate the high-order bit

More information

Universiteit Leiden Computer Science

Universiteit Leiden Computer Science Universiteit Leiden Computer Science Optimizing octree updates for visibility determination on dynamic scenes Name: Hans Wortel Student-no: 0607940 Date: 28/07/2011 1st supervisor: Dr. Michael Lew 2nd

More information

FFT-Based Astronomical Image Registration and Stacking using GPU

FFT-Based Astronomical Image Registration and Stacking using GPU M. Aurand 4.21.2010 EE552 FFT-Based Astronomical Image Registration and Stacking using GPU The productive imaging of faint astronomical targets mandates vanishingly low noise due to the small amount of

More information

Automated Control for Elastic Storage

Automated Control for Elastic Storage Automated Control for Elastic Storage Summarized by Matthew Jablonski George Mason University mjablons@gmu.edu October 26, 2015 Lim, H. C. and Babu, S. and Chase, J. S. (2010) Automated Control for Elastic

More information

Image Processing. Application area chosen because it has very good parallelism and interesting output.

Image Processing. Application area chosen because it has very good parallelism and interesting output. Chapter 11 Slide 517 Image Processing Application area chosen because it has very good parallelism and interesting output. Low-level Image Processing Operates directly on stored image to improve/enhance

More information

Smarter Balanced Vocabulary (from the SBAC test/item specifications)

Smarter Balanced Vocabulary (from the SBAC test/item specifications) Example: Smarter Balanced Vocabulary (from the SBAC test/item specifications) Notes: Most terms area used in multiple grade levels. You should look at your grade level and all of the previous grade levels.

More information

2D rendering takes a photo of the 2D scene with a virtual camera that selects an axis aligned rectangle from the scene. The photograph is placed into

2D rendering takes a photo of the 2D scene with a virtual camera that selects an axis aligned rectangle from the scene. The photograph is placed into 2D rendering takes a photo of the 2D scene with a virtual camera that selects an axis aligned rectangle from the scene. The photograph is placed into the viewport of the current application window. A pixel

More information

Biometrics Technology: Image Processing & Pattern Recognition (by Dr. Dickson Tong)

Biometrics Technology: Image Processing & Pattern Recognition (by Dr. Dickson Tong) Biometrics Technology: Image Processing & Pattern Recognition (by Dr. Dickson Tong) References: [1] http://homepages.inf.ed.ac.uk/rbf/hipr2/index.htm [2] http://www.cs.wisc.edu/~dyer/cs540/notes/vision.html

More information

Electromagnetic migration of marine CSEM data in areas with rough bathymetry Michael S. Zhdanov and Martin Čuma*, University of Utah

Electromagnetic migration of marine CSEM data in areas with rough bathymetry Michael S. Zhdanov and Martin Čuma*, University of Utah Electromagnetic migration of marine CSEM data in areas with rough bathymetry Michael S. Zhdanov and Martin Čuma*, University of Utah Summary In this paper we present a new approach to the interpretation

More information

GPGPUs in HPC. VILLE TIMONEN Åbo Akademi University CSC

GPGPUs in HPC. VILLE TIMONEN Åbo Akademi University CSC GPGPUs in HPC VILLE TIMONEN Åbo Akademi University 2.11.2010 @ CSC Content Background How do GPUs pull off higher throughput Typical architecture Current situation & the future GPGPU languages A tale of

More information

Network Design Considerations for Grid Computing

Network Design Considerations for Grid Computing Network Design Considerations for Grid Computing Engineering Systems How Bandwidth, Latency, and Packet Size Impact Grid Job Performance by Erik Burrows, Engineering Systems Analyst, Principal, Broadcom

More information

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI. CSCI 402: Computer Architectures Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI 6.6 - End Today s Contents GPU Cluster and its network topology The Roofline performance

More information

Chapter 4. Clustering Core Atoms by Location

Chapter 4. Clustering Core Atoms by Location Chapter 4. Clustering Core Atoms by Location In this chapter, a process for sampling core atoms in space is developed, so that the analytic techniques in section 3C can be applied to local collections

More information

A New Online Clustering Approach for Data in Arbitrary Shaped Clusters

A New Online Clustering Approach for Data in Arbitrary Shaped Clusters A New Online Clustering Approach for Data in Arbitrary Shaped Clusters Richard Hyde, Plamen Angelov Data Science Group, School of Computing and Communications Lancaster University Lancaster, LA1 4WA, UK

More information

7 th GRADE PLANNER Mathematics. Lesson Plan # QTR. 3 QTR. 1 QTR. 2 QTR 4. Objective

7 th GRADE PLANNER Mathematics. Lesson Plan # QTR. 3 QTR. 1 QTR. 2 QTR 4. Objective Standard : Number and Computation Benchmark : Number Sense M7-..K The student knows, explains, and uses equivalent representations for rational numbers and simple algebraic expressions including integers,

More information

MAT 003 Brian Killough s Instructor Notes Saint Leo University

MAT 003 Brian Killough s Instructor Notes Saint Leo University MAT 003 Brian Killough s Instructor Notes Saint Leo University Success in online courses requires self-motivation and discipline. It is anticipated that students will read the textbook and complete sample

More information

NVIDIA GTX200: TeraFLOPS Visual Computing. August 26, 2008 John Tynefield

NVIDIA GTX200: TeraFLOPS Visual Computing. August 26, 2008 John Tynefield NVIDIA GTX200: TeraFLOPS Visual Computing August 26, 2008 John Tynefield 2 Outline Execution Model Architecture Demo 3 Execution Model 4 Software Architecture Applications DX10 OpenGL OpenCL CUDA C Host

More information

Theme 7 Group 2 Data mining technologies Catalogues crossmatching on distributed database and application on MWA absorption source finding

Theme 7 Group 2 Data mining technologies Catalogues crossmatching on distributed database and application on MWA absorption source finding Theme 7 Group 2 Data mining technologies Catalogues crossmatching on distributed database and application on MWA absorption source finding Crossmatching is a method to find corresponding objects in different

More information

Detecting Geometric Faults from Measured Data

Detecting Geometric Faults from Measured Data Detecting Geometric s from Measured Data A.L. Gower 1 1 School of Mathematics, Statistics and Applied Mathematics, National University of Ireland Galway, Ireland. May 4, 214 Abstract Manufactured artefacts

More information

The type of all data used in a C++ program must be specified

The type of all data used in a C++ program must be specified The type of all data used in a C++ program must be specified A data type is a description of the data being represented That is, a set of possible values and a set of operations on those values There are

More information

Integers & Absolute Value Properties of Addition Add Integers Subtract Integers. Add & Subtract Like Fractions Add & Subtract Unlike Fractions

Integers & Absolute Value Properties of Addition Add Integers Subtract Integers. Add & Subtract Like Fractions Add & Subtract Unlike Fractions Unit 1: Rational Numbers & Exponents M07.A-N & M08.A-N, M08.B-E Essential Questions Standards Content Skills Vocabulary What happens when you add, subtract, multiply and divide integers? What happens when

More information

Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation

Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation Obviously, this is a very slow process and not suitable for dynamic scenes. To speed things up, we can use a laser that projects a vertical line of light onto the scene. This laser rotates around its vertical

More information

MET71 COMPUTER AIDED DESIGN

MET71 COMPUTER AIDED DESIGN UNIT - II BRESENHAM S ALGORITHM BRESENHAM S LINE ALGORITHM Bresenham s algorithm enables the selection of optimum raster locations to represent a straight line. In this algorithm either pixels along X

More information

Chapter 13 Strong Scaling

Chapter 13 Strong Scaling Chapter 13 Strong Scaling Part I. Preliminaries Part II. Tightly Coupled Multicore Chapter 6. Parallel Loops Chapter 7. Parallel Loop Schedules Chapter 8. Parallel Reduction Chapter 9. Reduction Variables

More information

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments Torben Kling-Petersen, PhD Presenter s Name Principle Field Title andengineer Division HPC &Cloud LoB SunComputing Microsystems

More information

specular diffuse reflection.

specular diffuse reflection. Lesson 8 Light and Optics The Nature of Light Properties of Light: Reflection Refraction Interference Diffraction Polarization Dispersion and Prisms Total Internal Reflection Huygens s Principle The Nature

More information

Biomedical Image Analysis. Point, Edge and Line Detection

Biomedical Image Analysis. Point, Edge and Line Detection Biomedical Image Analysis Point, Edge and Line Detection Contents: Point and line detection Advanced edge detection: Canny Local/regional edge processing Global processing: Hough transform BMIA 15 V. Roth

More information

Single slit diffraction

Single slit diffraction Single slit diffraction Book page 364-367 Review double slit Core Assume paths of the two rays are parallel This is a good assumption if D >>> d PD = R 2 R 1 = dsin θ since sin θ = PD d Constructive interference

More information

(Refer Slide Time: 00:03:51)

(Refer Slide Time: 00:03:51) Computer Graphics Prof. Sukhendu Das Dept. of Computer Science and Engineering Indian Institute of Technology, Madras Lecture 17 Scan Converting Lines, Circles and Ellipses Hello and welcome everybody

More information

School of Computer and Information Science

School of Computer and Information Science School of Computer and Information Science CIS Research Placement Report Multiple threads in floating-point sort operations Name: Quang Do Date: 8/6/2012 Supervisor: Grant Wigley Abstract Despite the vast

More information

BlueGene/L. Computer Science, University of Warwick. Source: IBM

BlueGene/L. Computer Science, University of Warwick. Source: IBM BlueGene/L Source: IBM 1 BlueGene/L networking BlueGene system employs various network types. Central is the torus interconnection network: 3D torus with wrap-around. Each node connects to six neighbours

More information

WHOLE NUMBER AND DECIMAL OPERATIONS

WHOLE NUMBER AND DECIMAL OPERATIONS WHOLE NUMBER AND DECIMAL OPERATIONS Whole Number Place Value : 5,854,902 = Ten thousands thousands millions Hundred thousands Ten thousands Adding & Subtracting Decimals : Line up the decimals vertically.

More information

EE368 Project: Visual Code Marker Detection

EE368 Project: Visual Code Marker Detection EE368 Project: Visual Code Marker Detection Kahye Song Group Number: 42 Email: kahye@stanford.edu Abstract A visual marker detection algorithm has been implemented and tested with twelve training images.

More information

Creating an Automated Blood Vessel. Diameter Tracking Tool

Creating an Automated Blood Vessel. Diameter Tracking Tool Medical Biophysics 3970Z 6 Week Project: Creating an Automated Blood Vessel Diameter Tracking Tool Peter McLachlan - 250068036 April 2, 2013 Introduction In order to meet the demands of tissues the body

More information

Physics 202 Homework 9

Physics 202 Homework 9 Physics 202 Homework 9 May 29, 2013 1. A sheet that is made of plastic (n = 1.60) covers one slit of a double slit 488 nm (see Figure 1). When the double slit is illuminated by monochromatic light (wavelength

More information

Correlator Field-of-View Shaping

Correlator Field-of-View Shaping Correlator Field-of-View Shaping Colin Lonsdale Shep Doeleman Vincent Fish Divya Oberoi Lynn Matthews Roger Cappallo Dillon Foight MIT Haystack Observatory Context SKA specifications extremely challenging

More information

GG450 4/5/2010. Today s material comes from p and in the text book. Please read and understand all of this material!

GG450 4/5/2010. Today s material comes from p and in the text book. Please read and understand all of this material! GG450 April 6, 2010 Seismic Reflection I Today s material comes from p. 32-33 and 81-116 in the text book. Please read and understand all of this material! Back to seismic waves Last week we talked about

More information

Manycore and GPU Channelisers. Seth Hall High Performance Computing Lab, AUT

Manycore and GPU Channelisers. Seth Hall High Performance Computing Lab, AUT Manycore and GPU Channelisers Seth Hall High Performance Computing Lab, AUT GPU Accelerated Computing GPU-accelerated computing is the use of a graphics processing unit (GPU) together with a CPU to accelerate

More information

Block Lanczos-Montgomery method over large prime fields with GPU accelerated dense operations

Block Lanczos-Montgomery method over large prime fields with GPU accelerated dense operations Block Lanczos-Montgomery method over large prime fields with GPU accelerated dense operations Nikolai Zamarashkin and Dmitry Zheltkov INM RAS, Gubkina 8, Moscow, Russia {nikolai.zamarashkin,dmitry.zheltkov}@gmail.com

More information

Math 340 Fall 2014, Victor Matveev. Binary system, round-off errors, loss of significance, and double precision accuracy.

Math 340 Fall 2014, Victor Matveev. Binary system, round-off errors, loss of significance, and double precision accuracy. Math 340 Fall 2014, Victor Matveev Binary system, round-off errors, loss of significance, and double precision accuracy. 1. Bits and the binary number system A bit is one digit in a binary representation

More information

Powering Real-time Radio Astronomy Signal Processing with latest GPU architectures

Powering Real-time Radio Astronomy Signal Processing with latest GPU architectures Powering Real-time Radio Astronomy Signal Processing with latest GPU architectures Harshavardhan Reddy Suda NCRA, India Vinay Deshpande NVIDIA, India Bharat Kumar NVIDIA, India What signals we are processing?

More information

Matthew Schwartz Lecture 19: Diffraction and resolution

Matthew Schwartz Lecture 19: Diffraction and resolution Matthew Schwartz Lecture 19: Diffraction and resolution 1 Huygens principle Diffraction refers to what happens to a wave when it hits an obstacle. The key to understanding diffraction is a very simple

More information

PowerVault MD3 SSD Cache Overview

PowerVault MD3 SSD Cache Overview PowerVault MD3 SSD Cache Overview A Dell Technical White Paper Dell Storage Engineering October 2015 A Dell Technical White Paper TECHNICAL INACCURACIES. THE CONTENT IS PROVIDED AS IS, WITHOUT EXPRESS

More information

3.3 Optimizing Functions of Several Variables 3.4 Lagrange Multipliers

3.3 Optimizing Functions of Several Variables 3.4 Lagrange Multipliers 3.3 Optimizing Functions of Several Variables 3.4 Lagrange Multipliers Prof. Tesler Math 20C Fall 2018 Prof. Tesler 3.3 3.4 Optimization Math 20C / Fall 2018 1 / 56 Optimizing y = f (x) In Math 20A, we

More information

Computer Graphics Prof. Sukhendu Das Dept. of Computer Science and Engineering Indian Institute of Technology, Madras Lecture - 14

Computer Graphics Prof. Sukhendu Das Dept. of Computer Science and Engineering Indian Institute of Technology, Madras Lecture - 14 Computer Graphics Prof. Sukhendu Das Dept. of Computer Science and Engineering Indian Institute of Technology, Madras Lecture - 14 Scan Converting Lines, Circles and Ellipses Hello everybody, welcome again

More information