Parallelization and Performance of 3D Ultrasound Imaging Beamforming Algorithms on Modern Clusters

Size: px
Start display at page:

Download "Parallelization and Performance of 3D Ultrasound Imaging Beamforming Algorithms on Modern Clusters"

Transcription

1 Parallelization and Performane of 3D Ultrasound Imaging Beamforming Algorithms on Modern Clusters F. Zhang, A. Bilas, A. Dhanantwari, K.N. Plataniotis, R. Abiprojo, and S. Stergiopoulos Dept. of Eletrial and Computer Engineering, 1 King s College Road, University of Toronto, Toronto, Ontario, M5S3G4, Canada Defense R&D Canada,Toronto 1133 Sheppard Ave. West, North York, Ontario, M3M3B9, Canada {fanzhang, bilas@eeg.toronto.edu, kostas@dsp.toronto.edu, {amar.adhanant, robert.abiprojo, stergios.stergiopoulos@drd-rdd.g.a ABSTRACT Reently there has been a lot of interest in improving the infrastruture used in medial appliations. In partiular, there is renewed interest on non-invasive, high-resolution diagnosti methods. One suh method is digital, 3D ultrasound medial imaging. Current state-of-the-art ultrasound systems use speialized hardware for performing advaned proessing of input data to improve the quality of the generated images. Suh systems are limited in their apabilities by the underlying omputing arhiteture and they tend to be expensive due to the speialized nature of the solutions they employ. Our goal in this work is twofold: (i) To understand the behavior of this lass of emerging medial appliations in order to provide an effiient parallel implementation and (ii) to introdue a new benhmark for parallel omputer arhitetures from a novel and important lass of appliations. We address the limitations faed by modern ultrasound systems by investigating how all proessing required by advaned beamforming algorithms an be performed on modern lusters of high-end PCs onneted with low-lateny, high-bandwidth system area networks. We investigate the omputational harateristis of a state-of-the-art algorithm and demonstrate that today s ommodity arhitetures are apable of providing almost-real-time performane without ompromising image quality signifiantly. Keywords Parallel proessing, Medial appliations Categories and Subjet Desriptors C.3 [Computer Systems Organization]: Speial-Purpose and Appliation-Based Systems Permission to make digital or hard opies of all or part of this work for personal or lassroom use is granted without fee provided that opies are not made or distributed for profit or ommerial advantage and that opies bear this notie and the full itation on the first page. To opy otherwise, to republish, to post on servers or to redistribute to lists, requires prior speifi permission and/or a fee. ICS 2, June 22-26, 22, New York, New York, USA. Copyright 22 ACM /2/6...$5.. General Terms Algorithms, Performane 1. INTRODUCTION Major efforts have been devoted reently in improving non-invasive, high-preision diagnosti methods, a ritial omponent in the renewed effort to enhane health servies. One of the researh diretions taken to address suh requirements is the development of high-resolution, digital, three dimensional (3D) ultrasound medial imaging systems. With the advent of high performane omputing failities and the availability of transduer rystal tehnology, ultrasound imaging systems have emerged as effiient methods to extrat and reprodue relevant medial diagnosti information. Sanned 3D Volume Ultrasound Probing Input Time Series Beamformer Reonstruted 3D Volume Figure 1: The omponents of a beamforming-based ultrasound system. Generally, a digital ultrasound system onsists of a set of sensors [25, 11, 2] that perform data aquisition and a bakend omputing arhiteture, responsible for proessing the raw data and reonstruting the ultrasoni images [24]. Figure 1 depits a typial ultrasound system. The ultrasound probing apparatus onsists of a set of sensors and a data aquisition unit that probe the objet under onsideration and gather the sampled data. The beamformer is based on some omputing struture and performs signal proessing of the samples in order to reonstrut the image of the sanned objet. In this framework, both the probing unit as well as the omputing arhiteture omponents are important in delivering good quality diagnosti results. Current limitations in sensor tehnologies neessitate the usage of omplex signal proessing engines to improve image quality. Most ultrasound medial systems suffer from poor image resolution. Some of these limitations an be attributed to fundamental physial aspets of the ultrasound transduer and the interation with the tissue. Advaned signal proessing algorithms an enhane image resolution and detetion quality as well as minimize the relevant prob-

2 ing hardware requirements leading to ost effetive ultrasound system tehnologies. However, the urrent state-ofthe-art in high-resolution, digital, 3D ultrasound medial imaging faes two main hallenges. First, the ultrasound signal proessing strutures used are omputationally demanding. Traditionally, speialized omputing arhitetures and hardware have been used to provide the levels of performane and I/O throughput required, resulting in high system design and ownership osts. With the emergene of high-end workstations and low-lateny, highbandwidth interonnets [1, 3, 6], it now beomes interesting and timely to investigate if suh tehnologies an be used in building low-ost, high-resolution, 3D ultrasound medial imaging systems. Seond, although beamforming algorithms have been studied in the ontext of other appliations [4], little is known about their omputational harateristis with respet to ultrasound-related proessing, and medial appliations in general. It is not lear whih parts of these algorithms are the most demanding in terms of proessing or ommuniation and how exatly they an be mapped on modern parallel arhitetures. In partiular, although the algorithmi omplexity of different setions an be alulated, little has been done in terms of atual performane analysis on real systems. The lak of suh knowledge inhibits further progress in this area, sine it is not lear how these algorithms should evolve to lead to appliable solutions in the area of ultrasound medial imaging. In this work we address both of these issues by designing an effiient parallel beamforming algorithm and studying its behavior and requirements on a generi omputing arhiteture that onsists of ommodity omponents. First, we review the signal proessing algorithm used in the implementation of a 3D ultrasound medial imaging system we are urrently building. We provide an effiient, all-software, sequential implementation that shows onsiderable advantages over hardware-based implementation of the past. We then provide an effiient parallel implementation of the algorithm for a luster of high-end PCs onneted with a lowlateny, high-bandwidth interonnetion network and study its behavior. The emphasis is on the omputational harateristis of the algorithm and the identifiation of parameters that ritially affet both the performane and ost of our system. We study the behavior and performane of the algorithm for a wide set of parameters and we reveal a number of interesting harateristis leading to onlusions about the prospet of using ommodity arhitetures for performing all related proessing in this family of medial appliations. Our high level onlusions and ontributions are: (i) A 16- proessor system today an ahieve lose-to-real-time performane for high image quality and is ertainly expeted to do so in the near future. (ii) Only small parts of this family of signal proessing algorithms are very omputationally intensive. In partiular, 85-98% of the time is spent in FFT and beam steering funtions for all our runs and most of the runs spent between 92-95% in these funtions. (iii) The ommuniation requirements in the partiular implementation are fairly small, loalized, and ertainly within the apabilities of modern low-lateny, high-bandwidth interonnets. (iv) Our results provide an indiation of the amount of proessing required for a given level of image quality and an be used as a referene in designing omputing arhitetures for ultrasound systems. The rest of the paper is organized as follows: Setion 2 provides a bakground for ultrasound systems. Setion 3 presents a omprehensive summary of the family of onventional beamforming algorithms we use. Setion 4 desribes the platform we use for our experiments. Setion 5 desribes our sequential and parallel implementations of the algorithm. Setion 6 desribes our methodology and Setion 7 presents our experimental results. Finally we present related work in Setion 8 and draw onlusions in Setion ULTRASOUND SYSTEMS Ultrasound medial imaging is one of the most widely used imaging modalities in the area of health servies. Ultrasound systems an be used for early diagnosis, sreening, monitoring, and minimally-invasive follow up proedures. The ultrasound image quality has dramatially improved the last few years mostly due to the omplete elimination of the analogue eletronis and the introdution of digital beamforming tehniques [24]. Although there is a large base of installed systems and numerous hardware platforms already in use, the majority of these systems share ommon harateristis. In the near future, pratially all ultrasound systems will utilize signal proessing tehniques to proess signals reeived as a result of the stimulation of the tissue. Suh ultrasound systems follow the general struture shown in Figure 1. The system s quality is determined by both the physial harateristis of the system as well as by the signal proessing algorithm used to proess the signals. The transduers used in suh ultrasound systems are based on phased array transeiver tehnology. They onsist of an array of transeivers whih an be aligned in a speial geometri onfiguration suh as linear, irular, planar, ylindrial, or spherial. The purpose of the phased array transduer is to exploit the superposition of waves radiated by the individual transeiver of the array s transduer. The ability to ontrol the phase and the amplitude of the ultrasound waves emitted by eah individual transeiver allows the angular steering of the radiated beams that are used to illuminate a volume of interest. After transmitting the sound waves, the ultrasound system omes to the reeiving mode. As the sound waves penetrate the volume and enounter objets, refletion ours. The refleted waves as well as their multi-path versions are reeived and digitized by the ultrasound mahine. A ertain segment of the digitized signals is proessed by the beamformer, resulting in disrete time series of a ertain length. A digital beamformer is a spatial filter that proesses data from the array of sensors in order to enhane the signal reeived from a ertain diretion reduing the interferene of the bakground noise. Beams are then ombined together to form the spatial images of 2D or 3D volumes. Next, we desribe in more detail the speifi beamforming algorithm we use in our work. 3. BEAMFORMING ALGORITHM The beamforming algorithm we use in our work is based on the onventional beamforming algorithm [24]. 3D planar-phased-array beamformers use multiple beams to san a 3D volume. The volume is reonstruted by using the transeivers outputs of the planar array transduer. Eah beam is haraterized by its angular diretion in the

3 sanned volume. Figure 2 shows how eah beam is speified. The thik arrow depits a beam in the diretion (θ, φ) in the spherial oordinate system. The enter of the oordinate system is the enter of the (N M) planar array that lies symmetrially on the (, Y ) plane, where N and M denote the number of sensors in eah row and olumn of the array, respetively. The rows and olumns of the array are aligned in parallel with the, Y axis. With eah pair of angles (θ, φ) we assoiate another pair of angles (A, B) that are used in the algorithm to haraterize a beam. A is the angle between the beam and the axis and B is the angle between the beam and the Y axis. The boundaries of the volume reonstruted by eah beam are speified in terms of these angles A and B (Figure 2). For example, a reonstruted volume is speified to be within 7 A, B 11. The number of beams is speified as a b, wherea and b are the numbers of beams in A and B angular diretions. The beam width is defined to be the angular width that is overed by a single beam and haraterizes the image resolution apabilities of the image reonstrution proess. The more beams and narrow beam width the beamformer uses the better quality images it an generate, however, at higher omputational osts. Z O φ θ A B R Figure 2: The representation of a beam in the spherial oordinate system. To produe sharp images of the input objet, the sanned volume is divided in multiple foal zones. Ultrasound systems use various foal depths for the reonstruted volume. Eah foal zone Z R is entered around a foal depth R. For example, the zone of depth between 1 m and 2 m is reonstruted using a foal depth R =1.5 m. To produe an image that is foused over the whole reonstruted region, the beamforming proess is repeated for eah foal zone (haraterized by a different value of R). Narrower foal zones produe sharper images but result in more proessing as well. In most pratial appliations, the foal zone size should be in the range between.5 and 1. m. In our algorithm we use uniform spaing for R. The beam time series at the output of the beamformer for a speifi R is trunated and only the segment that overs the foal zone Z R under onsideration is proessed. The images of the different foal zones Z R are then onatenated to form the whole volume. The volume is reonstruted from the time samples of eah beam and foal zone as follows. Sine the reeived aousti wave is oming from a point soure loated at a finite distanefromthearray,thewavefront is radial. Therefore, the arriving waves are not simple plane waves, but rather spherial waves as being refleted by an objet and due to the separation of the transeivers in the array, they arrive at different transeivers with slight time delays. If we assume that the distane of the m th transeiver on the axis from Y origin point O is x m and the distane of the n th transeiver on the Y axis from the same origin point is y n,thenwean ompute the time delay between these two transeivers as: R2 + x t x = 2 m 2Rx m os A R (1) and p R2 + yn t y = 2 2Ry n os B R (2) where is the speed of sound in human tissue. Thus, for a beam with foal depth of R, the 3D angular response of a N by M planar array to a steered diretion (A x,b y)anbe expressed as: B(f i,a x,b y,r)= M 1 N 1 m= n= I m,n(f i)s m,n(f i,a x,b y,r) (3) where I m,n(f i) is the Fourier transform of the input time series from the (m,n) transeiver: I m,n(f i)=fft(i m,n(t i)) (4) and S m,n(f i,a x,b y,r) is the steering vetor applied to ompensate for the time delay of the (m,n) transeiverwith respet to the referene point (loated at the enter of the planar array): S m,n(f i,a x,b y,r)= e j2πf i R 2 +x 2 m 2Rxm os Ax R + R 2 +yn 2 2Ryn os By R The equation for the angular response (3) an be simplified by separating the term in the steering vetor S m,n as follows B(f i,a x,b y,r)= where N 1 n= S n(f i,b y,r) S m(f i,a x,r)=e j2πf i S n(f i,b y,r)=e j2πf i h M 1 m= R 2 +x 2 m R 2 +y 2 n i I m,n(f i)s m(f i,a x,r) 2Rxm os Ax R 2Ryn os By R The summation term in square brakets in equation (6) is equivalent to the response of a line array beamformer along the axis. If we let all the steered beams from this summation term form a vetor denoted by B n(f i,a x), then equation (6) an be rewritten as: B(f i,a x,b y,r)= N 1 n= (5) (6) (7) (8) B n(f i,a x)s n(f i,b y,r), (9) whih expresses a linear beamforming along the Y axis with B n(f i,a x) as input.

4 Equation (9) suggests that the 2D planar array beamformer an be deomposed into two linear array beamforming steps. The first step inludes a line array beamforming along the axis and will be repeated N time to get the vetor B n(f i,a x). The seond step onsists of line array beamforming along the Y axis and will be done only one by treating the vetor B n(f i,a x) as the input signal for the line array beamformer to get the output B(f i,a x,b y,r). The deomposition of the planar array beamformer into these two line array beamforming steps leads to an effiient implementation based on the following two fators [24]: First, the number of the involved transeivers for eah of these line array beamformers is muh smaller than the total number of transeivers, M N, in the planar array. This kind of deomposition proess for the 3D beamformer redues both memory and CPU requirements. Seond, all line array beamformers an be exeuted in parallel resulting in high degree of oarse-grain parallelism. Finally, we should note that the number of sensors used in a transduer array is an important parameter for an ultrasound system. Detetion of an aousti signal in a noise field is haraterized by the array gain (AG) parameter that is usually defined as: AG =1log(M N BIN) 2 where M N is the number of sensors and BIN is the number of frequeny bins used in the beamforming (or the FFT size as explained later). The more sensors used in a sensor array, the higher is the array gain. The array gain indiates the strength of a beamformer in deteting refleted ultrasound signals. When an objet is viewed as a olletion of refletive point soures, a beamformer with higher array gain an produe sharper images for the individual point soures. 4. EPERIMENTAL PLATFORM Our final ultrasound system follows the overall struture shown in Figure 1. The omputing arhiteture we will use is a modern luster of high-end PCs. Eah node will be equipped with a PCI data aquisition ard that will onnet thenodetoasubsetofthesensorarray. Thedataaquisition ards will deliver the probing data from the sensor array to the orresponding node s memory. The beamforming algorithm will then reonstrut the image of the sanned objet, redistributing data as appropriate. The purpose of this work is to examine the proessing omponent of the system, after the sampled data have been plaed in the main memory of eah node. The experimental system we use for evaluation is a luster of 16 2-way Pentium III nodes interonneted with a Myrinet network [3]. The exat system onfiguration is summarized in Table 1. Myrinet is a low-lateny, high-bandwidth, point-to-point system area network (SAN), used widely for lusters of workstations and PCs. By allowing users to diretly aess the network, without operating system intervention, Myrinet and other SANs dramatially redue latenies ompared to traditional TCP/IP based loal area networks. Moreover, to further redue latenies in SANs, diret memory operations are usually supported; reads and writes to remote memory are performed without remote proessor intervention. Eah network interfae in our system has a 133 MHz programmable proessor (LANai9) and onnets the node to the network with two unidiretional links of 16 MByte/s peak bandwidth eah. Atual node-to-network bandwidth is usually onstrained by the 133 MBytes/s I/O bus on whih the NIC sits. All system nodes are onneted with a 16-port full rossbar Myrinet swith. Proessors 2 x Intel Pentium III, 8 MHz Cahe 32K (L1), 512K (L2) Memory 512MB SDRAM OS RedHat Linux Kernel smp PCI buses 32 bits, 33 MHz NIC Myriom M3M-PCI64B Communiation library MPICH/Sore 4. Table 1: Cluster node onfiguration. The ommuniation layer we use is the Message Passing Interfae (MPI) on top of the SCore system [14]. SCore is a high-performane parallel programming environment for workstation and PC lusters. SCore relies on the PMv2 [26] low-level ommuniation layer. The MPI implementation we use is a port of the MPICH library [19] for the SCore system. Figure 3 shows the bandwidth and lateny of the basi, un-ontended MPI Send and MPI Rev operations. We obtain these point-to-point numbers from running the SKaMPI (Speial Karlsruher MPI-Benhmark) benhmark on our system [23]. In all our experiments we use the g ompiler, version , with the -O2 optimization level. 5. ALGORITHM IMPLEMENTATION Our implementations of the algorithm outlined in Setion 3 assume that sampled data has already been plaed in the main memory of eah node by the aquisition units. Next, we present our, in-house, sequential and parallel implementations of the 3D beamforming algorithm. void main() { reate_filter(bf_filter); reate_steeringvetor(stv); //for eah tile (pr,p) for(int pr = ; pr < ROW; pr++) { for(int p = ; p < COL; p++) { read_data(buffer_in[chc][chr][num_freq]); while (zone < NZONES) { FFT(buffer_in, fft_out); Filter(fft_out, bf_filter); while(fft samples >= zones samples) { for(xb = ; xb < xbeams; xb++) { C_Steer(fft_out, STV, az_out); for(yb = ; yb < ybeams; yb++) { R_Steer(az_out, STV, bout); IFFT(bout); Write_to(buffer_out, bout); display(buffer_out); Algorithm 1: Pseudo-ode for the sequential implementation. 5.1 Sequential Implementation Our sequential algorithm for performing the omputation

5 Bandwidth (MB/s) Lateny in miroseonds k 2k 4k 8k 16k 32k 64k 128k 256k 512k 1M Message Size (byes) k 2k 4k 8k 16k 32k 64k 128k Message Size (byes) Figure 3: Ping-pong bandwidth (left) and one-way lateny (right) for a pair of nodes using (MPI Send, MPI Rev) to send and reeive data. outlined in Setion 3 onsists of the following phases: read input samples, ompute FFTs, filter results, perform olumn steering, reorganize data in memory, perform row steering, perform inverse FFTs, and, finally, output to display. Algorithm 1 shows the pseudo-ode for this implementation. In addition to dividing the sanned volume to multiple foal zones and using multiple beams to san it, beamforming algorithms divide the 2D surfae to be sanned in multiple tiles. If we view the 2D surfae as an array of points, we an divide the rows and olumns in ROW, COL groups forming ROW COL tiles. Eah tile is sanned by the full planar sensor array. Thus, CHC and CHR represent the number of sensors in eah dimension of the planar array. Every full snapshot (that generates a single full 3D frame of the sanned volume) requires sanning all tiles and foal zones. For real-time proessing we would require at least 1 frames (full snapshots) per seond and ideally 2 to 3. To re-reate the depth information the algorithm proesses the data based on a number of foal zones (NZONES). Eah foal zone is of onstant width (depth), whih depends onthedepthofthevolumetobesannedandthenumber of foal zones. The volume depth is usually onstant and defined by the type of objets the ultrasound will san. For instane, different human organs require different san depths. In this work we use a fixed maximum foal depth of 16m and we vary only the number of foal zones. For eah sanned point of the input volume the program reads the time series data from the orresponding sensor (that may san multiple points) and stores it in a buffer buffer in[chc][chr][num FREQ] in host memory. Eah sample is 32 bits and is represented as a single preision floating point number. The number of samples that need to be proessed is ditated by the depth of eah foal zone. The probing signal used to san the objet reahes different depths of the sanned volume with different delays. The sampling rate used to digitize the reeived signal ditates the minimum number of samples (and the minimum FFT size) required to reonstrut depth information. For example, assuming a sampling frequeny of F s =3MHz,thenumberofsamples needed to reonstrut 2 m depth an be omputed by N = 2dFs = = 7792, where N is the number of read in samples, is the speed of sound in meters per seond, d is the depth of the reonstrut area in meters. The fator of two is need to aount for the round trip time. Thus, in this example the ultrasound system (aquisition unit) needs to provide the beamformer with 7792 samples for eah of the sensor time series of the ultrasound probe. Using more than the minimum number of samples an improve the array gain and result in better quality imaging. However, reading in more samples results not only in more proessing but also in longer aquisition times and higher storage requirements. In our experiments we set the number of samples to 8K and instead we vary the size of the FFT operations. After the time samples are read and onverted to frequeny domain, a filtering phase is used to redue the amount of information passed to later stages. The information embedded in the reeived signals neessary for reonstruting a partiular depth region is loalized in a ertain bandwidth. Using only the relevant frequeny omponents further redues omputational time. Thus, the FFT output samples are filtered (with a Finite Impulse Response (FIR) filter [22]) to exlude unneessary information and the related proessing. The bandwidth depends on the foal depth and the enter frequeny of the ultrasound pulses. Lower frequeny signals usually have better penetration into deeper regions, whereas, higher frequeny signals produe sharper beam resolutions. However, enter frequenies are usually fixed for eah depth. Thus, in this work we use 2 MHz as the enter frequeny (for an input volume with maximum depth of 16 m). In the appliations we are interested in, most objets (human organs) would fall within this range. Given this enter frequeny the bandwidth of the filter an vary in the range.5 4. MHz. After filtering, the beamformer performs the steering operations and finally samples are onverted bak to the time domain for displaying. Based on equation (6), the xb and yb loops proess the steering of beams on the and Y axis separately using the pre-alulated steering vetor STV to align the time delay of the signals arriving in different sensors and IFFT transforms the signal from the frequeny to the time domain. 5.2 Optimizations To gain onfidene that we start from an effiient sequential implementation, before proeeding with parallelization, we perform a number of measurements to fine tune several aspets of our sequential implementation. First, we explore various FFT implementations, both our own and publily available. We find that, for the proessors we use, the most effiient implementation is FFTW [8], a C

6 void main(int arg, har** argv) { reate_filter(bf_filter); reate_steering_vetor(stv); for (eah frame tile) read_data(buffer_in); for (eah fft-size samples) FFT(buffer_in, fft_out); Filter(fft_out, bf_filter); for (eah foal zone) for (eah proessor pro < NPROCS ) for (eah x-axis beam < xbeams/nprocs) C_Steer(fft_out, STV, az_out_send); // redistribute data among nodes if(pro!= My_rank) { MPI_Irev(az_out_buf[pro], sendsize, MPI_FLOAT, MPI_ANY_SOURCE, Tag, MPI_COMM_WORLD, &rev_req[omm_ount]); MPI_Send(az_out_send, sendsize, MPI_FLOAT, pro, Tag, MPI_COMM_WORLD); else { mempy(az_out_buf[pro], az_out_send, sendsize*sizeof(float)); MPI_Waitall(NPROCS-1, rev_req, rev_stat); datareordertransformation(az_out_buf, az_out); for (eah x-axis beam < xbeams/nprocs) for (eah y-axis beam < ybeams) R_Steer(az_out, STV, bout); IFFT(bout); write_time_serial_data(buf_out, bout); reate_display_data(buf_out); Algorithm 2: Pseudo-ode for the parallel implementation. library for omputing disrete Fourier transforms. Sine the input time series data are real numbers we use the real onedimensional FFT funtion rfftw(). This also minimizes spae requirements sine the output of this funtion is a half-omplex array that onsists of only half the DFT amplitudes; The negative-frequeny amplitudes for real data are the omplex onjugates of the positive-frequeny amplitudes. The side effet of this is that we need to reorganize the output to a ommon, full-omplex array format after the FFT and revert to the half-omplex array before the IFFT. Also, FFTW omputes an un-normalized transform for the input signal (IFFT(FFT(x)) = N x) forsizen transforms. Thus, a division by N is needed for eah element of the array after the final IFFT. The plan argument to rfftw() is onstant aross invoations and an be preomputed. Seond, we experiment with multiple ways of performing the steering and the related dot produt operations. We notie that the inner summation in equation (6) is atually the summation of M and N omplex numbers whih are the results of the omplex multipliations I m,n (f i) and S m(f i,a x,r), respetively. We find that the best results are obtained by using the blas dotu sub() funtion from the Intel Math Kernel Library (MKL) [12] to ompute the neessary dot produts and to perform the steering. MKL is optimized for the Pentium family of proessors and makes effetive use of the Matrix Manipulation Extensions (MM) [21], SSE (Streaming SIMD Extensions) [13], and similar instrutions. Third, we tune loop ordering and the layout of multidimensional array data strutures to improve memory aess behavior and to redue ahe misses. The overall effet of these optimization steps is a redution of the overall exeution time of the sequential implementation by a fator of about 1. It is a surprising result that hand-tuning an be so effetive with all ompiler optimizations turned on. However, sine in this work we are more interested in the behavior of the parallel version we omit these results due to spae limitations. 5.3 Parallel Implementation The parallel version of the beamforming algorithm follows losely the struture of the sequential implementation. We see that the data read from eah sensor is proessed independently until steering. Then, during the steering phase, the beams aross the and Y diretions are independent. Therefore, we hoose to divide the omputation in two phases. The first phase inludes all proessing until after the olumnsteering phase. The seond phase inludes the rest of the proessing, starting at the row-steering phase. Between the two phases, we need to reorganize the data in memory by performing a matrix transpose, whih results in an all-to-all ommuniation pattern. The first phase of the omputation for eah frame is deomposed in tasks based on the data generated by eah sensor. Thus, there is as many tasks as sensors (e.g ), whih is suffiient for systems with large numbers of proessors. The tasks for the seond phase are determined by the proessing assoiated with eah beam. We use the proessing related to a single beam as the basi task and we deompose the seond phase to xbeams ybeams tasks. For instane, with 8 beams in eah diretion, there are 64 oarse grain tasks. We expet that for all pratial appliations, at least 8 8beamswillbeneessaryandthuswedonot onsider ases where the number of proessors is larger than the total number of beams. We experiment with two implementations of the parallel algorithm. The first implementation uses dediated nodes for eah phase. However, we find that balaning the number of nodes between the two phases of the omputation depends on a large number of parameters. Thus, we provide a seond, symmetri, implementation as well, where all nodes in the system perform the same type of proessing. Although, the first, dataflow approah has ertain advantages in reduing task management osts, we find that the seond, SPMD approah is more flexible and results in better load balaning. Thus, for the rest of this work we only use our symmetri implementation, as shown in Algorithm EVALUATION METHODOLOGY The goal of our work is twofold. We are interested in evaluating the absolute performane of this family of algorithms on lusters of generi, ommodity omponents. In addition, we aim to understand the omputational harateristis of this emerging lass of appliations. Sine the data aquisition unit of our system is not available yet and there are no publily available data from atual systems (due to privay and other onstraints), we use the

7 Field II ultrasound simulator [15] to generate the input samples for our experiments. The input to the Field II simulator is a point-model of a shell objet. For our experiments we use a shell of 5,652 points. The exat simulator parameters for generating the input time-domain signals are shown in Table 2. Transmit Center Frequeny 2.MHz Bandwidth 2.MHz Array Size Detetor Size.35mm Detetor Spaing.4mm Transmit Foal Depth 7mm Reeive Array Size Detetor Size.35mm Detetor Spaing.4mm Reeive Foal Depth Infinite (1 22 m) Sampling Frequeny 33MHz Shell Inner Shell Radius 1mm Outer Shell Radius 14mm Shell Center (5mm, -5mm, 7mm) Shell Thikness 4mm Points Defining Shell 5,652 Satter Density 6.93 pts/mm 3 Table 2: Input parameters for the Field II simulator. To investigate how eah system parameter affets the exeution time of the algorithm in pratie, we examine the most important system parameters for beamforming-based ultrasound systems. Table 3 summarizes these parameters, their allowable value ranges, and the values used in our experiments. Eah parameter is attributed either to the ultrasound system itself (physial) or the beamforming algorithm (algorithmi). First, we verify that parameters are (for all pratial purposes) independent of eah other by performing guided experiments (whih we do not present here due to spae limitations). Thus, we vary eah parameter individually by keeping all other parameters onstant. The base value and the range we use for eah parameter is shown in Table 3. To make the effet of varying eah parameter as visible as possible we use as the base ase, values that result in relatively low amounts of omputation. We denote eah onfiguration with the notation a{sensors-b{beams-f{fft-sp{foal -bw{bandwidth, wheresensors is the number of sensors in eah dimension of the planar array, beams is the number of beams in eah tile of a snapshot, FFT is the FFT size, foal represents the foal zone size in millimeters, and bandwidth is the bandwidth of the filter in MHz. For example, the base onfiguration, denoted as a32-b8-f512-sp1-bw2. speifies a onfiguration of sensor array, 8 8 beams per tile, 512 FFT size, 1 mm depth for eah foal zone, and 2. MHz filter bandwidth. It is important to note that hanging eah parameter impats not only exeution time, but image quality as well. Thus, it is important to be able to quantify image quality and to also take it into aount when evaluating various onfigurations. One traditional method of quantifying image quality is to orrelate eah generated image with the prototype that is being sanned, and to use the orrelation number for ranking output images. In our ase, however, sine the input time series is generated with the Fields II simulator, there is no atual input objet or image. For this reason, we use as the prototype image the best possible image that the algorithm an generate (a32-b16-f496-sp5- bw4. ). We orrelate eah pair of images by using the same oeffiient that is used in statis to express the degree of dependene between two variables [2]. In our ase, eah variable orresponds to the pixel value of eah image. Although it is somewhat simplisti to ompare two linial images just by using the orrelation oeffiient (without expert opinion from medial personnel), we still get a good indiation of the relative quality of various images. We perform various onsisteny heks to verify that the orrelation oeffiients orrespond, to the extend possible, to the pereived quality of eah image and we feel reasonably onfident that our ranking methodology is valid. In our measurements we exlude the initialization time and we present measurements only for the parallel setion. Moreover, as mentioned earlier, we assume that input data is delivered to memory by the data aquisition ards. Although these transfers may interfere with other ommuniation in the system, it is not an issue sine overall traffi is low (as we will see in Setion 7). It is important to note that, although we do not evaluate this aspet of our system, one of the advantages of using a luster to proess the input data, is that the I/O path bandwidth sales linearly as we inrease the number of nodes in the luster. Finally, in our measurements we exlude the time needed to send the proessed data from eah node to a separate node that displays the images. However, the amount of ommuniation required is very small and ours over a separate 1 MBit Ethernet network. For the parallel setion of the algorithm, we present both overall exeution times as well as exeution time breakdowns. To reveal whih parts of the algorithm inur high overheads we break exeution time to the following omponents: FFT is the total time spent performing FFTs on the input samples. Filter is the time filtering frequenies that are outside a pre-speified range. Csteer is the time spent steering the samples orresponding to the olumn sensors. Communiation is the time spent redistributing the data among the system nodes. For the uniproessor ase, ommuniation time represents the time to transpose the array loally. Rsteer is the time spent steering the data that orrespond to eah sensor row. IFFT is the time spent performing inverse FFTs. Finally, Other represents the time spent in the rest of the parallel setion of the algorithm. 7. RESULTS In this setion we first present our overall performane results and then we examine the effet of eah individual parameter separately. 7.1 Overall Exeution Time Figure 4 shows the total exeution time of the parallel setion of eah onfiguration as the number of proessors hanges. We see that exeution time redues linearly with the number of proessors (the x-axis uses a log sale). This is mainly due to the fat that the partitioning of the tasks is well balaned and the fat that the amount of ommuniation between the two phases of the parallel algorithm is relatively small. The message size depends on the number of the proessors, the enter frequeny of the ultrasound

8 Parameter Charateristis Range Values used Number of sensors Physial m m(m =32, 24, 16, 8) 32 32, 24 24, 16 16, 8 8 FFT size (samples) Algorithmi # of samples 512, 124, 248, 496 Filter bandwidth (MHz) Algorithmi [.5, 4.].5, 1., 1.5, 2., 2.6, 3., 4. Foal zones size (m) Algorithmi 1..5, 1. Number of beams per 1 1 Algorithmi n n (n 16) 16 16, 8 8, 4 4 Table 3: Algorithm parameters, valid ranges for eah parameter, and the values we examine. Highlighted values indiate the base value for eah parameter. Time for 1 frames (se.) Number of Proessors a32-b8-f496-sp1-bw2. a32-b8-f248-sp1-bw2. a32-b16-f512-sp1-bw2. a32-b8-f124-sp1-bw2. a32-b8-f512-sp1-bw4. a32-b8-f512-sp5-bw2. a32-b8-f512-sp1-bw3. a32-b8-f512-sp1-bw2.6 a32-b8-f512-sp1-bw2. a32-b8-f512-sp1-bw1.5 a32-b8-f512-sp1-bw1. a32-b4-f512-sp1-bw2. a32-b8-f512-sp1-bw.5 a24-b8-f512-sp1-bw2. a16-b8-f512-sp1-bw2. a8-b8-f512-sp1-bw Effets of system parameters We now desribe the effets of different parameters on exeution time. Sine it is important to also onsider the effet on image quality, we also present the orrelation rankings for eah final (output) image. Time for 1 Frames in Seond Other IFFT Rsteer Comm. Csteer Filter FFT Figure 4: Speedups for different parameter sets by the number of proessors. signal, and the bandwidth of the filter. In our experiments message sizes vary between 112 bytes and 127K bytes. On average, the total amount of data exhanged between the two phases for eah frame is about 1 MByte. This imposes fairly small bandwidth requirements on the interonnet and is well within the apabilities of modern system area networks. Next, we note that the proessing rate for our base ase, a32-b8-f512-sp1-bw1., is about 2 frames/s. The onfiguration with the least amount of proessing, a8-b8-f512- sp1-bw2., an generate about 5.5 frames/s with aeptable quality. Although this is still less than what is needed for real-time performane (ideally, for real-time performane we would require a rate of 2-3 frames/s), using faster proessors that are already available would offer 2-3 times better performane today and real-time performane within a few months. Preliminary runs on a 8-node luster with 2. GHz Pentium4 proessors, show that the average speedup ompared to our 8 MHz proessors varies between 1.6 and 3.9 for an average of about 2.3 aross all onfigurations. For our base onfiguration, a32-b8-f512-sp1-bw2., there is a speedup of about 2.1. The speedup on Csteer is about 4., whereas the speedup on FFT and IFFT is between 1.5 to 1.7. Finally, our fastest onfiguration, a8-b8-f512-sp1-bw2., exhibits a speedup of about 3.2 over our 8 MHz luster and results in a final speed of about 9 frames/s. Given the linear speedups we observe on our 16-node luster, we expet that using sixteen 2. GHz nodes will result in real-time performane for onfiguration with aeptable or even high image quality. 32x32 24x24 16x16 8x8 32x32 24x24 16x16 8x8 32x32 24x24 16x16 8x8 32x32 24x24 16x16 8x8 Number of Sensors P1 P2 P4 P8 Figure 5: Exeution time breakdowns for different numbers of sensors Number of Sensors Figure 5 shows the exeution time breakdown as we vary the number of sensors. All other parameters are set to their base values, b8-f512-sp1-bw2.. We observe that the overall exeution time is almost linear with the total number of sensors and the number of proessors. Next, we observe that the time spent in eah setion of the algorithm redues linearly with the number of sensors, exept for IFFT whih is independent of the number of sensors and remains onstant. Figure 6 shows the orrelation oeffiient for eah output image as the number of sensors is redued (the number of proessors does not affet image quality). We see that the image quality degrades signifiantly as the number of sensors drops; The best and the worst ases differ by more than 15%. However, we should note that whether this redution in image quality is aeptable for an appliation, depends on the speifi appliation. For instane, if the objets to be sanned are fairly simple, then the drop in quality may be aeptable, whereas for objets that have more omplex ontours this may not be the ase Number of beams Varying the number of beams in the algorithm affets

9 No. of Sensors 32x32 24x24 16x16 8x8 No. Of Beams FFT Size Foal Size 5 1 Bandwidth Correlation Coeffiient Figure 6: Image orrelation oeffiient with different parameter sets. The table shows the x-axis values for eah urve. only the steering operations, the amount of ommuniation, and the inverse FFTs. Figure 7 shows that both row-based steering and IFFT times are redued super-linearly. This is due to the fat that below a ertain number of beams the amount of information to be proessed fits ompletely in the L2 ahe. This suggests, that both larger L2 ahes an be helpful for this lass of appliations as well as appliation knowledge that an be used to limit the number of neessary beams. The orrelation oeffiients (Figure 6) show that image quality degrades only if the number of beams is redued to less than 8. In our experiments, eah frame overs an area with an angle of 1 1. For the beam frames, eah beam overs an angle of.625. To over the same area, the 4 4 beams snapshot has an angle of 2.5 for eah beam, whih is a lot oarser than using beams. Our results suggest that for objets of similar omplexity to our input, image quality degrades signifiantly only if eah beam sans more than FFT size Figure 8 shows the exeution time breakdowns for different FFT size and number of proessors. We notie that the overall time spent in FFTs redues slightly with the FFT size. Although, smaller FFTs result in larger numbers of FFTs and for the sizes we onsider the L2 ahe is always effetive, smaller FFTs tend to be more effiient. We also note that FFT time redues sub-linearly with the number of proessors. Finally, we note that the size of the FFTs affets signifiantly olumn and row steering, ommuniation, and inverse FFT times that all redue linearly with FFT size. Figure 6 shows that FFT size has pratially no influene on image quality for the input we use. The reason for this is that the time samples have a high gain even for small FFT sizes and the algorithm is able to reonstrut a sharp image of the input. We expet that this behavior will hange when we use input objets with more omplex ontours in the atual system. However, these results indiate that understanding the appliation areas where the ultrasound system is used, an help identify appropriate values for parameters suh as the FFT size and to optimize for Time for 1 Frames in Seond Number of Beams P1 P2 P4 P8 P16 Other IFFT Rsteer Comm. Csteer Figure 7: Exeution time breakdowns for different numbers of beams. system ost, performane, and image quality tradeoffs Foal zone size Similarly to FFT size, the foal zone size affets only the seond phase of the algorithm. Dereasing the foal zone size from 1 to 5 mm doubles the number of foal zones (from 16 to 32) that are required to over the same depth of volume (16m) and inreases the time required for the seond phase of the algorithm linearly (Figure 9). Finally, image quality is not affeted signifiantly by the foal zone size (Figure 6), for reasons similar to what was explained for the effets of the FFT size Filter bandwidth Changing the filter bandwidth affets all aspets of the algorithm, exept for the time spent in FFTs and IFFTs (Figure 1). The orrelation oeffiient for the output images (Figure 6) shows that image quality degrades only if the filter bandwidth drops below 1. MHz. This is due to the physial harateristis of our simulated transduer array. The aousti signal we use has a bandwidth of 2. MHz whih results in useful information being ontained in a 2. MHz bandwidth in the frequeny domain after the Fourier transform of the input samples. Using smaller filter bandwidths exludes some of this information, and bandwidths less than 1. MHz result in signifiant degradation of image quality Summary Overall, we find that our parallel implementation sales linearly with the number of proessors and that we an ahieve almost real-time performane with state-of-the-art lusters. Furthermore, we find that the amount of time spent in FFT and steering operations dominates. In all our experiments, the parallel implementation spends 85-98% of the time in FFT and beam steering funtions and most of the runs spent between 92-95% in these funtions. Finally, the values of different parameters have signifiant impat on the omputational requirements of the algorithm. Thus, appliation knowledge that an help seleting appropriate values Filter FFT

10 Time for 1 Frames in Seond FFT Size P1 P2 P4 P Other IFFT Rsteer Comm. Csteer Filter FFT P16 Time for 1 Frames in Seond Other IFFT Rsteer Comm. Csteer Filter FFT Foal Zone Size (mm) P1 P2 P4 P8 P16 Figure 8: Exeution time breakdowns for different FFT sizes. for these parameters may be important in optimizing future ultrasound systems for ost and performane. 8. RELATED WORK To the best of our knowledge, there is very little work in understanding the omputational harateristis of ultrasound imaging beamforming proessing algorithms on modern lusters. Numerous solutions for the aquisition problem and a large number of algorithms for proessing the sensor time series have been proposed reently [25, 11, 2, 15, 18, 24]. This work has examined, among other, issues related to transduers and their relation to beamforming tehniques for ultrasound systems. Our work is orthogonal to this and relies on high-quality transduer arrays. Also, previous work has examined the usage of beamforming algorithms in ultrasound and other medial appliations [17, 7]. Finally, there has been a large body of work on parallel beamforming algorithms and implementations on both high-end parallel systems and distributed workstations [4, 5, 9, 16, 1]. However, all this work has examined appliations from other domains, and in partiular sonar systems. 9. CONCLUSIONS In this paper we examine a family of algorithms that are used in high-resolution 3D medial imaging systems. We present the neessary bakground, we desribe the fundamental algorithmi aspets, and study the omputational behavior on modern arhitetures. Our work, indiates that for many appliations, speialized arhitetures are not neessary and that generi lusters may be used. In partiular, we see that our implementation of a stateof-the-art beamforming algorithm, by arefully deomposing the original problem, results in linear speedups in systems up to 16 proessors. On a 16-proessor system we an ahieve almost-real-time medial imaging with aeptable or high image quality. Given that we use older-generation proessors, we expet that today s systems (or within a few months) will be able to provide real-time performane, resulting in signifiant flexibility and ost benefits ompared Figure 9: Exeution time breakdowns for different foal zone sizes. to traditional ustom solutions. Preliminary results with 2. GHz Pentium4 proessors show that there is an average speedup of about 2.3 aross onfiguration ompared to our 8 MHz luster. This ability to take advantage of the latest system omponents that beome available with no additional osts for re-designing the system arhiteture is one of the fundamental benefits of our approah in addressing issues in this area. Thus, given our results, we expet that modern lusters will be used with suess in a wider range of medial appliations. We find that FFT and steering osts are the most signifiant overheads and that ommuniation requirements are very low. In most of our experiments, the appliation spends between 92-95% in these setions. Furthermore, we study and reveal how eah setion of the parallel implementation depends on system parameters. We find that most dependenes are linear with small super- or sub-linear effets. In terms of the indued ommuniation, eah proessor exhanges a small number of messages with all other proessors in the ommuniation phase. We use orrelation oeffiients to quantify the impat on image quality and we find that the effets of different parameters on image quality is very diverse and indiates that appliation knowledge is important in optimizing future ultrasound systems for ostperformane. Furthermore, our work indiates that if speialized solutions are neessary, for instane, portable ultrasound systems, system designers an fous on optimizing ertain setions of the algorithm and ignoring the rest. Finally, we expet that, given their advantages over more traditional solutions, modern lusters with low-lateny and high-bandwidth networks will be apable of handling a wide range of medial appliations and they will be instrumental in improving the ost and preision of medial infrastruture. 1. ACKNOWLEDGMENTS We are thankful to Andreas Moshovos for helping with various, uniproessor optimizations of the sequential algorithm. We thankfully aknowledge the support of Natu-

Pipelined Multipliers for Reconfigurable Hardware

Pipelined Multipliers for Reconfigurable Hardware Pipelined Multipliers for Reonfigurable Hardware Mithell J. Myjak and José G. Delgado-Frias Shool of Eletrial Engineering and Computer Siene, Washington State University Pullman, WA 99164-2752 USA {mmyjak,

More information

Multi-Channel Wireless Networks: Capacity and Protocols

Multi-Channel Wireless Networks: Capacity and Protocols Multi-Channel Wireless Networks: Capaity and Protools Tehnial Report April 2005 Pradeep Kyasanur Dept. of Computer Siene, and Coordinated Siene Laboratory, University of Illinois at Urbana-Champaign Email:

More information

What are Cycle-Stealing Systems Good For? A Detailed Performance Model Case Study

What are Cycle-Stealing Systems Good For? A Detailed Performance Model Case Study What are Cyle-Stealing Systems Good For? A Detailed Performane Model Case Study Wayne Kelly and Jiro Sumitomo Queensland University of Tehnology, Australia {w.kelly, j.sumitomo}@qut.edu.au Abstrat The

More information

A Novel Validity Index for Determination of the Optimal Number of Clusters

A Novel Validity Index for Determination of the Optimal Number of Clusters IEICE TRANS. INF. & SYST., VOL.E84 D, NO.2 FEBRUARY 2001 281 LETTER A Novel Validity Index for Determination of the Optimal Number of Clusters Do-Jong KIM, Yong-Woon PARK, and Dong-Jo PARK, Nonmembers

More information

NONLINEAR BACK PROJECTION FOR TOMOGRAPHIC IMAGE RECONSTRUCTION. Ken Sauer and Charles A. Bouman

NONLINEAR BACK PROJECTION FOR TOMOGRAPHIC IMAGE RECONSTRUCTION. Ken Sauer and Charles A. Bouman NONLINEAR BACK PROJECTION FOR TOMOGRAPHIC IMAGE RECONSTRUCTION Ken Sauer and Charles A. Bouman Department of Eletrial Engineering, University of Notre Dame Notre Dame, IN 46556, (219) 631-6999 Shool of

More information

Automatic Physical Design Tuning: Workload as a Sequence Sanjay Agrawal Microsoft Research One Microsoft Way Redmond, WA, USA +1-(425)

Automatic Physical Design Tuning: Workload as a Sequence Sanjay Agrawal Microsoft Research One Microsoft Way Redmond, WA, USA +1-(425) Automati Physial Design Tuning: Workload as a Sequene Sanjay Agrawal Mirosoft Researh One Mirosoft Way Redmond, WA, USA +1-(425) 75-357 sagrawal@mirosoft.om Eri Chu * Computer Sienes Department University

More information

On - Line Path Delay Fault Testing of Omega MINs M. Bellos 1, E. Kalligeros 1, D. Nikolos 1,2 & H. T. Vergos 1,2

On - Line Path Delay Fault Testing of Omega MINs M. Bellos 1, E. Kalligeros 1, D. Nikolos 1,2 & H. T. Vergos 1,2 On - Line Path Delay Fault Testing of Omega MINs M. Bellos, E. Kalligeros, D. Nikolos,2 & H. T. Vergos,2 Dept. of Computer Engineering and Informatis 2 Computer Tehnology Institute University of Patras,

More information

Outline: Software Design

Outline: Software Design Outline: Software Design. Goals History of software design ideas Design priniples Design methods Life belt or leg iron? (Budgen) Copyright Nany Leveson, Sept. 1999 A Little History... At first, struggling

More information

Acoustic Links. Maximizing Channel Utilization for Underwater

Acoustic Links. Maximizing Channel Utilization for Underwater Maximizing Channel Utilization for Underwater Aousti Links Albert F Hairris III Davide G. B. Meneghetti Adihele Zorzi Department of Information Engineering University of Padova, Italy Email: {harris,davide.meneghetti,zorzi}@dei.unipd.it

More information

Analysis of input and output configurations for use in four-valued CCD programmable logic arrays

Analysis of input and output configurations for use in four-valued CCD programmable logic arrays nalysis of input and output onfigurations for use in four-valued D programmable logi arrays J.T. utler H.G. Kerkhoff ndexing terms: Logi, iruit theory and design, harge-oupled devies bstrat: s in binary,

More information

Gray Codes for Reflectable Languages

Gray Codes for Reflectable Languages Gray Codes for Refletable Languages Yue Li Joe Sawada Marh 8, 2008 Abstrat We lassify a type of language alled a refletable language. We then develop a generi algorithm that an be used to list all strings

More information

Abstract. Key Words: Image Filters, Fuzzy Filters, Order Statistics Filters, Rank Ordered Mean Filters, Channel Noise. 1.

Abstract. Key Words: Image Filters, Fuzzy Filters, Order Statistics Filters, Rank Ordered Mean Filters, Channel Noise. 1. Fuzzy Weighted Rank Ordered Mean (FWROM) Filters for Mixed Noise Suppression from Images S. Meher, G. Panda, B. Majhi 3, M.R. Meher 4,,4 Department of Eletronis and I.E., National Institute of Tehnology,

More information

A {k, n}-secret Sharing Scheme for Color Images

A {k, n}-secret Sharing Scheme for Color Images A {k, n}-seret Sharing Sheme for Color Images Rastislav Luka, Konstantinos N. Plataniotis, and Anastasios N. Venetsanopoulos The Edward S. Rogers Sr. Dept. of Eletrial and Computer Engineering, University

More information

Accommodations of QoS DiffServ Over IP and MPLS Networks

Accommodations of QoS DiffServ Over IP and MPLS Networks Aommodations of QoS DiffServ Over IP and MPLS Networks Abdullah AlWehaibi, Anjali Agarwal, Mihael Kadoh and Ahmed ElHakeem Department of Eletrial and Computer Department de Genie Eletrique Engineering

More information

Extracting Partition Statistics from Semistructured Data

Extracting Partition Statistics from Semistructured Data Extrating Partition Statistis from Semistrutured Data John N. Wilson Rihard Gourlay Robert Japp Mathias Neumüller Department of Computer and Information Sienes University of Strathlyde, Glasgow, UK {jnw,rsg,rpj,mathias}@is.strath.a.uk

More information

The Implementation of RRTs for a Remote-Controlled Mobile Robot

The Implementation of RRTs for a Remote-Controlled Mobile Robot ICCAS5 June -5, KINEX, Gyeonggi-Do, Korea he Implementation of RRs for a Remote-Controlled Mobile Robot Chi-Won Roh*, Woo-Sub Lee **, Sung-Chul Kang *** and Kwang-Won Lee **** * Intelligent Robotis Researh

More information

System-Level Parallelism and Throughput Optimization in Designing Reconfigurable Computing Applications

System-Level Parallelism and Throughput Optimization in Designing Reconfigurable Computing Applications System-Level Parallelism and hroughput Optimization in Designing Reonfigurable Computing Appliations Esam El-Araby 1, Mohamed aher 1, Kris Gaj 2, arek El-Ghazawi 1, David Caliga 3, and Nikitas Alexandridis

More information

COST PERFORMANCE ASPECTS OF CCD FAST AUXILIARY MEMORY

COST PERFORMANCE ASPECTS OF CCD FAST AUXILIARY MEMORY COST PERFORMANCE ASPECTS OF CCD FAST AUXILIARY MEMORY Dileep P, Bhondarkor Texas Instruments Inorporated Dallas, Texas ABSTRACT Charge oupled devies (CCD's) hove been mentioned as potential fast auxiliary

More information

Methods for Multi-Dimensional Robustness Optimization in Complex Embedded Systems

Methods for Multi-Dimensional Robustness Optimization in Complex Embedded Systems Methods for Multi-Dimensional Robustness Optimization in Complex Embedded Systems Arne Hamann, Razvan Rau, Rolf Ernst Institute of Computer and Communiation Network Engineering Tehnial University of Braunshweig,

More information

The Mathematics of Simple Ultrasonic 2-Dimensional Sensing

The Mathematics of Simple Ultrasonic 2-Dimensional Sensing The Mathematis of Simple Ultrasoni -Dimensional Sensing President, Bitstream Tehnology The Mathematis of Simple Ultrasoni -Dimensional Sensing Introdution Our ompany, Bitstream Tehnology, has been developing

More information

Simulation of Crystallographic Texture and Anisotropie of Polycrystals during Metal Forming with Respect to Scaling Aspects

Simulation of Crystallographic Texture and Anisotropie of Polycrystals during Metal Forming with Respect to Scaling Aspects Raabe, Roters, Wang Simulation of Crystallographi Texture and Anisotropie of Polyrystals during Metal Forming with Respet to Saling Aspets D. Raabe, F. Roters, Y. Wang Max-Plank-Institut für Eisenforshung,

More information

Learning Convention Propagation in BeerAdvocate Reviews from a etwork Perspective. Abstract

Learning Convention Propagation in BeerAdvocate Reviews from a etwork Perspective. Abstract CS 9 Projet Final Report: Learning Convention Propagation in BeerAdvoate Reviews from a etwork Perspetive Abstrat We look at the way onventions propagate between reviews on the BeerAdvoate dataset, and

More information

Algorithms, Mechanisms and Procedures for the Computer-aided Project Generation System

Algorithms, Mechanisms and Procedures for the Computer-aided Project Generation System Algorithms, Mehanisms and Proedures for the Computer-aided Projet Generation System Anton O. Butko 1*, Aleksandr P. Briukhovetskii 2, Dmitry E. Grigoriev 2# and Konstantin S. Kalashnikov 3 1 Department

More information

Plot-to-track correlation in A-SMGCS using the target images from a Surface Movement Radar

Plot-to-track correlation in A-SMGCS using the target images from a Surface Movement Radar Plot-to-trak orrelation in A-SMGCS using the target images from a Surfae Movement Radar G. Golino Radar & ehnology Division AMS, Italy ggolino@amsjv.it Abstrat he main topi of this paper is the formulation

More information

SVC-DASH-M: Scalable Video Coding Dynamic Adaptive Streaming Over HTTP Using Multiple Connections

SVC-DASH-M: Scalable Video Coding Dynamic Adaptive Streaming Over HTTP Using Multiple Connections SVC-DASH-M: Salable Video Coding Dynami Adaptive Streaming Over HTTP Using Multiple Connetions Samar Ibrahim, Ahmed H. Zahran and Mahmoud H. Ismail Department of Eletronis and Eletrial Communiations, Faulty

More information

An Event Display for ATLAS H8 Pixel Test Beam Data

An Event Display for ATLAS H8 Pixel Test Beam Data An Event Display for ATLAS H8 Pixel Test Beam Data George Gollin Centre de Physique des Partiules de Marseille and University of Illinois April 17, 1999 g-gollin@uiu.edu An event display program is now

More information

Introduction to Seismology Spring 2008

Introduction to Seismology Spring 2008 MIT OpenCourseWare http://ow.mit.edu 1.510 Introdution to Seismology Spring 008 For information about iting these materials or our Terms of Use, visit: http://ow.mit.edu/terms. 1.510 Leture Notes 3.3.007

More information

The Minimum Redundancy Maximum Relevance Approach to Building Sparse Support Vector Machines

The Minimum Redundancy Maximum Relevance Approach to Building Sparse Support Vector Machines The Minimum Redundany Maximum Relevane Approah to Building Sparse Support Vetor Mahines Xiaoxing Yang, Ke Tang, and Xin Yao, Nature Inspired Computation and Appliations Laboratory (NICAL), Shool of Computer

More information

The AMDREL Project in Retrospective

The AMDREL Project in Retrospective The AMDREL Projet in Retrospetive K. Siozios 1, G. Koutroumpezis 1, K. Tatas 1, N. Vassiliadis 2, V. Kalenteridis 2, H. Pournara 2, I. Pappas 2, D. Soudris 1, S. Nikolaidis 2, S. Siskos 2, and A. Thanailakis

More information

Detection and Recognition of Non-Occluded Objects using Signature Map

Detection and Recognition of Non-Occluded Objects using Signature Map 6th WSEAS International Conferene on CIRCUITS, SYSTEMS, ELECTRONICS,CONTROL & SIGNAL PROCESSING, Cairo, Egypt, De 9-31, 007 65 Detetion and Reognition of Non-Oluded Objets using Signature Map Sangbum Park,

More information

Approximate logic synthesis for error tolerant applications

Approximate logic synthesis for error tolerant applications Approximate logi synthesis for error tolerant appliations Doohul Shin and Sandeep K. Gupta Eletrial Engineering Department, University of Southern California, Los Angeles, CA 989 {doohuls, sandeep}@us.edu

More information

An Approach to Physics Based Surrogate Model Development for Application with IDPSA

An Approach to Physics Based Surrogate Model Development for Application with IDPSA An Approah to Physis Based Surrogate Model Development for Appliation with IDPSA Ignas Mikus a*, Kaspar Kööp a, Marti Jeltsov a, Yuri Vorobyev b, Walter Villanueva a, and Pavel Kudinov a a Royal Institute

More information

Dr.Hazeem Al-Khafaji Dept. of Computer Science, Thi-Qar University, College of Science, Iraq

Dr.Hazeem Al-Khafaji Dept. of Computer Science, Thi-Qar University, College of Science, Iraq Volume 4 Issue 6 June 014 ISSN: 77 18X International Journal of Advaned Researh in Computer Siene and Software Engineering Researh Paper Available online at: www.ijarsse.om Medial Image Compression using

More information

A Load-Balanced Clustering Protocol for Hierarchical Wireless Sensor Networks

A Load-Balanced Clustering Protocol for Hierarchical Wireless Sensor Networks International Journal of Advanes in Computer Networks and Its Seurity IJCNS A Load-Balaned Clustering Protool for Hierarhial Wireless Sensor Networks Mehdi Tarhani, Yousef S. Kavian, Saman Siavoshi, Ali

More information

the data. Structured Principal Component Analysis (SPCA)

the data. Structured Principal Component Analysis (SPCA) Strutured Prinipal Component Analysis Kristin M. Branson and Sameer Agarwal Department of Computer Siene and Engineering University of California, San Diego La Jolla, CA 9193-114 Abstrat Many tasks involving

More information

HEXA: Compact Data Structures for Faster Packet Processing

HEXA: Compact Data Structures for Faster Packet Processing Washington University in St. Louis Washington University Open Sholarship All Computer Siene and Engineering Researh Computer Siene and Engineering Report Number: 27-26 27 HEXA: Compat Data Strutures for

More information

Performance of Histogram-Based Skin Colour Segmentation for Arms Detection in Human Motion Analysis Application

Performance of Histogram-Based Skin Colour Segmentation for Arms Detection in Human Motion Analysis Application World Aademy of Siene, Engineering and Tehnology 8 009 Performane of Histogram-Based Skin Colour Segmentation for Arms Detetion in Human Motion Analysis Appliation Rosalyn R. Porle, Ali Chekima, Farrah

More information

Multi-Piece Mold Design Based on Linear Mixed-Integer Program Toward Guaranteed Optimality

Multi-Piece Mold Design Based on Linear Mixed-Integer Program Toward Guaranteed Optimality INTERNATIONAL CONFERENCE ON MANUFACTURING AUTOMATION (ICMA200) Multi-Piee Mold Design Based on Linear Mixed-Integer Program Toward Guaranteed Optimality Stephen Stoyan, Yong Chen* Epstein Department of

More information

Partial Character Decoding for Improved Regular Expression Matching in FPGAs

Partial Character Decoding for Improved Regular Expression Matching in FPGAs Partial Charater Deoding for Improved Regular Expression Mathing in FPGAs Peter Sutton Shool of Information Tehnology and Eletrial Engineering The University of Queensland Brisbane, Queensland, 4072, Australia

More information

Chapter 2: Introduction to Maple V

Chapter 2: Introduction to Maple V Chapter 2: Introdution to Maple V 2-1 Working with Maple Worksheets Try It! (p. 15) Start a Maple session with an empty worksheet. The name of the worksheet should be Untitled (1). Use one of the standard

More information

Dynamic Backlight Adaptation for Low Power Handheld Devices 1

Dynamic Backlight Adaptation for Low Power Handheld Devices 1 Dynami Baklight Adaptation for ow Power Handheld Devies 1 Sudeep Pasriha, Manev uthra, Shivajit Mohapatra, Nikil Dutt and Nalini Venkatasubramanian 444, Computer Siene Building, Shool of Information &

More information

RAC 2 E: Novel Rendezvous Protocol for Asynchronous Cognitive Radios in Cooperative Environments

RAC 2 E: Novel Rendezvous Protocol for Asynchronous Cognitive Radios in Cooperative Environments 21st Annual IEEE International Symposium on Personal, Indoor and Mobile Radio Communiations 1 RAC 2 E: Novel Rendezvous Protool for Asynhronous Cognitive Radios in Cooperative Environments Valentina Pavlovska,

More information

3-D IMAGE MODELS AND COMPRESSION - SYNTHETIC HYBRID OR NATURAL FIT?

3-D IMAGE MODELS AND COMPRESSION - SYNTHETIC HYBRID OR NATURAL FIT? 3-D IMAGE MODELS AND COMPRESSION - SYNTHETIC HYBRID OR NATURAL FIT? Bernd Girod, Peter Eisert, Marus Magnor, Ekehard Steinbah, Thomas Wiegand Te {girod eommuniations Laboratory, University of Erlangen-Nuremberg

More information

Trajectory Tracking Control for A Wheeled Mobile Robot Using Fuzzy Logic Controller

Trajectory Tracking Control for A Wheeled Mobile Robot Using Fuzzy Logic Controller Trajetory Traking Control for A Wheeled Mobile Robot Using Fuzzy Logi Controller K N FARESS 1 M T EL HAGRY 1 A A EL KOSY 2 1 Eletronis researh institute, Cairo, Egypt 2 Faulty of Engineering, Cairo University,

More information

COSSIM An Integrated Solution to Address the Simulator Gap for Parallel Heterogeneous Systems

COSSIM An Integrated Solution to Address the Simulator Gap for Parallel Heterogeneous Systems COSSIM An Integrated Solution to Address the Simulator Gap for Parallel Heterogeneous Systems Andreas Brokalakis Synelixis Solutions Ltd, Greee brokalakis@synelixis.om Nikolaos Tampouratzis Teleommuniation

More information

CleanUp: Improving Quadrilateral Finite Element Meshes

CleanUp: Improving Quadrilateral Finite Element Meshes CleanUp: Improving Quadrilateral Finite Element Meshes Paul Kinney MD-10 ECC P.O. Box 203 Ford Motor Company Dearborn, MI. 8121 (313) 28-1228 pkinney@ford.om Abstrat: Unless an all quadrilateral (quad)

More information

A DYNAMIC ACCESS CONTROL WITH BINARY KEY-PAIR

A DYNAMIC ACCESS CONTROL WITH BINARY KEY-PAIR Malaysian Journal of Computer Siene, Vol 10 No 1, June 1997, pp 36-41 A DYNAMIC ACCESS CONTROL WITH BINARY KEY-PAIR Md Rafiqul Islam, Harihodin Selamat and Mohd Noor Md Sap Faulty of Computer Siene and

More information

Capturing Large Intra-class Variations of Biometric Data by Template Co-updating

Capturing Large Intra-class Variations of Biometric Data by Template Co-updating Capturing Large Intra-lass Variations of Biometri Data by Template Co-updating Ajita Rattani University of Cagliari Piazza d'armi, Cagliari, Italy ajita.rattani@diee.unia.it Gian Lua Marialis University

More information

Design Implications for Enterprise Storage Systems via Multi-Dimensional Trace Analysis

Design Implications for Enterprise Storage Systems via Multi-Dimensional Trace Analysis Design Impliations for Enterprise Storage Systems via Multi-Dimensional Trae Analysis Yanpei Chen, Kiran Srinivasan, Garth Goodson, Randy Katz University of California, Berkeley, NetApp In. {yhen2, randy}@ees.berkeley.edu,

More information

Using Augmented Measurements to Improve the Convergence of ICP

Using Augmented Measurements to Improve the Convergence of ICP Using Augmented Measurements to Improve the onvergene of IP Jaopo Serafin, Giorgio Grisetti Dept. of omputer, ontrol and Management Engineering, Sapienza University of Rome, Via Ariosto 25, I-0085, Rome,

More information

Cluster-Based Cumulative Ensembles

Cluster-Based Cumulative Ensembles Cluster-Based Cumulative Ensembles Hanan G. Ayad and Mohamed S. Kamel Pattern Analysis and Mahine Intelligene Lab, Eletrial and Computer Engineering, University of Waterloo, Waterloo, Ontario N2L 3G1,

More information

Make your process world

Make your process world Automation platforms Modion Quantum Safety System Make your proess world a safer plae You are faing omplex hallenges... Safety is at the heart of your proess In order to maintain and inrease your ompetitiveness,

More information

Drawing lines. Naïve line drawing algorithm. drawpixel(x, round(y)); double dy = y1 - y0; double dx = x1 - x0; double m = dy / dx; double y = y0;

Drawing lines. Naïve line drawing algorithm. drawpixel(x, round(y)); double dy = y1 - y0; double dx = x1 - x0; double m = dy / dx; double y = y0; Naïve line drawing algorithm // Connet to grid points(x0,y0) and // (x1,y1) by a line. void drawline(int x0, int y0, int x1, int y1) { int x; double dy = y1 - y0; double dx = x1 - x0; double m = dy / dx;

More information

13.1 Numerical Evaluation of Integrals Over One Dimension

13.1 Numerical Evaluation of Integrals Over One Dimension 13.1 Numerial Evaluation of Integrals Over One Dimension A. Purpose This olletion of subprograms estimates the value of the integral b a f(x) dx where the integrand f(x) and the limits a and b are supplied

More information

Video Data and Sonar Data: Real World Data Fusion Example

Video Data and Sonar Data: Real World Data Fusion Example 14th International Conferene on Information Fusion Chiago, Illinois, USA, July 5-8, 2011 Video Data and Sonar Data: Real World Data Fusion Example David W. Krout Applied Physis Lab dkrout@apl.washington.edu

More information

A Partial Sorting Algorithm in Multi-Hop Wireless Sensor Networks

A Partial Sorting Algorithm in Multi-Hop Wireless Sensor Networks A Partial Sorting Algorithm in Multi-Hop Wireless Sensor Networks Abouberine Ould Cheikhna Department of Computer Siene University of Piardie Jules Verne 80039 Amiens Frane Ould.heikhna.abouberine @u-piardie.fr

More information

A radiometric analysis of projected sinusoidal illumination for opaque surfaces

A radiometric analysis of projected sinusoidal illumination for opaque surfaces University of Virginia tehnial report CS-21-7 aompanying A Coaxial Optial Sanner for Synhronous Aquisition of 3D Geometry and Surfae Refletane A radiometri analysis of projeted sinusoidal illumination

More information

Announcements. Lecture Caching Issues for Multi-core Processors. Shared Vs. Private Caches for Small-scale Multi-core

Announcements. Lecture Caching Issues for Multi-core Processors. Shared Vs. Private Caches for Small-scale Multi-core Announements Your fous should be on the lass projet now Leture 17: Cahing Issues for Multi-ore Proessors This week: status update and meeting A short presentation on: projet desription (problem, importane,

More information

A Dual-Hamiltonian-Path-Based Multicasting Strategy for Wormhole-Routed Star Graph Interconnection Networks

A Dual-Hamiltonian-Path-Based Multicasting Strategy for Wormhole-Routed Star Graph Interconnection Networks A Dual-Hamiltonian-Path-Based Multiasting Strategy for Wormhole-Routed Star Graph Interonnetion Networks Nen-Chung Wang Department of Information and Communiation Engineering Chaoyang University of Tehnology,

More information

Graph-Based vs Depth-Based Data Representation for Multiview Images

Graph-Based vs Depth-Based Data Representation for Multiview Images Graph-Based vs Depth-Based Data Representation for Multiview Images Thomas Maugey, Antonio Ortega, Pasal Frossard Signal Proessing Laboratory (LTS), Eole Polytehnique Fédérale de Lausanne (EPFL) Email:

More information

Rotation Invariant Spherical Harmonic Representation of 3D Shape Descriptors

Rotation Invariant Spherical Harmonic Representation of 3D Shape Descriptors Eurographis Symposium on Geometry Proessing (003) L. Kobbelt, P. Shröder, H. Hoppe (Editors) Rotation Invariant Spherial Harmoni Representation of 3D Shape Desriptors Mihael Kazhdan, Thomas Funkhouser,

More information

FUZZY WATERSHED FOR IMAGE SEGMENTATION

FUZZY WATERSHED FOR IMAGE SEGMENTATION FUZZY WATERSHED FOR IMAGE SEGMENTATION Ramón Moreno, Manuel Graña Computational Intelligene Group, Universidad del País Vaso, Spain http://www.ehu.es/winto; {ramon.moreno,manuel.grana}@ehu.es Abstrat The

More information

We P9 16 Eigenray Tracing in 3D Heterogeneous Media

We P9 16 Eigenray Tracing in 3D Heterogeneous Media We P9 Eigenray Traing in 3D Heterogeneous Media Z. Koren* (Emerson), I. Ravve (Emerson) Summary Conventional two-point ray traing in a general 3D heterogeneous medium is normally performed by a shooting

More information

Detection of RF interference to GPS using day-to-day C/No differences

Detection of RF interference to GPS using day-to-day C/No differences 1 International Symposium on GPS/GSS Otober 6-8, 1. Detetion of RF interferene to GPS using day-to-day /o differenes Ryan J. R. Thompson 1#, Jinghui Wu #, Asghar Tabatabaei Balaei 3^, and Andrew G. Dempster

More information

Self-Adaptive Parent to Mean-Centric Recombination for Real-Parameter Optimization

Self-Adaptive Parent to Mean-Centric Recombination for Real-Parameter Optimization Self-Adaptive Parent to Mean-Centri Reombination for Real-Parameter Optimization Kalyanmoy Deb and Himanshu Jain Department of Mehanial Engineering Indian Institute of Tehnology Kanpur Kanpur, PIN 86 {deb,hjain}@iitk.a.in

More information

And, the (low-pass) Butterworth filter of order m is given in the frequency domain by

And, the (low-pass) Butterworth filter of order m is given in the frequency domain by Problem Set no.3.a) The ideal low-pass filter is given in the frequeny domain by B ideal ( f ), f f; =, f > f. () And, the (low-pass) Butterworth filter of order m is given in the frequeny domain by B

More information

Direct-Mapped Caches

Direct-Mapped Caches A Case for Diret-Mapped Cahes Mark D. Hill University of Wisonsin ahe is a small, fast buffer in whih a system keeps those parts, of the ontents of a larger, slower memory that are likely to be used soon.

More information

Gradient based progressive probabilistic Hough transform

Gradient based progressive probabilistic Hough transform Gradient based progressive probabilisti Hough transform C.Galambos, J.Kittler and J.Matas Abstrat: The authors look at the benefits of exploiting gradient information to enhane the progressive probabilisti

More information

A Coarse-to-Fine Classification Scheme for Facial Expression Recognition

A Coarse-to-Fine Classification Scheme for Facial Expression Recognition A Coarse-to-Fine Classifiation Sheme for Faial Expression Reognition Xiaoyi Feng 1,, Abdenour Hadid 1 and Matti Pietikäinen 1 1 Mahine Vision Group Infoteh Oulu and Dept. of Eletrial and Information Engineering

More information

A scheme for racquet sports video analysis with the combination of audio-visual information

A scheme for racquet sports video analysis with the combination of audio-visual information A sheme for raquet sports video analysis with the ombination of audio-visual information Liyuan Xing a*, Qixiang Ye b, Weigang Zhang, Qingming Huang a and Hua Yu a a Graduate Shool of the Chinese Aadamy

More information

MATH STUDENT BOOK. 12th Grade Unit 6

MATH STUDENT BOOK. 12th Grade Unit 6 MATH STUDENT BOOK 12th Grade Unit 6 Unit 6 TRIGONOMETRIC APPLICATIONS MATH 1206 TRIGONOMETRIC APPLICATIONS INTRODUCTION 3 1. TRIGONOMETRY OF OBLIQUE TRIANGLES 5 LAW OF SINES 5 AMBIGUITY AND AREA OF A TRIANGLE

More information

Cluster-based Cooperative Communication with Network Coding in Wireless Networks

Cluster-based Cooperative Communication with Network Coding in Wireless Networks Cluster-based Cooperative Communiation with Network Coding in Wireless Networks Zygmunt J. Haas Shool of Eletrial and Computer Engineering Cornell University Ithaa, NY 4850, U.S.A. Email: haas@ee.ornell.edu

More information

mahines. HBSP enhanes the appliability of the BSP model by inorporating parameters that reet the relative speeds of the heterogeneous omputing omponen

mahines. HBSP enhanes the appliability of the BSP model by inorporating parameters that reet the relative speeds of the heterogeneous omputing omponen The Heterogeneous Bulk Synhronous Parallel Model Tiani L. Williams and Rebea J. Parsons Shool of Computer Siene University of Central Florida Orlando, FL 32816-2362 fwilliams,rebeag@s.uf.edu Abstrat. Trends

More information

A RAY TRACING SIMULATION OF SOUND DIFFRACTION BASED ON ANALYTIC SECONDARY SOURCE MODEL

A RAY TRACING SIMULATION OF SOUND DIFFRACTION BASED ON ANALYTIC SECONDARY SOURCE MODEL 19th European Signal Proessing Conferene (EUSIPCO 211) Barelona, Spain, August 29 - September 2, 211 A RAY TRACING SIMULATION OF SOUND DIFFRACTION BASED ON ANALYTIC SECONDARY SOURCE MODEL Masashi Okada,

More information

Detecting Outliers in High-Dimensional Datasets with Mixed Attributes

Detecting Outliers in High-Dimensional Datasets with Mixed Attributes Deteting Outliers in High-Dimensional Datasets with Mixed Attributes A. Koufakou, M. Georgiopoulos, and G.C. Anagnostopoulos 2 Shool of EECS, University of Central Florida, Orlando, FL, USA 2 Dept. of

More information

Cluster Centric Fuzzy Modeling

Cluster Centric Fuzzy Modeling 10.1109/TFUZZ.014.300134, IEEE Transations on Fuzzy Systems TFS-013-0379.R1 1 Cluster Centri Fuzzy Modeling Witold Pedryz, Fellow, IEEE, and Hesam Izakian, Student Member, IEEE Abstrat In this study, we

More information

Detecting Moving Targets in Clutter in Airborne SAR via Keystoning and Multiple Phase Center Interferometry

Detecting Moving Targets in Clutter in Airborne SAR via Keystoning and Multiple Phase Center Interferometry Deteting Moving Targets in Clutter in Airborne SAR via Keystoning and Multiple Phase Center Interferometry D. M. Zasada, P. K. Sanyal The MITRE Corp., 6 Eletroni Parkway, Rome, NY 134 (dmzasada, psanyal)@mitre.org

More information

INTERPOLATED AND WARPED 2-D DIGITAL WAVEGUIDE MESH ALGORITHMS

INTERPOLATED AND WARPED 2-D DIGITAL WAVEGUIDE MESH ALGORITHMS Proeedings of the COST G-6 Conferene on Digital Audio Effets (DAFX-), Verona, Italy, Deember 7-9, INTERPOLATED AND WARPED -D DIGITAL WAVEGUIDE MESH ALGORITHMS Vesa Välimäki Lab. of Aoustis and Audio Signal

More information

An Optimized Approach on Applying Genetic Algorithm to Adaptive Cluster Validity Index

An Optimized Approach on Applying Genetic Algorithm to Adaptive Cluster Validity Index IJCSES International Journal of Computer Sienes and Engineering Systems, ol., No.4, Otober 2007 CSES International 2007 ISSN 0973-4406 253 An Optimized Approah on Applying Geneti Algorithm to Adaptive

More information

PARAMETRIC SAR IMAGE FORMATION - A PROMISING APPROACH TO RESOLUTION-UNLIMITED IMAGING. Yesheng Gao, Kaizhi Wang, Xingzhao Liu

PARAMETRIC SAR IMAGE FORMATION - A PROMISING APPROACH TO RESOLUTION-UNLIMITED IMAGING. Yesheng Gao, Kaizhi Wang, Xingzhao Liu 20th European Signal Proessing Conferene EUSIPCO 2012) Buharest, Romania, August 27-31, 2012 PARAMETRIC SAR IMAGE FORMATION - A PROMISING APPROACH TO RESOLUTION-UNLIMITED IMAGING Yesheng Gao, Kaizhi Wang,

More information

Algorithms for External Memory Lecture 6 Graph Algorithms - Weighted List Ranking

Algorithms for External Memory Lecture 6 Graph Algorithms - Weighted List Ranking Algorithms for External Memory Leture 6 Graph Algorithms - Weighted List Ranking Leturer: Nodari Sithinava Sribe: Andi Hellmund, Simon Ohsenreither 1 Introdution & Motivation After talking about I/O-effiient

More information

Allocating Rotating Registers by Scheduling

Allocating Rotating Registers by Scheduling Alloating Rotating Registers by Sheduling Hongbo Rong Hyunhul Park Cheng Wang Youfeng Wu Programming Systems Lab Intel Labs {hongbo.rong,hyunhul.park,heng..wang,youfeng.wu}@intel.om ABSTRACT A rotating

More information

BENDING STIFFNESS AND DYNAMIC CHARACTERISTICS OF A ROTOR WITH SPLINE JOINTS

BENDING STIFFNESS AND DYNAMIC CHARACTERISTICS OF A ROTOR WITH SPLINE JOINTS Proeedings of ASME 0 International Mehanial Engineering Congress & Exposition IMECE0 November 5-, 0, San Diego, CA IMECE0-6657 BENDING STIFFNESS AND DYNAMIC CHARACTERISTICS OF A ROTOR WITH SPLINE JOINTS

More information

Boosted Random Forest

Boosted Random Forest Boosted Random Forest Yohei Mishina, Masamitsu suhiya and Hironobu Fujiyoshi Department of Computer Siene, Chubu University, 1200 Matsumoto-ho, Kasugai, Aihi, Japan {mishi, mtdoll}@vision.s.hubu.a.jp,

More information

KERNEL SPARSE REPRESENTATION WITH LOCAL PATTERNS FOR FACE RECOGNITION

KERNEL SPARSE REPRESENTATION WITH LOCAL PATTERNS FOR FACE RECOGNITION KERNEL SPARSE REPRESENTATION WITH LOCAL PATTERNS FOR FACE RECOGNITION Cuiui Kang 1, Shengai Liao, Shiming Xiang 1, Chunhong Pan 1 1 National Laboratory of Pattern Reognition, Institute of Automation, Chinese

More information

特集 Road Border Recognition Using FIR Images and LIDAR Signal Processing

特集 Road Border Recognition Using FIR Images and LIDAR Signal Processing デンソーテクニカルレビュー Vol. 15 2010 特集 Road Border Reognition Using FIR Images and LIDAR Signal Proessing 高木聖和 バーゼル ファルディ Kiyokazu TAKAGI Basel Fardi ヘンドリック ヴァイゲル Hendrik Weigel ゲルド ヴァニーリック Gerd Wanielik This paper

More information

We don t need no generation - a practical approach to sliding window RLNC

We don t need no generation - a practical approach to sliding window RLNC We don t need no generation - a pratial approah to sliding window RLNC Simon Wunderlih, Frank Gabriel, Sreekrishna Pandi, Frank H.P. Fitzek Deutshe Telekom Chair of Communiation Networks, TU Dresden, Dresden,

More information

A Unified Subdivision Scheme for Polygonal Modeling

A Unified Subdivision Scheme for Polygonal Modeling EUROGRAPHICS 2 / A. Chalmers and T.-M. Rhyne (Guest Editors) Volume 2 (2), Number 3 A Unified Subdivision Sheme for Polygonal Modeling Jérôme Maillot Jos Stam Alias Wavefront Alias Wavefront 2 King St.

More information

Particle Swarm Optimization for the Design of High Diffraction Efficient Holographic Grating

Particle Swarm Optimization for the Design of High Diffraction Efficient Holographic Grating Original Artile Partile Swarm Optimization for the Design of High Diffration Effiient Holographi Grating A.K. Tripathy 1, S.K. Das, M. Sundaray 3 and S.K. Tripathy* 4 1, Department of Computer Siene, Berhampur

More information

arxiv: v1 [cs.db] 13 Sep 2017

arxiv: v1 [cs.db] 13 Sep 2017 An effiient lustering algorithm from the measure of loal Gaussian distribution Yuan-Yen Tai (Dated: May 27, 2018) In this paper, I will introdue a fast and novel lustering algorithm based on Gaussian distribution

More information

Cross-layer Resource Allocation on Broadband Power Line Based on Novel QoS-priority Scheduling Function in MAC Layer

Cross-layer Resource Allocation on Broadband Power Line Based on Novel QoS-priority Scheduling Function in MAC Layer Communiations and Networ, 2013, 5, 69-73 http://dx.doi.org/10.4236/n.2013.53b2014 Published Online September 2013 (http://www.sirp.org/journal/n) Cross-layer Resoure Alloation on Broadband Power Line Based

More information

Calculation of typical running time of a branch-and-bound algorithm for the vertex-cover problem

Calculation of typical running time of a branch-and-bound algorithm for the vertex-cover problem Calulation of typial running time of a branh-and-bound algorithm for the vertex-over problem Joni Pajarinen, Joni.Pajarinen@iki.fi Otober 21, 2007 1 Introdution The vertex-over problem is one of a olletion

More information

Fast Distribution of Replicated Content to Multi- Homed Clients Mohammad Malli Arab Open University, Beirut, Lebanon

Fast Distribution of Replicated Content to Multi- Homed Clients Mohammad Malli Arab Open University, Beirut, Lebanon ACEEE Int. J. on Information Tehnology, Vol. 3, No. 2, June 2013 Fast Distribution of Repliated Content to Multi- Homed Clients Mohammad Malli Arab Open University, Beirut, Lebanon Email: mmalli@aou.edu.lb

More information

SSD Based First Layer File System for the Next Generation Super-computer

SSD Based First Layer File System for the Next Generation Super-computer SSD Based First Layer File System for the Next Generation Super-omputer Shinji Sumimoto, Ph.D. Next Generation Tehnial Computing Unit FUJITSU LIMITED Sept. 24 th, 2018 0 Outline of This Talk A64FX: High

More information

Exploring the Commonality in Feature Modeling Notations

Exploring the Commonality in Feature Modeling Notations Exploring the Commonality in Feature Modeling Notations Miloslav ŠÍPKA Slovak University of Tehnology Faulty of Informatis and Information Tehnologies Ilkovičova 3, 842 16 Bratislava, Slovakia miloslav.sipka@gmail.om

More information

Reading Object Code. A Visible/Z Lesson

Reading Object Code. A Visible/Z Lesson Reading Objet Code A Visible/Z Lesson The Idea: When programming in a high-level language, we rarely have to think about the speifi ode that is generated for eah instrution by a ompiler. But as an assembly

More information

Exploiting Enriched Contextual Information for Mobile App Classification

Exploiting Enriched Contextual Information for Mobile App Classification Exploiting Enrihed Contextual Information for Mobile App Classifiation Hengshu Zhu 1 Huanhuan Cao 2 Enhong Chen 1 Hui Xiong 3 Jilei Tian 2 1 University of Siene and Tehnology of China 2 Nokia Researh Center

More information

Time delay estimation of reverberant meeting speech: on the use of multichannel linear prediction

Time delay estimation of reverberant meeting speech: on the use of multichannel linear prediction University of Wollongong Researh Online Faulty of Informatis - apers (Arhive) Faulty of Engineering and Information Sienes 7 Time delay estimation of reverberant meeting speeh: on the use of multihannel

More information

Alleviating DFT cost using testability driven HLS

Alleviating DFT cost using testability driven HLS Alleviating DFT ost using testability driven HLS M.L.Flottes, R.Pires, B.Rouzeyre Laboratoire d Informatique, de Robotique et de Miroéletronique de Montpellier, U.M. CNRS 5506 6 rue Ada, 34392 Montpellier

More information

Automated System for the Study of Environmental Loads Applied to Production Risers Dustin M. Brandt 1, Celso K. Morooka 2, Ivan R.

Automated System for the Study of Environmental Loads Applied to Production Risers Dustin M. Brandt 1, Celso K. Morooka 2, Ivan R. EngOpt 2008 - International Conferene on Engineering Optimization Rio de Janeiro, Brazil, 01-05 June 2008. Automated System for the Study of Environmental Loads Applied to Prodution Risers Dustin M. Brandt

More information