Reevaluating the overhead of data preparation for asymmetric multicore system on graphics processing

Size: px
Start display at page:

Download "Reevaluating the overhead of data preparation for asymmetric multicore system on graphics processing"

Transcription

1 KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS VOL. 10, NO. 7, Jul Copyright 2016 KSII Reevaluating the overhead of data preparation for asymmetri multiore system on graphis proessing Songwen Pei 1,2, Junge Zhang 1, Linhua Jiang 1, Myoung-Seo Kim 2 and Jean-Lu Gaudiot 2 1 Shanghai Key Lab of Modern Optial Systems, University of Shanghai for Siene and Tehnology, Shanghai , China [ {swpei, lhjiang}@usst.edu.n] 2 Parallel Systems and Computer Arhiteture Lab,University of California Irvine, CA 92697, USA [ {myoungseo.kim, gaudiot}@ui.edu] *Corresponding author: Songwen Pei, Myoung-Seo Kim Reeived May 10, 2016; revised May 24, 2016; aepted June 5, 2016; published July 31, 2016 Abstrat As proessor design has been transiting from homogeneous multiore proessor to heterogeneous multiore proessor, traditional Amdahl s law annot meet the new hallenges for asymmetri multiore system. In order to further investigate the impat fators related to the Overhead of Data Preparation (ODP) for Asymmetri multiore systems, we evaluate an asymmetri multiore system built with CPU-GPU by measuring the overheads of memory transfer, omputing kernel, ahe missing and synhronization. This paper demonstrates that dereasing the overhead of data preparation is a promising approah to improve the whole performane of heterogeneous system. Keywords: Overhead of data preparation; heterogeneous multiore; Amdahl s law; power onsumption A preliminary version of this paper appeared in The 1 st International Conferene on Eletronis, Eletrial Engineering, Computer Siene: Innovation and Convergene (EEECS 2016), January 20-22, Phuket, Thailand. This version inludes a onrete analysis and supporting implementation results on asymmetri mulitore system on graphis proessing. This researh was supported by a researh grant from the Shanghai Muniipal Natural Siene Foundation (15ZR ), Shanghai Pujiang Program(16PJ ), the program for Professor of Speial Appointment (Eastern Sholar) at Shanghai Institutions of Higher Learning, USST inubation projet(15hjpy-ms02), and the National Siene Foundation (XPS ). ISSN :

2 3232 Pei et al.: Reevaluating the Overhead of Data Preparation 1. Introdution As we enter the multiore era, asymmetri multiore system[1] is beoming the mainstream in the field of future proessor design. Reent examples inlude Intel s MIC[2], AMD s Kabini[3], and NVIDIA s Projet Denver[4], et. However, the memory wall and ommuniation issues will ontinue inreasing the gap between the performane of an ideal proessor and that of a pratial proessor, beause the overhead of data preparation beomes an unavoidable key problem. Amdahl s Law was a state-of-the-art analytial model that guided software developers to evaluate the atual speedup that ould be ahieved by using parallel programs, or hardware designers to draw muh more elaborate miroarhiteture and omponents. Gene Amdahl[5] defined his law for the speial ase of using n proessors (ores) in parallel when he argued for the single-proessor approah s validity for ahieving large-sale omputing apabilities. He used a limit argument to assume that a fration f of a program s exeution time was infinitely parallelizable with no sheduling overhead and other overheads to generate ready data to be used by omputing units, while the remaining fration, 1 f, was totally sequential. He noted that the speedup on n proessors is governed by equation (1): 1 Speedup(f, n) = f (1- f)+ (1) n This equation Amdahl s law on basis of omputing-entri system whih never takes into aount the potential ost of data preparation. It is only orret as long as three key assumptions are verified: (1) the programs to be exeuted are of fixed-datasize and the fration of the programs that is parallelizable remains onstant as well; (2) there exists an infinite memory spae without extra overhead of swithing memory bloks and disk bloks, et. (3) the overhead of preparing the data to be used by omputing units, whih inludes memory aess, ommuniation on-hips or off-hips and synhronization among ores, an be negleted. Currently, some CPU-GPU memory arhitetures are using separate memory address spaes for the CPU and the GPU, while the programmer is responsible for ommuniation between them by using a relatively slow ommuniation hannel,suh as PCIe bus. This ommuniation bottlenek limits the peak throughput that an be obtained from these systems. Otherwise, the power onsumption has beome an important issue that impats GPU appliations. Therefore, traditional Amdahl s Law is not fit for heterogeneous omputer system, we will take onsideration of the overhead of data preparation in asymmetri multiore system. Despite some ritiisms, Amdahl s Law is still relevant as we enter a heterogeneous multi-ore omputing era. However, the future relevane of the law requires its extension by the inlusion of onstraints and arhitetural trends demanded by modern multiproessor hips. There are a lot of ahievable researhes in theory to extend Amdahl s Law. Woo and Lee[6] extended Amdahl s law for energy-effiient omputing of many-ore, who lassified

3 KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS VOL. 10, NO. 7, July many-ore design styles as three type: symmetri supersalar proessor tagged with P*, symmetri smaller power-effiient ore tagged with *, and asymmetri many-ore proessor with supersalar proessor and many smaller ores tagged with P + *. The researh results show that heterogeneous arhiteture is better than symmetri system to save power. Similarly, Marowka[7] extended Amdahl s law for heterogeneous omputing, and investigated how energy effiieny and salability are affeted by the power onstraints for three kind of heterogeneous omputer system, i.e., symmetri, asymmetri and simultaneous asymmetri. The analysis shows learly that greater parallelism is the most important fator affeting power onsumption. Karanikolaou, et al evaluated experimental energy onsumption based on both of performane/power and performane/energy ratio metris[8]. Kim et al foused on the energy effiieny of the sequential part aeleration, and how to determine the optimal frequeny boosting ratio whih maximize energy effiieny[9]. Londono et al modeled the potential dynami energy improvement, optimal frequeny and voltage alloation of a multiore system in terms of extending Amdahl s law[10]. EyerMan et al [11] introdued a model that aounts for ritial setions of te parallelizable fration of a program. Ge et al proposed a power-aware speedup model to predit the saled exeution time of power-aware lusters by isolating the performane effets of hanging proessor frequenies and the number of nodes[12]. In this paper we analyse the extended Amdahl s law metioned in the papers[16-19] and demonstrate that data preparation is another promising approah to improve performane of heterogeneous system. The ontributions in this paper are: (1) Rethinking and analysing the general model of Amdahl s law; (2) Proposing the overhead of data preparation in the extended Amdahl s law; (3) Reevaluating an asymmetri multiore system built with CPU-GPU by measuring the overheads of ommuniation, omputing kernel, ahe missing and synhronization. This paper is organized as follows. Setion II revisits the extended Amdahl s law of asymmetri multiore system by onsidering the overhead of data preparation[13]. Setion III analyses the affeting fators of data preparation. Setion IV proves the affeting fators of data preparation should not be negleted through Rodinia benhmarks. Setion VI onludes our work and future missions. 2. Revisiting the extended Amdahl s law by onsidering the overhead of data preparation The graphis proessing unit (GPU) has made signifiant strides as an aelerator in parallel omputing. However, GPU has resided out on PCIe as a disrete devie, the performane of GPU appliations an be bottleneked by data transfers between the CPU and GPU over PCIe. So the overhead of preparing the data and aessing the memory, of transmitting data on-hip and off-hip, of transferring data between CPU memories and GPU memories for heterogeneous system, of synhronizing proesses, et, beome signifiant that it annot be ignored any longer. However, traditional Amdahl s law only onsiders the ost of instrution exeution. We will thus now assume that whole ost of exeuting a program an be split into two independent parts: ( p ) and (1 p ), as shown in the Fig. 1, one (1 p ) is preparing the data exeution and the other one ( p ) is running instrution when the required data are ready.

4 3234 Pei et al.: Reevaluating the Overhead of Data Preparation Therefore, the Overhead of Data Preparation (ODP) inludes the whole ost of preparing data for exeution. As proposed by [13], ODP an be onsidered to produe a new speedup equation will all it the Extended Amdahl s law as expressed in equation (2): S ( f,, p ) EA 1 f ((1- f ) ) p (1- p ) where p denotes the omputation portion, 1 p denotes the Data Preparation portion normalized to the omputation portion: sine the lok frequenies and the ISAs of CPUs, GPUs, off-hip bus and memory arbitrator would likely be different, we should normalize the performane of Data Preparation instrutions to that of omputing instrutions. The f is the parallelizable omputation portion, 1 f is the sequential omputation portion. (2) 1 ore P 1-P 1-f f 1-fh (1-α) fh α fh ores 1-f f/ 1-fh k(1-α) fh (1-k)(1-α) fh α fh Fig. 1. Illustration of Extended Amdahl's law As shown in Fig. 1, however, equation (2) does not make allowanes for the introdution of tehniques that would derease or eliminate the overhead of data preparation. We further divide the portion of data preparation into three sub-parts: 1 f, f and (1 ) f h h h. f h denotes the portion of the program that an be overlapped with omputing instrutions in theory, where 0< f <1. But, it annot be ompletely overlapped in pratial due to the h unpredited run-time negative fators from the limits of hardware resoures. In other words, the exeution of this portion is masked behind the exeution of the omputation instrutions. In general, memory aess instrutions an be exeuted in parallel with independent omputing instrutions. However, it will not be possible to exeute all of the data preparation instrutions simultaneously with omputing instrutions. For example, if a load instrution is theoretially independent of the next omputing instrutions, it an be theoretially issued with the next omputing instrutions. However, it would not be issued beause the queue of issuing load instrutions is full. Therefore, we introdue the parameter to denote the perentage of data preparation instrutions whih are atually exeuted simultaneously with omputing instrutions before using advaned tehnologies, where 0< <1. Thus, f denotes the portion of real parallelized instrutions for data preparation and (1 ) f h denotes the portion of data preparation instrutions whih annot be overlapped with omputing instrutions before adopting advaned tehnologies. Furthermore, with the help of advaned tehnologies suh as data prefething, speulative exeution, universal memory, no-opy data transfer, 3D NoC, et., the fration of data preparation instrutions whih annot be overlapped on one proessor will sharply derease as h

5 Transfer Time(ms) KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS VOL. 10, NO. 7, July the number of ores inreases. In other words, (1 ) f h ould be dereased signifiantly. Furthermore, we introdue a variable k to model how muh perentage of data preparation that annot be overlapped on a ores system even adopting advaned tehnologies, where 0< k <1. After normalization to the omputation, the fration of data preparation instrutions whih annot be overlapped on a ore system beomes fud (1 fh) k (1 ) f. On the h opposite, the fration of data preparation instrutions whih an be overlapped on a ore system beomes f f (1 k ) (1 ) f, where f f 1. While, the pd h h ud pd 1 f denotes the fration of data preparation whih is losely dependent on omputation. h Therefore, we an extend Amdahl's law to be a new equation alled Enhaned Amdahl's law. The performane speedup is governed by: ' 1 S EA( f,, p, fud ) (3) f ((1 f ) ) p fud (1 p ) 3. Analysis the impat fators of data preparation Some multiore performane models tend to ignore the impat of data synhronization, CPU-GPU ommuniation, ahe missing rate and so on. In this setion, we take aount of some possible impat fators suh as kernel overhead, memory overhead, data transfer overhead and ahe missing rate whih ontribute to the overhead of data preparation.amdal s Law on basis never takes aount of those overhead. 3.1 Memory Transfer Overhead In CPU-GPU heterogeneous omputing, one overhead is aused by the disjoint address spaes of the CPU and GPU and the need to expliitly transfer data between the two memory spaes. Before exeuting a kernel on the GPU, all of the data used by kernel needs to be transferred from the CPU memory to the GPU memory. After exeution, the data produed by the kernel mostly needs to be transferred bak to the CPU memory. The funtion to aomplish this memory transferring is usually udamempy to devie pinned to devie from devie pinned from devie E+00 1.E+01 1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07 1.E+08 1.E+09 Bytes per transfer Fig. 2. Time per memory transfer for different size of transferred data

6 Time per all(us) 3236 Pei et al.: Reevaluating the Overhead of Data Preparation Fig. 2 shows the average transfer time by using udamempy strides a wide range as the size of transferring data inrease. Four different onfigurations are measured, and their results are all different whether the data are transferred to or from the GPU memory, and whether we use pinned or non-pinned memory. Pinned memory is the memory whih is alloated by using the udamallohost funtion, whih prevents the memory from being swapped out and provides improved transfer speeds. However, the non-pinned memory is the memory whih is alloated by using the mallo funtion. For the small size of transfer data, there is a onstant overhead of transferring data whih is around 10 µs. As the size of transferring data inreasing about 8 KB, the transfer time starts to inrease linearly. Besides, the overhead of transferring data in pinned memory is smaller than that in non-pinned memory. 3.2 Kernel Overhead Kernel overhead inludes the overhead of synhronization and launhing a kernel. Synhronization overhead an be a barrier to ahieving good performane for appliations utilizing fine-grained synhronization. The overhead of launhing a kernel an severely impat the performane of a CUDA appliation if the kernel is invoked many times. To evaluate the overhead of kernel exeution, we measured the average time required to launh an empty kernel over a large number of kernel invoations. Fig. 3 shows the time of alling eah kernel on four different GPUs. The asynhronization bar represents the ase where the kernel is invoked repeatedly without synhronization between alls. The synhronization bar represents the ase where the udathreadsynhronize funtion is alled after alling eah kernel, and fores the CPU to wait for the exeution of urrent kernel to be finished before alling the next kernel. The y-axis is the time for alling a kernel, whih mostly results from the overhead of aessing the GPU driver API and Runtime API. As shown in the Fig. 3, the average time of alling a kernel with synhronization is about 3.5 times than that of asynhronous mode. Therefore, the synhronization overhead is muh greater than that of asynhronous Synhronization Asynhronization GTX 280 GTX GTS 8800 GTX Fig. 3. Time of alling an empty kernel 3.3 Cahe Missing Rate Another overhead of data preparation is aused by the ahe missing. The appliation with

7 KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS VOL. 10, NO. 7, July large problem sizes and irregular aessing memory would ause serious Cahe missing in the run-time beause of unonventional memory aess patterns are poorly handled by pre-fething. CPUs are designed to redue the effetive memory aess lateny through extensive multi ahe level and prefething tehnologies. Thus, a slightly irregular memory aess pattern an be suessfully aptured by the CPU ahes. However, the same aess pattern may be irregular enough to prevent effiient utilization of the GPU memory bandwidth, due to the restritions on aess patterns that must be met in order to ahieve good memory performane. The restritions of aess pattern an be partially relaxed by taking advantage of the GPU speial-purpose address spaes. Both onstant and texture memory provide small on-hip ahes that allow threads to take advantage of fine-grained spatial and temporal loality. In addition, texture memory relaxes the alignment requirements that must be met in order for multiple memory aesses from within the same warp to be oalesed. Zero opy improves performane by eliminating these redundant data opies. Another effetive approah is to use the software-ontrolled shared memory as an expliitly managed ahe, whih an signifiantly improve performane when data elements are frequently reused. 4. Experimental Evaluation In this setion, to evaluate the overhead of the data preparation, we performed different appliations from Rodinia benhmark. In order to implement GPU programs, the Rodinia suite uses CUDA [14, 20], an extension to C for GPUs. CUDA is a parallel omputing platform and programming model invented by NVIDIA. It enables dramati inreases in omputing performane by harnessing the power of the graphis proessing unit (GPU). Rodinia is a benhmark suite for heterogeneous omputing. To help arhitets study emerging platforms suh as GPUs (Graphis Proessing Units), Rodinia inludes appliations and kernels whih target on multi-ore CPU and GPU platforms. 4.1 Platform and Appliations The benhmarks used for experiment are mentioned in Table 1. The Rodinia[15] benhmark suite is designed to provide parallel programs for the study of heterogeneous systems. We use seven benhmarks from Rodinia, whih are k-means (KM), bak propagation (BP), breadth-first searh(bfs), hotspot(hs), Needleman-Wunsh(NW), spekle reduing anisotropi diffusion and stream luster(srad). Table 1. Seleted appliations from Rodinia Benhmark Appliation Desription Input dataset BFS Breadth First Searh Graph1MW_6.txt NW Needleman-Wunsh (Dynami Programming) 2048*2048 HS Hot spot (Strutured Grid) 500*500 BP Bak propagation KM K-means Clustering SC Stream luster SRAD Spekle reduing anisotropi diffusion 2048*2048

8 perentage(%) 3238 Pei et al.: Reevaluating the Overhead of Data Preparation The KM is a lustering algorithm used extensively in the field of data mining. This identifies related points by assoiating eah data point with its nearest luster, omputing new luster entroids, and iterating until onvergene. BP is a mahine-learning algorithm for layered neural network. This appliation is omprised of two phases: the forward phase and bakward phase. HS is a thermal simulation tool used for estimating proessor temperature based on an arhitetural floor plan and simulated power measurements. NW is an optimization method for DNA sequene alignments, and BFS traverses all the onneted omponents in a graph. SRAD is a diffusion algorithm based on partial differential equations and used for removing the spekles in an image without sarifiing important image features. SRAD is widely used in ultrasoni and radar imaging appliations. The inputs to the program are ultrasound images and the value of eah point in the omputation domain depends on its four neighbors. Those seven benhmarks an be divided into two ategories: ompute-intensive and memory-intensive appliations. SRAD and HotSpot are relatively ompute-intensive, while Needleman-Wunsh, Breadth-First Searh, Kmeans, and Stream Cluster are memory-intensive appliation. 4.2 Experimental Results The GPU in our experimental platform for heterogeneous system is NVIDIA GTX 750. It has 512 streaming multi-proessors (SMs), and 2 GB devie memory. We investigate the overhead of data preparation by omparing against the total exeution time of the appliations from Rodinia benhmark. As mentioned in setion III, We mainly fous on overhead of memory transfer, synhronization and ahe missing rate. The omputing performane of appliations is mostly affeted by those overheads, whih we will disuss in the following setions. As shown in Fig. 4, the overhead of different fators are analyzed, suh as kernel exeution, CPU-GPU ommuniation, CPU exeution and synhronization. And to different appliations, the influene of those fators is not the same. For example, the CPU-GPU ommuation overhead of SRAD and BP are relatively high beause of they have a large dataset. In addition, as for NW, HS and BP, synhronization overhead makes up a high proportion of data preparation for the data dependent. 60 Kernel Exeution CPU Exeution CPU-GPU Communiation Synhronization KM NW HS BP SRAD BFS SC Fig. 4. Perentage of total exeution time

9 perentage(%) KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS VOL. 10, NO. 7, July Communiation Overhead The overhead of ommuniation between GPUs and CPUs often beomes a major fator to omputing performane. For large dataset, the overhead of data preparation is mainly determined by the bandwidth of PCIe bus. Slowing data transfer between CPU and GPU exaerbates the overhead of data preparation. As shown in the Fig. 5, SRAD and BP require signifiant CPU-GPU ommuniation time beause they need a lot of data to be inputted. The average of ommuniation time for SRAD and BP is almost one-third of the whole runtime. As for NW and HS, all of the omputation is done on the GPU and the results are expliitly transferred bak only after all GPU work has been ompleted. In these appliations, the memory transfer overhead is hard to be further redued. In onlusion, ommuniation overhead should be onsidered in the Extended Amdahl s Law. Understanding the appliation s memory aess patterns on the GPU is ruial to ahieve good performane. If the neighboring threads aess neighboring rows in an array, alloating the array in olumn-major order will allow threads within the same warp to aess ontiguous elements. So warp sheduling an hide the memory lateny and derease the ommuniation overhead to some extent. 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% CPU-GPU Communiation Other Overhead KM NW HS BP SRAD BFS SC Synhronization Overhead Fig. 5. CPU-GPU ommuniation overhead As shown in the Fig. 6, the overhead of synhronization is a barrier to ahieve high performane of appliations by utilizing fine-grained synhronization. SRAD and SC show relatively low synhronization overhead beause the majority of their omputations are independent. However, NW, HS, BP and BFS have great overhead of synhronization. The synhronization perentage of those appliations is between 30% and 40%, beause the majority of their omputations are interdependent. As show in equation (3), if parallelism is low, ignore the effets of data synhronization overhead is aeptable. But as parallelism growing, this effet beomes more predominant. Che et al. [18] also showed the importane of using the ahed, read-only onstant memory and texture memory spaes for frequently reused

10 perentage(%) perentage(%) 3240 Pei et al.: Reevaluating the Overhead of Data Preparation data within a thread blok. These methods are espeially helpful in reduing the overhead of GPU s data synhronization. 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Synhronization Other Overhead KM NW HS BP SRAD BFS SC Fig. 6. Synhronization overhead At all, the overhead of CPU-GPU ommuniation and synhronization has great effet on data preparation. As show in Fig. 7, we an alulate the total perentage of ommuniation and synhronization during the running time. The rate of NW and BP are already above the half of the total time (NW is 52.4% and BP is 55.1%).The average rate of those benhmarks is 37.83%.Aording to Fig. 7, we find that data synhronization and ommuniation affet Amdahl s Law in the hetergeneous arhiteture. It is apparent that the overhead of data preparation (CPU-GPU ommuniation and data synronization) takes up a great proportion of the total exeution time. 60 CPU-GPU Communiation Synhronization KM NW HS BP SRAD BFS SC Average Fig. 7. Overhead of Communiation and Synhronization

11 L2 ahe missing rate(%) KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS VOL. 10, NO. 7, July The Overhead of Cahe Missing Cahe missing rate is another fator whih ontributes to the overhead of data preparation. As shown in the Fig. 8, irregular data aess is poorly handled by pre-fething. NW exhibits a high L2 ahe missing rate of 41.0% beause of its irregular memory aess patterns (diagonal strips). BFS exhibits L2 ahe missing rate of 20.8% and KM exhibits 27.1%, whih exhibit regular behavior, present ahe miss rates that are lower but still high enough to be of interest. The L2 ahe miss rates of other appliations range from 2.0% to 12.0%. In terms of those experimental data, we onlude that the appliations with memory-intensive and irregular data aessing would inur high ahe missing rate KM NW HS BP SRAD BFS SC Fig. 8. L2 ahe missing rate 5. Conlusion Aording to the experiment results, alling a kernel would inurs high overhead of performane and programming to deal with the CPU-GPU ommuniation (as in BP and SRAD). High synhronization overheads are aused by omputations interdependent suh as HS and BP. Therefore, the overhead of data preparation is a major fator whih we should not be negleted in heterogeneous system. Traditional Amdahl s Law ignores those overheads, so for tasks of high synhronization, parallel exeution may result in signifiantly lower speedup than as predited by traditional Amdahl s Law. When we take into aount the overhead of data preparation, the speedup will not grow linearly as the number of ores inreases. Therefore, the speedup obtained is always less than the ideal in terms of traditional Amdahl's law. Therefore, we onsider the overhead of data preparation is an important fator by extending traditional Amdahl s Law. The effets of ommuniation, synhronization and ahe missing rate an be overome by proper arhitetural and software design. If a multiore arhiteture has no private memory, then there would be no need in synhronization and ommuniation: the only one memory is shared by all of the ores and data aess at the same spae. Therefore suh memory arhiteture would be unaffeted by those overheads of data preparation. But this may inur long memory lateny. In this paper, we have presented an Extended Amdahl s Law taking into aount the effets

12 3242 Pei et al.: Reevaluating the Overhead of Data Preparation of data preparation. By experiment, the heterogeneous multiore performane is fundamentally limited by data synhronization, CPU-GPU ommuniation and ahe missing rate. The maximum performane speedup ahieved by heterogeneous multiore an be lower than predited by Amdahl s Law. In the future work, we will keep on investigating the performane gain within limited energy (or power) budget while onsidering the overhead of data preparation. Referenes [1] P.Rogers, Heterogeneous System Arhiteture Overview, in Pro. of Hot Chips Sym., Aug Artile (CrossRef Link) [2] A.Duran and M.Klemm, The Intel Many Integrated Core Arhiteture, in Pro. of the International Conferene on High Performane Computing and Simulation(HPCS), pp , July Artile (CrossRef Link) [3] D.Bouvier, B.Cohen, W.Fry, S.Godey and M.Mantor, Kabini: An AMD Aelerated Proessing Unit System on A Chip, MICRO, vol.34, no.2, pp.22-33, Marh Artile (CrossRef Link) [4] Nvidia Corporation. Nvidia Projet Denver, (available in Feb. 2016). [5] Amdahl G M., Validity of the single proessor approah to ahieving large sale omputing apabilities [J], in Pro. of the Amerian Federation of Information Proessing Soieties (AFIPS), , April Artile (CrossRef Link) [6] D.H.Woo and H.S.Lee, Extending Amdahl s Law for Energy-Effiient Computing in the Many-Core Era, Computer, vol.41, no.12, pp.24-31, De Artile (CrossRef Link) [7] A. Marowka, Extending Amdahl s Law for Heterogeneous Computing, in Pro. of the 2012 International Symposium on Parallel and Distributed Proessing with Appliations(ISPDPA), pp , July Artile (CrossRef Link) [8] E.M.Karanikolaou, E.I.Milovanovi, I.Z ˇ.Milovanovi,M.P.Bekakos, Peformane Salability and Energy Consumpution on Distributed and Many-ore Platforms, Journal of Superomputing, vol.70, no.1, pp ,ot Artile (CrossRef Link) [9] S.H.Kim, D.Kim, C.Lee, W.S.Jeong, W.W.Ro, and J.L.Gaudiot, A Performane-Energy Model to EvaluateSingle Thread Exeution Aeleration, Computer Arhiteture Letters, vol.14, no.2, pp , Nov Artile (CrossRef Link) [10] S.M.London o, and J.P.Gyvez, Extending Amdahl s Law for Energy-Effiieny, in Pro. of International Conferene on Energy Aware Computing(ICEAC), pp.1-4, De Artile (CrossRef Link) [11] Eyerman S, Eekhout L., Modeling ritial setions in Amdahl's law and its impliations for multiore design [J]., in Pro. of the 37 th Annual International Symposium on Computer Arhiteture, ACM, New York, US, 38(3): , Artile (CrossRef Link) [12] R.Ge and K.W.Cameron, Power-Aware Speedup, in Pro. of International Parallel and Distributed Proessing Symposium (IPDPS), Marh 2007, pp Artile (CrossRef Link) [13] S.W.Pei, M.S.Kim, J.L.Gaudiot, Extending Amdahl s Law for Heterogeneous Multiore Proessor with Consideration of the Overhead of Data Preparation, IEEE Embedded Systems Letters, vol. 8, no.1, pp.26-29, Marh Artile (CrossRef Link) [14] J. Nikolls, I. Buk, M. Garland, and K. Skadron, Salable parallel programming with CUDA, ACM Queue, vol.6, no.2, pp.40 53, Artile (CrossRef Link) [15] S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer,S.-H. Lee, and K. Skadron, Rodinia: A benhmark suite for heterogeneous omputing, in Pro. of IEEE International Symposium on Workload Charaterization, pp Artile (CrossRef Link) [16] Hill, Mark D, Marty, Mihael R., Amdahl's Law in the Multiore Era [J], Computer, 2008, 41(7): Artile (CrossRef Link) [17] Erlin Yao, Yungang Bao, Guangming Tan, et al., Extending Amdahl s Law in the Multiore Era[J]., ACM SIGMETRICS Performane Evaluation Review, 37(2):24-26, Artile (CrossRef Link)

13 KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS VOL. 10, NO. 7, July [18] Xian-He Sun, Yong Chen, Reevaluating Amdahl s law in the multiore era [J], Journal of Parallel and distributed Computing, 70(2): , Artile (CrossRef Link) [19] Andreou A G. Beyond Amdahl's Law: An Objetive Funtion That Links Multi-proessor Performane Gains to Delay and Energy [J]. IEEE Transations on Computers, 61(8): , Artile (CrossRef Link) [20] S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, and K. Skadron, A performane study of general purpose appliations on graphis proessors using CUDA, Journal of Parallel and Distributed Computing, 68(10): , Artile (CrossRef Link) Songwen Pei reeived the B.S. in the Shool of Computer from National University of Defene and Tehnology, Changsha, China in 2003, the M.S. in the Shool of Information Engineering from Guizhou University, Guiyang, China in 2006, and the Ph.D. in the Shool of Computer Siene and Tehnology from Fudan University, Shanghai, China in He is urrently an Assoiate Professor of the Computer Siene and Engineering Department at the University of Shanghai for Siene and Tehnology, and he urrently works as a Guest Researher at the Institute of Computing Tehnology, Chinese Aademy of Sienes (2011-), and a Researh Sientist at the Eletrial Engineering and Computer Siene, University of California( ). His researh interests inlude heterogeneous multi-ore proessors, parallel and distributed omputing, loud omputing, big data, and fault-tolerant omputation, et. He is a member of the IEEE, ACM and CCF in China, and he is also a board member of CCF-TCCET, CCF-TCARCH and CCF-YOCSEF Shanghai respetively. He is a talent award-winner of Shanghai Pujiang Program Junge Zhang reeived the B.S. in the omputer siene from Luoyang Normal University in She is urrently a graduate student in omputer arhiteture at the University of Shanghai for Siene and Tehnology. Her urrent researh interests inlude omputer arhiteture, Multi-ore heterogeneous systems, data prefething and GPU power model. Linhua Jiang reeived M.S. degree and Ph.D. degree from Catholi University of Leuven and Leiden University respetively. He is urrently a Professor in the Shool of Optial-Eletrial and Computer Engineering, University of Shanghai for Siene and Tehnology, and is the diretor of the Optial-Computer Siene Researh Group. He was a visiting Faulty of Stanford University. His urrent researh fields inlude Optial-Information and Image Proessing, Computer Network and Cyber-Physial System, Parallel Computing, et. Prof. JIANG published more than one hundred sientifi papers and onferene artiles.

14 3244 Pei et al.: Reevaluating the Overhead of Data Preparation Myoung-Seo Kim reeived both the B.S. degree in Computer Siene and the B.S. degree in Eletrial and Eletronis Engineering and the M.S. degree in Computer Siene from Yonsei University, Seoul, Korea in 2003 and 2005, respetively. He has worked for 5 years performing design and verifiation of portable multimedia system-on-a-hip (SoC) at Samsung Eletronis, Yongin, Korea ( ) and at Apple In., Cupertino, California ( ). He reeived the Ph.D. degree in Computer Siene from University of California, Irvine in He is urrently a prinipal researher in the Parallel Systems and Computer Arhiteture Lab at University of California, Irvine. He was the reipient of the 2011 Yonsei International Foundation Sholarship and the 2012 SK Hynix Study Abroad Sholarship. In addition, he has published more than 15 papers in journals and international onferenes. His researh interests inlude multi/many-ore system-on-a-hip, omputer arhiteture and design, low-power arhiteture, embedded systems, VLSI iruit design and analysis, and design automation. He is a member of the IEEE and the ACM. Jean-Lu Gaudiot reeived the Diplôme d Ingénieur from ESIEE, Paris, Frane in 1976 and the M.S. and Ph.D. degrees in Computer Siene from the University of California, Los Angeles in 1977 and 1982, respetively. He is urrently a Professor and Chair of the Eletrial and Computer Engineering Department at the University of California, Irvine. Prior to joining UCI in January 2002, he was a Professor of Eletrial Engineering at the University of Southern California sine 1982, where he served as and Diretor of the Computer Engineering Division for three years. He has also done miroproessor systems design at Teledyne Controls, Santa Monia, California ( ) and researh in innovative arhitetures at the TRW Tehnology Researh Center, El Segundo, California ( ). He onsults for a number of ompanies involved in the design of high-performane omputer arhitetures. His researh interests inlude multithreaded arhitetures, fault-tolerant multi-proessors, and implementation of reonfigurable arhitetures. He has published over 200 journal and onferene papers. His researh has been sponsored by NSF, DoE, and DARPA, as well as a number of industrial organizations. In January 2006, he beame the first Editor-in-Chief of IEEE Computer Arhiteture Letters, a new publiation of the IEEE Computer Soiety, whih he helped found to the end of failitating short, fast turnaround of fundamental ideas in the Computer Arhiteture domain. From 1999 to 2002, he was the Editor-in-Chief of the IEEE Transations on Computers. He has served on the IEEE Computer Soiety Board of Governors for two terms ( ), was VP of Eduational Ativities (2013), VP of Publiations ( ) and is the 2017 IEEE Computer Soiety President. In 1999, he beame a Fellow of the IEEE. He was elevated to the rank of AAAS Fellow in 2007.

Pipelined Multipliers for Reconfigurable Hardware

Pipelined Multipliers for Reconfigurable Hardware Pipelined Multipliers for Reonfigurable Hardware Mithell J. Myjak and José G. Delgado-Frias Shool of Eletrial Engineering and Computer Siene, Washington State University Pullman, WA 99164-2752 USA {mmyjak,

More information

On - Line Path Delay Fault Testing of Omega MINs M. Bellos 1, E. Kalligeros 1, D. Nikolos 1,2 & H. T. Vergos 1,2

On - Line Path Delay Fault Testing of Omega MINs M. Bellos 1, E. Kalligeros 1, D. Nikolos 1,2 & H. T. Vergos 1,2 On - Line Path Delay Fault Testing of Omega MINs M. Bellos, E. Kalligeros, D. Nikolos,2 & H. T. Vergos,2 Dept. of Computer Engineering and Informatis 2 Computer Tehnology Institute University of Patras,

More information

Announcements. Lecture Caching Issues for Multi-core Processors. Shared Vs. Private Caches for Small-scale Multi-core

Announcements. Lecture Caching Issues for Multi-core Processors. Shared Vs. Private Caches for Small-scale Multi-core Announements Your fous should be on the lass projet now Leture 17: Cahing Issues for Multi-ore Proessors This week: status update and meeting A short presentation on: projet desription (problem, importane,

More information

COST PERFORMANCE ASPECTS OF CCD FAST AUXILIARY MEMORY

COST PERFORMANCE ASPECTS OF CCD FAST AUXILIARY MEMORY COST PERFORMANCE ASPECTS OF CCD FAST AUXILIARY MEMORY Dileep P, Bhondarkor Texas Instruments Inorporated Dallas, Texas ABSTRACT Charge oupled devies (CCD's) hove been mentioned as potential fast auxiliary

More information

Particle Swarm Optimization for the Design of High Diffraction Efficient Holographic Grating

Particle Swarm Optimization for the Design of High Diffraction Efficient Holographic Grating Original Artile Partile Swarm Optimization for the Design of High Diffration Effiient Holographi Grating A.K. Tripathy 1, S.K. Das, M. Sundaray 3 and S.K. Tripathy* 4 1, Department of Computer Siene, Berhampur

More information

What are Cycle-Stealing Systems Good For? A Detailed Performance Model Case Study

What are Cycle-Stealing Systems Good For? A Detailed Performance Model Case Study What are Cyle-Stealing Systems Good For? A Detailed Performane Model Case Study Wayne Kelly and Jiro Sumitomo Queensland University of Tehnology, Australia {w.kelly, j.sumitomo}@qut.edu.au Abstrat The

More information

System-Level Parallelism and Throughput Optimization in Designing Reconfigurable Computing Applications

System-Level Parallelism and Throughput Optimization in Designing Reconfigurable Computing Applications System-Level Parallelism and hroughput Optimization in Designing Reonfigurable Computing Appliations Esam El-Araby 1, Mohamed aher 1, Kris Gaj 2, arek El-Ghazawi 1, David Caliga 3, and Nikitas Alexandridis

More information

Learning Convention Propagation in BeerAdvocate Reviews from a etwork Perspective. Abstract

Learning Convention Propagation in BeerAdvocate Reviews from a etwork Perspective. Abstract CS 9 Projet Final Report: Learning Convention Propagation in BeerAdvoate Reviews from a etwork Perspetive Abstrat We look at the way onventions propagate between reviews on the BeerAdvoate dataset, and

More information

Cross-layer Resource Allocation on Broadband Power Line Based on Novel QoS-priority Scheduling Function in MAC Layer

Cross-layer Resource Allocation on Broadband Power Line Based on Novel QoS-priority Scheduling Function in MAC Layer Communiations and Networ, 2013, 5, 69-73 http://dx.doi.org/10.4236/n.2013.53b2014 Published Online September 2013 (http://www.sirp.org/journal/n) Cross-layer Resoure Alloation on Broadband Power Line Based

More information

Accommodations of QoS DiffServ Over IP and MPLS Networks

Accommodations of QoS DiffServ Over IP and MPLS Networks Aommodations of QoS DiffServ Over IP and MPLS Networks Abdullah AlWehaibi, Anjali Agarwal, Mihael Kadoh and Ahmed ElHakeem Department of Eletrial and Computer Department de Genie Eletrique Engineering

More information

A DYNAMIC ACCESS CONTROL WITH BINARY KEY-PAIR

A DYNAMIC ACCESS CONTROL WITH BINARY KEY-PAIR Malaysian Journal of Computer Siene, Vol 10 No 1, June 1997, pp 36-41 A DYNAMIC ACCESS CONTROL WITH BINARY KEY-PAIR Md Rafiqul Islam, Harihodin Selamat and Mohd Noor Md Sap Faulty of Computer Siene and

More information

A Novel Validity Index for Determination of the Optimal Number of Clusters

A Novel Validity Index for Determination of the Optimal Number of Clusters IEICE TRANS. INF. & SYST., VOL.E84 D, NO.2 FEBRUARY 2001 281 LETTER A Novel Validity Index for Determination of the Optimal Number of Clusters Do-Jong KIM, Yong-Woon PARK, and Dong-Jo PARK, Nonmembers

More information

Facility Location: Distributed Approximation

Facility Location: Distributed Approximation Faility Loation: Distributed Approximation Thomas Mosibroda Roger Wattenhofer Distributed Computing Group PODC 2005 Where to plae ahes in the Internet? A distributed appliation that has to dynamially plae

More information

Methods for Multi-Dimensional Robustness Optimization in Complex Embedded Systems

Methods for Multi-Dimensional Robustness Optimization in Complex Embedded Systems Methods for Multi-Dimensional Robustness Optimization in Complex Embedded Systems Arne Hamann, Razvan Rau, Rolf Ernst Institute of Computer and Communiation Network Engineering Tehnial University of Braunshweig,

More information

A Load-Balanced Clustering Protocol for Hierarchical Wireless Sensor Networks

A Load-Balanced Clustering Protocol for Hierarchical Wireless Sensor Networks International Journal of Advanes in Computer Networks and Its Seurity IJCNS A Load-Balaned Clustering Protool for Hierarhial Wireless Sensor Networks Mehdi Tarhani, Yousef S. Kavian, Saman Siavoshi, Ali

More information

Approximate logic synthesis for error tolerant applications

Approximate logic synthesis for error tolerant applications Approximate logi synthesis for error tolerant appliations Doohul Shin and Sandeep K. Gupta Eletrial Engineering Department, University of Southern California, Los Angeles, CA 989 {doohuls, sandeep}@us.edu

More information

Smooth Trajectory Planning Along Bezier Curve for Mobile Robots with Velocity Constraints

Smooth Trajectory Planning Along Bezier Curve for Mobile Robots with Velocity Constraints Smooth Trajetory Planning Along Bezier Curve for Mobile Robots with Veloity Constraints Gil Jin Yang and Byoung Wook Choi Department of Eletrial and Information Engineering Seoul National University of

More information

Parallel Block-Layered Nonbinary QC-LDPC Decoding on GPU

Parallel Block-Layered Nonbinary QC-LDPC Decoding on GPU Parallel Blok-Layered Nonbinary QC-LDPC Deoding on GPU Huyen Thi Pham, Sabooh Ajaz and Hanho Lee Department of Information and Communiation Engineering, Inha University, Inheon, 42-751, Korea Abstrat This

More information

Volume 3, Issue 9, September 2013 International Journal of Advanced Research in Computer Science and Software Engineering

Volume 3, Issue 9, September 2013 International Journal of Advanced Research in Computer Science and Software Engineering Volume 3, Issue 9, September 2013 ISSN: 2277 128X International Journal of Advaned Researh in Computer Siene and Software Engineering Researh Paper Available online at: www.ijarsse.om A New-Fangled Algorithm

More information

Acoustic Links. Maximizing Channel Utilization for Underwater

Acoustic Links. Maximizing Channel Utilization for Underwater Maximizing Channel Utilization for Underwater Aousti Links Albert F Hairris III Davide G. B. Meneghetti Adihele Zorzi Department of Information Engineering University of Padova, Italy Email: {harris,davide.meneghetti,zorzi}@dei.unipd.it

More information

Design of High Speed Mac Unit

Design of High Speed Mac Unit Design of High Speed Ma Unit 1 Harish Babu N, 2 Rajeev Pankaj N 1 PG Student, 2 Assistant professor Shools of Eletronis Engineering, VIT University, Vellore -632014, TamilNadu, India. 1 harishharsha72@gmail.om,

More information

Multi-hop Fast Conflict Resolution Algorithm for Ad Hoc Networks

Multi-hop Fast Conflict Resolution Algorithm for Ad Hoc Networks Multi-hop Fast Conflit Resolution Algorithm for Ad Ho Networks Shengwei Wang 1, Jun Liu 2,*, Wei Cai 2, Minghao Yin 2, Lingyun Zhou 2, and Hui Hao 3 1 Power Emergeny Center, Sihuan Eletri Power Corporation,

More information

Outline: Software Design

Outline: Software Design Outline: Software Design. Goals History of software design ideas Design priniples Design methods Life belt or leg iron? (Budgen) Copyright Nany Leveson, Sept. 1999 A Little History... At first, struggling

More information

Architecture and Performance of the Hitachi SR2201 Massively Parallel Processor System

Architecture and Performance of the Hitachi SR2201 Massively Parallel Processor System Arhiteture and Performane of the Hitahi SR221 Massively Parallel Proessor System Hiroaki Fujii, Yoshiko Yasuda, Hideya Akashi, Yasuhiro Inagami, Makoto Koga*, Osamu Ishihara*, Masamori Kashiyama*, Hideo

More information

Multi-Channel Wireless Networks: Capacity and Protocols

Multi-Channel Wireless Networks: Capacity and Protocols Multi-Channel Wireless Networks: Capaity and Protools Tehnial Report April 2005 Pradeep Kyasanur Dept. of Computer Siene, and Coordinated Siene Laboratory, University of Illinois at Urbana-Champaign Email:

More information

The Minimum Redundancy Maximum Relevance Approach to Building Sparse Support Vector Machines

The Minimum Redundancy Maximum Relevance Approach to Building Sparse Support Vector Machines The Minimum Redundany Maximum Relevane Approah to Building Sparse Support Vetor Mahines Xiaoxing Yang, Ke Tang, and Xin Yao, Nature Inspired Computation and Appliations Laboratory (NICAL), Shool of Computer

More information

The AMDREL Project in Retrospective

The AMDREL Project in Retrospective The AMDREL Projet in Retrospetive K. Siozios 1, G. Koutroumpezis 1, K. Tatas 1, N. Vassiliadis 2, V. Kalenteridis 2, H. Pournara 2, I. Pappas 2, D. Soudris 1, S. Nikolaidis 2, S. Siskos 2, and A. Thanailakis

More information

Parallelizing Frequent Web Access Pattern Mining with Partial Enumeration for High Speedup

Parallelizing Frequent Web Access Pattern Mining with Partial Enumeration for High Speedup Parallelizing Frequent Web Aess Pattern Mining with Partial Enumeration for High Peiyi Tang Markus P. Turkia Department of Computer Siene Department of Computer Siene University of Arkansas at Little Rok

More information

Multiple-Criteria Decision Analysis: A Novel Rank Aggregation Method

Multiple-Criteria Decision Analysis: A Novel Rank Aggregation Method 3537 Multiple-Criteria Deision Analysis: A Novel Rank Aggregation Method Derya Yiltas-Kaplan Department of Computer Engineering, Istanbul University, 34320, Avilar, Istanbul, Turkey Email: dyiltas@ istanbul.edu.tr

More information

Accelerating Multiprocessor Simulation with a Memory Timestamp Record

Accelerating Multiprocessor Simulation with a Memory Timestamp Record Aelerating Multiproessor Simulation with a Memory Timestamp Reord Kenneth Barr Heidi Pan Mihael Zhang Krste Asanovi Marh, 5 Massahusetts Institute of Tehnology Intelligent sampling gives est speed-auray

More information

Performance Improvement of TCP on Wireless Cellular Networks by Adaptive FEC Combined with Explicit Loss Notification

Performance Improvement of TCP on Wireless Cellular Networks by Adaptive FEC Combined with Explicit Loss Notification erformane Improvement of TC on Wireless Cellular Networks by Adaptive Combined with Expliit Loss tifiation Masahiro Miyoshi, Masashi Sugano, Masayuki Murata Department of Infomatis and Mathematial Siene,

More information

RAC 2 E: Novel Rendezvous Protocol for Asynchronous Cognitive Radios in Cooperative Environments

RAC 2 E: Novel Rendezvous Protocol for Asynchronous Cognitive Radios in Cooperative Environments 21st Annual IEEE International Symposium on Personal, Indoor and Mobile Radio Communiations 1 RAC 2 E: Novel Rendezvous Protool for Asynhronous Cognitive Radios in Cooperative Environments Valentina Pavlovska,

More information

Dynamic Backlight Adaptation for Low Power Handheld Devices 1

Dynamic Backlight Adaptation for Low Power Handheld Devices 1 Dynami Baklight Adaptation for ow Power Handheld Devies 1 Sudeep Pasriha, Manev uthra, Shivajit Mohapatra, Nikil Dutt and Nalini Venkatasubramanian 444, Computer Siene Building, Shool of Information &

More information

Automatic Physical Design Tuning: Workload as a Sequence Sanjay Agrawal Microsoft Research One Microsoft Way Redmond, WA, USA +1-(425)

Automatic Physical Design Tuning: Workload as a Sequence Sanjay Agrawal Microsoft Research One Microsoft Way Redmond, WA, USA +1-(425) Automati Physial Design Tuning: Workload as a Sequene Sanjay Agrawal Mirosoft Researh One Mirosoft Way Redmond, WA, USA +1-(425) 75-357 sagrawal@mirosoft.om Eri Chu * Computer Sienes Department University

More information

Performance of Histogram-Based Skin Colour Segmentation for Arms Detection in Human Motion Analysis Application

Performance of Histogram-Based Skin Colour Segmentation for Arms Detection in Human Motion Analysis Application World Aademy of Siene, Engineering and Tehnology 8 009 Performane of Histogram-Based Skin Colour Segmentation for Arms Detetion in Human Motion Analysis Appliation Rosalyn R. Porle, Ali Chekima, Farrah

More information

Partial Character Decoding for Improved Regular Expression Matching in FPGAs

Partial Character Decoding for Improved Regular Expression Matching in FPGAs Partial Charater Deoding for Improved Regular Expression Mathing in FPGAs Peter Sutton Shool of Information Tehnology and Eletrial Engineering The University of Queensland Brisbane, Queensland, 4072, Australia

More information

Measurement of the stereoscopic rangefinder beam angular velocity using the digital image processing method

Measurement of the stereoscopic rangefinder beam angular velocity using the digital image processing method Measurement of the stereosopi rangefinder beam angular veloity using the digital image proessing method ROMAN VÍTEK Department of weapons and ammunition University of defense Kouniova 65, 62 Brno CZECH

More information

Naïve Bayesian Rough Sets Under Fuzziness

Naïve Bayesian Rough Sets Under Fuzziness IJMSA: Vol. 6, No. 1-2, January-June 2012, pp. 19 25 Serials Publiations ISSN: 0973-6786 Naïve ayesian Rough Sets Under Fuzziness G. GANSAN 1,. KRISHNAVNI 2 T. HYMAVATHI 3 1,2,3 Department of Mathematis,

More information

Gray Codes for Reflectable Languages

Gray Codes for Reflectable Languages Gray Codes for Refletable Languages Yue Li Joe Sawada Marh 8, 2008 Abstrat We lassify a type of language alled a refletable language. We then develop a generi algorithm that an be used to list all strings

More information

SVC-DASH-M: Scalable Video Coding Dynamic Adaptive Streaming Over HTTP Using Multiple Connections

SVC-DASH-M: Scalable Video Coding Dynamic Adaptive Streaming Over HTTP Using Multiple Connections SVC-DASH-M: Salable Video Coding Dynami Adaptive Streaming Over HTTP Using Multiple Connetions Samar Ibrahim, Ahmed H. Zahran and Mahmoud H. Ismail Department of Eletronis and Eletrial Communiations, Faulty

More information

PROJECT PERIODIC REPORT

PROJECT PERIODIC REPORT FP7-ICT-2007-1 Contrat no.: 215040 www.ative-projet.eu PROJECT PERIODIC REPORT Publishable Summary Grant Agreement number: ICT-215040 Projet aronym: Projet title: Enabling the Knowledge Powered Enterprise

More information

Extracting Partition Statistics from Semistructured Data

Extracting Partition Statistics from Semistructured Data Extrating Partition Statistis from Semistrutured Data John N. Wilson Rihard Gourlay Robert Japp Mathias Neumüller Department of Computer and Information Sienes University of Strathlyde, Glasgow, UK {jnw,rsg,rpj,mathias}@is.strath.a.uk

More information

The Tofu Interconnect D

The Tofu Interconnect D 2018 IEEE International Conferene on Cluster Computing The Tofu Interonnet D Yuihiro Ajima, Takahiro Kawashima, Takayuki Okamoto, Naoyuki Shida, Kouihi Hirai, Toshiyuki Shimizu Next Generation Tehnial

More information

mahines. HBSP enhanes the appliability of the BSP model by inorporating parameters that reet the relative speeds of the heterogeneous omputing omponen

mahines. HBSP enhanes the appliability of the BSP model by inorporating parameters that reet the relative speeds of the heterogeneous omputing omponen The Heterogeneous Bulk Synhronous Parallel Model Tiani L. Williams and Rebea J. Parsons Shool of Computer Siene University of Central Florida Orlando, FL 32816-2362 fwilliams,rebeag@s.uf.edu Abstrat. Trends

More information

Abstract. Key Words: Image Filters, Fuzzy Filters, Order Statistics Filters, Rank Ordered Mean Filters, Channel Noise. 1.

Abstract. Key Words: Image Filters, Fuzzy Filters, Order Statistics Filters, Rank Ordered Mean Filters, Channel Noise. 1. Fuzzy Weighted Rank Ordered Mean (FWROM) Filters for Mixed Noise Suppression from Images S. Meher, G. Panda, B. Majhi 3, M.R. Meher 4,,4 Department of Eletronis and I.E., National Institute of Tehnology,

More information

HEXA: Compact Data Structures for Faster Packet Processing

HEXA: Compact Data Structures for Faster Packet Processing Washington University in St. Louis Washington University Open Sholarship All Computer Siene and Engineering Researh Computer Siene and Engineering Report Number: 27-26 27 HEXA: Compat Data Strutures for

More information

SSD Based First Layer File System for the Next Generation Super-computer

SSD Based First Layer File System for the Next Generation Super-computer SSD Based First Layer File System for the Next Generation Super-omputer Shinji Sumimoto, Ph.D. Next Generation Tehnial Computing Unit FUJITSU LIMITED Sept. 24 th, 2018 0 Outline of This Talk A64FX: High

More information

Uplink Channel Allocation Scheme and QoS Management Mechanism for Cognitive Cellular- Femtocell Networks

Uplink Channel Allocation Scheme and QoS Management Mechanism for Cognitive Cellular- Femtocell Networks 62 Uplink Channel Alloation Sheme and QoS Management Mehanism for Cognitive Cellular- Femtoell Networks Kien Du Nguyen 1, Hoang Nam Nguyen 1, Hiroaki Morino 2 and Iwao Sasase 3 1 University of Engineering

More information

Alleviating DFT cost using testability driven HLS

Alleviating DFT cost using testability driven HLS Alleviating DFT ost using testability driven HLS M.L.Flottes, R.Pires, B.Rouzeyre Laboratoire d Informatique, de Robotique et de Miroéletronique de Montpellier, U.M. CNRS 5506 6 rue Ada, 34392 Montpellier

More information

A Dictionary based Efficient Text Compression Technique using Replacement Strategy

A Dictionary based Efficient Text Compression Technique using Replacement Strategy A based Effiient Text Compression Tehnique using Replaement Strategy Debashis Chakraborty Assistant Professor, Department of CSE, St. Thomas College of Engineering and Tehnology, Kolkata, 700023, India

More information

Computing Pool: a Simplified and Practical Computational Grid Model

Computing Pool: a Simplified and Practical Computational Grid Model Computing Pool: a Simplified and Pratial Computational Grid Model Peng Liu, Yao Shi, San-li Li Institute of High Performane Computing, Department of Computer Siene and Tehnology, Tsinghua University, Beijing,

More information

The Implementation of RRTs for a Remote-Controlled Mobile Robot

The Implementation of RRTs for a Remote-Controlled Mobile Robot ICCAS5 June -5, KINEX, Gyeonggi-Do, Korea he Implementation of RRs for a Remote-Controlled Mobile Robot Chi-Won Roh*, Woo-Sub Lee **, Sung-Chul Kang *** and Kwang-Won Lee **** * Intelligent Robotis Researh

More information

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc. Improvement of low illumination image enhancement algorithm based on physical mode

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc. Improvement of low illumination image enhancement algorithm based on physical mode [Type text] [Type text] [Type text] ISSN : 0974-7435 Volume 10 Issue 22 BioTehnology 2014 An Indian Journal FULL PAPER BTAIJ, 10(22), 2014 [13995-14001] Improvement of low illumination image enhanement

More information

Dr.Hazeem Al-Khafaji Dept. of Computer Science, Thi-Qar University, College of Science, Iraq

Dr.Hazeem Al-Khafaji Dept. of Computer Science, Thi-Qar University, College of Science, Iraq Volume 4 Issue 6 June 014 ISSN: 77 18X International Journal of Advaned Researh in Computer Siene and Software Engineering Researh Paper Available online at: www.ijarsse.om Medial Image Compression using

More information

COSSIM An Integrated Solution to Address the Simulator Gap for Parallel Heterogeneous Systems

COSSIM An Integrated Solution to Address the Simulator Gap for Parallel Heterogeneous Systems COSSIM An Integrated Solution to Address the Simulator Gap for Parallel Heterogeneous Systems Andreas Brokalakis Synelixis Solutions Ltd, Greee brokalakis@synelixis.om Nikolaos Tampouratzis Teleommuniation

More information

We don t need no generation - a practical approach to sliding window RLNC

We don t need no generation - a practical approach to sliding window RLNC We don t need no generation - a pratial approah to sliding window RLNC Simon Wunderlih, Frank Gabriel, Sreekrishna Pandi, Frank H.P. Fitzek Deutshe Telekom Chair of Communiation Networks, TU Dresden, Dresden,

More information

A Partial Sorting Algorithm in Multi-Hop Wireless Sensor Networks

A Partial Sorting Algorithm in Multi-Hop Wireless Sensor Networks A Partial Sorting Algorithm in Multi-Hop Wireless Sensor Networks Abouberine Ould Cheikhna Department of Computer Siene University of Piardie Jules Verne 80039 Amiens Frane Ould.heikhna.abouberine @u-piardie.fr

More information

Post-K Supercomputer with Fujitsu's Original CPU, A64FX Powered by Arm ISA

Post-K Supercomputer with Fujitsu's Original CPU, A64FX Powered by Arm ISA Post-K Superomputer with Fujitsu's Original CPU, A64FX Powered by Arm ISA Toshiyuki Shimizu Nov. 15th, 2018 Post-K is under development, information in these slides is subjet to hange without notie 0 Agenda

More information

Trajectory Tracking Control for A Wheeled Mobile Robot Using Fuzzy Logic Controller

Trajectory Tracking Control for A Wheeled Mobile Robot Using Fuzzy Logic Controller Trajetory Traking Control for A Wheeled Mobile Robot Using Fuzzy Logi Controller K N FARESS 1 M T EL HAGRY 1 A A EL KOSY 2 1 Eletronis researh institute, Cairo, Egypt 2 Faulty of Engineering, Cairo University,

More information

Make your process world

Make your process world Automation platforms Modion Quantum Safety System Make your proess world a safer plae You are faing omplex hallenges... Safety is at the heart of your proess In order to maintain and inrease your ompetitiveness,

More information

timestamp, if silhouette(x, y) 0 0 if silhouette(x, y) = 0, mhi(x, y) = and mhi(x, y) < timestamp - duration mhi(x, y), else

timestamp, if silhouette(x, y) 0 0 if silhouette(x, y) = 0, mhi(x, y) = and mhi(x, y) < timestamp - duration mhi(x, y), else 3rd International Conferene on Multimedia Tehnolog(ICMT 013) An Effiient Moving Target Traking Strateg Based on OpenCV and CAMShift Theor Dongu Li 1 Abstrat Image movement involved bakground movement and

More information

Direct-Mapped Caches

Direct-Mapped Caches A Case for Diret-Mapped Cahes Mark D. Hill University of Wisonsin ahe is a small, fast buffer in whih a system keeps those parts, of the ontents of a larger, slower memory that are likely to be used soon.

More information

NONLINEAR BACK PROJECTION FOR TOMOGRAPHIC IMAGE RECONSTRUCTION. Ken Sauer and Charles A. Bouman

NONLINEAR BACK PROJECTION FOR TOMOGRAPHIC IMAGE RECONSTRUCTION. Ken Sauer and Charles A. Bouman NONLINEAR BACK PROJECTION FOR TOMOGRAPHIC IMAGE RECONSTRUCTION Ken Sauer and Charles A. Bouman Department of Eletrial Engineering, University of Notre Dame Notre Dame, IN 46556, (219) 631-6999 Shool of

More information

On Optimal Total Cost and Optimal Order Quantity for Fuzzy Inventory Model without Shortage

On Optimal Total Cost and Optimal Order Quantity for Fuzzy Inventory Model without Shortage International Journal of Fuzzy Mathemat and Systems. ISSN 48-9940 Volume 4, Numer (014, pp. 193-01 Researh India Puliations http://www.ripuliation.om On Optimal Total Cost and Optimal Order Quantity for

More information

Cluster-Based Cumulative Ensembles

Cluster-Based Cumulative Ensembles Cluster-Based Cumulative Ensembles Hanan G. Ayad and Mohamed S. Kamel Pattern Analysis and Mahine Intelligene Lab, Eletrial and Computer Engineering, University of Waterloo, Waterloo, Ontario N2L 3G1,

More information

Real-Time Control for a Turbojet Engine

Real-Time Control for a Turbojet Engine A Multiproessor mplementation of Real-Time Control for a Turbojet Engine Phillip L. Shaffer ABSTRACT: A real-time ontrol program for a turbojet engine has been implemented on a four-proessor omputer, ahieving

More information

Visualization of patent analysis for emerging technology

Visualization of patent analysis for emerging technology Available online at www.sienediret.om Expert Systems with Appliations Expert Systems with Appliations 34 (28) 84 82 www.elsevier.om/loate/eswa Visualization of patent analysis for emerging tehnology Young

More information

Sequential Incremental-Value Auctions

Sequential Incremental-Value Auctions Sequential Inremental-Value Autions Xiaoming Zheng and Sven Koenig Department of Computer Siene University of Southern California Los Angeles, CA 90089-0781 {xiaominz,skoenig}@us.edu Abstrat We study the

More information

A Novel Bit Level Time Series Representation with Implication of Similarity Search and Clustering

A Novel Bit Level Time Series Representation with Implication of Similarity Search and Clustering A Novel Bit Level Time Series Representation with Impliation of Similarity Searh and lustering hotirat Ratanamahatana, Eamonn Keogh, Anthony J. Bagnall 2, and Stefano Lonardi Dept. of omputer Siene & Engineering,

More information

DECODING OF ARRAY LDPC CODES USING ON-THE FLY COMPUTATION Kiran Gunnam, Weihuang Wang, Euncheol Kim, Gwan Choi, Mark Yeary *

DECODING OF ARRAY LDPC CODES USING ON-THE FLY COMPUTATION Kiran Gunnam, Weihuang Wang, Euncheol Kim, Gwan Choi, Mark Yeary * DECODING OF ARRAY LDPC CODES USING ON-THE FLY COMPUTATION Kiran Gunnam, Weihuang Wang, Eunheol Kim, Gwan Choi, Mark Yeary * Dept. of Eletrial Engineering, Texas A&M University, College Station, TX-77840

More information

High-level synthesis under I/O Timing and Memory constraints

High-level synthesis under I/O Timing and Memory constraints Highlevel synthesis under I/O Timing and Memory onstraints Philippe Coussy, Gwenolé Corre, Pierre Bomel, Eri Senn, Eri Martin To ite this version: Philippe Coussy, Gwenolé Corre, Pierre Bomel, Eri Senn,

More information

A RAY TRACING SIMULATION OF SOUND DIFFRACTION BASED ON ANALYTIC SECONDARY SOURCE MODEL

A RAY TRACING SIMULATION OF SOUND DIFFRACTION BASED ON ANALYTIC SECONDARY SOURCE MODEL 19th European Signal Proessing Conferene (EUSIPCO 211) Barelona, Spain, August 29 - September 2, 211 A RAY TRACING SIMULATION OF SOUND DIFFRACTION BASED ON ANALYTIC SECONDARY SOURCE MODEL Masashi Okada,

More information

Space- and Time-Efficient BDD Construction via Working Set Control

Space- and Time-Efficient BDD Construction via Working Set Control Spae- and Time-Effiient BDD Constrution via Working Set Control Bwolen Yang Yirng-An Chen Randal E. Bryant David R. O Hallaron Computer Siene Department Carnegie Mellon University Pittsburgh, PA 15213.

More information

An Approach to Physics Based Surrogate Model Development for Application with IDPSA

An Approach to Physics Based Surrogate Model Development for Application with IDPSA An Approah to Physis Based Surrogate Model Development for Appliation with IDPSA Ignas Mikus a*, Kaspar Kööp a, Marti Jeltsov a, Yuri Vorobyev b, Walter Villanueva a, and Pavel Kudinov a a Royal Institute

More information

Using Augmented Measurements to Improve the Convergence of ICP

Using Augmented Measurements to Improve the Convergence of ICP Using Augmented Measurements to Improve the onvergene of IP Jaopo Serafin, Giorgio Grisetti Dept. of omputer, ontrol and Management Engineering, Sapienza University of Rome, Via Ariosto 25, I-0085, Rome,

More information

CA Agile Requirements Designer 2.x Implementation Proven Professional Exam (CAT-720) Study Guide Version 1.0

CA Agile Requirements Designer 2.x Implementation Proven Professional Exam (CAT-720) Study Guide Version 1.0 Exam (CAT-720) Study Guide Version 1.0 PROPRIETARY AND CONFIDENTIAL INFORMATION 2017 CA. All rights reserved. CA onfidential & proprietary information. For CA, CA Partner and CA Customer use only. No unauthorized

More information

Fast Elliptic Curve Algorithm of Embedded Mobile Equipment

Fast Elliptic Curve Algorithm of Embedded Mobile Equipment Send Orders for Reprints to reprints@benthamsiene.net 8 The Open Eletrial & Eletroni Engineering Journal, 0, 7, 8-4 Fast Ellipti Curve Algorithm of Embedded Mobile Equipment Open Aess Lihong Zhang *, Shuqian

More information

Australian Journal of Basic and Applied Sciences. A new Divide and Shuffle Based algorithm of Encryption for Text Message

Australian Journal of Basic and Applied Sciences. A new Divide and Shuffle Based algorithm of Encryption for Text Message ISSN:1991-8178 Australian Journal of Basi and Applied Sienes Journal home page: www.ajbasweb.om A new Divide and Shuffle Based algorithm of Enryption for Text Message Dr. S. Muthusundari R.M.D. Engineering

More information

User-level Fairness Delivered: Network Resource Allocation for Adaptive Video Streaming

User-level Fairness Delivered: Network Resource Allocation for Adaptive Video Streaming User-level Fairness Delivered: Network Resoure Alloation for Adaptive Video Streaming Mu Mu, Steven Simpson, Arsham Farshad, Qiang Ni, Niholas Rae Shool of Computing and Communiations, Lanaster University

More information

Zippy - A coarse-grained reconfigurable array with support for hardware virtualization

Zippy - A coarse-grained reconfigurable array with support for hardware virtualization Zippy - A oarse-grained reonfigurable array with support for hardware virtualization Christian Plessl Computer Engineering and Networks Lab ETH Zürih, Switzerland plessl@tik.ee.ethz.h Maro Platzner Department

More information

THROUGHPUT EVALUATION OF AN ASYMMETRICAL FDDI TOKEN RING NETWORK WITH MULTIPLE CLASSES OF TRAFFIC

THROUGHPUT EVALUATION OF AN ASYMMETRICAL FDDI TOKEN RING NETWORK WITH MULTIPLE CLASSES OF TRAFFIC THROUGHPUT EVALUATION OF AN ASYMMETRICAL FDDI TOKEN RING NETWORK WITH MULTIPLE CLASSES OF TRAFFIC Priya N. Werahera and Anura P. Jayasumana Department of Eletrial Engineering Colorado State University

More information

ASSESSMENT OF TWO CHEAP CLOSE-RANGE FEATURE EXTRACTION SYSTEMS

ASSESSMENT OF TWO CHEAP CLOSE-RANGE FEATURE EXTRACTION SYSTEMS ASSESSMENT OF TWO CHEAP CLOSE-RANGE FEATURE EXTRACTION SYSTEMS Ahmed Elaksher a, Mohammed Elghazali b, Ashraf Sayed b, and Yasser Elmanadilli b a Shool of Civil Engineering, Purdue University, West Lafayette,

More information

arxiv: v1 [cs.db] 13 Sep 2017

arxiv: v1 [cs.db] 13 Sep 2017 An effiient lustering algorithm from the measure of loal Gaussian distribution Yuan-Yen Tai (Dated: May 27, 2018) In this paper, I will introdue a fast and novel lustering algorithm based on Gaussian distribution

More information

The Mathematics of Simple Ultrasonic 2-Dimensional Sensing

The Mathematics of Simple Ultrasonic 2-Dimensional Sensing The Mathematis of Simple Ultrasoni -Dimensional Sensing President, Bitstream Tehnology The Mathematis of Simple Ultrasoni -Dimensional Sensing Introdution Our ompany, Bitstream Tehnology, has been developing

More information

Improved flooding of broadcast messages using extended multipoint relaying

Improved flooding of broadcast messages using extended multipoint relaying Improved flooding of broadast messages using extended multipoint relaying Pere Montolio Aranda a, Joaquin Garia-Alfaro a,b, David Megías a a Universitat Oberta de Catalunya, Estudis d Informàtia, Mulimèdia

More information

High Speed Area Efficient VLSI Architecture for DCT using Proposed CORDIC Algorithm

High Speed Area Efficient VLSI Architecture for DCT using Proposed CORDIC Algorithm International Journal of Innovative Researh in Siene, Engineering and Tehnology Website: www.ijirset.om High Speed Area Effiient VLSI Arhiteture for DCT using Proposed CORDIC Algorithm Deepnarayan Sinha

More information

Performance Benchmarks for an Interactive Video-on-Demand System

Performance Benchmarks for an Interactive Video-on-Demand System Performane Benhmarks for an Interative Video-on-Demand System. Guo,P.G.Taylor,E.W.M.Wong,S.Chan,M.Zukerman andk.s.tang ARC Speial Researh Centre for Ultra-Broadband Information Networks (CUBIN) Department

More information

INTERPOLATED AND WARPED 2-D DIGITAL WAVEGUIDE MESH ALGORITHMS

INTERPOLATED AND WARPED 2-D DIGITAL WAVEGUIDE MESH ALGORITHMS Proeedings of the COST G-6 Conferene on Digital Audio Effets (DAFX-), Verona, Italy, Deember 7-9, INTERPOLATED AND WARPED -D DIGITAL WAVEGUIDE MESH ALGORITHMS Vesa Välimäki Lab. of Aoustis and Audio Signal

More information

IN structured P2P overlay networks, each node and file key

IN structured P2P overlay networks, each node and file key 242 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 21, NO. 2, FEBRUARY 2010 Elasti Routing Table with Provable Performane for Congestion Control in DHT Networks Haiying Shen, Member, IEEE,

More information

A Multi-Head Clustering Algorithm in Vehicular Ad Hoc Networks

A Multi-Head Clustering Algorithm in Vehicular Ad Hoc Networks International Journal of Computer Theory and Engineering, Vol. 5, No. 2, April 213 A Multi-Head Clustering Algorithm in Vehiular Ad Ho Networks Shou-Chih Lo, Yi-Jen Lin, and Jhih-Siao Gao Abstrat Clustering

More information

A PROTOTYPE OF INTELLIGENT VIDEO SURVEILLANCE CAMERAS

A PROTOTYPE OF INTELLIGENT VIDEO SURVEILLANCE CAMERAS INTERNATIONAL JOURNAL OF INFORMATION AND SYSTEMS SCIENCES Volume 1, 3, Number 1, 3, Pages 1-22 365-382 2007 Institute for Sientifi Computing and Information A PROTOTYPE OF INTELLIGENT VIDEO SURVEILLANCE

More information

Flow Demands Oriented Node Placement in Multi-Hop Wireless Networks

Flow Demands Oriented Node Placement in Multi-Hop Wireless Networks Flow Demands Oriented Node Plaement in Multi-Hop Wireless Networks Zimu Yuan Institute of Computing Tehnology, CAS, China {zimu.yuan}@gmail.om arxiv:153.8396v1 [s.ni] 29 Mar 215 Abstrat In multi-hop wireless

More information

Cluster Centric Fuzzy Modeling

Cluster Centric Fuzzy Modeling 10.1109/TFUZZ.014.300134, IEEE Transations on Fuzzy Systems TFS-013-0379.R1 1 Cluster Centri Fuzzy Modeling Witold Pedryz, Fellow, IEEE, and Hesam Izakian, Student Member, IEEE Abstrat In this study, we

More information

Folding. Hardware Mapped vs. Time multiplexed. Folding by N (N=folding factor) Node A. Unfolding by J A 1 A J-1. Time multiplexed/microcoded

Folding. Hardware Mapped vs. Time multiplexed. Folding by N (N=folding factor) Node A. Unfolding by J A 1 A J-1. Time multiplexed/microcoded Folding is verse of Unfolding Node A A Folding by N (N=folding fator) Folding A Unfolding by J A A J- Hardware Mapped vs. Time multiplexed l Hardware Mapped vs. Time multiplexed/mirooded FI : y x(n) h

More information

Reduced-Complexity Column-Layered Decoding and. Implementation for LDPC Codes

Reduced-Complexity Column-Layered Decoding and. Implementation for LDPC Codes Redued-Complexity Column-Layered Deoding and Implementation for LDPC Codes Zhiqiang Cui 1, Zhongfeng Wang 2, Senior Member, IEEE, and Xinmiao Zhang 3 1 Qualomm In., San Diego, CA 92121, USA 2 Broadom Corp.,

More information

A Dual-Hamiltonian-Path-Based Multicasting Strategy for Wormhole-Routed Star Graph Interconnection Networks

A Dual-Hamiltonian-Path-Based Multicasting Strategy for Wormhole-Routed Star Graph Interconnection Networks A Dual-Hamiltonian-Path-Based Multiasting Strategy for Wormhole-Routed Star Graph Interonnetion Networks Nen-Chung Wang Department of Information and Communiation Engineering Chaoyang University of Tehnology,

More information

IMPROVED FUZZY CLUSTERING METHOD BASED ON INTUITIONISTIC FUZZY PARTICLE SWARM OPTIMIZATION

IMPROVED FUZZY CLUSTERING METHOD BASED ON INTUITIONISTIC FUZZY PARTICLE SWARM OPTIMIZATION Journal of Theoretial and Applied Information Tehnology IMPROVED FUZZY CLUSTERING METHOD BASED ON INTUITIONISTIC FUZZY PARTICLE SWARM OPTIMIZATION V.KUMUTHA, 2 S. PALANIAMMAL D.J. Aademy For Managerial

More information

Chemical, Biological and Radiological Hazard Assessment: A New Model of a Plume in a Complex Urban Environment

Chemical, Biological and Radiological Hazard Assessment: A New Model of a Plume in a Complex Urban Environment hemial, Biologial and Radiologial Haard Assessment: A New Model of a Plume in a omplex Urban Environment Skvortsov, A.T., P.D. Dawson, M.D. Roberts and R.M. Gailis HPP Division, Defene Siene and Tehnology

More information

Tackling IPv6 Address Scalability from the Root

Tackling IPv6 Address Scalability from the Root Takling IPv6 Address Salability from the Root Mei Wang Ashish Goel Balaji Prabhakar Stanford University {wmei, ashishg, balaji}@stanford.edu ABSTRACT Internet address alloation shemes have a huge impat

More information

Cluster-based Cooperative Communication with Network Coding in Wireless Networks

Cluster-based Cooperative Communication with Network Coding in Wireless Networks Cluster-based Cooperative Communiation with Network Coding in Wireless Networks Zygmunt J. Haas Shool of Eletrial and Computer Engineering Cornell University Ithaa, NY 4850, U.S.A. Email: haas@ee.ornell.edu

More information