Implementation and Evaluation of NAS Parallel CG Benchmark on GPU Cluster with Proprietary Interconnect TCA

Size: px
Start display at page:

Download "Implementation and Evaluation of NAS Parallel CG Benchmark on GPU Cluster with Proprietary Interconnect TCA"

Transcription

1 Implementation an Evaluation of AS Parallel CG Benchmark on GPU Cluster with Proprietary Interconnect TCA Kazuya Matsumoto 1, orihisa Fujita 2, Toshihiro Hanawa 3, an Taisuke Boku 1,2 1 Center for Computational Sciences, University of Tsukuba 2 Grauate School of Systems an Information Engineering, University of Tsukuba 3 Information Technology Center, The University of Tokyo Abstract. We have been eveloping a proprietary interconnect technology calle Tightly Couple Accelerators (TCA) architecture to improve communication latency an banwith between accelerators (GPUs) over ifferent noes. This paper presents a Conjugate Graient (CG) benchmark implementation using the TCA an results of performance evaluation on the HA-PACS/TCA system, which is a proof-of-concept GPU cluster base on the TCA concept. The implementation is base on the CG benchmark in AS Parallel Benchmarks, an its parallelization is achieve by a two-imensional ecomposition of matrix ata. The TCA utilization improves the communication performance compare with the implementation with MPI/InfiniBan utilization for small size benchmark classes. This stuy also shows that the CG implementation with the two-imensional ecomposition is more suitable for the TCA utilization than a CG implementation with a one-imensional ecomposition to make use of the interconnect. 1 Introuction Currently, GPU clusters are wiely use as high performance computing systems. A problem of GPU clusters is that the communication spee between multiple compute noes is not fast enough compare to its high computation spee. In orer to aress this problem, we have been researching the Tightly Couple Accelerators (TCA) architecture [2]. The TCA is a technology on a proprietary interconnect network to enable irect communication between accelerators over ifferent noes. We have conucte the basic performance evaluation of the TCA [4, 1]. In [4], a Conjugate Graient (CG) metho has been implemente by utilizing allgather an allreuce collective communications with TCA s communication functions. The parallelization of the CG implementation is accomplishe by a one-imensional ecomposition of matrix ata. Results of the performance evaluation shows that the CG metho implementation using TCA outperforms the implementation using MPI/InfiniBan for sparse matrices whose matrix (problem) size is nine thousan or smaller; however, the communication performance

2 of TCA tens to become lower when either or both of the target matrix sizes an the number of utilizing processes are larger. In the present stuy, we evaluate a ifferent CG metho implementation with the TCA utilization. The CG implementation is base on the CG benchmark in AS Parallel Benchmarks. We apply a two-imensional ecomposition of matrix as the enhance implementation from [4], an present results of performance evaluation on the HA-PACS/TCA GPU cluster. Aitionally, this stuy presents communication performance ifferences between the CG implementation with the two-imensional ecomposition an the one-imensional ecomposition. 2 Tightly Couple Accelerators Architecture This section briefly explains the Tightly Couple Accelerators (TCA) architecture (see [2, 1] for etaile information on the TCA). The TCA is a novel technology of proprietary interconnect for PC clusters. The PCI Express Aaptive Communication Hub ver. 2 (PEACH2) is a prototype implementation of the TCA architecture. We can construct a cluster system by connecting the PEACH2 boars with each other. The communication using the PEACH2 is conucte only with the PCIe protocol, an the overhea time for protocol conversion, which is require in the InfiniBan, is eliminate. Consequently, the PEACH2 enables ata communication with extremely low latency. The PEACH2 provies two types of ata communication functions: PIO an DMA. In the PIO communication, the ata is transferre to a remote noe by the CPU s remote write operation. The latency of PIO is very small, an, as a result, the PIO is useful to transfer short messages. The DMA function is achieve by the DMA controller, which has four DMA channels. While the latency of DMA is larger than that of PIO, the DMA emonstrates higher maximum banwith performance. The HA-PACS (Highly Accelerate Parallel Avance system for Computational Sciences) is a GPU cluster system at the Center for Computational Sciences, University of Tsukuba. The HA-PACS/TCA is a proof-of-concept system of TCA architecture concept an a performance evaluation test-be of PEACH2 boar. The HA-PACS/TCA contains the PEACH2 as an interconnect aapter in aition to a commoity InfiniBan interconnect. Table 1 shows the specification of HA-PACS/TCA. The HA-PACS/TCA consists of four sub-clusters. Each sub-cluster is compose of 16 compute noes. The 16 noes are connecte by the PEACH2 an configure a 2 8 torus network. ote that the 64 noes of HA-PACS/TCA are connecte also by two ports of InfiniBan QDR in a fat-tree configuration with full bisection banwith. 3 Implementation The CG benchmark in AS Parallel Benchmarks (PB) is originally written in Fortran. Though several CG benchmark stuies on GPUs have been conucte [3, 8], there is no available implementations written in MPI an CUDA to our knowlege. Because of the fact, in the present stuy, we moify the MPI version

3 Table 1. HA-PACS/TCA system configuration oe configuration Motherboar SuperMicro X9DRG-QF CPU Intel Xeon E v2 2.8 GHz 2 (IvyBrige 10 cores / CPU) Memory DDR MHz 4 ch, 128 GB (=8 16 GB) Peak performance 224 Gflops / CPU GPU VIDIA Tesla K20X 732 MHz 4 (Kepler GK cores / GPU) Memory GDDR5 6 GB / GPU Peak performance 1.31 Tflops / GPU Interconnect InfiniBan: Mellanox Connect-X3 Dual-port QDR TCA: PEACH2 boar (Altera Stratix-IV GX 530 FPGA) System configuration umber of noes 64 Interconnect InfiniBan QDR 108 ports switch 2 ch Peak performance 364 Tflops /2 /2 /4 /4 /4 /2 /2 P = 2 P = 4 P = 8 P = 16 Fig. 1. Process istribution by two-imensional ecomposition of matrix ata. The number in each rectangular represents its process rank i. of CG benchmark in AS such that the main computation part (conj gra function) is written in C language an CUDA, an its inter-noe ata communications are conucte with the TCA/PEACH2. Let us enote a linear equation system as Ax = b, where A is an symmetric positive efinite matrix, an both x an b are a vector with elements in the following. The moifie implementation uses the ientical ata istribution an communication pattern to the original MPI version. The parallelization of CG benchmark is achieve by a two-imensional ecomposition of the matrix A an conformable istribution of vectors. When we efine the number of processes as P, the matrix A is two-imensionally ecompose by P = P r P c processes. Base on the ecomposition, each process contains (/P r ) (/P c ) sub-matrix of A an vectors with /P c elements. Figure 1 shows the process istribution by two-imensional ecomposition of matrix A ata for P = 2, 4, 8, 16. Almost all of computations in the implementation are carrie out by GPUs. While we implement the sparse-matrix vector multiplication (SpMV) by ourselves, our CG implementation utilizes the VIDIA s CUBLAS library for vec-

4 tor ot prouct (DOT) an vector aition (AXPY) operations. ote that the computation part is not tune so eeply since this stuy focuses on performance evaluation of ata communication. Three kins of ata communication are require in the implementation. The first require communication is to sen vector ata for obtaining the prouct of SpMV after the local SpMV computation on a GPU in each noe. The communication is conucte among P c processes in the same process row in a binary tree fashion, an thus P c communication steps are neee (each step nees to sen 8/P c Bytes of vector ata in ouble precision). The secon communication is to sen a scalar (8 Bytes) value to compute the sum of local prouct by the DOT computation. This communication is also mae among the P c processes an P c steps are require. The thir communication is to sen 8/P c Bytes of a vector for ata exchange. In the following, let us name the first, secon an thir communication as COMM SpMV, COMM DOT an COMM EXCH, respectively. The COMM DOT is scalar ata communications between CPU memories of ifferent processes an the latency for issuing communication operations occupies almost all of its communication time. The COMM SpMV an COMM EXCH are block ata communications of 8/P c Bytes between ifferent GPU memories an a communication banwith is important for high performance communication as well as the issue latency. Consiering the communication characteristics, we implement the COMM SpMV an COMM EXCH with the DMA communication function of TCA/PEACH2 an implement the COMM DOT with the PIO function. ote that, as shown in Figure 2, the DMA communication is faster than the PIO communication when message sizes are larger than 128 Bytes, which are smaller than sizes for the require block communications in any problem classes of CG benchmark. The TCA/PEACH2 configures a 2 8 torus network on a sub-cluster of HA-PACS/TCA; thus, a way of process (noe) mapping also affects the communication performance. As can be seen from Fig. 1, the CG implementation with two-imensional ecomposition requires communication among P c processes (4 processes at most when we utilize up to 16 noes). We use a noe mapping shown in Figure 3. This mapping oes not cause message ata contentions an collisions within the TCA/PEACH2 s communication network on a sub-cluster even when P = 16 cases. 4 Performance Evaluation We conuct performance measurements on a sub-cluster of the HA-PACS/TCA. A single GPU an a single CPU socket are utilize per noe 4. For comparison with the implementation using TCA/PEACH2, this section also presents the 4 This is because using two or more sub-clusters entails a hybri utilization of the TCA/PEACH2 an MPI/IB, an because two or more GPUs usage requires aitional consierations to use the TCA/PEACH2 effectively. Both of the hybri utilization an the multi GPU usage are among our future work.

5 WhWh W,W/K WhWh W,D WhWhDW// 'Wh'Wh W,D e e e eee D 'Wh'Wh DW// Fig. 2. Ping-pong communication performance between two neighboring noes on the HA-PACS/TCA. Fig. 3. oe mapping optimize for CG benchmark implementation on a sub-cluster of HA- PACS/TCA. Circles represent compute noes, lines between circles represent ata links, an the number correspons to its process rank i. performance of an implementation using MPI/InfiniBan (MPI/IB) for internoe communications. We use the MVAPICH2 GDR 2.1a (MV2GDR) MPI library implementation [6]. As with the TCA/PEACH2, the MV2GDR utilizes the GPUDirect for RDMA (GDR) technology [5] for irect communication between GPUs. ote that the theoretical peak banwith of TCA/PEACH2 (PCIe Gen2 x8) is twice lower than that of MPI/IB (ual-rail InfiniBan QDR 5 ); therefore, the implementation using TCA/PEACH2 is outperforme when message sizes become large. On the conition where the program is compile by Intel C compiler with MV2GDR 2.1a an CUDA 6.5 usage, the TCA/PEACH2 is faster for message sizes up to 64 KB than the MPI/IB in terms of the ping-pong communication performance as shown in Fig. 2. In the CG benchmark, the problem sizes (CLASS) an the number of processes (P ) can be esignate. This section presents results of performance evaluations for CLASS=S, W, A, B an P = 2, 4, 8, 16. The problem (matrix/vector) size is 1,400 for CLASS=S, 7,000 for CLASS=W, 14,000 for CLASS=A, an 75,000 for CLASS=B. We measure the consuming time for each computation/communication operation in the conj gra program function. Figure 4 shows the measure time breakown on average time of ten times calls to the function. ote that this is the breakown of process rank 0 an the communication time is the communication wait time. A single call of the conj gra function inclues 26 P c of COMM SpMV, 52 P c of COMM DOT, 26 of COMM EXCH, 26 of SpMV, 52 of DOT, an P c of AXPY operations. In Fig. 4, the performance is shown for the implementation in Fortran (original PB-MPI 5 The theoretical peak banwith of the ual-rail InfiniBan QDR is 8 GB/s, which is equivalent to that of PCIe Gen3 x8.

6 e e, / / W, / / & D W, / / W & D W, / / W & D W W & D W KDD^Ds KDDK KDDy, ywz K ^Ds D K e e, / / W, / / & D W, / / W & D W, / / W & D W W & D W Wl Wl Wle Wle Wl Wl Wle Wle KDD^Ds KDDK KDDy, ywz e e, / / W, / / & D W, / / W & D W, / / W & D W W & D W K ^Ds D K, / / W, / / & D W, / / W & D W, / / W & D W W & D W Wl Wl Wle Wle Wl Wl Wle Wle Fig. 4. Time breakown on a single conj gra function call of CG benchmark. In each figure, the performance (time in microsecon) is shown for the implementation in Fortran (original PB-MPI coe), C language, CUDA with TCA/PEACH2 communication, an CUDA with MPI/IB communication. The upper plot results for several results in Fortran an C, such as CLASS=B & P=2 case, are cut for simplicity. Fig. 5. Time breakown of CG implementation with one-imensional ecomposition coe), C language, CUDA with TCA/PEACH2 communication, an CUDA with MPI/IB communication. The Fortran implementation is faster than the C implementation (similar performance results were reporte for the PB-LU benchmark by Pennycook et al. [7]). Compare with the original Fortran implementation, the GPU/CUDA usage for PB implementation eteriorates the performance in several cases

7 (particularly in cases of smaller size class an larger number of processes utilization). This is mainly ue to that the GPU usage brings bigger overheas for the CUDA kernel invocations 6 an for the inter-noe communication call between GPUs. The TCA/PEACH2 utilization contributes the performance improvement for the three small size classes (CLASS=S, CLASS=W, an CLASS=A) compare with the MPI/IB utilization. Especially, the communication performance is 3.00 times higher in the case CLASS=S with P = 16, an the overall performance incluing the computation time an communication time is 1.24 times higher. The large performance improvement by TCA/PEACH2 is erive from improvements of COMM SpMV an COMM DOT. The communication time is 2.42 times shorter for COMM SpMV an 4.72 times shorter for COMM DOT in this case. In the case CLASS=A with P = 16, the communication performance using the TCA/PEACH2 is 1.44 times higher an the overall performance is 1.12 times higher. The performance of COMM DOT with TCA/PEACH2 is higher in all four size cases since the communication latency mostly ecies the performance for the scalar ata communication. However, the TCA/PEACH2 is not always effective. The performance for CLASS=B is almost same. In the CLASS=B case, the COMM EXCH is the main performance bottleneck. To see how the performance is ifferent between the CG benchmark implementation with two-imensional ecomposition (2D-CG) in the present stuy an the CG implementation with one-imensional ecomposition (1D-CG) in our previous stuy [4], we aitionally measure the performance of 1D-CG implementation on the equivalent conition an cases (1D-CG is implemente by ourselves an not in the AS Parallel Benchmarks). Figure 5 shows the measure time breakown of the 1D-CG for matrices corresponing CLASS=S an CLASS=B. The 1D-CG utilizes allgather an allreuce collective communications (see [4] for implementation etails). All P processes are involve for both the collective communications in 1D-CG, whereas P c processes are involve at most in 2D-CG. Since the network topology of TCA/PEACH2 is 2 8 torus network, collisions within the communication network cannot be avoie for P = 16 cases in 1D-CG. As shown in Fig. 5, the communication time in 1D-CG becomes larger when P increases. In aition, the largest message size of 1D-CG is larger than that of 2D-CG for P = 8, 16 cases (the size is 8/2 Bytes in 1D-CG an 8/P c in 2D-CG). This fact is isavantage for large matrix sizes (CLASS=B) an relative performance ifferences between TCA/PEACH2 an MPI/IB is large compare with 2D-CG implementation. In general, the message size of each communication in 2D-CG is shorter than or equal to that in 1D-CG for corresponing communication. The performance egraation with shorter message size is serious in MPI/IB while TCA/PEACH2 provies a goo performance thanks to its very small latency. Thus, the combination of such short messages an avoiance of message collision on the torus network of TCA/PEACH2 leas this performance improvement on 2D-CG benchmark. 6 A CUDA kernel invocation at least takes 9.7 µsec, incluing the CUDA stream synchronization time, in our measurement.

8 5 Conclusion The present stuy has utilize the TCA/PEACH2 for an implementation of AS Parallel CG benchmark an conucte its performance evaluation on the HA-PACS/TCA GPU cluster. Results of the performance evaluation show that the CG implementation with the parallelization by a two-imensional ecomposition of matrix ata oes not cause message ata collisions within the communication network of TCA/PEACH when processes are properly mappe to noes; an the present CG implementation is consiere to be better suite for the TCA/PEACH2 utilization than the previous CG metho implementation with a one-imensional ecomposition [4]. The performance improvement over MPI/IB utilization is ue to the very small latency of TCA/PEACH2. We will continue researches on the TCA with the view that reucing the latency between accelerators by irect communication is important for strong-scaling computing. Acknowlegements. The present stuy was supporte by the Japan Science an Technology Agency s CREST program entitle Research an Development of Unifie Environment on Accelerate Computing an Interconnection for Post- Petascale Era. The authors woul like to thank the Center for Computational Sciences, University of Tsukuba for allowing us to use the HA-PACS/TCA system as part of the interisciplinary Collaborative Research Program. References 1. Hanawa, T., Fujii, H., Fujita,., Oajima, T., Matsumoto, K., Boku, T.: Evaluation of FFT for GPU Cluster Using Tightly Couple Accelerators Architecture. In: Proc. Cluster pp IEEE (2015) 2. Hanawa, T., Koama, Y., Boku, T., Sato, M.: Tightly Couple Accelerators Architecture for Minimizing Communication Latency among Accelerators. In: Proc. IPDPSW pp IEEE (2013) 3. Lee, S., Vetter, J.S.: Early evaluation of irective-base GPU programming moels for prouctive exascale computing. In: Proc. SC 12 (2012) 4. Matsumoto, K., Hanawa, T., Koama, Y., Fujii, H., Boku, T.: Implementation of CG Metho on GPU Cluster with Proprietary Interconnect TCA for GPU Direct Communication. In: Proc. IPDPSW pp IEEE (2015) 5. VIDIA: VIDIA GPUDirect (Accesse April 25, 2016), 6. Pana, D.K.: MVAPICH2-GDR (MVAPICH2 with GPUDirect RDMA) (Accesse April 25, 2016), 7. Pennycook, S.J., Hammon, S.D., Jarvis, S.A., Mualige, G.R.: Performance Analysis of a Hybri MPI/CUDA Implementation of the AS- LU Benchmark. SIGMETRICS Perform. Eval. Rev. 38(4), (2011), 8. Xu, R., Tian, X., Charasekaran, S., Yan, Y., Chapman, B.: AS Parallel Benchmarks for GPGPUs Using a Directive-Base Programming Moel. In: Languages an Compilers for Parallel Computing. LCS, vol. 8967, pp Springer (2015)

Interconnection Network for Tightly Coupled Accelerators Architecture

Interconnection Network for Tightly Coupled Accelerators Architecture Interconnection Network for Tightly Coupled Accelerators Architecture Toshihiro Hanawa, Yuetsu Kodama, Taisuke Boku, Mitsuhisa Sato Center for Computational Sciences University of Tsukuba, Japan 1 What

More information

HA-PACS/TCA: Tightly Coupled Accelerators for Low-Latency Communication between GPUs

HA-PACS/TCA: Tightly Coupled Accelerators for Low-Latency Communication between GPUs HA-PACS/TCA: Tightly Coupled Accelerators for Low-Latency Communication between GPUs Yuetsu Kodama Division of High Performance Computing Systems Center for Computational Sciences University of Tsukuba,

More information

Tightly Coupled Accelerators Architecture

Tightly Coupled Accelerators Architecture Tightly Coupled Accelerators Architecture Yuetsu Kodama Division of High Performance Computing Systems Center for Computational Sciences University of Tsukuba, Japan 1 What is Tightly Coupled Accelerators

More information

Tightly Coupled Accelerators with Proprietary Interconnect and Its Programming and Applications

Tightly Coupled Accelerators with Proprietary Interconnect and Its Programming and Applications 1 Tightly Coupled Accelerators with Proprietary Interconnect and Its Programming and Applications Toshihiro Hanawa Information Technology Center, The University of Tokyo Taisuke Boku Center for Computational

More information

Almost Disjunct Codes in Large Scale Multihop Wireless Network Media Access Control

Almost Disjunct Codes in Large Scale Multihop Wireless Network Media Access Control Almost Disjunct Coes in Large Scale Multihop Wireless Network Meia Access Control D. Charles Engelhart Anan Sivasubramaniam Penn. State University University Park PA 682 engelhar,anan @cse.psu.eu Abstract

More information

6.823 Computer System Architecture. Problem Set #3 Spring 2002

6.823 Computer System Architecture. Problem Set #3 Spring 2002 6.823 Computer System Architecture Problem Set #3 Spring 2002 Stuents are strongly encourage to collaborate in groups of up to three people. A group shoul han in only one copy of the solution to the problem

More information

Loop Scheduling and Partitions for Hiding Memory Latencies

Loop Scheduling and Partitions for Hiding Memory Latencies Loop Scheuling an Partitions for Hiing Memory Latencies Fei Chen Ewin Hsing-Mean Sha Dept. of Computer Science an Engineering University of Notre Dame Notre Dame, IN 46556 Email: fchen,esha @cse.n.eu Tel:

More information

Yet Another Parallel Hypothesis Search for Inverse Entailment Hiroyuki Nishiyama and Hayato Ohwada Faculty of Sci. and Tech. Tokyo University of Scien

Yet Another Parallel Hypothesis Search for Inverse Entailment Hiroyuki Nishiyama and Hayato Ohwada Faculty of Sci. and Tech. Tokyo University of Scien Yet Another Parallel Hypothesis Search for Inverse Entailment Hiroyuki Nishiyama an Hayato Ohwaa Faculty of Sci. an Tech. Tokyo University of Science, 2641 Yamazaki, Noa-shi, CHIBA, 278-8510, Japan hiroyuki@rs.noa.tus.ac.jp,

More information

Table-based division by small integer constants

Table-based division by small integer constants Table-base ivision by small integer constants Florent e Dinechin, Laurent-Stéphane Diier LIP, Université e Lyon (ENS-Lyon/CNRS/INRIA/UCBL) 46, allée Italie, 69364 Lyon Ceex 07 Florent.e.Dinechin@ens-lyon.fr

More information

Queueing Model and Optimization of Packet Dropping in Real-Time Wireless Sensor Networks

Queueing Model and Optimization of Packet Dropping in Real-Time Wireless Sensor Networks Queueing Moel an Optimization of Packet Dropping in Real-Time Wireless Sensor Networks Marc Aoun, Antonios Argyriou, Philips Research, Einhoven, 66AE, The Netherlans Department of Computer an Communication

More information

T2K & HA-PACS Projects Supercomputers at CCS

T2K & HA-PACS Projects Supercomputers at CCS T2K & HA-PACS Projects Supercomputers at CCS Taisuke Boku Deputy Director, HPC Division Center for Computational Sciences University of Tsukuba Two Streams of Supercomputers at CCS Service oriented general

More information

Intensive Hypercube Communication: Prearranged Communication in Link-Bound Machines 1 2

Intensive Hypercube Communication: Prearranged Communication in Link-Bound Machines 1 2 This paper appears in J. of Parallel an Distribute Computing 10 (1990), pp. 167 181. Intensive Hypercube Communication: Prearrange Communication in Link-Boun Machines 1 2 Quentin F. Stout an Bruce Wagar

More information

Coupling the User Interfaces of a Multiuser Program

Coupling the User Interfaces of a Multiuser Program Coupling the User Interfaces of a Multiuser Program PRASUN DEWAN University of North Carolina at Chapel Hill RAJIV CHOUDHARY Intel Corporation We have evelope a new moel for coupling the user-interfaces

More information

Towards a Low-Power Accelerator of Many FPGAs for Stencil Computations

Towards a Low-Power Accelerator of Many FPGAs for Stencil Computations 2012 Thir International Conference on Networking an Computing Towars a Low-Power Accelerator of Many FPGAs for Stencil Computations Ryohei Kobayashi Tokyo Institute of Technology, Japan E-mail: kobayashi@arch.cs.titech.ac.jp

More information

Random Clustering for Multiple Sampling Units to Speed Up Run-time Sample Generation

Random Clustering for Multiple Sampling Units to Speed Up Run-time Sample Generation DEIM Forum 2018 I4-4 Abstract Ranom Clustering for Multiple Sampling Units to Spee Up Run-time Sample Generation uzuru OKAJIMA an Koichi MARUAMA NEC Solution Innovators, Lt. 1-18-7 Shinkiba, Koto-ku, Tokyo,

More information

Shift-map Image Registration

Shift-map Image Registration Shift-map Image Registration Svärm, Linus; Stranmark, Petter Unpublishe: 2010-01-01 Link to publication Citation for publishe version (APA): Svärm, L., & Stranmark, P. (2010). Shift-map Image Registration.

More information

Particle Swarm Optimization Based on Smoothing Approach for Solving a Class of Bi-Level Multiobjective Programming Problem

Particle Swarm Optimization Based on Smoothing Approach for Solving a Class of Bi-Level Multiobjective Programming Problem BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 17, No 3 Sofia 017 Print ISSN: 1311-970; Online ISSN: 1314-4081 DOI: 10.1515/cait-017-0030 Particle Swarm Optimization Base

More information

Transient analysis of wave propagation in 3D soil by using the scaled boundary finite element method

Transient analysis of wave propagation in 3D soil by using the scaled boundary finite element method Southern Cross University epublications@scu 23r Australasian Conference on the Mechanics of Structures an Materials 214 Transient analysis of wave propagation in 3D soil by using the scale bounary finite

More information

Study of Network Optimization Method Based on ACL

Study of Network Optimization Method Based on ACL Available online at www.scienceirect.com Proceia Engineering 5 (20) 3959 3963 Avance in Control Engineering an Information Science Stuy of Network Optimization Metho Base on ACL Liu Zhian * Department

More information

SURVIVABLE IP OVER WDM: GUARANTEEEING MINIMUM NETWORK BANDWIDTH

SURVIVABLE IP OVER WDM: GUARANTEEEING MINIMUM NETWORK BANDWIDTH SURVIVABLE IP OVER WDM: GUARANTEEEING MINIMUM NETWORK BANDWIDTH Galen H Sasaki Dept Elec Engg, U Hawaii 2540 Dole Street Honolul HI 96822 USA Ching-Fong Su Fuitsu Laboratories of America 595 Lawrence Expressway

More information

Performance Modelling of Necklace Hypercubes

Performance Modelling of Necklace Hypercubes erformance Moelling of ecklace ypercubes. Meraji,,. arbazi-aza,, A. atooghy, IM chool of Computer cience & harif University of Technology, Tehran, Iran {meraji, patooghy}@ce.sharif.eu, aza@ipm.ir a Abstract

More information

Fast Fractal Image Compression using PSO Based Optimization Techniques

Fast Fractal Image Compression using PSO Based Optimization Techniques Fast Fractal Compression using PSO Base Optimization Techniques A.Krishnamoorthy Visiting faculty Department Of ECE University College of Engineering panruti rishpci89@gmail.com S.Buvaneswari Visiting

More information

Interconnect Your Future

Interconnect Your Future Interconnect Your Future Gilad Shainer 2nd Annual MVAPICH User Group (MUG) Meeting, August 2014 Complete High-Performance Scalable Interconnect Infrastructure Comprehensive End-to-End Software Accelerators

More information

arxiv: v1 [physics.comp-ph] 4 Nov 2013

arxiv: v1 [physics.comp-ph] 4 Nov 2013 arxiv:1311.0590v1 [physics.comp-ph] 4 Nov 2013 Performance of Kepler GTX Titan GPUs and Xeon Phi System, Weonjong Lee, and Jeonghwan Pak Lattice Gauge Theory Research Center, CTP, and FPRD, Department

More information

GPU-centric communication for improved efficiency

GPU-centric communication for improved efficiency GPU-centric communication for improved efficiency Benjamin Klenk *, Lena Oden, Holger Fröning * * Heidelberg University, Germany Fraunhofer Institute for Industrial Mathematics, Germany GPCDP Workshop

More information

Supporting Fully Adaptive Routing in InfiniBand Networks

Supporting Fully Adaptive Routing in InfiniBand Networks XIV JORNADAS DE PARALELISMO - LEGANES, SEPTIEMBRE 200 1 Supporting Fully Aaptive Routing in InfiniBan Networks J.C. Martínez, J. Flich, A. Robles, P. López an J. Duato Resumen InfiniBan is a new stanar

More information

Computer Organization

Computer Organization Computer Organization Douglas Comer Computer Science Department Purue University 250 N. University Street West Lafayette, IN 47907-2066 http://www.cs.purue.eu/people/comer Copyright 2006. All rights reserve.

More information

Exploiting GPU Caches in Sparse Matrix Vector Multiplication. Yusuke Nagasaka Tokyo Institute of Technology

Exploiting GPU Caches in Sparse Matrix Vector Multiplication. Yusuke Nagasaka Tokyo Institute of Technology Exploiting GPU Caches in Sparse Matrix Vector Multiplication Yusuke Nagasaka Tokyo Institute of Technology Sparse Matrix Generated by FEM, being as the graph data Often require solving sparse linear equation

More information

5th International Conference on Advanced Design and Manufacturing Engineering (ICADME 2015)

5th International Conference on Advanced Design and Manufacturing Engineering (ICADME 2015) 5th International Conference on Avance Design an Manufacturing Engineering (ICADME 25) Research on motion characteristics an application of multi egree of freeom mechanism base on R-W metho Xiao-guang

More information

An Algorithm for Building an Enterprise Network Topology Using Widespread Data Sources

An Algorithm for Building an Enterprise Network Topology Using Widespread Data Sources An Algorithm for Builing an Enterprise Network Topology Using Wiesprea Data Sources Anton Anreev, Iurii Bogoiavlenskii Petrozavosk State University Petrozavosk, Russia {anreev, ybgv}@cs.petrsu.ru Abstract

More information

Offloading Cellular Traffic through Opportunistic Communications: Analysis and Optimization

Offloading Cellular Traffic through Opportunistic Communications: Analysis and Optimization 1 Offloaing Cellular Traffic through Opportunistic Communications: Analysis an Optimization Vincenzo Sciancalepore, Domenico Giustiniano, Albert Banchs, Anreea Picu arxiv:1405.3548v1 [cs.ni] 14 May 24

More information

Lab work #8. Congestion control

Lab work #8. Congestion control TEORÍA DE REDES DE TELECOMUNICACIONES Grao en Ingeniería Telemática Grao en Ingeniería en Sistemas e Telecomunicación Curso 2015-2016 Lab work #8. Congestion control (1 session) Author: Pablo Pavón Mariño

More information

MILC Performance Benchmark and Profiling. April 2013

MILC Performance Benchmark and Profiling. April 2013 MILC Performance Benchmark and Profiling April 2013 Note The following research was performed under the HPC Advisory Council activities Special thanks for: HP, Mellanox For more information on the supporting

More information

2008 International ANSYS Conference

2008 International ANSYS Conference 2008 International ANSYS Conference Maximizing Productivity With InfiniBand-Based Clusters Gilad Shainer Director of Technical Marketing Mellanox Technologies 2008 ANSYS, Inc. All rights reserved. 1 ANSYS,

More information

Cluster Center Initialization Method for K-means Algorithm Over Data Sets with Two Clusters

Cluster Center Initialization Method for K-means Algorithm Over Data Sets with Two Clusters Available online at www.scienceirect.com Proceia Engineering 4 (011 ) 34 38 011 International Conference on Avances in Engineering Cluster Center Initialization Metho for K-means Algorithm Over Data Sets

More information

Shift-map Image Registration

Shift-map Image Registration Shift-map Image Registration Linus Svärm Petter Stranmark Centre for Mathematical Sciences, Lun University {linus,petter}@maths.lth.se Abstract Shift-map image processing is a new framework base on energy

More information

Classifying Facial Expression with Radial Basis Function Networks, using Gradient Descent and K-means

Classifying Facial Expression with Radial Basis Function Networks, using Gradient Descent and K-means Classifying Facial Expression with Raial Basis Function Networks, using Graient Descent an K-means Neil Allrin Department of Computer Science University of California, San Diego La Jolla, CA 9237 nallrin@cs.ucs.eu

More information

A Multi-class SVM Classifier Utilizing Binary Decision Tree

A Multi-class SVM Classifier Utilizing Binary Decision Tree Informatica 33 (009) 33-41 33 A Multi-class Classifier Utilizing Binary Decision Tree Gjorgji Mazarov, Dejan Gjorgjevikj an Ivan Chorbev Department of Computer Science an Engineering Faculty of Electrical

More information

Multilevel Linear Dimensionality Reduction using Hypergraphs for Data Analysis

Multilevel Linear Dimensionality Reduction using Hypergraphs for Data Analysis Multilevel Linear Dimensionality Reuction using Hypergraphs for Data Analysis Haw-ren Fang Department of Computer Science an Engineering University of Minnesota; Minneapolis, MN 55455 hrfang@csumneu ABSTRACT

More information

Non-homogeneous Generalization in Privacy Preserving Data Publishing

Non-homogeneous Generalization in Privacy Preserving Data Publishing Non-homogeneous Generalization in Privacy Preserving Data Publishing W. K. Wong, Nios Mamoulis an Davi W. Cheung Department of Computer Science, The University of Hong Kong Pofulam Roa, Hong Kong {wwong2,nios,cheung}@cs.hu.h

More information

Parallel Directionally Split Solver Based on Reformulation of Pipelined Thomas Algorithm

Parallel Directionally Split Solver Based on Reformulation of Pipelined Thomas Algorithm NASA/CR-1998-208733 ICASE Report No. 98-45 Parallel Directionally Split Solver Base on Reformulation of Pipeline Thomas Algorithm A. Povitsky ICASE, Hampton, Virginia Institute for Computer Applications

More information

Generalized Edge Coloring for Channel Assignment in Wireless Networks

Generalized Edge Coloring for Channel Assignment in Wireless Networks Generalize Ege Coloring for Channel Assignment in Wireless Networks Chun-Chen Hsu Institute of Information Science Acaemia Sinica Taipei, Taiwan Da-wei Wang Jan-Jan Wu Institute of Information Science

More information

Socially-optimal ISP-aware P2P Content Distribution via a Primal-Dual Approach

Socially-optimal ISP-aware P2P Content Distribution via a Primal-Dual Approach Socially-optimal ISP-aware P2P Content Distribution via a Primal-Dual Approach Jian Zhao, Chuan Wu The University of Hong Kong {jzhao,cwu}@cs.hku.hk Abstract Peer-to-peer (P2P) technology is popularly

More information

On the Role of Multiply Sectioned Bayesian Networks to Cooperative Multiagent Systems

On the Role of Multiply Sectioned Bayesian Networks to Cooperative Multiagent Systems On the Role of Multiply Sectione Bayesian Networks to Cooperative Multiagent Systems Y. Xiang University of Guelph, Canaa, yxiang@cis.uoguelph.ca V. Lesser University of Massachusetts at Amherst, USA,

More information

Skyline Community Search in Multi-valued Networks

Skyline Community Search in Multi-valued Networks Syline Community Search in Multi-value Networs Rong-Hua Li Beijing Institute of Technology Beijing, China lironghuascut@gmail.com Jeffrey Xu Yu Chinese University of Hong Kong Hong Kong, China yu@se.cuh.eu.h

More information

EFFICIENT ON-LINE TESTING METHOD FOR A FLOATING-POINT ADDER

EFFICIENT ON-LINE TESTING METHOD FOR A FLOATING-POINT ADDER FFICINT ON-LIN TSTING MTHOD FOR A FLOATING-POINT ADDR A. Droz, M. Lobachev Department of Computer Systems, Oessa State Polytechnic University, Oessa, Ukraine Droz@ukr.net, Lobachev@ukr.net Abstract In

More information

Tight Wavelet Frame Decomposition and Its Application in Image Processing

Tight Wavelet Frame Decomposition and Its Application in Image Processing ITB J. Sci. Vol. 40 A, No., 008, 151-165 151 Tight Wavelet Frame Decomposition an Its Application in Image Processing Mahmu Yunus 1, & Henra Gunawan 1 1 Analysis an Geometry Group, FMIPA ITB, Banung Department

More information

Latest Advances in MVAPICH2 MPI Library for NVIDIA GPU Clusters with InfiniBand

Latest Advances in MVAPICH2 MPI Library for NVIDIA GPU Clusters with InfiniBand Latest Advances in MVAPICH2 MPI Library for NVIDIA GPU Clusters with InfiniBand Presentation at GTC 2014 by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu http://www.cse.ohio-state.edu/~panda

More information

AnyTraffic Labeled Routing

AnyTraffic Labeled Routing AnyTraffic Labele Routing Dimitri Papaimitriou 1, Pero Peroso 2, Davie Careglio 2 1 Alcatel-Lucent Bell, Antwerp, Belgium Email: imitri.papaimitriou@alcatel-lucent.com 2 Universitat Politècnica e Catalunya,

More information

MORA: a Movement-Based Routing Algorithm for Vehicle Ad Hoc Networks

MORA: a Movement-Based Routing Algorithm for Vehicle Ad Hoc Networks : a Movement-Base Routing Algorithm for Vehicle A Hoc Networks Fabrizio Granelli, Senior Member, Giulia Boato, Member, an Dzmitry Kliazovich, Stuent Member Abstract Recent interest in car-to-car communications

More information

The Journal of Systems and Software

The Journal of Systems and Software The Journal of Systems an Software 83 (010) 1864 187 Contents lists available at ScienceDirect The Journal of Systems an Software journal homepage: www.elsevier.com/locate/jss Embeing capacity raising

More information

Disjoint Multipath Routing in Dual Homing Networks using Colored Trees

Disjoint Multipath Routing in Dual Homing Networks using Colored Trees Disjoint Multipath Routing in Dual Homing Networks using Colore Trees Preetha Thulasiraman, Srinivasan Ramasubramanian, an Marwan Krunz Department of Electrical an Computer Engineering University of Arizona,

More information

A New Search Algorithm for Solving Symmetric Traveling Salesman Problem Based on Gravity

A New Search Algorithm for Solving Symmetric Traveling Salesman Problem Based on Gravity Worl Applie Sciences Journal 16 (10): 1387-1392, 2012 ISSN 1818-4952 IDOSI Publications, 2012 A New Search Algorithm for Solving Symmetric Traveling Salesman Problem Base on Gravity Aliasghar Rahmani Hosseinabai,

More information

Exploiting Full Potential of GPU Clusters with InfiniBand using MVAPICH2-GDR

Exploiting Full Potential of GPU Clusters with InfiniBand using MVAPICH2-GDR Exploiting Full Potential of GPU Clusters with InfiniBand using MVAPICH2-GDR Presentation at Mellanox Theater () Dhabaleswar K. (DK) Panda - The Ohio State University panda@cse.ohio-state.edu Outline Communication

More information

New Geometric Interpretation and Analytic Solution for Quadrilateral Reconstruction

New Geometric Interpretation and Analytic Solution for Quadrilateral Reconstruction New Geometric Interpretation an Analytic Solution for uarilateral Reconstruction Joo-Haeng Lee Convergence Technology Research Lab ETRI Daejeon, 305 777, KOREA Abstract A new geometric framework, calle

More information

Effect of GPU Communication-Hiding for SpMV Using OpenACC

Effect of GPU Communication-Hiding for SpMV Using OpenACC ICCM2014 28-30 th July, Cambridge, England Effect of GPU Communication-Hiding for SpMV Using OpenACC *Olav Aanes Fagerlund¹, Takeshi Kitayama 2,3, Gaku Hashimoto 2 and Hiroshi Okuda 2 1 Department of Systems

More information

n N c CIni.o ewsrg.au

n N c CIni.o ewsrg.au @NCInews NCI and Raijin National Computational Infrastructure 2 Our Partners General purpose, highly parallel processors High FLOPs/watt and FLOPs/$ Unit of execution Kernel Separate memory subsystem GPGPU

More information

Indexing the Edges A simple and yet efficient approach to high-dimensional indexing

Indexing the Edges A simple and yet efficient approach to high-dimensional indexing Inexing the Eges A simple an yet efficient approach to high-imensional inexing Beng Chin Ooi Kian-Lee Tan Cui Yu Stephane Bressan Department of Computer Science National University of Singapore 3 Science

More information

CUDA Accelerated Linpack on Clusters. E. Phillips, NVIDIA Corporation

CUDA Accelerated Linpack on Clusters. E. Phillips, NVIDIA Corporation CUDA Accelerated Linpack on Clusters E. Phillips, NVIDIA Corporation Outline Linpack benchmark CUDA Acceleration Strategy Fermi DGEMM Optimization / Performance Linpack Results Conclusions LINPACK Benchmark

More information

Image Segmentation using K-means clustering and Thresholding

Image Segmentation using K-means clustering and Thresholding Image Segmentation using Kmeans clustering an Thresholing Preeti Panwar 1, Girhar Gopal 2, Rakesh Kumar 3 1M.Tech Stuent, Department of Computer Science & Applications, Kurukshetra University, Kurukshetra,

More information

Reversible Image Watermarking using Bit Plane Coding and Lifting Wavelet Transform

Reversible Image Watermarking using Bit Plane Coding and Lifting Wavelet Transform 250 IJCSNS International Journal of Computer Science an Network Security, VOL.9 No.11, November 2009 Reversible Image Watermarking using Bit Plane Coing an Lifting Wavelet Transform S. Kurshi Jinna, Dr.

More information

6 Gradient Descent. 6.1 Functions

6 Gradient Descent. 6.1 Functions 6 Graient Descent In this topic we will iscuss optimizing over general functions f. Typically the function is efine f : R! R; that is its omain is multi-imensional (in this case -imensional) an output

More information

A Duality Based Approach for Realtime TV-L 1 Optical Flow

A Duality Based Approach for Realtime TV-L 1 Optical Flow A Duality Base Approach for Realtime TV-L 1 Optical Flow C. Zach 1, T. Pock 2, an H. Bischof 2 1 VRVis Research Center 2 Institute for Computer Graphics an Vision, TU Graz Abstract. Variational methos

More information

Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA

Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA Pak Lui, Gilad Shainer, Brian Klaff Mellanox Technologies Abstract From concept to

More information

Backpressure-based Packet-by-Packet Adaptive Routing in Communication Networks

Backpressure-based Packet-by-Packet Adaptive Routing in Communication Networks 1 Backpressure-base Packet-by-Packet Aaptive Routing in Communication Networks Eleftheria Athanasopoulou, Loc Bui, Tianxiong Ji, R. Srikant, an Alexaner Stolyar Abstract Backpressure-base aaptive routing

More information

Learning Subproblem Complexities in Distributed Branch and Bound

Learning Subproblem Complexities in Distributed Branch and Bound Learning Subproblem Complexities in Distribute Branch an Boun Lars Otten Department of Computer Science University of California, Irvine lotten@ics.uci.eu Rina Dechter Department of Computer Science University

More information

Investigation into a new incremental forming process using an adjustable punch set for the manufacture of a doubly curved sheet metal

Investigation into a new incremental forming process using an adjustable punch set for the manufacture of a doubly curved sheet metal 991 Investigation into a new incremental forming process using an ajustable punch set for the manufacture of a oubly curve sheet metal S J Yoon an D Y Yang* Department of Mechanical Engineering, Korea

More information

An Extension of XcalableMP PGAS Lanaguage for Multi-node GPU Clusters

An Extension of XcalableMP PGAS Lanaguage for Multi-node GPU Clusters An Extension of XcalableMP PGAS Lanaguage for Multi-node Clusters Jinpil Lee, Minh Tuan Tran, Tetsuya Odajima, Taisuke Boku and Mitsuhisa Sato University of Tsukuba 1 Presentation Overview l Introduction

More information

Support for GPUs with GPUDirect RDMA in MVAPICH2 SC 13 NVIDIA Booth

Support for GPUs with GPUDirect RDMA in MVAPICH2 SC 13 NVIDIA Booth Support for GPUs with GPUDirect RDMA in MVAPICH2 SC 13 NVIDIA Booth by D.K. Panda The Ohio State University E-mail: panda@cse.ohio-state.edu http://www.cse.ohio-state.edu/~panda Outline Overview of MVAPICH2-GPU

More information

Overview. Operating Systems I. Simple Memory Management. Simple Memory Management. Multiprocessing w/fixed Partitions.

Overview. Operating Systems I. Simple Memory Management. Simple Memory Management. Multiprocessing w/fixed Partitions. Overview Operating Systems I Management Provie Services processes files Manage Devices processor memory isk Simple Management One process in memory, using it all each program nees I/O rivers until 96 I/O

More information

On Effectively Determining the Downlink-to-uplink Sub-frame Width Ratio for Mobile WiMAX Networks Using Spline Extrapolation

On Effectively Determining the Downlink-to-uplink Sub-frame Width Ratio for Mobile WiMAX Networks Using Spline Extrapolation On Effectively Determining the Downlink-to-uplink Sub-frame With Ratio for Mobile WiMAX Networks Using Spline Extrapolation Panagiotis Sarigianniis, Member, IEEE, Member Malamati Louta, Member, IEEE, Member

More information

CPMD Performance Benchmark and Profiling. February 2014

CPMD Performance Benchmark and Profiling. February 2014 CPMD Performance Benchmark and Profiling February 2014 Note The following research was performed under the HPC Advisory Council activities Special thanks for: HP, Mellanox For more information on the supporting

More information

A fast embedded selection approach for color texture classification using degraded LBP

A fast embedded selection approach for color texture classification using degraded LBP A fast embee selection approach for color texture classification using egrae A. Porebski, N. Vanenbroucke an D. Hama Laboratoire LISIC - EA 4491 - Université u Littoral Côte Opale - 50, rue Ferinan Buisson

More information

Adjacency Matrix Based Full-Text Indexing Models

Adjacency Matrix Based Full-Text Indexing Models 1000-9825/2002/13(10)1933-10 2002 Journal of Software Vol.13, No.10 Ajacency Matrix Base Full-Text Inexing Moels ZHOU Shui-geng 1, HU Yun-fa 2, GUAN Ji-hong 3 1 (Department of Computer Science an Engineering,

More information

filtering LETTER An Improved Neighbor Selection Algorithm in Collaborative Taek-Hun KIM a), Student Member and Sung-Bong YANG b), Nonmember

filtering LETTER An Improved Neighbor Selection Algorithm in Collaborative Taek-Hun KIM a), Student Member and Sung-Bong YANG b), Nonmember 107 IEICE TRANS INF & SYST, VOLE88 D, NO5 MAY 005 LETTER An Improve Neighbor Selection Algorithm in Collaborative Filtering Taek-Hun KIM a), Stuent Member an Sung-Bong YANG b), Nonmember SUMMARY Nowaays,

More information

HA-PACS Project Challenge for Next Step of Accelerating Computing

HA-PACS Project Challenge for Next Step of Accelerating Computing HA-PAS Project hallenge for Next Step of Accelerating omputing Taisuke Boku enter for omputational Sciences University of Tsukuba taisuke@cs.tsukuba.ac.jp 1 Outline of talk Introduction of S, U. Tsukuba

More information

ETNA Kent State University

ETNA Kent State University Electronic Transactions on Numerical Analysis. Volume 6, pp. 246-254, December 997. Copyright 997,. ISSN 068-963. ETNA etna@mcs.kent.eu FAST SOLUTION OF MSC/NASTRAN SPARSE MATRIX PROBLEMS USING A MULTILEVEL

More information

Secure Network Coding for Distributed Secret Sharing with Low Communication Cost

Secure Network Coding for Distributed Secret Sharing with Low Communication Cost Secure Network Coing for Distribute Secret Sharing with Low Communication Cost Nihar B. Shah, K. V. Rashmi an Kannan Ramchanran, Fellow, IEEE Abstract Shamir s (n,k) threshol secret sharing is an important

More information

EDOVE: Energy and Depth Variance-Based Opportunistic Void Avoidance Scheme for Underwater Acoustic Sensor Networks

EDOVE: Energy and Depth Variance-Based Opportunistic Void Avoidance Scheme for Underwater Acoustic Sensor Networks sensors Article EDOVE: Energy an Depth Variance-Base Opportunistic Voi Avoiance Scheme for Unerwater Acoustic Sensor Networks Safar Hussain Bouk 1, *, Sye Hassan Ahme 2, Kyung-Joon Park 1 an Yongsoon Eun

More information

An Adaptive Routing Algorithm for Communication Networks using Back Pressure Technique

An Adaptive Routing Algorithm for Communication Networks using Back Pressure Technique International OPEN ACCESS Journal Of Moern Engineering Research (IJMER) An Aaptive Routing Algorithm for Communication Networks using Back Pressure Technique Khasimpeera Mohamme 1, K. Kalpana 2 1 M. Tech

More information

OPEN MPI WITH RDMA SUPPORT AND CUDA. Rolf vandevaart, NVIDIA

OPEN MPI WITH RDMA SUPPORT AND CUDA. Rolf vandevaart, NVIDIA OPEN MPI WITH RDMA SUPPORT AND CUDA Rolf vandevaart, NVIDIA OVERVIEW What is CUDA-aware History of CUDA-aware support in Open MPI GPU Direct RDMA support Tuning parameters Application example Future work

More information

Chapter 9 Memory Management

Chapter 9 Memory Management Contents 1. Introuction 2. Computer-System Structures 3. Operating-System Structures 4. Processes 5. Threas 6. CPU Scheuling 7. Process Synchronization 8. Dealocks 9. Memory Management 10.Virtual Memory

More information

Handling missing values in kernel methods with application to microbiology data

Handling missing values in kernel methods with application to microbiology data an Machine Learning. Bruges (Belgium), 24-26 April 2013, i6oc.com publ., ISBN 978-2-87419-081-0. Available from http://www.i6oc.com/en/livre/?gcoi=28001100131010. Hanling missing values in kernel methos

More information

Lecture 1 September 4, 2013

Lecture 1 September 4, 2013 CS 84r: Incentives an Information in Networks Fall 013 Prof. Yaron Singer Lecture 1 September 4, 013 Scribe: Bo Waggoner 1 Overview In this course we will try to evelop a mathematical unerstaning for the

More information

Using Vector and Raster-Based Techniques in Categorical Map Generalization

Using Vector and Raster-Based Techniques in Categorical Map Generalization Thir ICA Workshop on Progress in Automate Map Generalization, Ottawa, 12-14 August 1999 1 Using Vector an Raster-Base Techniques in Categorical Map Generalization Beat Peter an Robert Weibel Department

More information

Message Transport With The User Datagram Protocol

Message Transport With The User Datagram Protocol Message Transport With The User Datagram Protocol User Datagram Protocol (UDP) Use During startup For VoIP an some vieo applications Accounts for less than 10% of Internet traffic Blocke by some ISPs Computer

More information

Robust PIM-SM Multicasting using Anycast RP in Wireless Ad Hoc Networks

Robust PIM-SM Multicasting using Anycast RP in Wireless Ad Hoc Networks Robust PIM-SM Multicasting using Anycast RP in Wireless A Hoc Networks Jaewon Kang, John Sucec, Vikram Kaul, Sunil Samtani an Mariusz A. Fecko Applie Research, Telcoria Technologies One Telcoria Drive,

More information

On the Placement of Internet Taps in Wireless Neighborhood Networks

On the Placement of Internet Taps in Wireless Neighborhood Networks 1 On the Placement of Internet Taps in Wireless Neighborhoo Networks Lili Qiu, Ranveer Chanra, Kamal Jain, Mohamma Mahian Abstract Recently there has emerge a novel application of wireless technology that

More information

Learning convex bodies is hard

Learning convex bodies is hard Learning convex boies is har Navin Goyal Microsoft Research Inia navingo@microsoftcom Luis Raemacher Georgia Tech lraemac@ccgatecheu Abstract We show that learning a convex boy in R, given ranom samples

More information

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D.

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D. Resources Current and Future Systems Timothy H. Kaiser, Ph.D. tkaiser@mines.edu 1 Most likely talk to be out of date History of Top 500 Issues with building bigger machines Current and near future academic

More information

Optimized Non-contiguous MPI Datatype Communication for GPU Clusters: Design, Implementation and Evaluation with MVAPICH2

Optimized Non-contiguous MPI Datatype Communication for GPU Clusters: Design, Implementation and Evaluation with MVAPICH2 Optimized Non-contiguous MPI Datatype Communication for GPU Clusters: Design, Implementation and Evaluation with MVAPICH2 H. Wang, S. Potluri, M. Luo, A. K. Singh, X. Ouyang, S. Sur, D. K. Panda Network-Based

More information

BIJECTIONS FOR PLANAR MAPS WITH BOUNDARIES

BIJECTIONS FOR PLANAR MAPS WITH BOUNDARIES BIJECTIONS FOR PLANAR MAPS WITH BOUNDARIES OLIVIER BERNARDI AND ÉRIC FUSY Abstract. We present bijections for planar maps with bounaries. In particular, we obtain bijections for triangulations an quarangulations

More information

Improving Spatial Reuse of IEEE Based Ad Hoc Networks

Improving Spatial Reuse of IEEE Based Ad Hoc Networks mproving Spatial Reuse of EEE 82.11 Base A Hoc Networks Fengji Ye, Su Yi an Biplab Sikar ECSE Department, Rensselaer Polytechnic nstitute Troy, NY 1218 Abstract n this paper, we evaluate an suggest methos

More information

Online Appendix to: Generalizing Database Forensics

Online Appendix to: Generalizing Database Forensics Online Appenix to: Generalizing Database Forensics KYRIACOS E. PAVLOU an RICHARD T. SNODGRASS, University of Arizona This appenix presents a step-by-step iscussion of the forensic analysis protocol that

More information

1 Surprises in high dimensions

1 Surprises in high dimensions 1 Surprises in high imensions Our intuition about space is base on two an three imensions an can often be misleaing in high imensions. It is instructive to analyze the shape an properties of some basic

More information

Advanced method of NC programming for 5-axis machining

Advanced method of NC programming for 5-axis machining Available online at www.scienceirect.com Proceia CIRP (0 ) 0 07 5 th CIRP Conference on High Performance Cutting 0 Avance metho of NC programming for 5-axis machining Sergej N. Grigoriev a, A.A. Kutin

More information

Overview : Computer Networking. IEEE MAC Protocol: CSMA/CA Internet mobility TCP over noisy links

Overview : Computer Networking. IEEE MAC Protocol: CSMA/CA Internet mobility TCP over noisy links Overview 15-441 15-441: Computer Networking 15-641 Lecture 24: Wireless Eric Anerson Fall 2014 www.cs.cmu.eu/~prs/15-441-f14 Internet mobility TCP over noisy links Link layer challenges an WiFi Cellular

More information

Enabling Efficient Use of UPC and OpenSHMEM PGAS models on GPU Clusters

Enabling Efficient Use of UPC and OpenSHMEM PGAS models on GPU Clusters Enabling Efficient Use of UPC and OpenSHMEM PGAS models on GPU Clusters Presentation at GTC 2014 by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu http://www.cse.ohio-state.edu/~panda

More information

Backpressure-based Packet-by-Packet Adaptive Routing in Communication Networks

Backpressure-based Packet-by-Packet Adaptive Routing in Communication Networks 1 Backpressure-base Packet-by-Packet Aaptive Routing in Communication Networks Eleftheria Athanasopoulou, Loc Bui, Tianxiong Ji, R. Srikant, an Alexaner Stoylar arxiv:15.4984v1 [cs.ni] 27 May 21 Abstract

More information

PDE-BASED INTERPOLATION METHOD FOR OPTICALLY VISUALIZED SOUND FIELD. Kohei Yatabe and Yasuhiro Oikawa

PDE-BASED INTERPOLATION METHOD FOR OPTICALLY VISUALIZED SOUND FIELD. Kohei Yatabe and Yasuhiro Oikawa 4 IEEE International Conference on Acoustic, Speech an Signal Processing (ICASSP) PDE-BASED INTERPOLATION METHOD FOR OPTICALLY VISUALIZED SOUND FIELD Kohei Yatabe an Yasuhiro Oikawa Department of Intermeia

More information