A Simulated Annealing algorithm for GPU clusters
|
|
- Sophie Franklin
- 5 years ago
- Views:
Transcription
1 A Simulated Annealing algorithm for GPU clusters Institute of Computer Science Warsaw University of Technology Parallel Processing and Applied Mathematics 2011
2 1 Introduction 2 3 The lower level The upper level Results 4
3 Simulated Annealing (1) Metaheuristic for the global optimization problem Adaptation of Metropolis-Hastings Effective for computationally hard (NP-hard) problems
4 Simulated Annealing (2) Metropolis-Hastings algorithm On start: pick x min = x 0 R n, L < x 0 < U Find x i+1 based on x min If f(x i+1 ) < f(x min ) then x min = x i+1 Else x min = x i+1 with probability p
5 Simulated Annealing (2) Metropolis-Hastings algorithm On start: pick x min = x 0 R n, L < x 0 < U Find x i+1 based on x min If f(x i+1 ) < f(x min ) then x min = x i+1 Else x min = x i+1 with probability p Simulated Annealing The p value depends on the temperature The temperature is reduced whenever the suboptimal solution is accepted
6 Basic concepts of the processing platform (1) Two-level hierarchical architecture (master-slave)
7 Basic concepts of the processing platform (1) MASTER Upper level SLAVES Lower level CUDA Threads CUDA Threads CUDA Threads GPUs
8 Basic concepts of the processing platform (2) Assumption: Asynchronous communication on the upper level
9 Basic concepts of the processing platform (2) Assumption: Asynchronous communication on the upper level Implications: No direct cooperation between nodes (nor GPU threads)
10 Basic concepts of the processing platform (2) Assumption: Asynchronous communication on the upper level Implications: No direct cooperation between nodes (nor GPU threads) But: the system may be heterogeneous
11 Basic concepts of the processing platform (2) Asynchronous Synchronous or sequential Fast GPU Slow GPU
12 The lower level Introduction The lower level The upper level Results Initial solution computed on CPU Each thread perform m steps of sequential SA modifying only the d-th dimension (d = ID mod n) All solutions are adaptively composed into the new one
13 The upper level Introduction The lower level The upper level Results Whenever a result is obtained: The result is adaptively combined to the previous one If the new result is better, it is accepted
14 Test environment Introduction The lower level The upper level Results Comparison with traditional parallel SA (MHCS) MHCS: Sun Fire X4600M2 (8x AMD Opteron 8218) TLSA: 16x NVIDIA GeForce 130 GT n-dimensional test functions (n {10, 100})
15 The lower level The upper level Results Time results, n = 10 (mean of 5 runs) Function GPU cluster [s] Sun Fire [s] Ackley > 4min De Jong Giewangk > 4min Michalewicz Rastrigin > 4min Rosenbrock Schwefel
16 The lower level The upper level Results Time results, n = 100 (mean of 5 runs) Function GPU cluster [s] Sun Fire [s] Ackley De Jong Giewangk Michalewicz > 4h Rastrigin Rosenbrock Schwefel
17 Scalability (1) Introduction The lower level The upper level Results Ideal and obtained speedups Ideal Experiment 12 Speedup Number of machines
18 Scalability (2) Introduction The lower level The upper level Results Execution times for various cluster sizes 1 machine 2 machines 8 machines 16 machines Normalized time Number of thread blocks per machine
19 Scalability (3) Introduction The lower level The upper level Results Execution times for various GPUs 1x GeForce 130 GT 1x GeForce GTX Normalized time Number of thread blocks
20 Introduction Good efficiency for complex problems Higher versatility (heterogeneous environment) Almost ideal scalability w.r.t. number of machines (up to some point)
21 Thank you!
22 Time results, n = 10 Function GPU cluster Sun Fire m [s] σ [s] m [s] σ [s] Ackley > 4min 0 De Jong Giewangk > 4min 0 Michalewicz Rastrigin > 4min 0 Rosenbrock Schwefel
23 Time results, n = 100 Function GPU cluster Sun Fire m [s] σ [s] m [s] σ [s] Ackley De Jong Giewangk Michalewicz > 4h 0 Rastrigin Rosenbrock Schwefel
24 Results, n = 100 Function GPU Cluster Sun Fire name minimum m σ m σ Ackley De Jong Giewangk Michalewicz? Rastrigin Rosenbrock Schwefel
25 Results, n = 10 Function GPU Cluster Sun Fire name minimum m σ m σ Ackley De Jong Giewangk Michalewicz? Rastrigin Rosenbrock Schwefel
26 Observations Introduction More threads more SA steps in parallel Best when the number of dimensions number of threads What with other cases?
Approaches to Parallel Computing
Approaches to Parallel Computing K. Cooper 1 1 Department of Mathematics Washington State University 2019 Paradigms Concept Many hands make light work... Set several processors to work on separate aspects
More informationOpenCL implementation of PSO: a comparison between multi-core CPU and GPU performances
OpenCL implementation of PSO: a comparison between multi-core CPU and GPU performances Stefano Cagnoni 1, Alessandro Bacchini 1,2, Luca Mussi 1 1 Dept. of Information Engineering, University of Parma,
More informationNVIDIA s Compute Unified Device Architecture (CUDA)
NVIDIA s Compute Unified Device Architecture (CUDA) Mike Bailey mjb@cs.oregonstate.edu Reaching the Promised Land NVIDIA GPUs CUDA Knights Corner Speed Intel CPUs General Programmability 1 History of GPU
More informationNVIDIA s Compute Unified Device Architecture (CUDA)
NVIDIA s Compute Unified Device Architecture (CUDA) Mike Bailey mjb@cs.oregonstate.edu Reaching the Promised Land NVIDIA GPUs CUDA Knights Corner Speed Intel CPUs General Programmability History of GPU
More informationThe Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System
The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System Alan Humphrey, Qingyu Meng, Martin Berzins Scientific Computing and Imaging Institute & University of Utah I. Uintah Overview
More informationHiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and Parallelization of LDPC codes.
HiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and Parallelization of LDPC codes Ian Glendinning Outline NVIDIA GPU cards CUDA & OpenCL Parallel Implementation
More informationA Computation Time Comparison of Self-Organising Migrating Algorithm in Java and C#
A Computation Time Comparison of Self-Organising Migrating Algorithm in Java and C# JAN KOLEK 1, PAVEL VAŘACHA 2, ROMAN JAŠEK 3 Tomas Bata University in Zlin Faculty of Applied Informatics nam. T.G..Masaryka
More informationUsing Genetic Algorithms to solve the Minimum Labeling Spanning Tree Problem
Using to solve the Minimum Labeling Spanning Tree Problem Final Presentation, oliverr@umd.edu Advisor: Dr Bruce L. Golden, bgolden@rhsmith.umd.edu R. H. Smith School of Business (UMD) May 3, 2012 1 / 42
More informationFlux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters
Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters Manfred Liebmann Technische Universität München Chair of Optimal Control Center for Mathematical Sciences,
More informationMathematical computations with GPUs
Master Educational Program Information technology in applications Mathematical computations with GPUs GPU architecture Alexey A. Romanenko arom@ccfit.nsu.ru Novosibirsk State University GPU Graphical Processing
More informationFacial Recognition Using Neural Networks over GPGPU
Facial Recognition Using Neural Networks over GPGPU V Latin American Symposium on High Performance Computing Juan Pablo Balarini, Martín Rodríguez and Sergio Nesmachnow Centro de Cálculo, Facultad de Ingeniería
More informationGPU Parallelization of Gibbs Sampling Abstractions, Results, and Lessons Learned Alireza S Mahani Scientific Computing Group Sentrana Inc.
GPU Parallelization of Gibbs Sampling Abstractions, Results, and Lessons Learned Alireza S Mahani Scientific Computing Group Sentrana Inc. May 16, 2012 Objectives of This Talk What This Talk Is About What
More informationFlux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters
Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters Manfred Liebmann Technische Universität München Chair of Optimal Control Center for Mathematical Sciences,
More informationParallel Architectures
Parallel Architectures Part 1: The rise of parallel machines Intel Core i7 4 CPU cores 2 hardware thread per core (8 cores ) Lab Cluster Intel Xeon 4/10/16/18 CPU cores 2 hardware thread per core (8/20/32/36
More informationHigh Performance Computing. Taichiro Suzuki Tokyo Institute of Technology Dept. of mathematical and computing sciences Matsuoka Lab.
High Performance Computing Taichiro Suzuki Tokyo Institute of Technology Dept. of mathematical and computing sciences Matsuoka Lab. 1 Review Paper Two-Level Checkpoint/Restart Modeling for GPGPU Supada
More informationHPC with GPU and its applications from Inspur. Haibo Xie, Ph.D
HPC with GPU and its applications from Inspur Haibo Xie, Ph.D xiehb@inspur.com 2 Agenda I. HPC with GPU II. YITIAN solution and application 3 New Moore s Law 4 HPC? HPC stands for High Heterogeneous Performance
More informationECE 8823: GPU Architectures. Objectives
ECE 8823: GPU Architectures Introduction 1 Objectives Distinguishing features of GPUs vs. CPUs Major drivers in the evolution of general purpose GPUs (GPGPUs) 2 1 Chapter 1 Chapter 2: 2.2, 2.3 Reading
More informationEffective Learning and Classification using Random Forest Algorithm CHAPTER 6
CHAPTER 6 Parallel Algorithm for Random Forest Classifier Random Forest classification algorithm can be easily parallelized due to its inherent parallel nature. Being an ensemble, the parallel implementation
More informationBenchmark Functions for the CEC 2008 Special Session and Competition on Large Scale Global Optimization
Benchmark Functions for the CEC 2008 Special Session and Competition on Large Scale Global Optimization K. Tang 1, X. Yao 1, 2, P. N. Suganthan 3, C. MacNish 4, Y. P. Chen 5, C. M. Chen 5, Z. Yang 1 1
More informationCUDA. GPU Computing. K. Cooper 1. 1 Department of Mathematics. Washington State University
GPU Computing K. Cooper 1 1 Department of Mathematics Washington State University 2014 Review of Parallel Paradigms MIMD Computing Multiple Instruction Multiple Data Several separate program streams, each
More informationScalability of Processing on GPUs
Scalability of Processing on GPUs Keith Kelley, CS6260 Final Project Report April 7, 2009 Research description: I wanted to figure out how useful General Purpose GPU computing (GPGPU) is for speeding up
More informationComparison of High-Speed Ray Casting on GPU
Comparison of High-Speed Ray Casting on GPU using CUDA and OpenGL November 8, 2008 NVIDIA 1,2, Andreas Weinlich 1, Holger Scherl 2, Markus Kowarschik 2 and Joachim Hornegger 1 1 Chair of Pattern Recognition
More informationHigh-Performance Data Loading and Augmentation for Deep Neural Network Training
High-Performance Data Loading and Augmentation for Deep Neural Network Training Trevor Gale tgale@ece.neu.edu Steven Eliuk steven.eliuk@gmail.com Cameron Upright c.upright@samsung.com Roadmap 1. The General-Purpose
More informationApplications of Berkeley s Dwarfs on Nvidia GPUs
Applications of Berkeley s Dwarfs on Nvidia GPUs Seminar: Topics in High-Performance and Scientific Computing Team N2: Yang Zhang, Haiqing Wang 05.02.2015 Overview CUDA The Dwarfs Dynamic Programming Sparse
More informationFaster Simulations of the National Airspace System
Faster Simulations of the National Airspace System PK Menon Monish Tandale Sandy Wiraatmadja Optimal Synthesis Inc. Joseph Rios NASA Ames Research Center NVIDIA GPU Technology Conference 2010, San Jose,
More informationFast Interactive Sand Simulation for Gesture Tracking systems Shrenik Lad
Fast Interactive Sand Simulation for Gesture Tracking systems Shrenik Lad Project Guide : Vivek Mehta, Anup Tapadia TouchMagix media labs TouchMagix www.touchmagix.com Interactive display solutions Interactive
More informationAn Oracle Technical White Paper October Sizing Guide for Single Click Configurations of Oracle s MySQL on Sun Fire x86 Servers
An Oracle Technical White Paper October 2011 Sizing Guide for Single Click Configurations of Oracle s MySQL on Sun Fire x86 Servers Introduction... 1 Foundation for an Enterprise Infrastructure... 2 Sun
More informationParallel Computing with MATLAB
Parallel Computing with MATLAB CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University
More informationCS6963: Parallel Programming for GPUs Midterm Exam March 25, 2009
1 CS6963: Parallel Programming for GPUs Midterm Exam March 25, 2009 Instructions: This is an in class, open note exam. Please use the paper provided to submit your responses. You can include additional
More informationIntroduction to CUDA
Introduction to CUDA Overview HW computational power Graphics API vs. CUDA CUDA glossary Memory model, HW implementation, execution Performance guidelines CUDA compiler C/C++ Language extensions Limitations
More informationPerformance potential for simulating spin models on GPU
Performance potential for simulating spin models on GPU Martin Weigel Institut für Physik, Johannes-Gutenberg-Universität Mainz, Germany 11th International NTZ-Workshop on New Developments in Computational
More informationOptimizing Data Locality for Iterative Matrix Solvers on CUDA
Optimizing Data Locality for Iterative Matrix Solvers on CUDA Raymond Flagg, Jason Monk, Yifeng Zhu PhD., Bruce Segee PhD. Department of Electrical and Computer Engineering, University of Maine, Orono,
More informationData Partitioning on Heterogeneous Multicore and Multi-GPU systems Using Functional Performance Models of Data-Parallel Applictions
Data Partitioning on Heterogeneous Multicore and Multi-GPU systems Using Functional Performance Models of Data-Parallel Applictions Ziming Zhong Vladimir Rychkov Alexey Lastovetsky Heterogeneous Computing
More informationAddressing Heterogeneity in Manycore Applications
Addressing Heterogeneity in Manycore Applications RTM Simulation Use Case stephane.bihan@caps-entreprise.com Oil&Gas HPC Workshop Rice University, Houston, March 2008 www.caps-entreprise.com Introduction
More informationCUDA (Compute Unified Device Architecture)
CUDA (Compute Unified Device Architecture) Mike Bailey History of GPU Performance vs. CPU Performance GFLOPS Source: NVIDIA G80 = GeForce 8800 GTX G71 = GeForce 7900 GTX G70 = GeForce 7800 GTX NV40 = GeForce
More informationParallel Computing in Combinatorial Optimization
Parallel Computing in Combinatorial Optimization Bernard Gendron Université de Montréal gendron@iro.umontreal.ca Course Outline Objective: provide an overview of the current research on the design of parallel
More informationOverview of Project's Achievements
PalDMC Parallelised Data Mining Components Final Presentation ESRIN, 12/01/2012 Overview of Project's Achievements page 1 Project Outline Project's objectives design and implement performance optimised,
More informationHow GPUs can find your next hit: Accelerating virtual screening with OpenCL. Simon Krige
How GPUs can find your next hit: Accelerating virtual screening with OpenCL Simon Krige ACS 2013 Agenda > Background > About blazev10 > What is a GPU? > Heterogeneous computing > OpenCL: a framework for
More informationVideocard Benchmarks Over 800,000 Video Cards Benchmarked
Page 1 of 9 Hom e Software Hardware Benchm arks Services Store Support Forum s AboutUs Home» Video Card Benchmarks» Mid to High Range Video Cards CPU Benchmarks Video Card Benchmarks Hard Drive Benchmarks
More informationWhat is Parallel Computing?
What is Parallel Computing? Parallel Computing is several processing elements working simultaneously to solve a problem faster. 1/33 What is Parallel Computing? Parallel Computing is several processing
More informationCafeGPI. Single-Sided Communication for Scalable Deep Learning
CafeGPI Single-Sided Communication for Scalable Deep Learning Janis Keuper itwm.fraunhofer.de/ml Competence Center High Performance Computing Fraunhofer ITWM, Kaiserslautern, Germany Deep Neural Networks
More informationGPU for HPC. October 2010
GPU for HPC Simone Melchionna Jonas Latt Francis Lapique October 2010 EPFL/ EDMX EPFL/EDMX EPFL/DIT simone.melchionna@epfl.ch jonas.latt@epfl.ch francis.lapique@epfl.ch 1 Moore s law: in the old days,
More informationOn Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators
On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators Karl Rupp, Barry Smith rupp@mcs.anl.gov Mathematics and Computer Science Division Argonne National Laboratory FEMTEC
More informationMaster Program (Laurea Magistrale) in Computer Science and Networking. High Performance Computing Systems and Enabling Platforms.
Master Program (Laurea Magistrale) in Computer Science and Networking High Performance Computing Systems and Enabling Platforms Marco Vanneschi Multithreading Contents Main features of explicit multithreading
More informationCUDA GPGPU Workshop 2012
CUDA GPGPU Workshop 2012 Parallel Programming: C thread, Open MP, and Open MPI Presenter: Nasrin Sultana Wichita State University 07/10/2012 Parallel Programming: Open MP, MPI, Open MPI & CUDA Outline
More informationParallel Neural Network Training with OpenCL
Parallel Neural Network Training with OpenCL Nenad Krpan, Domagoj Jakobović Faculty of Electrical Engineering and Computing Unska 3, Zagreb, Croatia Email: nenadkrpan@gmail.com, domagoj.jakobovic@fer.hr
More informationInterval arithmetic on graphics processing units
Interval arithmetic on graphics processing units Sylvain Collange*, Jorge Flórez** and David Defour* RNC'8 July 7 9, 2008 * ELIAUS, Université de Perpignan Via Domitia ** GILab, Universitat de Girona How
More informationPLB-HeC: A Profile-based Load-Balancing Algorithm for Heterogeneous CPU-GPU Clusters
PLB-HeC: A Profile-based Load-Balancing Algorithm for Heterogeneous CPU-GPU Clusters IEEE CLUSTER 2015 Chicago, IL, USA Luis Sant Ana 1, Daniel Cordeiro 2, Raphael Camargo 1 1 Federal University of ABC,
More informationMulticore Hardware and Parallelism
Multicore Hardware and Parallelism Minsoo Ryu Department of Computer Science and Engineering 2 1 Advent of Multicore Hardware 2 Multicore Processors 3 Amdahl s Law 4 Parallelism in Hardware 5 Q & A 2 3
More informationHome Software Hardware Benchmarks Services Store Support Forums About Us
1 of 11 10/16/2018, 9:16 AM Home Software Hardware Benchmarks Services Store Support Forums About Us Home» Video Card Benchmarks» Mid to High Range Video Cards CPU Benchmarks Video Card Benchmarks Hard
More informationHigh performance 2D Discrete Fourier Transform on Heterogeneous Platforms. Shrenik Lad, IIIT Hyderabad Advisor : Dr. Kishore Kothapalli
High performance 2D Discrete Fourier Transform on Heterogeneous Platforms Shrenik Lad, IIIT Hyderabad Advisor : Dr. Kishore Kothapalli Motivation Fourier Transform widely used in Physics, Astronomy, Engineering
More informationVariants of Mersenne Twister Suitable for Graphic Processors
Variants of Mersenne Twister Suitable for Graphic Processors Mutsuo Saito 1, Makoto Matsumoto 2 1 Hiroshima University, 2 University of Tokyo August 16, 2010 This study is granted in part by JSPS Grant-In-Aid
More informationAccelerating String Matching Using Multi-threaded Algorithm
Accelerating String Matching Using Multi-threaded Algorithm on GPU Cheng-Hung Lin*, Sheng-Yu Tsai**, Chen-Hsiung Liu**, Shih-Chieh Chang**, Jyuo-Min Shyu** *National Taiwan Normal University, Taiwan **National
More informationPerformance impact of dynamic parallelism on different clustering algorithms
Performance impact of dynamic parallelism on different clustering algorithms Jeffrey DiMarco and Michela Taufer Computer and Information Sciences, University of Delaware E-mail: jdimarco@udel.edu, taufer@udel.edu
More informationImplementation of the finite-difference method for solving Maxwell`s equations in MATLAB language on a GPU
Implementation of the finite-difference method for solving Maxwell`s equations in MATLAB language on a GPU 1 1 Samara National Research University, Moskovskoe Shosse 34, Samara, Russia, 443086 Abstract.
More informationAn Introduction to GPGPU Pro g ra m m ing - CUDA Arc hitec ture
An Introduction to GPGPU Pro g ra m m ing - CUDA Arc hitec ture Rafia Inam Mälardalen Real-Time Research Centre Mälardalen University, Västerås, Sweden http://www.mrtc.mdh.se rafia.inam@mdh.se CONTENTS
More informationCOMP 605: Introduction to Parallel Computing Lecture : GPU Architecture
COMP 605: Introduction to Parallel Computing Lecture : GPU Architecture Mary Thomas Department of Computer Science Computational Science Research Center (CSRC) San Diego State University (SDSU) Posted:
More informationFirst Swedish Workshop on Multi-Core Computing MCC 2008 Ronneby: On Sorting and Load Balancing on Graphics Processors
First Swedish Workshop on Multi-Core Computing MCC 2008 Ronneby: On Sorting and Load Balancing on Graphics Processors Daniel Cederman and Philippas Tsigas Distributed Computing Systems Chalmers University
More informationGPUs and GPGPUs. Greg Blanton John T. Lubia
GPUs and GPGPUs Greg Blanton John T. Lubia PROCESSOR ARCHITECTURAL ROADMAP Design CPU Optimized for sequential performance ILP increasingly difficult to extract from instruction stream Control hardware
More informationTuning CUDA Applications for Fermi. Version 1.2
Tuning CUDA Applications for Fermi Version 1.2 7/21/2010 Next-Generation CUDA Compute Architecture Fermi is NVIDIA s next-generation CUDA compute architecture. The Fermi whitepaper [1] gives a detailed
More informationG P G P U : H I G H - P E R F O R M A N C E C O M P U T I N G
Joined Advanced Student School (JASS) 2009 March 29 - April 7, 2009 St. Petersburg, Russia G P G P U : H I G H - P E R F O R M A N C E C O M P U T I N G Dmitry Puzyrev St. Petersburg State University Faculty
More informationCUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav
CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav CMPE655 - Multiple Processor Systems Fall 2015 Rochester Institute of Technology Contents What is GPGPU? What s the need? CUDA-Capable GPU Architecture
More informationVector Quantization. A Many-Core Approach
Vector Quantization A Many-Core Approach Rita Silva, Telmo Marques, Jorge Désirat, Patrício Domingues Informatics Engineering Department School of Technology and Management, Polytechnic Institute of Leiria
More informationXLVI Pesquisa Operacional na Gestão da Segurança Pública
PARALLEL CONSTRUCTION FOR CONTINUOUS GRASP OPTIMIZATION ON GPUs Lisieux Marie Marinho dos Santos Andrade Centro de Informática Universidade Federal da Paraíba Campus I, Cidade Universitária 58059-900,
More informationAutoTVM & Device Fleet
AutoTVM & Device Fleet ` Learning to Optimize Tensor Programs Frameworks High-level data flow graph and optimizations Hardware Learning to Optimize Tensor Programs Frameworks High-level data flow graph
More informationSpeed Up Your Codes Using GPU
Speed Up Your Codes Using GPU Wu Di and Yeo Khoon Seng (Department of Mechanical Engineering) The use of Graphics Processing Units (GPU) for rendering is well known, but their power for general parallel
More informationGPU Programming. Lecture 1: Introduction. Miaoqing Huang University of Arkansas 1 / 27
1 / 27 GPU Programming Lecture 1: Introduction Miaoqing Huang University of Arkansas 2 / 27 Outline Course Introduction GPUs as Parallel Computers Trend and Design Philosophies Programming and Execution
More informationUsing Interval Scanning to Improve the Performance of the Enhanced Unidimensional Search Method
Using Interval Scanning to Improve the Performance of the Enhanced Unidimensional Search Method Mahamed G. H. Omran Department of Computer Science Gulf university for Science & Technology, Kuwait Email:
More informationAcceleration of a Python-based Tsunami Modelling Application via CUDA and OpenHMPP
Acceleration of a Python-based Tsunami Modelling Application via CUDA and OpenHMPP Zhe Weng and Peter Strazdins*, Computer Systems Group, Research School of Computer Science, The Australian National University
More informationDistributed Hash Cracker: A Cross-Platform GPU-Accelerated Password Recovery System
Distributed Hash Cracker: A Cross-Platform GPU-Accelerated Password Recovery System Andrew Zonenberg Rensselaer Polytechnic Institute 110 8th Street Troy, New York U.S.A. 12180 zonena@cs.rpi.edu April
More informationWhat is a distributed system?
CS 378 Intro to Distributed Computing Lorenzo Alvisi Harish Rajamani What is a distributed system? A distributed system is one in which the failure of a computer you didn t even know existed can render
More informationParticle Swarm Optimization
Dario Schor, M.Sc., EIT schor@ieee.org Space Systems Department Magellan Aerospace Winnipeg Winnipeg, Manitoba 1 of 34 Optimization Techniques Motivation Optimization: Where, min x F(x), subject to g(x)
More informationFlexible Architecture Research Machine (FARM)
Flexible Architecture Research Machine (FARM) RAMP Retreat June 25, 2009 Jared Casper, Tayo Oguntebi, Sungpack Hong, Nathan Bronson Christos Kozyrakis, Kunle Olukotun Motivation Why CPUs + FPGAs make sense
More informationRecent Advances in Heterogeneous Computing using Charm++
Recent Advances in Heterogeneous Computing using Charm++ Jaemin Choi, Michael Robson Parallel Programming Laboratory University of Illinois Urbana-Champaign April 12, 2018 1 / 24 Heterogeneous Computing
More informationPerformance and Accuracy of Lattice-Boltzmann Kernels on Multi- and Manycore Architectures
Performance and Accuracy of Lattice-Boltzmann Kernels on Multi- and Manycore Architectures Dirk Ribbrock, Markus Geveler, Dominik Göddeke, Stefan Turek Angewandte Mathematik, Technische Universität Dortmund
More informationOn the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters
1 On the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters N. P. Karunadasa & D. N. Ranasinghe University of Colombo School of Computing, Sri Lanka nishantha@opensource.lk, dnr@ucsc.cmb.ac.lk
More informationSerial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing
CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.
More informationPARALLEL SYSTEMS PROJECT
PARALLEL SYSTEMS PROJECT CSC 548 HW6, under guidance of Dr. Frank Mueller Kaustubh Prabhu (ksprabhu) Narayanan Subramanian (nsubram) Ritesh Anand (ranand) Assessing the benefits of CUDA accelerator on
More informationCenter for Computational Science
Center for Computational Science Toward GPU-accelerated meshfree fluids simulation using the fast multipole method Lorena A Barba Boston University Department of Mechanical Engineering with: Felipe Cruz,
More informationThe Art of Parallel Processing
The Art of Parallel Processing Ahmad Siavashi April 2017 The Software Crisis As long as there were no machines, programming was no problem at all; when we had a few weak computers, programming became a
More informationCluster-based 3D Reconstruction of Aerial Video
Cluster-based 3D Reconstruction of Aerial Video Scott Sawyer (scott.sawyer@ll.mit.edu) MIT Lincoln Laboratory HPEC 12 12 September 2012 This work is sponsored by the Assistant Secretary of Defense for
More informationHigh Quality DXT Compression using OpenCL for CUDA. Ignacio Castaño
High Quality DXT Compression using OpenCL for CUDA Ignacio Castaño icastano@nvidia.com March 2009 Document Change History Version Date Responsible Reason for Change 0.1 02/01/2007 Ignacio Castaño First
More informationImproving performances of an embedded RDBMS with a hybrid CPU/GPU processing engine
Improving performances of an embedded RDBMS with a hybrid CPU/GPU processing engine Samuel Cremer 1,2, Michel Bagein 1, Saïd Mahmoudi 1, Pierre Manneback 1 1 UMONS, University of Mons Computer Science
More informationLarge scale Imaging on Current Many- Core Platforms
Large scale Imaging on Current Many- Core Platforms SIAM Conf. on Imaging Science 2012 May 20, 2012 Dr. Harald Köstler Chair for System Simulation Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen,
More informationA Parallel Simulated Annealing Algorithm for Weapon-Target Assignment Problem
A Parallel Simulated Annealing Algorithm for Weapon-Target Assignment Problem Emrullah SONUC Department of Computer Engineering Karabuk University Karabuk, TURKEY Baha SEN Department of Computer Engineering
More informationPortable GPU-Based Artificial Neural Networks For Data-Driven Modeling
City University of New York (CUNY) CUNY Academic Works International Conference on Hydroinformatics 8-1-2014 Portable GPU-Based Artificial Neural Networks For Data-Driven Modeling Zheng Yi Wu Follow this
More informationTesla Architecture, CUDA and Optimization Strategies
Tesla Architecture, CUDA and Optimization Strategies Lan Shi, Li Yi & Liyuan Zhang Hauptseminar: Multicore Architectures and Programming Page 1 Outline Tesla Architecture & CUDA CUDA Programming Optimization
More informationAlgorithm Design (4) Metaheuristics
Algorithm Design (4) Metaheuristics Takashi Chikayama School of Engineering The University of Tokyo Formalization of Constraint Optimization Minimize (or maximize) the objective function f(x 0,, x n )
More informationFast BVH Construction on GPUs
Fast BVH Construction on GPUs Published in EUROGRAGHICS, (2009) C. Lauterbach, M. Garland, S. Sengupta, D. Luebke, D. Manocha University of North Carolina at Chapel Hill NVIDIA University of California
More informationTesting Fine-Grained Parallelism for the ADMM on a Factor-Graph
Testing Fine-Grained Parallelism for the ADMM on a Factor-Graph Nate Derbinsky Wentworth Institute of Technology Ning Hao Oracle AmirReza Oghbaee Northeastern Mohammad Rostami UPenn José Bento Boston College
More informationParallel Computing Ideas
Parallel Computing Ideas K. 1 1 Department of Mathematics 2018 Why When to go for speed Historically: Production code Code takes a long time to run Code runs many times Code is not end in itself 2010:
More informationPresenting: Comparing the Power and Performance of Intel's SCC to State-of-the-Art CPUs and GPUs
Presenting: Comparing the Power and Performance of Intel's SCC to State-of-the-Art CPUs and GPUs A paper comparing modern architectures Joakim Skarding Christian Chavez Motivation Continue scaling of performance
More informationLecture 5. Performance programming for stencil methods Vectorization Computing with GPUs
Lecture 5 Performance programming for stencil methods Vectorization Computing with GPUs Announcements Forge accounts: set up ssh public key, tcsh Turnin was enabled for Programming Lab #1: due at 9pm today,
More informationAdvanced CUDA Optimization 1. Introduction
Advanced CUDA Optimization 1. Introduction Thomas Bradley Agenda CUDA Review Review of CUDA Architecture Programming & Memory Models Programming Environment Execution Performance Optimization Guidelines
More informationComputing and energy performance
Equipe I M S Equipe Projet INRIA AlGorille Computing and energy performance optimization i i of a multi algorithms li l i PDE solver on CPU and GPU clusters Stéphane Vialle, Sylvain Contassot Vivier, Thomas
More informationCSE 160 Lecture 24. Graphical Processing Units
CSE 160 Lecture 24 Graphical Processing Units Announcements Next week we meet in 1202 on Monday 3/11 only On Weds 3/13 we have a 2 hour session Usual class time at the Rady school final exam review SDSC
More informationCUDA Architecture & Programming Model
CUDA Architecture & Programming Model Course on Multi-core Architectures & Programming Oliver Taubmann May 9, 2012 Outline Introduction Architecture Generation Fermi A Brief Look Back At Tesla What s New
More informationhigh performance medical reconstruction using stream programming paradigms
high performance medical reconstruction using stream programming paradigms This Paper describes the implementation and results of CT reconstruction using Filtered Back Projection on various stream programming
More informationAn Extension of the StarSs Programming Model for Platforms with Multiple GPUs
An Extension of the StarSs Programming Model for Platforms with Multiple GPUs Eduard Ayguadé 2 Rosa M. Badia 2 Francisco Igual 1 Jesús Labarta 2 Rafael Mayo 1 Enrique S. Quintana-Ortí 1 1 Departamento
More informationParallel and Distributed Computing
Parallel and Distributed Computing NUMA; OpenCL; MapReduce José Monteiro MSc in Information Systems and Computer Engineering DEA in Computational Engineering Department of Computer Science and Engineering
More information