Exploring the features of OpenCL 2.0
|
|
- Mavis Barton
- 5 years ago
- Views:
Transcription
1 Exploring the features of OpenCL 2.0 Saoni Mukherjee, Xiang Gong, Leiming Yu, Carter McCardwell, Yash Ukidave, Tuan Dao, Fanny Paravecino, David Kaeli Northeastern University
2 Outline Introduction and evolution of OpenCL OpenCL 2.0- new features Applications used to explore these features Result and analysis
3 OpenCL Programming and runtime framework Executes applications across heterogeneous platforms First version, OpenCL 1.0 was released in 2009 OpenCL 1.0: Basic programming model OpenCL 1.1/1.2: Memory management & fine grain control OpenCL 2.0: Support for emerging hardware capabilities & improved programmability
4 Features Shared Virtual Memory Dynamic Parallelism Generic Address Space Image Support Android Installable Client Driver Extension
5 Features Shared Virtual Memory Dynamic Parallelism Generic Address Space Image Support Android Installable Client Driver Extension
6 Bigger picture Goal: A benchmark and micro benchmark suite with OpenCL 2.0 applications Features that are interesting in HSA and OpenCL 2.0 HPC Mobile/Embedded Big Data 1. Spectral Clustering 2. Connected Component Labeling 3. Graph-based Segmentation 4. Periodic Greens Function 5. Feature Selection and Outlier Detection 6. 2-D Finite Difference Time-Domain 1. N-channel IIR Filtering 2. Multi-channel Noise Filter using FIR Filtering 3. Speech Recognition using Hidden Markov Models 4. AES encryption/ decryption 5. Convolution Neural Network 6. Shallow Water Simulation 7. Color Histogramming 1. Rating System using MapReduce 2. K-means Clustering 3. Page Rank 4. Bayesian Estimation for Adaptive Spam Filtering/Learning 5. Gene Sequencing
7 Bigger picture Goal: A benchmark and micro benchmark suite with OpenCL 2.0 applications Features that are interesting in HSA and OpenCL 2.0 HPC Mobile/Embedded Big Data 1. Spectral Clustering 2. Connected Component Labeling 3. Graph-based Segmentation 4. Periodic Greens Function 5. Feature Selection and Outlier Detection 6. 2-D Finite Difference Time-Domain 1. N-channel IIR Filtering 2. Multi-channel Noise Filter using FIR Filtering 3. Speech Recognition using Hidden Markov Models 4. AES encryption/ decryption 5. Convolution Neural Network 6. Shallow Water Simulation 7. Color Histogramming 1. Rating System using MapReduce 2. K-means Clustering 3. Page Rank 4. Bayesian Estimation for Adaptive Spam Filtering/Learning 5. Gene Sequencing
8 Exploring the benefits of OpenCL 2.0 CyberSecurity: The Advanced Encryption Standard (AES) Adopted by US government for encryption Input as plain text with 256 bit key produces cipher text Blocks running concurrently Our results show that key expansion is faster on CPU than GPU 14 rounds of AES-256 are performed on GPU
9 Exploring the benefits of OpenCL 2.0 Signal Processing: Finite Impulse Response Filtering Impulse Response of finite duration Input: x[1 n] and b[1 N] output: f[x] Number of taps: N = 1024 Synthesized audio stream input Uses weighted reduction - very common parallel operation
10 Exploring the benefits of OpenCL 2.0 Signal Processing: Infinite Impulse Response Filter Less processing power than FIR for same design Decomposed into multiple parallel 2nd-order (real and complex) IIR for performance N 1 number of real poles N 2 number of complex poles Number of channels = 64 FIR coefficient: c 0 = 3.0 Synthesized audio stream input
11 Exploring the benefits of OpenCL 2.0 Statistical Modeling: Hidden Markov Models Probabilistic meaning of hidden states without prior knowledge Targeting isolated word recognition Matrix form used for coalescing and computational efficiency Uses operations including Matrix multiplication Matrix vector Parallel reduction Uses data & thread level parallelism
12 Ongoing OpenCL 2.0 Evaluation Baseline: OpenCL 1.2 GPU model: AMD Radeon R9 290x (reported in paper) Current use: AMD A K Radeon R7, Kaveri APU GPU Architecture: Compute Cores: 12 (4 CPU & 8 GPU) Global Memory: 512 MB Max Clock frequency: 720 MHz GPU Driver: (VM)
13 AES Results Execution time (sec) OpenCL1.2 OpenCL2.0 Optimizations explored: ü SVM X Dynamic Parallelism Unencrypted file size (MB) Input files contain excerpts of a book Input sizes are varied from 1MB to 1,000MB with constant 256 bit key Small benefits from SVM, which grow with input file size Child kernel is memory intensive, inhibiting dynamic parallelism
14 FIR Results Optimizations explored: ü SVM Execution time (sec) OpenCL1.2 OpenCL2.0 Block size FIR is a streaming application with different block sizes Results show that same kernel runs faster in OpenCL 2.0 Consistent benefits from SVM, which grow with input block size
15 IIR Results Optimizations explored: ü SVM X Workgroup function Execution time (msec) OpenCL1.2 OpenCL Block size Interesting feature - parallel reduction Workgroup function is useful for reduction, but did not work well
16 Exploring Workgroup Function further in IIR 0.4 Execution time (sec) HSA 1.0 final + OpenCL1.2 HSA 1.0 final + OpenCL Block size Workgroup function is useful for reduction, but did not work well in OpenCL 2.0 It works better in HSAIL on HSA, but not as good as reduction
17 Hidden Markov Model Results Optimizations explored: ü SVM ü Dynamic Parallelism Execution time (sec) OpenCL1.2 OpenCL2.0 Number of hidden states Updating expected values for each hidden state is an independent operation - perfect for Dynamic Parallelism!
18 K-means Results Data Mining: K-means algorithm Optimizations explored: ü SVM Execution time (sec) Execution time (sec) OpenCL 1.2 OpenCL 2.0 Well known clustering algorithm. K-means with different number of objects, 34 features, 5 clusters Input file contains features and attributes Consistent benefits from SVM Number of objects Number of objects
19 Shallow Water Simulation Results Physics simulation: Shallow Water Engine Optimizations explored: ü SVM Execution time (sec) OpenCL 1.2 OpenCL Number of objects Number of objects Depicts complex behavior of fluids, wave modeling for interactive systems Predicts matters of practical interest, e.g. internal tides in strait of Gibraltar Mathematically and computationally intense, so expensive to do real-time
20 Summary OpenCL 2.0 introduced new features We have explored the benefits of using them with some benchmarks from a variety of domains SVM provides consistent benefits Exploring issues with utilizing the work-group function The benchmark suite will be released Summer 2015
21 Northeastern University Architecture (NUCAR) Group
22 THANK YOU TO OUR SPONSORS
RUNTIME SUPPORT FOR ADAPTIVE SPATIAL PARTITIONING AND INTER-KERNEL COMMUNICATION ON GPUS
RUNTIME SUPPORT FOR ADAPTIVE SPATIAL PARTITIONING AND INTER-KERNEL COMMUNICATION ON GPUS Yash Ukidave, Perhaad Mistry, Charu Kalra, Dana Schaa and David Kaeli Department of Electrical and Computer Engineering
More informationCUDA and OpenCL Implementations of 3D CT Reconstruction for Biomedical Imaging
CUDA and OpenCL Implementations of 3D CT Reconstruction for Biomedical Imaging Saoni Mukherjee, Nicholas Moore, James Brock and Miriam Leeser September 12, 2012 1 Outline Introduction to CT Scan, 3D reconstruction
More informationTHE HETEROGENEOUS SYSTEM ARCHITECTURE IT S BEYOND THE GPU
THE HETEROGENEOUS SYSTEM ARCHITECTURE IT S BEYOND THE GPU PAUL BLINZER AMD INC, FELLOW, SYSTEM SOFTWARE SYSTEM ARCHITECTURE WORKGROUP CHAIR HSA FOUNDATION THE HSA VISION MAKE HETEROGENEOUS PROGRAMMING
More informationVisualization of OpenCL Application Execution on CPU-GPU Systems
Visualization of OpenCL Application Execution on CPU-GPU Systems A. Ziabari*, R. Ubal*, D. Schaa**, D. Kaeli* *NUCAR Group, Northeastern Universiy **AMD Northeastern University Computer Architecture Research
More informationA Flexible IIR Filtering Implementation for Audio Processing Juergen Schmidt, Technicolor R&I, Hannover
A Flexible IIR Filtering Implementation for Audio Processing Juergen Schmidt, Technicolor R&I, Hannover Motivation 3D audio 2 Motivation - Loudspeaker Equalization 3 Outline Infinite Impulse Response (IIR)
More informationMulti2sim Kepler: A Detailed Architectural GPU Simulator
Multi2sim Kepler: A Detailed Architectural GPU Simulator Xun Gong, Rafael Ubal, David Kaeli Northeastern University Computer Architecture Research Lab Department of Electrical and Computer Engineering
More informationAccelerated Machine Learning Algorithms in Python
Accelerated Machine Learning Algorithms in Python Patrick Reilly, Leiming Yu, David Kaeli reilly.pa@husky.neu.edu Northeastern University Computer Architecture Research Lab Outline Motivation and Goals
More informationThe Rise of Open Programming Frameworks. JC BARATAULT IWOCL May 2015
The Rise of Open Programming Frameworks JC BARATAULT IWOCL May 2015 1,000+ OpenCL projects SourceForge GitHub Google Code BitBucket 2 TUM.3D Virtual Wind Tunnel 10K C++ lines of code, 30 GPU kernels CUDA
More informationad-heap: an Efficient Heap Data Structure for Asymmetric Multicore Processors
ad-heap: an Efficient Heap Data Structure for Asymmetric Multicore Processors Weifeng Liu and Brian Vinter Niels Bohr Institute University of Copenhagen Denmark {weifeng, vinter}@nbi.dk March 1, 2014 Weifeng
More informationVirtual EM Inc. Ann Arbor, Michigan, USA
Functional Description of the Architecture of a Special Purpose Processor for Orders of Magnitude Reduction in Run Time in Computational Electromagnetics Tayfun Özdemir Virtual EM Inc. Ann Arbor, Michigan,
More informationOn Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators
On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators Karl Rupp, Barry Smith rupp@mcs.anl.gov Mathematics and Computer Science Division Argonne National Laboratory FEMTEC
More informationGeneral Purpose GPU Programming. Advanced Operating Systems Tutorial 9
General Purpose GPU Programming Advanced Operating Systems Tutorial 9 Tutorial Outline Review of lectured material Key points Discussion OpenCL Future directions 2 Review of Lectured Material Heterogeneous
More informationIntroduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono
Introduction to CUDA Algoritmi e Calcolo Parallelo References q This set of slides is mainly based on: " CUDA Technical Training, Dr. Antonino Tumeo, Pacific Northwest National Laboratory " Slide of Applied
More informationDCBench: a Data Center Benchmark Suite
DCBench: a Data Center Benchmark Suite Zhen Jia ( 贾禛 ) http://prof.ict.ac.cn/zhenjia/ Institute of Computing Technology, Chinese Academy of Sciences workshop in conjunction with CCF October 31,2013,Guilin
More informationSDA: Software-Defined Accelerator for Large- Scale DNN Systems
SDA: Software-Defined Accelerator for Large- Scale DNN Systems Jian Ouyang, 1 Shiding Lin, 1 Wei Qi, Yong Wang, Bo Yu, Song Jiang, 2 1 Baidu, Inc. 2 Wayne State University Introduction of Baidu A dominant
More informationFast Multipole Method on the GPU
Fast Multipole Method on the GPU with application to the Adaptive Vortex Method University of Bristol, Bristol, United Kingdom. 1 Introduction Particle methods Highly parallel Computational intensive Numerical
More informationGeneral Purpose GPU Programming. Advanced Operating Systems Tutorial 7
General Purpose GPU Programming Advanced Operating Systems Tutorial 7 Tutorial Outline Review of lectured material Key points Discussion OpenCL Future directions 2 Review of Lectured Material Heterogeneous
More informationSDA: Software-Defined Accelerator for Large- Scale DNN Systems
SDA: Software-Defined Accelerator for Large- Scale DNN Systems Jian Ouyang, 1 Shiding Lin, 1 Wei Qi, 1 Yong Wang, 1 Bo Yu, 1 Song Jiang, 2 1 Baidu, Inc. 2 Wayne State University Introduction of Baidu A
More informationHSA Foundation! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar Room (Bld 20)! 15 December, 2017!
Advanced Topics on Heterogeneous System Architectures HSA Foundation! Politecnico di Milano! Seminar Room (Bld 20)! 15 December, 2017! Antonio R. Miele! Marco D. Santambrogio! Politecnico di Milano! 2
More informationA Code Merging Optimization Technique for GPU. Ryan Taylor Xiaoming Li University of Delaware
A Code Merging Optimization Technique for GPU Ryan Taylor Xiaoming Li University of Delaware FREE RIDE MAIN FINDING A GPU program can use the spare resources of another GPU program without hurting its
More informationApplications of Berkeley s Dwarfs on Nvidia GPUs
Applications of Berkeley s Dwarfs on Nvidia GPUs Seminar: Topics in High-Performance and Scientific Computing Team N2: Yang Zhang, Haiqing Wang 05.02.2015 Overview CUDA The Dwarfs Dynamic Programming Sparse
More informationvs. GPU Performance Without the Answer University of Virginia Computer Engineering g Labs
Where is the Data? Why you Cannot Debate CPU vs. GPU Performance Without the Answer Chris Gregg and Kim Hazelwood University of Virginia Computer Engineering g Labs 1 GPUs and Data Transfer GPU computing
More informationExploiting the OpenPOWER Platform for Big Data Analytics and Cognitive. Rajesh Bordawekar and Ruchir Puri IBM T. J. Watson Research Center
Exploiting the OpenPOWER Platform for Big Data Analytics and Cognitive Rajesh Bordawekar and Ruchir Puri IBM T. J. Watson Research Center 3/17/2015 2014 IBM Corporation Outline IBM OpenPower Platform Accelerating
More informationRecent Advances in Heterogeneous Computing using Charm++
Recent Advances in Heterogeneous Computing using Charm++ Jaemin Choi, Michael Robson Parallel Programming Laboratory University of Illinois Urbana-Champaign April 12, 2018 1 / 24 Heterogeneous Computing
More informationAutomatic Identification of Application I/O Signatures from Noisy Server-Side Traces. Yang Liu Raghul Gunasekaran Xiaosong Ma Sudharshan S.
Automatic Identification of Application I/O Signatures from Noisy Server-Side Traces Yang Liu Raghul Gunasekaran Xiaosong Ma Sudharshan S. Vazhkudai Instance of Large-Scale HPC Systems ORNL s TITAN (World
More informationIntroduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono
Introduction to CUDA Algoritmi e Calcolo Parallelo References This set of slides is mainly based on: CUDA Technical Training, Dr. Antonino Tumeo, Pacific Northwest National Laboratory Slide of Applied
More informationINTERFERENCE FROM GPU SYSTEM SERVICE REQUESTS
INTERFERENCE FROM GPU SYSTEM SERVICE REQUESTS ARKAPRAVA BASU, JOSEPH L. GREATHOUSE, GURU VENKATARAMANI, JÁN VESELÝ AMD RESEARCH, ADVANCED MICRO DEVICES, INC. MODERN SYSTEMS ARE POWERED BY HETEROGENEITY
More informationA Framework for Visualization of OpenCL Applications Execution
A Framework for Visualization of OpenCL Applications Execution A. Ziabari*, R. Ubal*, D. Schaa**, D. Kaeli* *NUCAR Group, Northeastern Universiy **AMD Conference title 1 Outline Introduction Simulation
More informationImplementing a Speech Recognition System on a GPU using CUDA. Presented by Omid Talakoub Astrid Yi
Implementing a Speech Recognition System on a GPU using CUDA Presented by Omid Talakoub Astrid Yi Outline Background Motivation Speech recognition algorithm Implementation steps GPU implementation strategies
More informationENHANCED PARALLEL SOM BASED ON HETEROGENEOUS SYSTEM PLATFORM
How to cite this paper: Muhammad Firdaus Mustapha, Noor Elaiza Abd Khalid, Mazani Manaf, & Azlan Ismail. (2017). Enhanced parallel SOM based on heterogeneous system platform in Zulikha, J. & N. H. Zakaria
More informationData Partitioning on Heterogeneous Multicore and Multi-GPU systems Using Functional Performance Models of Data-Parallel Applictions
Data Partitioning on Heterogeneous Multicore and Multi-GPU systems Using Functional Performance Models of Data-Parallel Applictions Ziming Zhong Vladimir Rychkov Alexey Lastovetsky Heterogeneous Computing
More informationGPU Programming for Mathematical and Scientific Computing
GPU Programming for Mathematical and Scientific Computing Ethan Kerzner and Timothy Urness Department of Mathematics and Computer Science Drake University Des Moines, IA 50311 ethan.kerzner@gmail.com timothy.urness@drake.edu
More informationIntroduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620
Introduction to Parallel and Distributed Computing Linh B. Ngo CPSC 3620 Overview: What is Parallel Computing To be run using multiple processors A problem is broken into discrete parts that can be solved
More informationLinux Kernel Driver Support to Heterogeneous System Architecture
to Heterogeneous System Architecture 1 2 E-mail: zhangwenbo@bjut.edu.cn Chong Chen Fei Liu Zhenshan Bao E-mail: baozhenshan@bjut.edu.cn Jianli Liu E-mail: liujianl@bjut.edu.cn With the development of CPU-GPU
More informationUsing GPUs to compute the multilevel summation of electrostatic forces
Using GPUs to compute the multilevel summation of electrostatic forces David J. Hardy Theoretical and Computational Biophysics Group Beckman Institute for Advanced Science and Technology University of
More informationEvaluating the Potential of Graphics Processors for High Performance Embedded Computing
Evaluating the Potential of Graphics Processors for High Performance Embedded Computing Shuai Mu, Chenxi Wang, Ming Liu, Yangdong Deng Department of Micro-/Nano-electronics Tsinghua University Outline
More informationPerformance impact of dynamic parallelism on different clustering algorithms
Performance impact of dynamic parallelism on different clustering algorithms Jeffrey DiMarco and Michela Taufer Computer and Information Sciences, University of Delaware E-mail: jdimarco@udel.edu, taufer@udel.edu
More informationHSA foundation! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar Room A. Alario! 23 November, 2015!
Advanced Topics on Heterogeneous System Architectures HSA foundation! Politecnico di Milano! Seminar Room A. Alario! 23 November, 2015! Antonio R. Miele! Marco D. Santambrogio! Politecnico di Milano! 2
More informationImproving GPU Performance through Instruction Redistribution and Diversification
Improving GPU Performance through Instruction Redistribution and Diversification A Dissertation Presented by Xiang Gong to The Department of Electrical and Computer Engineering in partial fulfillment of
More informationARCHITECTURAL SUPPORT FOR IRREGULAR PROGRAMS AND PERFORMANCE MONITORING FOR HETEROGENEOUS SYSTEMS
ARCHITECTURAL SUPPORT FOR IRREGULAR PROGRAMS AND PERFORMANCE MONITORING FOR HETEROGENEOUS SYSTEMS A Thesis Presented by Perhaad Mistry to The Department of Electrical and Computer Engineering in partial
More informationSIMULATOR AMD RESEARCH JUNE 14, 2015
AMD'S gem5apu SIMULATOR AMD RESEARCH JUNE 14, 2015 OVERVIEW Introducing AMD s gem5 APU Simulator Extends gem5 with a GPU timing model Supports Heterogeneous System Architecture in SE mode Includes several
More informationCompiling for HSA accelerators with GCC
Compiling for HSA accelerators with GCC Martin Jambor SUSE Labs 8th August 2015 Outline HSA branch: svn://gcc.gnu.org/svn/gcc/branches/hsa Table of contents: Very Brief Overview of HSA Generating HSAIL
More informationFAST FIR FILTERS FOR SIMD PROCESSORS WITH LIMITED MEMORY BANDWIDTH
Key words: Digital Signal Processing, FIR filters, SIMD processors, AltiVec. Grzegorz KRASZEWSKI Białystok Technical University Department of Electrical Engineering Wiejska
More informationGPU ACCELERATED DATABASE MANAGEMENT SYSTEMS
CIS 601 - Graduate Seminar Presentation 1 GPU ACCELERATED DATABASE MANAGEMENT SYSTEMS PRESENTED BY HARINATH AMASA CSU ID: 2697292 What we will talk about.. Current problems GPU What are GPU Databases GPU
More informationCHAPTER 1 INTRODUCTION
1 CHAPTER 1 INTRODUCTION 1.1 Advance Encryption Standard (AES) Rijndael algorithm is symmetric block cipher that can process data blocks of 128 bits, using cipher keys with lengths of 128, 192, and 256
More informationAsynchronous OpenCL/MPI numerical simulations of conservation laws
Asynchronous OpenCL/MPI numerical simulations of conservation laws Philippe HELLUY 1,3, Thomas STRUB 2. 1 IRMA, Université de Strasbourg, 2 AxesSim, 3 Inria Tonus, France IWOCL 2015, Stanford Conservation
More informationArchitectural and Runtime Enhancements for Dynamically Controlled Multi-Level Concurrency on GPUs
Architectural and Runtime Enhancements for Dynamically Controlled Multi-Level Concurrency on GPUs A Dissertation Presented by Yash Ukidave to The Department of Electrical and Computer Engineering in partial
More informationUse cases. Faces tagging in photo and video, enabling: sharing media editing automatic media mashuping entertaining Augmented reality Games
Viewdle Inc. 1 Use cases Faces tagging in photo and video, enabling: sharing media editing automatic media mashuping entertaining Augmented reality Games 2 Why OpenCL matter? OpenCL is going to bring such
More informationAuto-Tuning Strategies for Parallelizing Sparse Matrix-Vector (SpMV) Multiplication on Multi- and Many-Core Processors
Auto-Tuning Strategies for Parallelizing Sparse Matrix-Vector (SpMV) Multiplication on Multi- and Many-Core Processors Kaixi Hou, Wu-chun Feng {kaixihou, wfeng}@vt.edu Shuai Che Shuai.Che@amd.com Sparse
More informationIntroduction to GPU computing
Introduction to GPU computing Nagasaki Advanced Computing Center Nagasaki, Japan The GPU evolution The Graphic Processing Unit (GPU) is a processor that was specialized for processing graphics. The GPU
More informationS2CBench : Synthesizable SystemC Benchmark Suite for High-Level Synthesis
S2CBench : Synthesizable SystemC Benchmark Suite for High-Level Synthesis Benjamin Carrion Schafer 1, Ansuhree Mahapatra 2 The Hong Kong Polytechnic University Department of Electronic and Information
More informationExploiting CUDA Dynamic Parallelism for low power ARM based prototypes
www.bsc.es Exploiting CUDA Dynamic Parallelism for low power ARM based prototypes Vishal Mehta Engineer, Barcelona Supercomputing Center vishal.mehta@bsc.es BSC/UPC CUDA Centre of Excellence (CCOE) Training
More informationADAPTING A SDR ENVIRONMENT TO GPU ARCHITECTURES
Proceedings of SDR'11-WInnComm-Europe, 22-24 Jun 211 ADAPTIG A SDR EVIROMET TO GPU ARCHITECTURES Pierre-Henri Horrein (CEA, Leti, Minatec, Grenoble, France; pierre-henri.horrein@cea.fr); Christine Hennebert
More informationOpenCL Implementation Of A Heterogeneous Computing System For Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data
OpenCL Implementation Of A Heterogeneous Computing System For Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data Andrew Miller Computer Vision Group Research Developer 3-D TERRAIN RECONSTRUCTION
More informationAccelerating MapReduce on a Coupled CPU-GPU Architecture
Accelerating MapReduce on a Coupled CPU-GPU Architecture Linchuan Chen Xin Huo Gagan Agrawal Department of Computer Science and Engineering The Ohio State University Columbus, OH 43210 {chenlinc,huox,agrawal}@cse.ohio-state.edu
More informationMachine Learning in WAN Research
Machine Learning in WAN Research Mariam Kiran mkiran@es.net Energy Sciences Network (ESnet) Lawrence Berkeley National Lab Oct 2017 Presented at Internet2 TechEx 2017 Outline ML in general ML in network
More informationUsing Graphics Chips for General Purpose Computation
White Paper Using Graphics Chips for General Purpose Computation Document Version 0.1 May 12, 2010 442 Northlake Blvd. Altamonte Springs, FL 32701 (407) 262-7100 TABLE OF CONTENTS 1. INTRODUCTION....1
More informationBig Data Systems on Future Hardware. Bingsheng He NUS Computing
Big Data Systems on Future Hardware Bingsheng He NUS Computing http://www.comp.nus.edu.sg/~hebs/ 1 Outline Challenges for Big Data Systems Why Hardware Matters? Open Challenges Summary 2 3 ANYs in Big
More informationDeep Learning. Volker Tresp Summer 2014
Deep Learning Volker Tresp Summer 2014 1 Neural Network Winter and Revival While Machine Learning was flourishing, there was a Neural Network winter (late 1990 s until late 2000 s) Around 2010 there
More informationStorage I/O Summary. Lecture 16: Multimedia and DSP Architectures
Storage I/O Summary Storage devices Storage I/O Performance Measures» Throughput» Response time I/O Benchmarks» Scaling to track technological change» Throughput with restricted response time is normal
More informationEvaluating MMX Technology Using DSP and Multimedia Applications
Evaluating MMX Technology Using DSP and Multimedia Applications Ravi Bhargava * Lizy K. John * Brian L. Evans Ramesh Radhakrishnan * November 22, 1999 The University of Texas at Austin Department of Electrical
More informationData Mining. Jeff M. Phillips. January 7, 2019 CS 5140 / CS 6140
Data Mining CS 5140 / CS 6140 Jeff M. Phillips January 7, 2019 What is Data Mining? What is Data Mining? Finding structure in data? Machine learning on large data? Unsupervised learning? Large scale computational
More informationPartial Wave Analysis using Graphics Cards
Partial Wave Analysis using Graphics Cards Niklaus Berger IHEP Beijing Hadron 2011, München The (computational) problem with partial wave analysis n rec * * i=1 * 1 Ngen MC NMC * i=1 A complex calculation
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 22, 2016 Course Information Website: http://www.stat.ucdavis.edu/~chohsieh/teaching/ ECS289G_Fall2016/main.html My office: Mathematical Sciences
More informationHeterogeneous Architecture. Luca Benini
Heterogeneous Architecture Luca Benini lbenini@iis.ee.ethz.ch Intel s Broadwell 03.05.2016 2 Qualcomm s Snapdragon 810 03.05.2016 3 AMD Bristol Ridge Departement Informationstechnologie und Elektrotechnik
More informationGPGPU on ARM. Tom Gall, Gil Pitney, 30 th Oct 2013
GPGPU on ARM Tom Gall, Gil Pitney, 30 th Oct 2013 Session Description This session will discuss the current state of the art of GPGPU technologies on ARM SoC systems. What standards are there? Where are
More informationPerformance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference
The 2017 IEEE International Symposium on Workload Characterization Performance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference Shin-Ying Lee
More informationHigh Performance Computing
The Need for Parallelism High Performance Computing David McCaughan, HPC Analyst SHARCNET, University of Guelph dbm@sharcnet.ca Scientific investigation traditionally takes two forms theoretical empirical
More informationAmortised Optimisation as a Means to Achieve Genetic Improvement
Amortised Optimisation as a Means to Achieve Genetic Improvement Hyeongjun Cho, Sungwon Cho, Seongmin Lee, Jeongju Sohn, and Shin Yoo Date 2017.01.30, The 50th CREST Open Workshop Offline Improvement Expensive
More informationExploring System Coherency and Maximizing Performance of Mobile Memory Systems
Exploring System Coherency and Maximizing Performance of Mobile Memory Systems Shanghai: William Orme, Strategic Marketing Manager of SSG Beijing & Shenzhen: Mayank Sharma, Product Manager of SSG ARM Tech
More information3.5 Filtering with the 2D Fourier Transform Basic Low Pass and High Pass Filtering using 2D DFT Other Low Pass Filters
Contents Part I Decomposition and Recovery. Images 1 Filter Banks... 3 1.1 Introduction... 3 1.2 Filter Banks and Multirate Systems... 4 1.2.1 Discrete Fourier Transforms... 5 1.2.2 Modulated Filter Banks...
More informationHybrid KAUST Many Cores and OpenACC. Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS
+ Hybrid Computing @ KAUST Many Cores and OpenACC Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS + Agenda Hybrid Computing n Hybrid Computing n From Multi-Physics
More informationLarge-scale Video Classification with Convolutional Neural Networks
Large-scale Video Classification with Convolutional Neural Networks Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, Li Fei-Fei Note: Slide content mostly from : Bay Area
More informationRevisiting the Past 25 Years: Lessons for the Future. Guri Sohi University of Wisconsin-Madison
Revisiting the Past 25 Years: Lessons for the Future Guri Sohi University of Wisconsin-Madison Outline VLIW OOO Superscalar Enhancing Superscalar And the future 2 Beyond pipelining to ILP Late 1980s to
More informationHeterogeneous Computing
Heterogeneous Computing Featured Speaker Ben Sander Senior Fellow Advanced Micro Devices (AMD) DR. DOBB S: GPU AND CPU PROGRAMMING WITH HETEROGENEOUS SYSTEM ARCHITECTURE Ben Sander AMD Senior Fellow APU:
More informationEnable AI on Mobile Devices
Enable AI on Mobile Devices Scott Wang 王舒翀 Senior Segment Manager Mobile, BSG ARM Tech Forum 2017 14 th June 2017, Shenzhen AI is moving from core to edge Ubiquitous AI Safe and autonomous Mixed reality
More informationEvaluating the Effectiveness of Model Based Power Characterization
Evaluating the Effectiveness of Model Based Power Characterization John McCullough, Yuvraj Agarwal, Jaideep Chandrashekhar (Intel), Sathya Kuppuswamy, Alex C. Snoeren, Rajesh Gupta Computer Science and
More informationHandout 3. HSAIL and A SIMT GPU Simulator
Handout 3 HSAIL and A SIMT GPU Simulator 1 Outline Heterogeneous System Introduction of HSA Intermediate Language (HSAIL) A SIMT GPU Simulator Summary 2 Heterogeneous System CPU & GPU CPU GPU CPU wants
More informationHigh Performance Computing on GPUs using NVIDIA CUDA
High Performance Computing on GPUs using NVIDIA CUDA Slides include some material from GPGPU tutorial at SIGGRAPH2007: http://www.gpgpu.org/s2007 1 Outline Motivation Stream programming Simplified HW and
More informationData Mining Practical Machine Learning Tools and Techniques. Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank Implementation: Real machine learning schemes Decision trees Classification
More informationFarewell to Servers: Hardware, Software, and Network Approaches towards Datacenter Resource Disaggregation
Farewell to Servers: Hardware, Software, and Network Approaches towards Datacenter Resource Disaggregation Yiying Zhang Datacenter 3 Monolithic Computer OS / Hypervisor 4 Can monolithic Application Hardware
More informationDirected Optimization On Stencil-based Computational Fluid Dynamics Application(s)
Directed Optimization On Stencil-based Computational Fluid Dynamics Application(s) Islam Harb 08/21/2015 Agenda Motivation Research Challenges Contributions & Approach Results Conclusion Future Work 2
More informationBIOMEDICAL DATA ANALYSIS ON HETEROGENEOUS PLATFORM. Dong Ping Zhang Heterogeneous System Architecture AMD
BIOMEDICAL DATA ANALYSIS ON HETEROGENEOUS PLATFORM Dong Ping Zhang Heterogeneous System Architecture AMD VASCULATURE ENHANCEMENT 3 Biomedical data analysis on heterogeneous platform June, 2012 EXAMPLE:
More informationEfficient Hardware Acceleration on SoC- FPGA using OpenCL
Efficient Hardware Acceleration on SoC- FPGA using OpenCL Advisor : Dr. Benjamin Carrion Schafer Susmitha Gogineni 30 th August 17 Presentation Overview 1.Objective & Motivation 2.Configurable SoC -FPGA
More informationBringing Intelligence to Enterprise Storage Drives
Bringing Intelligence to Enterprise Storage Drives Neil Werdmuller Director Storage Solutions Arm Santa Clara, CA 1 Who am I? 28 years experience in embedded Lead the storage solutions team Work closely
More informationCMPE 665:Multiple Processor Systems CUDA-AWARE MPI VIGNESH GOVINDARAJULU KOTHANDAPANI RANJITH MURUGESAN
CMPE 665:Multiple Processor Systems CUDA-AWARE MPI VIGNESH GOVINDARAJULU KOTHANDAPANI RANJITH MURUGESAN Graphics Processing Unit Accelerate the creation of images in a frame buffer intended for the output
More informationPhylogenetics on CUDA (Parallel) Architectures Bradly Alicea
Descent w/modification Descent w/modification Descent w/modification Descent w/modification CPU Descent w/modification Descent w/modification Phylogenetics on CUDA (Parallel) Architectures Bradly Alicea
More informationParallel FFT Program Optimizations on Heterogeneous Computers
Parallel FFT Program Optimizations on Heterogeneous Computers Shuo Chen, Xiaoming Li Department of Electrical and Computer Engineering University of Delaware, Newark, DE 19716 Outline Part I: A Hybrid
More informationIntegrating CPU and GPU, The ARM Methodology. Edvard Sørgård, Senior Principal Graphics Architect, ARM Ian Rickards, Senior Product Manager, ARM
Integrating CPU and GPU, The ARM Methodology Edvard Sørgård, Senior Principal Graphics Architect, ARM Ian Rickards, Senior Product Manager, ARM The ARM Business Model Global leader in the development of
More informationEXPLOITING ACCELERATOR-BASED HPC FOR ARMY APPLICATIONS
EXPLOITING ACCELERATOR-BASED HPC FOR ARMY APPLICATIONS James Ross High Performance Technologies, Inc (HPTi) Computational Scientist Edward Carmack David Richie Song Park, Brian Henz and Dale Shires HPTi
More informationImplementation of the finite-difference method for solving Maxwell`s equations in MATLAB language on a GPU
Implementation of the finite-difference method for solving Maxwell`s equations in MATLAB language on a GPU 1 1 Samara National Research University, Moskovskoe Shosse 34, Samara, Russia, 443086 Abstract.
More informationFinite Element Integration and Assembly on Modern Multi and Many-core Processors
Finite Element Integration and Assembly on Modern Multi and Many-core Processors Krzysztof Banaś, Jan Bielański, Kazimierz Chłoń AGH University of Science and Technology, Mickiewicza 30, 30-059 Kraków,
More informationPower Attack Defense: Securing Battery-Backed Data Centers
Power Attack Defense: Securing Battery-Backed Data Centers Presented by Chao Li, PhD Shanghai Jiao Tong University 2016.06.21, Seoul, Korea Risk of Power Oversubscription 2 3 01. Access Control 02. Central
More informationFrequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System
Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System Chi Zhang, Viktor K Prasanna University of Southern California {zhan527, prasanna}@usc.edu fpga.usc.edu ACM
More informationMachine Learning in WAN Research
Machine Learning in WAN Research Mariam Kiran mkiran@es.net Energy Sciences Network (ESnet) Lawrence Berkeley National Lab Oct 2017 Presented at Internet2 TechEx 2017 Outline ML in general ML in network
More informationAccelerating Implicit LS-DYNA with GPU
Accelerating Implicit LS-DYNA with GPU Yih-Yih Lin Hewlett-Packard Company Abstract A major hindrance to the widespread use of Implicit LS-DYNA is its high compute cost. This paper will show modern GPU,
More informationThe Efficient Implementation of Numerical Integration for FPGA Platforms
Website: www.ijeee.in (ISSN: 2348-4748, Volume 2, Issue 7, July 2015) The Efficient Implementation of Numerical Integration for FPGA Platforms Hemavathi H Department of Electronics and Communication Engineering
More informationHETEROGENEOUS SYSTEM ARCHITECTURE: PLATFORM FOR THE FUTURE
HETEROGENEOUS SYSTEM ARCHITECTURE: PLATFORM FOR THE FUTURE Haibo Xie, Ph.D. Chief HSA Evangelist AMD China OUTLINE: The Challenges with Computing Today Introducing Heterogeneous System Architecture (HSA)
More informationdesigning a GPU Computing Solution
designing a GPU Computing Solution Patrick Van Reeth EMEA HPC Competency Center - GPU Computing Solutions Saturday, May the 29th, 2010 1 2010 Hewlett-Packard Development Company, L.P. The information contained
More informationKernel level AES Acceleration using GPUs
Kernel level AES Acceleration using GPUs TABLE OF CONTENTS 1 PROBLEM DEFINITION 1 2 MOTIVATIONS.................................................1 3 OBJECTIVE.....................................................2
More information