Accelerating String Matching Using Multi-threaded Algorithm
|
|
- Magnus George
- 6 years ago
- Views:
Transcription
1 Accelerating String Matching Using Multi-threaded Algorithm on GPU Cheng-Hung Lin*, Sheng-Yu Tsai**, Chen-Hsiung Liu**, Shih-Chieh Chang**, Jyuo-Min Shyu** *National Taiwan Normal University, Taiwan **National Tsing Hua University, Taiwan
2 Introduction Network Intrusions Detection System (NIDS) has been widely used to detect network attacks. The pattern matching engine dominates the performance of an NIDS. Traditional pattern matching approaches on uniprocessor are too slow for today s networking. Hardware approaches for acceleration pattern matching. Logic-based Memory-based Multiprocessor-based 2
3 GPU for Pattern Matching Parallel computation on GPU is suitable for accelerating pattern matching. AAAAAAAAAAAAAAAAAAAAAAAB 1 thread 24 cycles AAAAAAAAAAAAAAAAAAAAAAAB Thread #1 Thread #2 Thread #3 Thread #4 4 segments 4 threads 6 cycles 3
4 Boundary Problem Boundary Problem Pattern occurring in the boundary of adjacent segments cannot be detected. False negative results False Negative AAAAAAAAAAAAAAAAAABBBBBB Thread #1 Thread #2 Thread #3 Thread #4 4
5 Overlapped Computation To resolve boundary problem Scan across boundaries Problem Overhead of overlapped computation Throughput reduction Thread #1 Thread #3 can identify "AB" AAAAAAAAAAAAAAAAAABBBBBB Thread #2 Thread #3 Thread #4 5
6 Aho-Corasick Algorithm Aho-Corasick (AC) algorithm has been widely used for pattern matching due to its advantage of matching multiple patterns in a single pass Compiling multiple patterns into a composite state machine Patterns (1) AB (2) ABG (3) BD (4) F [^AB] B G A B 0 D F 9 7 6
7 Aho-Corasick Algorithm (cont.) Aho-Corasick (AC) state machine composes of Solid line represents valid transitions. Dotted line represents failure transitions. Failure transition backtracks the state machine to recognize patterns in different start locations. B A 1 2 G 3 Input strings : A B D [^AB] 0 B F 9 D 6 7 location 1 location 2 7
8 Problems of AC on GPU Direct implementation of AC on GPU To resolve the boundary problem, each thread has low bound constraint of scanning length Constraint = segment length + overlapped length Overlapped length = the length of longest pattern -1 Overhead of overlapped computation A A A A A A A A A A A A A A A A A A B B B B B B 8
9 Problems of AC on GPU (cont.) 9
10 Failureless-AC State Machine AC state machine Failure transition backtracks the state machine to recognize patterns in different start locations. Input strings : A B D [^AB] 0 B A 1 2 B 4 5 G D location 1 location 2 8 F 9 Failureless-AC state machine Remove failure transition Terminated when no valid transitions Recognize patterns in location 1. Location 1 Input strings : A B D 0 B A 1 2 B G D F 9 Stop 10
11 Parallel Failureless-AC Algorithm Parallel Failureless-AC (PFAC) Algorithm Allocate each byte of input a thread to traverse Failureless-AC state machine. X X X X X X X X X A B D X X X X X X X X X X 11
12 Mechanism of PFAC Thread #n X X X X A B D X X X Thread #n+1 A 1 B 2 G 3 A 1 B 2 G 3 0 B D B D F 9 8 F 9 Thread #n Thread #n+1 12
13 Reducing Overlapped Computation Direct Implementation of AC Algorithm ach thread has low bound constraint of scanning length Overlapped computation (overlapped length = 3) PFAC Algorithm Without boundary problem. ach thread has variable scanning length Most thread terminates early Reducing overlapped computation to C C C C C C C C B C C C C C C 13
14 xperimental nvironments CPU: Intel Core i7 CPU GHz 4 cores 12 GB DDR3 memory GPU: NVIDIA GeForce GTX 1.4 GHz 480 cores 1536MB DDR5 memory Patterns: String pattern of Snort V2.4 1,998 rules containing 41,997 characters Total 27,754 states Input: Normal and worst case DFCON packet 14
15 xperimental Results Table 1: Throughput of normal case inputs AC_CPU AC_OMP AC_Pthread PFAC Speedup 1 thread (Gbps) 8 threads (Gbps) 8 threads (Gbps) multi-threads (Gbps) to fastest 2 MB MB MB MB MB MB MB MB
16 Comparisons of Normal Case AC_CPU 1 thread AC_OMP 8 threads AC_Pthread 8 threads PFAC multi-threads MB 4 MB 8 MB 16 MB 32 MB 64 MB 128 MB 192 MB 16
17 xperimental Results Table 2: Throughput of worst case inputs AC_CPU AC_OMP AC_Pthread PFAC Speedup 1 thread (Gbps) 8 threads (Gbps) 8 threads (Gbps) multi-threads (Gbps) to fastest 2 MB MB MB MB MB MB MB MB
18 Comparisons of Worst Case AC_CPU 1 thread AC_OMP 8 threads AC_Pthread 8 threads PFAC multi-threads MB 4 MB 8 MB 16 MB 32 MB 64 MB 128 MB 192 MB 18
19 Comparisons Approaches Character number of rule set Memory (KB) Throughput (Gbps) Memory fficiency Notes PFAC NVIDIA GTX 480 Huang et al. [10] Modified WM Schatz et al. [11] Suffix Tree Vasiliadis et al. [12] DFA Smith et al. [13] XFA NVIDIA 7600 GT NVIDIA GTX 8800 N.A NA NVIDIA 9800 GX2 N.A NA NVIDIA 8800 GTX Memory efficiency= (Throughput x # of characters) / Memory 19
20 Conclusions We have proposed a novel parallel string matching algorithm which is well-suited to be performed on GPUs and is free from the boundary detection problem. The proposed algorithm creates a new state machine which has less complexity and memory usage compared to the traditional Aho-Corasick state machine. The new algorithm achieves a significant speedup compared to the traditional Aho-Corasick algorithm accelerated by OpenMP on CPU. Compared to other GPU approaches, the new algorithm achieves 11.6 times faster than the state-of-the-art approach. 20
Accelerating String Matching Algorithms on Multicore Processors Cheng-Hung Lin
Accelerating String Matching Algorithms on Multicore Processors Cheng-Hung Lin Department of Electrical Engineering, National Taiwan Normal University, Taipei, Taiwan Abstract String matching is the most
More informationPFAC Library: GPU-Based String Matching Algorithm
PFAC Library: GPU-Based String Matching Algorithm Cheng-Hung Lin Lung-Sheng Chien Chen-Hsiung Liu Shih-Chieh Chang Wing-Kai Hon National Taiwan Normal University, Taipei, Taiwan National Tsing-Hua University,
More information소프트웨어기반고성능침입탐지시스템설계및구현
소프트웨어기반고성능침입탐지시스템설계및구현 KyoungSoo Park Department of Electrical Engineering, KAIST M. Asim Jamshed *, Jihyung Lee*, Sangwoo Moon*, Insu Yun *, Deokjin Kim, Sungryoul Lee, Yung Yi* Department of Electrical
More informationGregex: GPU based High Speed Regular Expression Matching Engine
11 Fifth International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing Gregex: GPU based High Speed Regular Expression Matching Engine Lei Wang 1, Shuhui Chen 2, Yong Tang
More informationA Capability-Based Hybrid CPU/GPU Pattern Matching Algorithm for Deep Packet Inspection
A Capability-Based Hybrid CPU/GPU Pattern Matching Algorithm for Deep Packet Inspection Yi-Shan Lin 1, Chun-Liang Lee 2*, Yaw-Chung Chen 1 1 Department of Computer Science, National Chiao Tung University,
More informationSWM: Simplified Wu-Manber for GPU-based Deep Packet Inspection
SWM: Simplified Wu-Manber for GPU-based Deep Packet Inspection Lucas Vespa Department of Computer Science University of Illinois at Springfield lvesp@uis.edu Ning Weng Department of Electrical and Computer
More informationString Matching with Multicore CPUs: Performing Better with the Aho-Corasick Algorithm
String Matching with Multicore CPUs: Performing Better with the -Corasick Algorithm S. Arudchutha, T. Nishanthy and R.G. Ragel Department of Computer Engineering University of Peradeniya, Sri Lanka Abstract
More informationGeoImaging Accelerator Pansharpen Test Results. Executive Summary
Executive Summary After demonstrating the exceptional performance improvement in the orthorectification module (approximately fourteen-fold see GXL Ortho Performance Whitepaper), the same approach has
More informationProject Proposal. ECE 526 Spring Modified Data Structure of Aho-Corasick. Benfano Soewito, Ed Flanigan and John Pangrazio
Project Proposal ECE 526 Spring 2006 Modified Data Structure of Aho-Corasick Benfano Soewito, Ed Flanigan and John Pangrazio 1. Introduction The internet becomes the most important tool in this decade
More informationG-NET: Effective GPU Sharing In NFV Systems
G-NET: Effective Sharing In NFV Systems Kai Zhang*, Bingsheng He^, Jiayu Hu #, Zeke Wang^, Bei Hua #, Jiayi Meng #, Lishan Yang # *Fudan University ^National University of Singapore #University of Science
More informationDetecting Computer Viruses using GPUs
Detecting Computer Viruses using GPUs Alexandre Nuno Vicente Dias Instituto Superior Técnico, No. 62580 alexandre.dias@ist.utl.pt Abstract Anti-virus software is the main defense mechanism against malware,
More informationFacial Recognition Using Neural Networks over GPGPU
Facial Recognition Using Neural Networks over GPGPU V Latin American Symposium on High Performance Computing Juan Pablo Balarini, Martín Rodríguez and Sergio Nesmachnow Centro de Cálculo, Facultad de Ingeniería
More informationConfigurable String Matching Hardware for Speeding up Intrusion Detection
Configurable String Matching Hardware for Speeding up Intrusion Detection Monther Aldwairi, Thomas Conte, Paul Franzon Dec 6, 2004 North Carolina State University {mmaldwai, conte, paulf}@ncsu.edu www.ece.ncsu.edu/erl
More informationMultipattern String Matching On A GPU
Multipattern String Matching On A GPU Xinyan Zha and Sartaj Sahni Computer and Information Science and Engineering University of Florida Gainesville, FL 32611 Email: {xzha, sahni}@cise.ufl.edu Abstract
More informationAccelerating Image Feature Comparisons using CUDA on Commodity Hardware
Accelerating Image Feature Comparisons using CUDA on Commodity Hardware Seth Warn, Wesley Emeneker, John Gauch, Jackson Cothren, Amy Apon University of Arkansas 1 Outline Background GPU kernel implementation
More informationCUDA Optimizations WS Intelligent Robotics Seminar. Universität Hamburg WS Intelligent Robotics Seminar Praveen Kulkarni
CUDA Optimizations WS 2014-15 Intelligent Robotics Seminar 1 Table of content 1 Background information 2 Optimizations 3 Summary 2 Table of content 1 Background information 2 Optimizations 3 Summary 3
More informationProject Proposal. ECE 526 Spring Modified Data Structure of Aho-Corasick. Benfano Soewito, Ed Flanigan and John Pangrazio
Project Proposal ECE 526 Spring 2006 Modified Data Structure of Aho-Corasick Benfano Soewito, Ed Flanigan and John Pangrazio 1. Introduction The internet becomes the most important tool in this decade
More informationA MULTI-CHARACTER TRANSITION STRING MATCHING ARCHITECTURE BASED ON AHO-CORASICK ALGORITHM. Chien-Chi Chen and Sheng-De Wang
International Journal of Innovative Computing, Information and Control ICIC International c 2012 ISSN 1349-4198 Volume 8, Number 12, December 2012 pp. 8367 8386 A MULTI-CHARACTER TRANSITION STRING MATCHING
More informationReliably Scalable Name Prefix Lookup! Haowei Yuan and Patrick Crowley! Washington University in St. Louis!! ANCS 2015! 5/8/2015!
Reliably Scalable Name Prefix Lookup! Haowei Yuan and Patrick Crowley! Washington University in St. Louis!! ANCS 2015! 5/8/2015! ! My Topic for Today! Goal: a reliable longest name prefix lookup performance
More informationParallelization of Shortest Path Graph Kernels on Multi-Core CPUs and GPU
Parallelization of Shortest Path Graph Kernels on Multi-Core CPUs and GPU Lifan Xu Wei Wang Marco A. Alvarez John Cavazos Dongping Zhang Department of Computer and Information Science University of Delaware
More informationSpace-Time Tradeoffs in Software-Based Deep Packet Inspection
Space-Time Tradeoffs in Software-ased eep Packet Inspection nat remler-arr I Herzliya, Israel Yotam Harchol avid Hay Hebrew University, Israel. OWSP Israel 2011 (Was also presented in I HPSR 2011) Parts
More informationGrAVity: A Massively Parallel Antivirus Engine
GrAVity: A Massively Parallel Antivirus Engine Giorgos Vasiliadis and Sotiris Ioannidis Institute of Computer Science, Foundation for Research and Technology Hellas, N. Plastira 100, Vassilika Vouton,
More informationEfficient Packet Pattern Matching for Gigabit Network Intrusion Detection using GPUs
2012 IEEE 14th International Conference on High Performance Computing and Communications Efficient Packet Pattern Matching for Gigabit Network Intrusion Detection using GPUs Che-Lun Hung Dept. of Computer
More informationParallel Direct Simulation Monte Carlo Computation Using CUDA on GPUs
Parallel Direct Simulation Monte Carlo Computation Using CUDA on GPUs C.-C. Su a, C.-W. Hsieh b, M. R. Smith b, M. C. Jermy c and J.-S. Wu a a Department of Mechanical Engineering, National Chiao Tung
More informationAccelerating Leukocyte Tracking Using CUDA: A Case Study in Leveraging Manycore Coprocessors
Accelerating Leukocyte Tracking Using CUDA: A Case Study in Leveraging Manycore Coprocessors Michael Boyer, David Tarjan, Scott T. Acton, and Kevin Skadron University of Virginia IPDPS 2009 Outline Leukocyte
More informationA GPU Implementation of Tiled Belief Propagation on Markov Random Fields. Hassan Eslami Theodoros Kasampalis Maria Kotsifakou
A GPU Implementation of Tiled Belief Propagation on Markov Random Fields Hassan Eslami Theodoros Kasampalis Maria Kotsifakou BP-M AND TILED-BP 2 BP-M 3 Tiled BP T 0 T 1 T 2 T 3 T 4 T 5 T 6 T 7 T 8 4 Tiled
More informationTHE problem of string matching is to find all locations
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. XX, NO. XX, XX 2X Parallelizing Exact and Approximate String Matching via Inclusive Scan on a GPU Yasuaki Mitani, Fumihiko Ino, Member, IEEE,
More informationDesign of Deterministic Finite Automata using Pattern Matching Strategy
Design of Deterministic Finite Automata using Pattern Matching Strategy V. N. V Srinivasa Rao 1, Dr. M. S. S. Sai 2 Assistant Professor, 2 Professor, Department of Computer Science and Engineering KKR
More informationPublished by: PIONEER RESEARCH & DEVELOPMENT GROUP ( 1
Enhanced Network Intrusion Detection System Sanu Skaria 1, Robin Abraham 2, Ajith Kurian Issac 3 1 Electronics and Communication Department, MG University, Muvattupuzha, Kerala, India 2 Electronics and
More informationEfficient Signature Matching with Multiple Alphabet Compression Tables
Efficient Signature Matching with Multiple Alphabet Compression Tables Shijin Kong Randy Smith Cristian Estan Presented at SecureComm, Istanbul, Turkey Signature Matching Signature Matching a core component
More informationCS 179: GPU Computing LECTURE 4: GPU MEMORY SYSTEMS
CS 179: GPU Computing LECTURE 4: GPU MEMORY SYSTEMS 1 Last time Each block is assigned to and executed on a single streaming multiprocessor (SM). Threads execute in groups of 32 called warps. Threads in
More informationSEASHORE / SARUMAN. Short Read Matching using GPU Programming. Tobias Jakobi
SEASHORE SARUMAN Summary 1 / 24 SEASHORE / SARUMAN Short Read Matching using GPU Programming Tobias Jakobi Center for Biotechnology (CeBiTec) Bioinformatics Resource Facility (BRF) Bielefeld University
More informationPERG-Rx: An FPGA-based Pattern-Matching Engine with Limited Regular Expression Support for Large Pattern Database. Johnny Ho
PERG-Rx: An FPGA-based Pattern-Matching Engine with Limited Regular Expression Support for Large Pattern Database Johnny Ho Supervisor: Guy Lemieux Date: September 11, 2009 University of British Columbia
More informationFast BVH Construction on GPUs
Fast BVH Construction on GPUs Published in EUROGRAGHICS, (2009) C. Lauterbach, M. Garland, S. Sengupta, D. Luebke, D. Manocha University of North Carolina at Chapel Hill NVIDIA University of California
More informationThe case for limited-preemptive scheduling in GPUs for real-time systems
The case for limited-preemptive scheduling in GPUs for real-time systems Roy Spliet Robert Mullins (first.last@cst.cam.ac.uk) Department of Computer Science and Technology University of Cambridge GPUs
More informationAdaptive-Mesh-Refinement Hydrodynamic GPU Computation in Astrophysics
Adaptive-Mesh-Refinement Hydrodynamic GPU Computation in Astrophysics H. Y. Schive ( 薛熙于 ) Graduate Institute of Physics, National Taiwan University Leung Center for Cosmology and Particle Astrophysics
More informationPacketShader: A GPU-Accelerated Software Router
PacketShader: A GPU-Accelerated Software Router Sangjin Han In collaboration with: Keon Jang, KyoungSoo Park, Sue Moon Advanced Networking Lab, CS, KAIST Networked and Distributed Computing Systems Lab,
More informationGPU ACCELERATED SELF-JOIN FOR THE DISTANCE SIMILARITY METRIC
GPU ACCELERATED SELF-JOIN FOR THE DISTANCE SIMILARITY METRIC MIKE GOWANLOCK NORTHERN ARIZONA UNIVERSITY SCHOOL OF INFORMATICS, COMPUTING & CYBER SYSTEMS BEN KARSIN UNIVERSITY OF HAWAII AT MANOA DEPARTMENT
More informationApplications of Berkeley s Dwarfs on Nvidia GPUs
Applications of Berkeley s Dwarfs on Nvidia GPUs Seminar: Topics in High-Performance and Scientific Computing Team N2: Yang Zhang, Haiqing Wang 05.02.2015 Overview CUDA The Dwarfs Dynamic Programming Sparse
More informationParallel Processing SIMD, Vector and GPU s cont.
Parallel Processing SIMD, Vector and GPU s cont. EECS4201 Fall 2016 York University 1 Multithreading First, we start with multithreading Multithreading is used in GPU s 2 1 Thread Level Parallelism ILP
More informationA Case Study in Optimizing GNU Radio s ATSC Flowgraph
A Case Study in Optimizing GNU Radio s ATSC Flowgraph Presented by Greg Scallon and Kirby Cartwright GNU Radio Conference 2017 Thursday, September 14 th 10am ATSC FLOWGRAPH LOADING 3% 99% 76% 36% 10% 33%
More informationFace Detection CUDA Accelerating
Face Detection CUDA Accelerating Jaromír Krpec Department of Computer Science VŠB Technical University Ostrava Ostrava, Czech Republic krpec.jaromir@seznam.cz Martin Němec Department of Computer Science
More informationA Simulated Annealing algorithm for GPU clusters
A Simulated Annealing algorithm for GPU clusters Institute of Computer Science Warsaw University of Technology Parallel Processing and Applied Mathematics 2011 1 Introduction 2 3 The lower level The upper
More informationImproving Signature Matching using Binary Decision Diagrams
Improving Signature Matching using Binary Decision Diagrams Liu Yang, Rezwana Karim, Vinod Ganapathy Rutgers University Randy Smith Sandia National Labs Signature matching in IDS Find instances of network
More informationCS/ECE 217. GPU Architecture and Parallel Programming. Lecture 16: GPU within a computing system
CS/ECE 217 GPU Architecture and Parallel Programming Lecture 16: GPU within a computing system Objective To understand the major factors that dictate performance when using GPU as an compute co-processor
More informationThe Art of Parallel Processing
The Art of Parallel Processing Ahmad Siavashi April 2017 The Software Crisis As long as there were no machines, programming was no problem at all; when we had a few weak computers, programming became a
More informationCME 213 S PRING Eric Darve
CME 213 S PRING 2017 Eric Darve Summary of previous lectures Pthreads: low-level multi-threaded programming OpenMP: simplified interface based on #pragma, adapted to scientific computing OpenMP for and
More informationTOKEN-BASED DICTIONARY PATTERN MATCHING FOR TEXT ANALYTICS. Raphael Polig, Kubilay Atasu, Christoph Hagleitner
TOKEN-BASED DICTIONARY PATTERN MATCHING FOR TEXT ANALYTICS Raphael Polig, Kubilay Atasu, Christoph Hagleitner IBM Research - Zurich Rueschlikon, Switzerland email: pol, kat, hle@zurich.ibm.com ABSTRACT
More informationHigh-Performance Packet Classification on GPU
High-Performance Packet Classification on GPU Shijie Zhou, Shreyas G. Singapura, and Viktor K. Prasanna Ming Hsieh Department of Electrical Engineering University of Southern California 1 Outline Introduction
More informationProcessors, Performance, and Profiling
Processors, Performance, and Profiling Architecture 101: 5-Stage Pipeline Fetch Decode Execute Memory Write-Back Registers PC FP ALU Memory Architecture 101 1. Fetch instruction from memory. 2. Decode
More informationX10 specific Optimization of CPU GPU Data transfer with Pinned Memory Management
X10 specific Optimization of CPU GPU Data transfer with Pinned Memory Management Hideyuki Shamoto, Tatsuhiro Chiba, Mikio Takeuchi Tokyo Institute of Technology IBM Research Tokyo Programming for large
More informationHARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES. Cliff Woolley, NVIDIA
HARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES Cliff Woolley, NVIDIA PREFACE This talk presents a case study of extracting parallelism in the UMT2013 benchmark for 3D unstructured-mesh
More informationCSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.
CSCI 402: Computer Architectures Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI 6.6 - End Today s Contents GPU Cluster and its network topology The Roofline performance
More informationStream Processing with CUDA TM A Case Study Using Gamebryo's Floodgate Technology
Stream Processing with CUDA TM A Case Study Using Gamebryo's Floodgate Technology Dan Amerson, Technical Director, Emergent Game Technologies Purpose Why am I giving this talk? To answer this question:
More informationGPUs and Emerging Architectures
GPUs and Emerging Architectures Mike Giles mike.giles@maths.ox.ac.uk Mathematical Institute, Oxford University e-infrastructure South Consortium Oxford e-research Centre Emerging Architectures p. 1 CPUs
More informationXIV International PhD Workshop OWD 2012, October Optimal structure of face detection algorithm using GPU architecture
XIV International PhD Workshop OWD 2012, 20 23 October 2012 Optimal structure of face detection algorithm using GPU architecture Dmitry Pertsau, Belarusian State University of Informatics and Radioelectronics
More informationInterval arithmetic on graphics processing units
Interval arithmetic on graphics processing units Sylvain Collange*, Jorge Flórez** and David Defour* RNC'8 July 7 9, 2008 * ELIAUS, Université de Perpignan Via Domitia ** GILab, Universitat de Girona How
More informationWaveView. System Requirement V6. Reference: WST Page 1. WaveView System Requirements V6 WST
WaveView System Requirement V6 Reference: WST-0125-01 www.wavestore.com Page 1 WaveView System Requirements V6 Copyright notice While every care has been taken to ensure the information contained within
More informationImproving performances of an embedded RDBMS with a hybrid CPU/GPU processing engine
Improving performances of an embedded RDBMS with a hybrid CPU/GPU processing engine Samuel Cremer 1,2, Michel Bagein 1, Saïd Mahmoudi 1, Pierre Manneback 1 1 UMONS, University of Mons Computer Science
More informationImproving Intrusion Detection for IoT Networks. A Snort GPGPU Modification Using OpenCL LINUS JOHANSSON OSKAR OLSSON
Improving Intrusion Detection for IoT Networks A Snort GPGPU Modification Using OpenCL Master s thesis in Computer Systems and Networks LINUS JOHANSSON OSKAR OLSSON Department of Computer Science and Engineering
More informationPerformance potential for simulating spin models on GPU
Performance potential for simulating spin models on GPU Martin Weigel Institut für Physik, Johannes-Gutenberg-Universität Mainz, Germany 11th International NTZ-Workshop on New Developments in Computational
More informationNetwork Design Considerations for Grid Computing
Network Design Considerations for Grid Computing Engineering Systems How Bandwidth, Latency, and Packet Size Impact Grid Job Performance by Erik Burrows, Engineering Systems Analyst, Principal, Broadcom
More informationHardware-accelerated regular expression matching with overlap handling on IBM PowerEN processor
Kubilay Atasu IBM Research Zurich 23 May 2013 Hardware-accelerated regular expression matching with overlap handling on IBM PowerEN processor Kubilay Atasu, Florian Doerfler, Jan van Lunteren, and Christoph
More informationLeveraging Hybrid Hardware in New Ways: The GPU Paging Cache
Leveraging Hybrid Hardware in New Ways: The GPU Paging Cache Frank Feinbube, Peter Tröger, Johannes Henning, Andreas Polze Hasso Plattner Institute Operating Systems and Middleware Prof. Dr. Andreas Polze
More informationBuilding NVLink for Developers
Building NVLink for Developers Unleashing programmatic, architectural and performance capabilities for accelerated computing Why NVLink TM? Simpler, Better and Faster Simplified Programming No specialized
More informationThe rcuda middleware and applications
The rcuda middleware and applications Will my application work with rcuda? rcuda currently provides binary compatibility with CUDA 5.0, virtualizing the entire Runtime API except for the graphics functions,
More informationLecture 1: Gentle Introduction to GPUs
CSCI-GA.3033-004 Graphics Processing Units (GPUs): Architecture and Programming Lecture 1: Gentle Introduction to GPUs Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Who Am I? Mohamed
More informationParallelization Techniques for Implementing Trellis Algorithms on Graphics Processors
1 Parallelization Techniques for Implementing Trellis Algorithms on Graphics Processors Qi Zheng*, Yajing Chen*, Ronald Dreslinski*, Chaitali Chakrabarti +, Achilleas Anastasopoulos*, Scott Mahlke*, Trevor
More informationRed Fox: An Execution Environment for Relational Query Processing on GPUs
Red Fox: An Execution Environment for Relational Query Processing on GPUs Haicheng Wu 1, Gregory Diamos 2, Tim Sheard 3, Molham Aref 4, Sean Baxter 2, Michael Garland 2, Sudhakar Yalamanchili 1 1. Georgia
More informationParticle-in-Cell Simulations on Modern Computing Platforms. Viktor K. Decyk and Tajendra V. Singh UCLA
Particle-in-Cell Simulations on Modern Computing Platforms Viktor K. Decyk and Tajendra V. Singh UCLA Outline of Presentation Abstraction of future computer hardware PIC on GPUs OpenCL and Cuda Fortran
More informationNVIDIA GTX200: TeraFLOPS Visual Computing. August 26, 2008 John Tynefield
NVIDIA GTX200: TeraFLOPS Visual Computing August 26, 2008 John Tynefield 2 Outline Execution Model Architecture Demo 3 Execution Model 4 Software Architecture Applications DX10 OpenGL OpenCL CUDA C Host
More informationCS427 Multicore Architecture and Parallel Computing
CS427 Multicore Architecture and Parallel Computing Lecture 6 GPU Architecture Li Jiang 2014/10/9 1 GPU Scaling A quiet revolution and potential build-up Calculation: 936 GFLOPS vs. 102 GFLOPS Memory Bandwidth:
More informationGPGPUs in HPC. VILLE TIMONEN Åbo Akademi University CSC
GPGPUs in HPC VILLE TIMONEN Åbo Akademi University 2.11.2010 @ CSC Content Background How do GPUs pull off higher throughput Typical architecture Current situation & the future GPGPU languages A tale of
More informationDeep Packet Inspection of Next Generation Network Devices
Deep Packet Inspection of Next Generation Network Devices Prof. Anat Bremler-Barr IDC Herzliya, Israel www.deepness-lab.org This work was supported by European Research Council (ERC) Starting Grant no.
More informationAccelerating Restore and Garbage Collection in Deduplication-based Backup Systems via Exploiting Historical Information
Accelerating Restore and Garbage Collection in Deduplication-based Backup Systems via Exploiting Historical Information Min Fu, Dan Feng, Yu Hua, Xubin He, Zuoning Chen *, Wen Xia, Fangting Huang, Qing
More informationImproving Memory Space Efficiency of Kd-tree for Real-time Ray Tracing Byeongjun Choi, Byungjoon Chang, Insung Ihm
Improving Memory Space Efficiency of Kd-tree for Real-time Ray Tracing Byeongjun Choi, Byungjoon Chang, Insung Ihm Department of Computer Science and Engineering Sogang University, Korea Improving Memory
More informationEnhancing Byte-Level Network Intrusion Detection Signatures with Context
Enhancing Byte-Level Network Intrusion Detection Signatures with Context Robin Sommer sommer@in.tum.de Technische Universität München Germany Vern Paxson vern@icir.org International Computer Science Institute
More informationJCudaMP: OpenMP/Java on CUDA
JCudaMP: OpenMP/Java on CUDA Georg Dotzler, Ronald Veldema, Michael Klemm Programming Systems Group Martensstraße 3 91058 Erlangen Motivation Write once, run anywhere - Java Slogan created by Sun Microsystems
More informationParallel Variable-Length Encoding on GPGPUs
Parallel Variable-Length Encoding on GPGPUs Ana Balevic University of Stuttgart ana.balevic@gmail.com Abstract. Variable-Length Encoding (VLE) is a process of reducing input data size by replacing fixed-length
More informationAccelerating image registration on GPUs
Accelerating image registration on GPUs Harald Köstler, Sunil Ramgopal Tatavarty SIAM Conference on Imaging Science (IS10) 13.4.2010 Contents Motivation: Image registration with FAIR GPU Programming Combining
More informationCSCI-GA Multicore Processors: Architecture & Programming Lecture 10: Heterogeneous Multicore
CSCI-GA.3033-012 Multicore Processors: Architecture & Programming Lecture 10: Heterogeneous Multicore Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Status Quo Previously, CPU vendors
More informationParallel LZ77 Decoding with a GPU. Emmanuel Morfiadakis Supervisor: Dr Eric McCreath College of Engineering and Computer Science, ANU
Parallel LZ77 Decoding with a GPU Emmanuel Morfiadakis Supervisor: Dr Eric McCreath College of Engineering and Computer Science, ANU Outline Background (What?) Problem definition and motivation (Why?)
More informationA Translation Framework for Automatic Translation of Annotated LLVM IR into OpenCL Kernel Function
A Translation Framework for Automatic Translation of Annotated LLVM IR into OpenCL Kernel Function Chen-Ting Chang, Yu-Sheng Chen, I-Wei Wu, and Jyh-Jiun Shann Dept. of Computer Science, National Chiao
More informationGeneral Purpose GPU Computing in Partial Wave Analysis
JLAB at 12 GeV - INT General Purpose GPU Computing in Partial Wave Analysis Hrayr Matevosyan - NTC, Indiana University November 18/2009 COmputationAL Challenges IN PWA Rapid Increase in Available Data
More informationFPGA Implementation of Token-Based Clam AV Regex Virus Signatures with Early Detection
IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735 PP 54-61 www.iosrjournals.org FPGA Implementation of Token-Based Clam AV Regex Virus Signatures
More informationInvestigation of GPU-based Pattern Matching
Investigation of GPU-based Pattern Matching X Bellekens, I Andonovic, RC Atkinson Department of Electronic & Electrical Engineering University of Strathclyde Glasgow, G1 1XW, UK Email: {avier.bellekens,
More informationGPU Programming. Lecture 1: Introduction. Miaoqing Huang University of Arkansas 1 / 27
1 / 27 GPU Programming Lecture 1: Introduction Miaoqing Huang University of Arkansas 2 / 27 Outline Course Introduction GPUs as Parallel Computers Trend and Design Philosophies Programming and Execution
More informationPLB-HeC: A Profile-based Load-Balancing Algorithm for Heterogeneous CPU-GPU Clusters
PLB-HeC: A Profile-based Load-Balancing Algorithm for Heterogeneous CPU-GPU Clusters IEEE CLUSTER 2015 Chicago, IL, USA Luis Sant Ana 1, Daniel Cordeiro 2, Raphael Camargo 1 1 Federal University of ABC,
More informationData Partitioning on Heterogeneous Multicore and Multi-GPU systems Using Functional Performance Models of Data-Parallel Applictions
Data Partitioning on Heterogeneous Multicore and Multi-GPU systems Using Functional Performance Models of Data-Parallel Applictions Ziming Zhong Vladimir Rychkov Alexey Lastovetsky Heterogeneous Computing
More informationKULLEĠĠ SAN BENEDITTU Boys Secondary, Kirkop
KULLEĠĠ SAN BENEDITTU Boys Secondary, Kirkop Mark HALF-YEARLY EXAMINATION 2012/2013 Track 3 (JL) FORM 3 COMPUTING TIME: 1h 30min Question 1 2 3 4 5 6 7 8 Max. Mark 12 18 18 20 5 5 10 12 Mark Global Mark
More informationGAMER : a GPU-accelerated Adaptive-MEsh-Refinement Code for Astrophysics GPU 與自適性網格於天文模擬之應用與效能
GAMER : a GPU-accelerated Adaptive-MEsh-Refinement Code for Astrophysics GPU 與自適性網格於天文模擬之應用與效能 Hsi-Yu Schive ( 薛熙于 ), Tzihong Chiueh ( 闕志鴻 ), Yu-Chih Tsai ( 蔡御之 ), Ui-Han Zhang ( 張瑋瀚 ) Graduate Institute
More informationAccelerating Lossless Data Compression with GPUs
Accelerating Lossless Data Compression with GPUs R.L. Cloud M.L. Curry H.L. Ward A. Skjellum P. Bangalore arxiv:1107.1525v1 [cs.it] 21 Jun 2011 October 22, 2018 Abstract Huffman compression is a statistical,
More informationHiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and Parallelization of LDPC codes.
HiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and Parallelization of LDPC codes Ian Glendinning Outline NVIDIA GPU cards CUDA & OpenCL Parallel Implementation
More informationRed Fox: An Execution Environment for Relational Query Processing on GPUs
Red Fox: An Execution Environment for Relational Query Processing on GPUs Georgia Institute of Technology: Haicheng Wu, Ifrah Saeed, Sudhakar Yalamanchili LogicBlox Inc.: Daniel Zinn, Martin Bravenboer,
More informationA Performance Evaluation of the Preprocessing Phase of Multiple Keyword Matching Algorithms
A Performance Evaluation of the Preprocessing Phase of Multiple Keyword Matching Algorithms Charalampos S. Kouzinopoulos and Konstantinos G. Margaritis Parallel and Distributed Processing Laboratory Department
More informationCUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav
CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav CMPE655 - Multiple Processor Systems Fall 2015 Rochester Institute of Technology Contents What is GPGPU? What s the need? CUDA-Capable GPU Architecture
More informationNVIDIA s Compute Unified Device Architecture (CUDA)
NVIDIA s Compute Unified Device Architecture (CUDA) Mike Bailey mjb@cs.oregonstate.edu Reaching the Promised Land NVIDIA GPUs CUDA Knights Corner Speed Intel CPUs General Programmability 1 History of GPU
More informationNVIDIA s Compute Unified Device Architecture (CUDA)
NVIDIA s Compute Unified Device Architecture (CUDA) Mike Bailey mjb@cs.oregonstate.edu Reaching the Promised Land NVIDIA GPUs CUDA Knights Corner Speed Intel CPUs General Programmability History of GPU
More informationGPGPU Programming & Erlang. Kevin A. Smith
GPGPU Programming & Erlang Kevin A. Smith What is GPGPU Programming? Using the graphics processor for nongraphical programming Writing algorithms for the GPU instead of the host processor Why? Ridiculous
More informationNOISE ELIMINATION USING A BIT CAMS
International Journal of VLSI Design, 2(2), 2011, pp. 97-101 NOISE ELIMINATION USING A BIT CAMS Sundar Srinivas Kuchibhotla 1 & Naga Lakshmi Kalyani Movva 2 1 Department of Electronics & Communication
More information